Abstract: The problem of finding the prime factors of large composite numbers has always been of mathematical interest for centuries. With the advent of public key cryptosystems it is also of practical importance, because of the security of these cryptosystems, such as the Rivest-Shamir-Adleman (RSA) systems, depends on the difficulty of factoring the public-keys. In recent years the best known integer factorization algorithms have improved greatly, to the point where it is now easy to factor a 100-decimal digit number and possible to factor larger than 250 decimal digits, given the availability of enough computing power. However, the problem of integer factorization still appears difficult, both in a practical sense (for numbers of more than over 100 decimal digits), in a theoretical sense (because none of the algorithms run in polynomial time). In this study we will outline some useful and recent integer factorization algorithms, including the Elliptic Curve Algorithm (ECM), Quadratic Sieve (QS), Number Field Sieve (NFS) and finally give some example of their usage.
INTRODUCTION
Cryptography is an important building block of e-commerce systems. In particular, public key cryptography can be used for ensuring the confidentiality, authenticity and integrity of information in an organization. To protect the sensitive information in an organization, encryption can be applied to conceal sensitive data so that the encrypted data is completely meaningless except to the authorized individuals with the correct decryption key. To preserve the authenticity and integrity of data, digital signature can be performed on the data such that other people can neither impersonate the correct signer nor modify the signed data without being detected.
Modern asymmetric key cryptography uses mathematical operations that are fairly easy to do in one direction, but extremely hard to do in reverse. The standard example used (indeed, the one that is almost synonymous with public key encryption) is that of prime factorization. Large primes have at least one practical application-they can be used to construct public key cryptosystems (also known as asymmetric cryptosystems and open encryption key cryptosystems)[1]. Two basic types of public-key schemes emerged in the mid 1970s; Diffie-Hellman (DH) for key agreement protocol proposed in 1975 which relies on the hardness of the Discrete Logarithm Problem (DLP): given p, g and ga, find a[2,3]. Two years later Rivest, Shamir and Alderman at MIT in the USA proposed the key transport and digital signature schemes known by their initials as RSA[4], which takes it security from the hardness of the Integer Factorization Problem (IFP): Given two large prime numbers p and q, it is a straightforward task to multiply them together and find the resulting multiplicand, n = (p · q). However, given a large composite integer that is a product of two large prime factors, it is extremely difficult to find those two primes[5].
Factorization was once primarily of academic interest. It gained in practical importance after the introduction of the RSA public-key cryptosystem[1,4,6]. It is one of the most popular public key crypto-algorithm, which is widely used today in hardware and software to secure electronic data transport on Internet especially the e-commerce to secure sensitive information such as credit card numbers.
As usual let us consider our usual communicating partners, Alice and Bob. Suppose Alice wants to use RSA to send an encrypted message to Bob. Let the public key of Bob be (e,n) and the private key of Bob be (d,n) where n is the product of two prime numbers p and q (with ed = 1(mod(p1)(q1)). In this scenario, (e, n) is accessible to anyone (e.g. Alice) who wants to send encrypted messages to Bob while d is kept secretly by Bob, for detailed implementation procedure[1,4,5].
To encrypt a plaintext message M for Bob, Alice has to compute ciphertext: C = Me(mod n). Bob can decrypt C by computing: (C)d = (Me)d = M(mod n) = M. No one except Bob can decrypt C since d is only known to Bob. To calculate d from e, it is required to factor n to get p and q. With q and q known, it is possible to calculate (p 1)(q 1).
Fig. 1: | Proposed the minimum key sizes (in bits) to be regarded as safe for RSA and ECC |
By reversing the key generation procedure, d can be calculated by computing: e-1(mod(p1)(q1))[1,4] for detailed implemenation.
The product, n, is the modulus, e is the public exponent and d is the secret exponent. You can publish your public-key freely, because there are no known easy methods of calculating d, p or q given only your public-key (e,n). Authentication on the other hand, is not as easy to guarantee in public-key cryptography[7]. Since everybody knows everybody elses public-key, Eve can easily send message to Alice claiming to be Bob. The RSA crypto-algorithm can be broken by factoring n into p and q. If n is factored then (p1)(q1) can be found and from this d can be computed. Hence, any adversary that factors n can find the private-key d and with it decrypt any encrypted message.
Therefore, RSA crypto-algorithm is secure only if the factorization of the carefully chosen sufficiently large two prime numbers requires a super-polynomial amount of time with respect to the size of the modulus number, n. To date it has not been proved that the process of factorization of numbers requires an exponential amount of time. However, no classical polynomial time algorithm has been found and most researchers generally believe that none will ever been found! This is a practical motivation for the current interest in integer factorization algorithms. Currently, you need a crypto-keylength of 1024-bit number to get the same security you got from a 512-bit number in the early 1980s. If you want your keys to remain secure for 20 years, 1024 bits is probably too short (Fig. 1).
Because the security of RSA is so dependent on an adversarys inability to factor a large composite number, much more research has been done to find ways to quickly factor such numbers. The key question is: how large is sufficiently large to make this recovery virtually impossible? In the 1980s it was generally held that prime numbers of a fifty odd digits (i.e., 1050) would suffice.
Fig. 2: | Comparison of security levels of ECC and RSA/DSA |
As a case in point take, for example, when Rivest challenged the world in 1977 to factor RSA-129, a 129-digit number (from a special list), he estimated that on the basis of contemporary computational methods and computer systems of the day, this would take about 1016 years of the computing time. Seventeen years later it took only eight months in a worldwide cooperative effort to do the job[8]. This gave a false credence to a modulus number of 512-bit (155-digits) which in the mid 80s was a popular RSA encryption keylength used, for example, on the Internet and to secure transactions in the financial world: it too, was factored at CWI on August 22, 1999 using the Number Field Sieve factoring method (NFS)[9]. This was not only a new record at the time for factoring general numbers but in general, a big setback to the RSA crypto-community which relied heavily on 512-bit RSA crypto-keys so this factorization represented a breakthrough in research on RSA-based cryptosystems.
Computing power is measured in MIPS-years: a million-instructions-per-second computer running for one year or about 3x1013 instructions. A 100-MHz Pentium III is about a 50-MIPS machine; a 1600-node Intel Paragon is about 50,000 MIPS. In 1983, a Cray X-MP supercomputer factored a 71-digit number in 0.1 MIPS-years, using 9.5 CPU hours. That's expensive. Factoring the 129-digit number in 1994 required 5000 MIPS-years and used the idle time on 1600 computers around the world over an eight-month period. Although it took longer, it was essentially free. These two computations used what's called the quadratic sieve, but a newer, more powerful algorithm has arrived. The general number field sieve is faster than the quadratic sieve for numbers well below 116 digits and can factor a 512-bit number over 10 times faster-it would take less than a year to run on an 1800-node Intel Paragon. Figure 2 shows current acceptable security level involving MIPS-years estimation featuring public keys.
INTEGER FACTORIZATION - A HISTORICAL PERSPECTIVE
It has been known since before the days of Euclid that any natural number >2 can be represented uniquely as a product of primes, that is:
(1) |
It is trivial to get the n from the primes, but how do we get the primes from n? This is the connerstone of the study.
In 1970, it was barely possible to factor a 20-digit number. In 1980, asymmetric cryptography had matured and was beginning to see widespread use in real-world applications. Factoring large integers suddenly became important work. The best algorithm of the time was Morrison-Brillhart continued fraction algorithm, based largely on Maurice Kraitchiks work during the 1920s improving Fermats difference-of-squares method[10]. Their method was commonly used to factor 70-digit numbers, but no reports of any factorizations near 100 digits were made. After analyzing the complexity of the continued fraction algorithms, Richard Schroeppel discovered what was necessary to improve their efficiency and he began working on the linear sieve[11]. Carl Pomerance used some of the same ideas in devising the quadratic sieve, which still is the most efficient general factoring method for large numbers[12].
By 1990, with the use of the quadratic sieve factoring algorithm, the record factored number was 116 digits long[13]. The biggest break for the quadratic sieve and perhaps factoring in general, was the introduction of a multiple polynomial variant, first by Jim Davis and then Peter Montgomery[14]. This allowed for straightforward parallelization, followed by a distributed version from Robert Silverman. Arjen Lenstra and Mark Manasse brought the problem to the Internet and in 1994 the 129-digit RSA challenge number was factored using the idle time on over 1600 computers. It had been estimated in 1976, to be safe for 40 quadrillion years[15]. The quadratic sieve was replaced by Pollards number field sieve in 1996[16]. Number Field Sieve (NFS) is currently at the cutting-edge of research into integer factoring algorithm capable of factoring large composite numbers over 100 digits[17]. The current record in factoring a generally hard integer is that of the 200 decimal digits challenge integer from RSA data Security, Inc., RSA-200, which was accomplished with General Number Field Sieve (GNFS) was factored by Bahr, et al.[18]. Among the Cunningham integers, the record is the factorization of 248 decimal digit integer by Special Number Field Sieve (SNFS) was factored by Aoki, Kida, Shimoyama, Sonoda and Ueda (CRYPTREC)[19,20].
Therefore, the baseline and trade-off, is the size of n should be chosen such that the time and cost for performing the factorization exceeds the value of the secured/encrypted information. But even then, great care must still be taken in the overall crypto-design, as current development in integer factorization have gone much faster than foreseen and it is a precarious matter for crypto-designers to venture upon quantitative forecasts in this field. Moreover, one should realize that it always remains possible that a new computational method could be invented from unsuspecting quarter, which makes factoring easy (e.g., quantum computing, if an operative quantum computer were to be realized in the not-so distance future): fortunately or unfortunately depending on which side you are on-no one knows how to build one yet[21,22]!
However, be warned that factoring large numbers is hard but not as hard as it used to be[21,22]. This has grave implications for the effectiveness of public-key cryptography, which relies on the difficulty of factoring long keys for its security. Today, the wise crypto-designer is ultraconservative when choosing key lengths for a public-key system. He/she must consider the intended security, the key's expected lifetime and the current state of the factoring art.
This quick historical perspectives show that the ability to factor huge numbers was not solely the result of advancements in computer technology, but instead was heavily based on the growth of mathematical algorithms. These advances began with a mathematician named Carl Friedrich Gauss[23].
THE INTEGER FACTORIZATION TECHNIQUES
For a long time factoring belonged to the realm of pure mathematics, without any practical application. Modern algorithms for factoring N fall into two major categories: these include trial division, Pollard Rho, p±1 and the Elliptic Curve Method (ECM). Algorithms in the second category factor a number regardless of the sizes of its prime factors, but cost much more when applied to larger integers. These algorithms include continued fraction, Quadratic Sieve (QS) and Number Field Sieve (NFS). In practice, algorithms in both categories are important. Given a large integer with no clue about the sizes of its prime factors, one typically tries algorithms in the first category until the cofactor (i.e., the quotient after dividing by known prime factors) of the original number is sufficiently small. Then one tries an algorithm in the second category if the cofactor is not itself prime.
Base concepts
Factoring linque: The symbols
Power-smoothness: A number N is B-smooth if it has no prime divisors larger than some bound B where B is a positive integer. A positive integer N is B-power-smooth if all prime powers dividing N are less than or equal to B. The power-smoothness of N is the largest B such that N is B-power-smooth. For example, 135 = 33 · 5 is 7-smooth and 5-smooth, but not 3-smooth or 2-smooth. Similarly, 123456789 = 2·32·5·3607·3803 is 500-smooth and 3803-smooth, but not 3607-smooth or 5-smooth. Once B is found you can always compute m = B!. Alternatively one can simply compute: m = lcm(B).
Fermat's little theorem: Let n be a composite integer with prime factor p. By Fermat's little theorem, which states that:
(2) |
Where, a,p are integers that are relatively prime (i.e., gcd(a,p) = 1)
TRIAL DIVISION FACTORING METHOD
Almost all factoring programs attempt trial division by the smallest primes.
Even if integer N is 100 decimal digits long, it takes only a few seconds to
divide N by all primes up to 107. If N is composite, then at least
one prime of N is at almost
Sometimes the primes divisors of N are known to have a special form. For example, if p | aN 1 but p ð ak 1 for any k where k < N and k | N, then p ≡ 1(mod N). This information facilitates trial division, since it restricts the range of possible divisors. For such numbers, one might try trial division by all qualifying primes below 232 or even higher. Hoever, unless N has a special form, trial division is impractical for finding prime divisor above 109. Furthermore, this method is not very economical, however and there are now far better methods available today.
The newest ones are the Quadratic Sieve (QS) and the Number Field Sieve (NFS), based on an old idea of Pierre de Fermat (1601-1665)[24] who observed that every composite number N can be written as the difference of two squares: N = x2y2 = (x+ y)(x y). Thus, having found such numbers x and y, we also have found factors of N. The numbers are found by a sieving process, in which possible values of x and y are excluded. The process amounts to very efficiently collecting relations: pairs of numbers a and b possessing only small prime factors, such that N divides the difference: a b. For large N, in general very many relations are required to find its prime factors (70 million for RSA-130, for example). In QS the relations consists of ordinary numbers, in NFS these are algebraic numbers (e.g., 3 + 22).
Difficulty of factoring grows rapidly with the size, i.e., the number of digits,
of a number we want to factor. To see this take a number N with L decimal digits
(N ≈ 10L) and try factor it by dividing it by
Shor[21] showed that if such machine could be built, integer factorization and discrete logs (including elliptic curve discrete logs) could be computed in polynomial time. This result has stimulated an explosion in research on quantum computers[22]. The one comforting factor is that all experts agree that even if quantum computers are eventually built, it will take many years to do so (at least for machines on a scale that will threaten modern public key cryptosystems) and so there will be advance warning about the need to develop and deploy alternative crypto-algorithms.
KRAITCHIK ALGORITHM
In the 1920s, Maurice Kraitchik improved Fermats difference of squares technique for factorization (x2 y2), which set the basis of most modern factorization algorithms[25]. Kraitchik determined that instead of looking for a difference of squares equal to n, it would be enough to find a difference of squares, which is equal to a multiple of n. This can be restated as x2 ≡ y2 (mod n). This congruence can have two different types of solutions: interesting and uninteresting solutions, respectively. In the case of an interesting solutions we have x ± y (mod n); while uninteresting solution leads to x ≡ ± y (mod n).
What makes an interesting solution interesting and an uninteresting solution uninteresting? This can be understood by breaking down, x2 y2 into two factors (x + y)(x y). In the case of an uninteresting solution, one of the two factors is a multiple of the number n we are trying to factor. This can be understood as follows: if (a·n)(b) = ab·n, where a and b are integers and ab·n is the value of x2 ≡ y2(mod n) and in this case the number n has not been broken into two factors. However, in the case of an interesting solution, the factorization ends up as a·b = c·n, or in other words, broken into two factors, say p and q, such that: n = (p · q).
Now, more importantly, how can the factors of n be pulled out once an interesting solution has been found? The answer to this can be found in Euclids Greatest Common Divisor (GCD) algorithm. Applying this algorithm to either x y or x + y, an interesting solution will yield a factor of n. Euclids algorithm works by repeatedly dividing the remainder into divisor of the previous function until the remainder is zero or one. If the remainder is zero, then the greatest common divisor has been found and; if the remainder is one, then the numbers are said to be relatively prime. Mathematically, we pull out the commpn factors using GCD as follows: gcd(x + y, n) = p or gcd(x y, n) = q to achieve our desired aim.
The mechanics of Kraitchik algorithm: Now, to better understand this
theory, an example will be presented using an overly simple composite number:
n = 1513. The floor of the square root of this value is ⌊
Next we implement the Kraitchiks method which finds a solution by combining several of the results from the above table. The first suitable combination that is found is for (x2 = 392·412·432), since the multiplication of the results is a square (y2 = 28 · 32 · 72). Therefore, the solution x = (39·41·43) and y = (24·3·7) yields gcd(x y,n) = gcd(68757 - 336, n) = 1 which leads to uninteresting solution. The next combination that is found is when x = 39·40·47 and y = 23·3·29) which is also uninteresting, since (73320 - 696) mod n = 0. Next, we check the combination, x = 39·45 and y = 26 which is interesting, since gcd(1755-64, n) = 89. In fact: n = 1513 = 17·89. (Note: If the powers of all numbers are even, then a square has been found, as is the case with j = 15, with x = 53 and y = 22·32 = 36, so that gcd(x y, n) = gcd(53 36,n) = 17.
Pollards (p 1)-method: Pollard's p 1 algorithm is a number theoretic integer factorization algorithm, invented by Pollard[26]. It is a special-purpose algorithm, meaning that it is only suitable for integers with specific types of factors. This algorithm is based on Fermats Little Theorem which we came across in Eq. (2). The algorithm is also based on the insight that numbers of the form ab1 tend to be highly composite when b is itself composite. Since it is computationally simple to evaluate numbers of this form in modular arithmetic, the algorithm allows one to quickly check many potential factors with great efficiency. In particular, the method will find a factor p if b is divisible by p 1, hence the name. When p 1 is smooth (the product of only small integers) then this algorithm is well-suited to discovering the factor p.
Base concepts: Let n be a composite integer with prime factor p. By invoking the Fermats Theorem above, let us further assume that p1 is B-powersmooth for some reasonably sized B. Recall that a positive integer m is called B-smooth if all prime factors pi of m are such that pi ≤ B. Here, m is called B-powersmooth if all prime powers pin dividing m are such that pin ≤ B. However, for n where there is no factor p for which p 1 has only factors, in the worse case, this algorithm is no better than trial division. Therefore, Pollards p 1 algorithm can be considered as special purpose algorithm to factor integers efficiently provided n can satisfy the property discussed.
Let p1,
,pL be the primes less than B and let e1,
,eL
be the exponents such that
Let:
(3) |
As a shortcut, note that m = lcm(1, ,B). As a consequence of this, (p 1) divides m and also if pe divides m this implies that pe ≤ B. Since (p1) divides m we know that am ≡ 1(mod p) and because p divides n this means gcd(am 1, N)> 1. Therefore if gcd (am 1, N) ≠ 1, then the gcd is a non-trivial factor of N. Note that if p 1 is not B-powersmooth, then am 1(mode p) for at least half of all a.
POLLARD CONCEPTS
Let N = pqr, where, p and q are distinct primes and r is an integer, such that p 1 is B-powersmooth and q 1 is not B-powersmooth. Now, gcd(am 1, N) yields a proper factor of N. Note that in the case where, q 1 is B-powersmooth, the gcd may yield a trivial factor because q divides am 1. Note that this is what makes the algorithm specialized.
As an example let N = 172189 = 421·409. Such that: p 1 = 421 - 1 = 23·3·5·7 and q 1 = 409 1 = 23·3·17. So, an appropriate value of B would be from 7 to 16. If B was selected less than 7 the gcd would have been 1 and if B was selected higher than 16 the gcd would have been N. Of course, we do not know what value of B is appropriate in advance, so this will factor into the algorithms running time.
To speed up calculations, we also know that when taking the gcd we can reduce one part modulo the other, so gcd(am 1, N) = gcd(am 1(mod N), N). This can be efficiently calculated using modular exponentiation and the Euclidean algorithm.
Algorithm and running time: The basic algorithm can be written as follows:
If d = 1 in step 6, this indicates that for all p 1 that none were B-powersmooth. If d = n in step 7, this usually indicates that all factors were B-powersmooth, but in rare cases it could indicate that a had a small order modulo p. The running time of this algorithm is O(B log B·log2n), so it is advantageous to pick a small value of B.
Large prime variant: A variant of the basic algorithm is sometimes used.
Statistically, there is often a factor p of n such that p 1 = fq such
that f is B-powersmooth and B < q ≤ B, where, q is a prime and B
is called a semi-smoothness bound. As a starting point, this would work into
the basic algorithm at step 6 if we encountered gcd = 1 but didn't want to increase
B. For all primes B < q1,
,qL ≤ B, we check
if
Additional information: Because of this algorithm's effectiveness on certain types of numbers the RSA specifications require that the primes, p and q, be such that they are non-B-powersmooth for small values of B.
The mechanics of Pollards (p 1)-Factoring Algorithm: We now illustrate the Pollard (p 1) algorithm as follows:
• | Choose m = B! and suppose that (p 1)|m for some bound
B ∈ |
• | Apply Fermats Little Theorem (FLT) to get 2m ≡ 1(mod p). (This is the reason for choosing B! in step 1.) |
• | Compute d = gcd(2m 1,n) using the Euclidean algorithm. |
• | If n ð (2m 1), then d from step 3 provides a nontrivial factorization of n |
In PARI, these computations are carried out as follows[27]:
ELLIPTIC CURVE METHOD (ECM) OF FACTORIZATION
Background of elliptic curve theory: Let K be the field. Here K will
be either field
(4) |
together with a single element denoted by O and called the point at infinity,
where a, b ∈
In all cases the group used when implementing the ECM is the group of points
on the curve over
Points of finite order: The order m of a point P on an elliptic curve
is the smallest positive integer such that mP = O; of course, such a finite
m need not exist. It is often of interest to find points P of finite order on
an elliptic curve, especially for elliptic curve defined over
Additive of points on elliptic curve (mod p): Points are added using a geometric group law which can be expressed algebraically through rational functions involving x and y. That is, two points are added, forming P + Q, or a point is doubled, forming 2P[28].
Let E denote the points on the elliptic curve given by Eq. 4. For any two points P = (x1, y1) and Q = (x2, y2) in E, define:
(5) |
Next define x3 and y3 as:
(6) |
Where:
(7) |
As an example, let the elliptic curve be given by: E: y2 = x3 + x + 188 over the finite field F7. To find all the points on E, we need to solve the congruence, x3 + x + 188 ≡ y2 (mod 7), for all x ≤ 6. In other words, we need to find the quadratic residues. For each value of x, one needs to determine whether or not it is a quadratic residue. If it is the case, then there are two values in the elliptic group. If not, then the point is not in the elliptic group E7 (1, 1880). So there will be a lot of points modulo 7. These are:
Hence, E has eleven points, which is also known as the order of the group, defined as: # E (F7) = 11. We now show that this is a cyclic group. We need only find a nonzero element that is a generator. Choose P = (2, 3) as the generator point that generates all the other points. For example, to find 2P = (2,3) + (2,3) = (4,2), we find λ using Eq. 7, as follows:
Which we enter in the table below:
Hasses bound for elliptic curve over Fr: If E is on
an elliptic curve over Fp for a prime p>3 and A is the number
of points on E, then, |A p 1|<
Now that we have explained the basic theory behind elliptic curve we will next look at how this theory motivated researchers to look at the possibility of using it to perform integer factorization or ECM.
Addition and reduction of points on elliptic curves (mod N) useful for ECM:
Let N be an integer and let E be an elliptic curve over
and
(8) |
Let P1, P2 be points on E where P1 + P2 ≠ O and the denominators of P1, P2 are relative prime to N. Then P1 + P2 is on E with coordinates having denominators prime to N if and only if there does not exist a prime p|N such that:
(9) |
On the elliptic curve
(10) |
Where,
The basic idea and motivation of ECM: Recall that in Pollards p 1 method we needed to fix an integer B. Further, recall that if N = p·q with p and q prime and neither p 1 nor q 1 is B-power-smooth number, then the Pollard (p 1) method; is extremely unlikely to work. For example, let B = 20 and suppose that N = 59·107 = 6313. Note that neither 59 1 = 2·29 nor 107 1 = 2·53 is B-power-smooth. With m = lcm (1,2, , 20) = 232792560, we have:
So we get nothing.
As remarked earlier, the problem is that p 1 is not 30-power-smooth
for either p = 59 or p = 107. However, notice that p 2 = 3·19
is 20-power-smooth! If we could somehow replace the group (
With
Lenstras Elliptic Curve Factoring Method (ECM): The Elliptic Curve
Method (ECM) was invented by Lenstra[32,33]. It is suited for finding
small-say 9 to 30 digits-prime factors of large numbers. Among the different
factorization algorithms whose complexity mainly depends on the size of the
factor searched for (trial division, Pollard rho, Pollard p 1, Williams
p + 1), it is asymptotically the best method known. ECM can be viewed as generalization
of Pollards p 1 method, just like ECPP generalizes the N
1 primality test. ECM relies on Hasses theorem: if p is prime, then an
elliptic curve over
The ECM factoring technique is an extension of Pollards (p1) method
obtained by replacing the multiplicative group by the group of points on a random
elliptic curve. To find a non-trivial divisor of an integer N>1, one begins
by selecting an elliptic curve E over
If the above algorithm fails with a specific elliptic curve E, there is an
option that is unavailable with Pollards (P1) method. We may repeat
the above algorithm with a different choice of E. The number of points on E
over
Now suppose that we wish to factor N. Choose an integer B. Next choose a random
point P = (x, y) and a random elliptic curve y2 = x3 +
ax + b over
The mechanics of ECM algorithm: Let n be an odd composite integer. The following is the algorithm for factoring n.
• | In some random fashion, we generate pair (E, P), where, E
is an elliptic curve over |
y2 = x3 + ax + b, a, b∈
|
|
• | Check that gcd (n, 4a2 + 27 b2) = 1. If not, then we have a factor of n, unless gcd (n, 4a2 + 27b2) n, in which case we choose a different pair (E, P). |
• | Choose m ∈ |
for small primes: p1 < p2 < pl ≤ B | |
Where: |
• | Compute: |
• | If the calculation of either (x2 x1)-1
or (2y1)-1 in Eq. 7, for some s|m
in step 4, determines one of them not prime to n, then there is a prime
p|n such that , |
The implementation of Lenstras EC factorization method: Here we use an overly small composite number for the purpose of testing the EC method. Let n = 963. Next choose a family of curves:
(11) |
with the points P = (0, 3) and Q = (8, 23) falling on the curve E. Choose one of the points, say Q = (8, 23). We now choose successive natural number a until the process described above is successful in factoring n. We take B = 3 and since:
Table 1: | Lenstras ECC Factoring method using (E,P) pair (y2 = x3 + x + 9, (8, 23)) to factor n = 963 |
From Hasses theorem, we find:
Since 1985, many improvements have been proposed to ECM Lenstras original algorithm had no second phase[32]. Brent proposes in[34] a birthday paradox second phase and further more technical refinements. In, Montgomery[35] presents different variants of phase two of ECM and Pollard p1 and introduces a parameterization with homogeneous coordinates, which avoids inversions modulo n, with only 6 and 5 modular multiplications per addition and duplication on E, respectively. It is also possible to choose elliptic curves with a group order divisible by 12 or 16[35,36].
ECM has been used to find the factors of Cunningham numbers an ± 1 for a = 2,3,5,6,7,10,11,12). Fermat numbers are very good candidates for a≥10, since they are too large for general purpose factorization methods. Brent[37] completed the factorization of F10 and F11 using ECM in 1988. The largest factor found by the Elliptic Curve method has 66 digits, found by Dodson[38] in April 2005. Brent[37] maintains a list of the ten largest factors found by ECM his extrapolation from previous data would give an ECM record of 70 digits in year 2010, 85 digits in year 2018 and 100 digits in year 2025.
From the point of view of parallel implementations, the big deal with elliptic curve method is that we can use many parallel curves for many different parametric values of a and b. This is naïve parallelism, but it works. There are various ways in which one can determine, for a given N, the right number of curves to use in order to be assured that factors that can be found with method will in fact be found with this method.
PARI Implementaion of ECM: For simplicity, we use an elliptic curve of the form: E: y2 = x3 +ax + 4 with b = 4 and a point P = (0, 2) already on it for any given value of a. The following tiny PARI function implements the ECM. Save it as ECM.gp. It generates an error message along with a usually nontrivial factor of N exactly when the ECM succeeds.
For simplicity we will implement the program on a small integer N. (ECM uses the random function, so the results of your run may differ from the one below.)
Elliptic curve primality test: Here we will extend Lenstras algorithm
and a method for modifying it in order to give a primality test, i.e., to tell
us if a given number is prime or not m ∈ ℕ. Let with gcd (n, 6) and let E be an elliptic
curve over rational field
• | |
• | |
• | If P≠ O is a point on E and |
Using the above theorem and picking randomly chosen points P1, P2, , Pm; m ∈ ℕ on E and calculating for each j = 1, 2, , m. If for some j = 1, 2, ,m, then n is prime. As a final illustration, we again choose an overly simplistic value of n, which we use to illustrate the primality test without excessive calculation.
As an example, let n = 1103 with our earlier elliptic curve pair: (E, P) = (y2 = x3 + x + 9, (5, 310)): One calculates: | (mod n)| = 1084 = 22·271 and
and so both parts 1 and 2 of the theorem are satisfied. If n were to be prime, then we might be able to generate enough points for testing from a primitive element. Thus, we proceed as with the Lenstras method, namely by successive doubling and reduction of P with powers of 2. Notice that:
271 = 28 + 23 + 22
+ 21 + 20 |
so we will first go for 28 and test
Hence, from the primality test above, we prove that n = 1103 is a primenumber as we are unable to split it. Moreover, we have been very lucky that the test worked after a few try, largely due to the small value of N chosen. However, if part 1, of the theorem fails, then we have to test for compositeness by Hasses theorem. Part 2 of the theorem, however, is very special and does not hold for many elliptic curves. Furthermore, although we were able to calculate the value of |
Pollards rho method: Pollard[40] introduced the Monte
Carlo method for factoring. He so named it because it rests on randomly chosen
integers. Today, we refer to Pollards method as the Pollard-rho method.
Given that n ∈
Table 2: | Elliptic curve primality testing method using (E, P) pair (y2 = x3 + x + 9, (5, 310)) to factor n = 1103 (tested using Sect. 7.3.3 part 5: ) |
Table 3: | Pollard rho-method simulation |
From the Table 3, we can observe that when we reach x9, then we are in the period that takes us back and forth between the residue system of 7 and that of 21 modulo 31. This is the significance of the left pointing arrow from the position of x8 back to position of x7, which is the same as the reside system of x9. This completes the circuit. The diagram mapped depict the shape of the symbol of the Greek letter ρ, rho.
Modified pollard rho method: The rho algorithm is based on Floyd's cycle-finding algorithm and the birthday paradox. It is based on the observation that, by the birthday paradox, two numbers x and y are congruent modulo p with probability 0.5 after 1.77 √p numbers have been randomly chosen. If p is a factor of n, the integer we are aiming to factor, then: gcd (|x y|, n) = p, since x y ≡ 0 (mod p) .
The rho algorithm therefore uses a function modulo n as a generator of a pseudo-random sequence. It runs one sequence twice as fast as the other; i.e., for every iteration made by one copy of the sequence, the other copy makes two iterations. Let x be the current state of one sequence and y be the current state of the other. The GCD of |x y| and n is taken at each step. If this GCD ever comes to n, then the algorithm terminates with failure, since this means that x = y and therefore, by Floyd's cycle-finding algorithm, the sequence has cycled and continuing any further would only be repeating previous study.
Note that this algorithm will return failure for all prime n, but it can also fail for composite n. In that case, use a different f (x) and try again.
To illustrate the modified Pollards rho method we follow the following procedure. Givencomposite and p an (as yet unknown) prime divisor of it, perform the following steps:
• | Choose an integral polynomial f with deg (f)≥2, usually f (x) = x2 + 1, is chosen for simplicity. |
• | Choose randomly generated x = x0, the seed and compute x1 = f (x0), x2 = f (x1), ., xj+1 = f (xj) for j = 0,1, ,B, where the bound B is determined by step 3. |
• | Sieve through all differences xi xj modulo n until we find xB xj but xB ≡ xj (mod p) for some natural number B > j ≥ 1. Then gcd (xB = xj, n) is a nontrivial divisor of n. |
As an example we use an overly small value of n = 5561 and x0 = 2 = y0 as the seed with f (x) = x2 + 1 we generate the values of xis and yis as per the table below until gcd (xi yi, n) = d > 1, where d is our factored integer.
We find that gcd (xi yi, n) for i≤ 4 until d = gcd (|x5y5|, n) = gcd (5080 3171, n) which is factor of 5561. In fact n = 5561 = 67·83.
Richard Brent's rho variant: Eldershan and Brent[41] published a faster variant of the rho algorithm. He used the same core ideas as Pollard[40] however, with a different method of cycle detection that was faster than Floyd's original algorithm. In Real practice, the algorithm is very fast for numbers with small factors. For example, on a 733 Mhz workstation, an implementation of the rho algorithm, without any optimizations, found the factor 274177 of the sixth Fermat number in about half a second. The sixth Fermat number is 18446744073709551617 (20 decimal digits). However, for a semiprime of the same size, the same workstation took around 9 seconds to find a factor of 10023859281455311421 (the product of 2 each of 10-digit primes). The rho algorithms most remarkable success has been the factorization of the eighth Fermat number by Brent and Pollard[42]. They used Brent's variant of the algorithm, which found a previously unknown prime factor. The complete factorization of F8 took, in total, 2 h on a Univac 1100/42.
THE QUADRATIC SIEVE FACTORING ALGORITHM
Background: Pollards two methods discussed above may be invoked when trial division fails to be useful. However, if the methods of Pollard fail to be useful, which they will for large prime factors, say with the number of digits in the high tens, then we need more powerful number cruncher algorithm e.g., the quadratic sieving technique[13]. The Quadratic Sieve (QS) factoring algorithm was the most effective general purpose factoring algorithm of the 1980s and early 1990s. This algorithm is an improvement over the Fermats method and Dixons random square method for performing factorization. It is the method of choice for factoring integers between 50 and 100 digits. Unlike the Integer Factorization Problem (IFP) and the likes in which the running time depends mainly on the size of p and q (the factors of n = (p·q)), the running time of Quadratic Sieve (QS) and its advance counterpart Number Field Sieve (NSF) depends mainly on the size of n.
The basic idea behind QS algorithm is to find solutions where: x2 ≡ y2(mod n). This would imply (x + y)(x y) ≡ 0 (mod n). By calculating gcd((x + y), n) and gcd((x y), n), it may be possible to find a non-trivial divisor of n. In particular, when n is the product of two prime numbers p and q, the probability of finding a non-trivial factor is 2/3.
Recall that the Kraitchik[13] method searches for combinations to produce a square by incrementing from the ceiling of the square root until an interesting combination is found. The quadratic sieve aims at finding a more efficient method of searching for these combinations by using a modified version of the Sieve of Eratosthenes. The Sieve of Eratosthenes searches for new primes by crossing off all multiples of primes below the square root of the upper bound of the search. Instead of crossing off the numbers, the quadratic sieve uses the search to find numbers, which fall in a range of primes. The quadratic sieve would search for results, which were completely factorable by all primes underneath a certain value (a combination with primes under a certain value is called B-Smooth, where y is the largest prime) by using division, instead of simply crossing off the values. However, it must be noted, that the larger the number being factored, the harder it becomes to find values under a low y-smooth range.
The mechanics of quadratic sieve algorithm: Quadratic Sieving (QS) technique is a process whereby we search numbers up to a prescribed bound and eliminate candidates along the way, leaving only the solution. Recall Fermats method, which essentially had as an end-goal to solve x2 ≡ y2(mod n) in an attempt to find a nontrivial factor of n. The QS method similarly has this same end-goal (which is essentially factoring using congruences) was introduced by Pomerance[13]. To describe it, we begin as usual with a composite n ∈
• | Choose a factor base |
• | Find all values of xi = j + |
• | If we find t such congruences as in part (2) with , then x2 ≡ y2(mod n) and gcd(x ± y,n) provides a nontrivial factor of n if x ± y(mod n). |
By virtue of part (2) of the algorithm, it makes no sense to have any odd primes
p ∈
Table 4: | Computation of Q(xj) and extracton of primes in the factor base |
Note: X implies not fully factored over T |
As an example and again using an overly small value composite number; let n = 45313. Notice that the optimal number of primes in our factor base should be twelve, by part (1) of the above algorithm. To find a factor base consider the values of (n/p), which we compute using the Legendre symbol:
Therefore, our selected factor base:
Now we use the results from Table 4 to speed up the factorization
of Q(x) = x2 n. Below we list the values of x from 200 to
225 (these values are chosen to be roughly centered from about 212, the square
root of n, i.e.,
Since most of the entries in columns 5-14 in Table 4 (~75 % to be precise) are empty, we have saved a lot of work in implementing the sieving method of building the table to tell us which primes we need to consider (tick marked) plus QS technique requires us to use only the rows that are completely split and cross out those that are but their prime factors are outside our factor base (rows marked with X in the last column). This effectively leads to a great saving. (For larger numbers, the savings are even greater - much greater!)
Now we can start to factor Q(x) = x2 n, for each value of x, using Table 4 to tell us which primes will factor Q(x). Scanning Table 4 we can see that we have a possible solution for j = 14, j = 20 and j = 23
Hence, we have: 2142 ≡ 3·7·23(mod n), 2202 ≡ 32·73(mod n) and 2232≡ 26·3·23(mod n) so (214·220·223)2 ≡ (23·32·72·23)2(mod n), namely 315372 ≡ 358312(mod 45313) and gcd(31537 35831, 45313) = 113, or gcd(31537 + 35831, 45313) = 401. In fact: n = 45313 = 113·401.
Sometimes we get lucky in our use of the quadratic sieve since we may find, in the sieving process, that a single xj2 n is itself a square. In other words, the value t = 1 in part (3) of the above algorithm. In this particular example we were not that lucky!
Now lets implememt a more effective and robust binary technique which gives a definitive solution and avoids the trial lookout performed above, which can be very difficulty if our composite number is large!.
Table 5: | Implementation of an effective and robust binary technique for speeding up QS |
To achieve this, we transfer our data for the values for which, Q(x) = x2
n, splits completely over
The matrix A extracted from Table 5, columns 5-14 is:
(12) |
Equation 12 has the form, AT
In this example we have solved (the matrix A, is transposed to AT,
as we solve AT
Which we shall identify as follows:
Here
Where, we read the values of xjs corresponding to values of ris as listed by rows in the last column of (Table 5), which gives:
This yields the gcds:
gcd(x y, n) = 113 and gcd(x y, n) =
401 |
Which gives a factorization n=45313=113·401. This is the same value as we had found earlier via a primitive look-up method.
Traditionally, one solves the system AT
The term quadratic sieve comes from the above process of sieving over all quadratic
congruences x2j ≡ n(mod p) for p ∈
Complexity of the quadratic sieve algorithm
• | Conjectured time for optimal choice of B: exp((c + o(1))·(log
n)1/2·(log log n)1/2) with c = 1, compared
to for c = |
• | Variant-self initializing multiple polynomial quadratic sieve. |
• | Subexponential time, subexponential space but can practically factor integers up to ~400 bits. |
• | Can we decrease the (log n)1/2 term in the exponent? |
On April 2, 1994, the factorization of RSA-129 was completed using QS. It was a 129-digit number, the product of two large primes, one of 64 digits and the other of 65[8]. The factor base for this factorization contained 524339 primes. The data collection phase took 5000 mips-years, done in distributed fashion over the Internet. The data collected totaled 2GB. The data processing phase took 45 h on Bellcore's MasPar (massively parallel) supercomputer. This was the largest published factorization by a general-purpose algorithm, until the arrival of NFS which is currently the champion factoring algorithm.
THE MULTIPLE POLYNOMIAL QUADRATIC SIEVE (MPQS)
The Quadratic Sieve (QS) is much efficient algorithm, however, for large n
it suffers from limitations. To generate enough relations, the length of the
sieve interval M must be large. But the value of Q(x) grows with M, making it
less likely that well find values of x which Q(x) factor fully over the
factor base
As the name suggests, the MPQS uses several polynomials instead of just a single polynomial Q(x) as in QS and although discovered independently by several researchers, it was first suggested by Peter Montgomery. But it is also important to note that MPQS goes back to Kraitchik technique[8]; however, Pomerance[12] was the first to describe and analyze it in its modern form. Davis and Holdridge[46] and independently, Montgomery[45] proposed the use of the multiple polynomials in the quadratic sieve algorithm. The new Multiple Polynomial Quadratic Sieve (MPQS) is significantly faster than the single polynomial version but requires expensive multi-precision and modular inverse operations for each new polynomial. The algorithm spends much time calculating the new zeros for each polynomial when compared to the sieving time.
In the integer factoring world, the MPQS method is usually considered as a complementary to ECM since the computing time of MPQS depends on the size of the smallest prime factor of the number to be factorized. At the present, numbers with smallest prime divisor up to 30 decimal digits may be best factorized with the help of ECM, whereas numbers with smallest divisor more than 30 digits may be best factorized with the help of MPQS, provided that the size of the number to be factorized does not exceed 90 decimal digits. In real practice, however, we usually do not have such knowledge about the size of the prime divisors we are seeking. In best practice, it is always advisable to try ECM before MPQS, in order to eliminate the smaller prime divisors first.
The mechanics of MPQS-algorithm: Let n be the (large) number, which is known by Fermats little theorem and which we wish to factorize. Further recall that the Quadratic Sieve (QS) method belongs to a class of algorithm, which have common aim to find two integers X and Y, such that:
(13) |
If gcd(XY, n) = d satisfies 1<d<n, then d is, a proper divisor of n. In order to find such an (X,Y)-pair, one may try to find triples (Ui, ai, Hi), where, i = 1,2, such that:
(14) |
Where, H(xi) = ai·f(xi) is easy to factor, or at least easier than n. If sufficiently many congruences have been found these can be combined, by multiplying together a subset of them, in order to get a relation of the form of Eq. 13, i.e.:
(15) |
In this study we will implement the version suggested by Montgomery. Multiple
polynomials keeps the value of Q(x) smaller and therefore more likely to be
smooth over
(16) |
The Montgomery option starts with a general polynomial of the form:
(17) |
Where, a, b and c are chosen according to certain guidelines below. The motivation for this is that by using several polynomials, we can make the sieving interval much smaller, which makes f(x) smaller, which in turn will mean that a greater proportion of values of f(x) completely factor over the factor base.
In choosing our coefficients, let a be a square. Then choose 0 ≤ b < a so that b2 ≡ n(mod a). This can only be true if n is a square mod p for every prime p|a. So we wish to choose a with a known factorization such that (n/p) = 1 for every p|a.
In order for f(x) to generate quadratic residues, the determinant b2 4ac must equal n. Since the determinant will be equal to 0 or 1 (mod 4) and n will be 1 or 3 (mod 4), if n≡3 (mod 4) we must multiply it by a small constant k (called the multiplier) in order to make the product kn equal to 1(mod 4) so we factor kn instead of n.
Lastly, we choose c so that b2 4ac = n. When we find a f(x) that factors well, notice that:
(18) |
Where:
f(x) = ax2 + 2bx + c, H(x) = a ·
f (x) and |
(19) |
So that:
(20) |
Also as before, if a prime p divides, a·f(x), then n is a quadratic residue modulo p, so that we do not need to change our factor base.
The number to be factored hits its minimum at x = b/a. We want to choose a, b and c so as to minimize both: Q(b/a) = n/a and the extreme values at the edge of the sieved interval:
Where, M is half of the sieving interval. If M is prescribed, this is accomplished
by setting these values equal, i.e., at
If a is chosen to be a prime, then we know how to solve the congruence, x2 ≡ n(mod a). Finally, we choose b to be a solution of this congruence and c to be: c = (b2 n)/a.
The MPQS described above has a number of nice features. For example, if we do not get enough completely factored Q(x)s in the chosen short sieving range then we just generate a new polynomial and sieve again over our shorter interval. However, one of the nicest features is that the sieving parallelizes perfectly. With N processors, one can assign a different polynomial to each processor and the algorithm runs N times as fast. In Lenstra and Manasse[15] used this procedure with a dramatic effect to produce the first factorization of a hard 100-digit integer into a product of two primes and accomplished the factorization by farming out their polynomials to roughly 400 computers around the world.
A Simple implementation of MPQS: Here we present an overly simple implementation of MPQS. Note that in real practice we are talking of integers in the range of 50-100 digits. It has been shown that if the number of decimal digits to be factorized is increased by three, then the amount of CPU-time needed is roughly doubled.
Let n = 45313 giving n≡1(mod 4) and k = 1 and lets pick a = 7 from our earlier factor base T, so that:
b = msqrt(n,a) = msqrt(45313,7) = 3 c = (b2n)/a = 6472 45 ≤ M ≤ 43 |
Hence, from Eq. (19):
f(x) = 7x2 = 6x 6472 H(x) = 7·f(x)
U(x) = 7x + 3 |
This can be combined with values of Q(xj) from QS algorithm to produce a product which allows for complete factorization of n. One such product which we can derive from is:
Q(200) = 1·3·7·11·23
Q(207) = 1·25·7·11 |
and
f(38) = 23·3·7·23 H(38) = 7·f(38) = 23·3·72·23 U(38) = 269 giving:
Q(200)Q(207)U(38)≡(200·207·269)2 |
Which we can write as follows:
x | ≡ | 200·207·269 ≡ 34915 mod n and |
y | ≡ | 24·3·72·11·23 ≡ 5987 mod n |
This yields the gcds as:
gcd(x y, n) = 113 and gcd(x + y, n) = 401 |
Which give a factorization of n = 45313 = 113·401 which is the same as found with QS.
The above solutions is found using the values read from the Appendix 1. Read tvals as: M = 45 or Q(45) to M = 43 or Q(43). These values gives the sieving range, 45 ≤ M ≤ 43, which can be merged with other data from Table 4 from the QS scheme to produce squares on both sides of Eq. 13.
MPQS large prime variations technique: The sieving procedure in QS looks for values of x such that f(x) is smooth with respect to the factor base . The algorithm is easily modified to also find values of x for which f(x) is a smooth number times a prime not much larger than the factor base bound, by adjusting the threshold used when inspecting logarithms after sieving. The extra prime in the factorization of f(x) is called a large prime. If one finds two values of x for which f(x) has the same large prime, then the corresponding congruences can be multiplied together and treated as a pair for the rest of the algorithm.
The solution is to allow separate large prime factors. The chances that another Q(xj) has the same large prime factor are good when an even number of such Q(xj) are combined, the large factor appears with even exponent and does not need to be considered in the matrix.
This procedure, called the large prime variation, is compatible with the use of multiple polynomials described above. For example, both Q(218) = 3·11·67 and f(2) = 1·25·3·67 have 67 as the only prime exceeding 41. After doing linear algebra phase, we decide to combine these with two entries from Table 4 from QS method. One such product which we can derive from are:
Q(211) = 1·23·32·11
Q(218) = 3·11·67 Q(220) = 32·73 |
and
f(2) = 1·25·3·67 H(2) = 7·f(2) = 1·25·3·7·67 U(2) = 17 is:
Q(211)Q(218)Q(220)U(2) ≡ (211·218·220·17)2 ≡ (1·23·32·11)(3·11·67)(32·73)(1·25·3·67) ≡ 28·36·74·112·672 ≡ (24·33·72·11·67)2 |
Which we can write as follows:
x | ≡ | 17·211·218·220 ≡ 24372 mod n and |
y | ≡ | 24·33·72·11·67 ≡ 13144 mod n |
This yields the gcds as:
gcd(x y, n) = 401 and gcd(x + y, n) = 113 |
Which give a factorization of n = 45313 = 113·401, which is the same as found with QS.
INDEX CALCULUS METHOD
The integer factorization problem has been the subject of intense research, especially in the years since the invention of RSA[4]. Recall that the most basic attack on RSA consists of factoring the modulus N = pq. Let N be an n-bit integer. Most of the subexponential-time algorithms-those that take fewer than -steps with c < 1-are of index calculus type. We now describe index calculus algorithm to factor N.
The method is based on the elementary observations that if x2 ≡ y2 (mod N), then N = pq|(x + y)(x y) and so p and q each must divide either x + y or x y. If x and y were formed independently of one another then one expects that 50% of the time the two primes will divide different factors, say p|x + y, q|(x y. In that case and as before, we can factor N by using the Euclidean algorithm to compute gcd(x + y, N) = p.
We start the index calculus factoring algorithm by choosing a factor base T consisting of all primes less than some bound B along with the number 1, i.e., T = {p0, p1, ,pr}, where p0 = 1, p1 = 2, p3 = 3. We next choose positive integers a < N (either randomly or according to some convenient criteria) and compute the least absolute residue of a2. If this residue cannot be written as a product of numbers in our factor base, we choose another value of a. We finally arrive at a system of mod N relations of the form:
Where, i = 1,2,..,s. We try the product :
Where, vi ∈ {0,1} in such a way that we get a perfect square on the right. In other words, we need each prime on the right to occur to an even power; i.e., Σiviai,j must be even for each j = 0,1, ,r. This amounts to solving a system of r + 1 simultaneously equations in s unknowns over the field F2 = {0,1}. Once we have such a product, we can set:
Then x2 ≡ y2 (mod N) and there is a 50% chance that we can immediately factor N, we find another solution to the simultaneous equations over F2 and try again.
As an example, again using an overly simple composite number, let N = 403 = (13·31) and choose T = {1, 2, 3, 5, 7, 11, 13}. After squaring some 2-digit numbers, we find that we can take ai, 1 ≤ i ≤ 7 which we use to complete the table below:
As with the case with Quadratic Sieve (QS), we extract the exponent information
from
and then solve the equation: AT·v = 0(mod 2), where v is the exponent vector, which gives three set of possible solutions as:
v = {[0,1,0,0,0,0,0], [0,0,0,0,0,1,0], [1,0,0,0,1,0,0]} |
Which we shall identify as follows: v = {[vs1], [vs2], [vs3]}. Here vs2 and vs3 give no factors of N = 403, while vs1 is able to factor our composite number N. For example, vs1 = [0,1,0,0,0,0,0], gives, x ≡ 22(mod 403) = 22 and y ≡ 32(mod 403) = 9. These yield the gcds, gcd(x y, n) = 13 and gcd(x + y, n) = 31, which gives a factorization N = 403 = 13·31.
It can be shown that the time required to factor an n-bit integer by the above
index calculus factorization method is of the order
However, ideas of Pollard[16] led to a major breakthrough in factorization,
called the Number Field Sieve (NFS). By carrying over index calculus to algebraic
filed, it was possible to factor an arbitrary n-bit integer in time bounded
by
THE GENERAL NUMBER FIELD SIEVE (GNFS)
The general number field sieve (GNFS) algorithm is the fastest known method for factoring large number of between 100 and 248 digits. GNFS is an improvement on quadratic sieve[47-51]. The algorithm uses idea from diverse field of mathematics such as algebraic number theory, graph theory, finite fields and linear algebra. One of the improvements is that the polynomial being used is not only limited to quadratic, but may be cubic or even higher degree polynomials. Implementation of the number field sieve can be written such that the sieving process can take place on several computers. The results of the sieving processare then combined at a later stage.
In GNFS, as compared to QS or its variant, we have two polynomials f and g that are related modulo n, but now the relation is more subtle: f and g both have known m modulo n:
f(m) ≡ g(m) ≡ 0(mod n) |
Also, suppose f and g are monic and irreducible. Let α be a complex root
of f. Consider the ring
q(α) = q0 + q1α +
q2α2 +
+ q + qdαdeg
f1 |
Operations in this ring are done modulo f(α), simply because f(α)
= 0. Similarly, define ,
Consider the ring homomorphism φ:
Suppose we found a setoff integer pairs
Then mod n gives:
Let F(a,b) = bdeg f f(a/b). It turns out that:
Implies:
Moreover, the converse almost holds. Therefore, we can work as before: find (a,b) pairs such that: F(a,b) is B-smooth, compute their exponent vectors and find dependencies. To find pairs, we fix values of b and sieve over a. We do the same for g and Z[β] and find S that satisfies both conditions. Mips years required to factor a number with the GNFS:
Complexity of the Number Field Sieve (NFS): The best known classical
method for factoring numbers is currently the Number Field Sieve (NFS)[47].
It has an asymptotic run time of approximately exp(c (ln·n)1/3(ln
n·ln n)2/3), reducing the exponent over the continued fraction
factorization algorithm and quadratic sieve. There are three values of c relevant
to different flavors of the method[36]. For the special case of the
algorithm applied to numbers near a large power, c = (32/9)1/3 =
1.526285 for the general case applicable to any odd positive number, which is
not a power, c = (64/9)1/3 = 1.922999 and for a version using many
polynomials (Coppersmith[43]),
• | The point: wisely choose f and g so that there values near 0 are small and therefore likely to be smooth. |
• | Specifically, we choose f and g of degree: d ≈ (log n/log log n)1/3 and the F(a,b) and G(a,b) values we test for smoothness have size roughly n2/d (versus n1/2 for QS). |
• | Conjectured time for optimal parameter choice: |
• | Successfully factored 512-bit and 524-bit composites, at considerable effort. |
• | Appears scalable to 1024-bit composites using custom-built hardware, thereby threatening the security reliability of RSA 1024 public key cryptosystems, which rely on the hardness factoring composite integer of that order. |
A recent research trend has been to design special-purpose hardware on which factoring algorithm such as the number field sieve might be faster or more cost-effective than on conventional general-purpose computers. Among the noteworthy proposals are Shamirs TWINKLE machine[52], Bernstein s circuits[53] and the TWIRL machine of Shamir and Tromer[54]. Shamir and Tromer[53] estimate that the relation-generation stage of the number field sieve for factoring a 1024-bit RSA modulus can be completed in less than a year by a machine that would cost about US$ 10 million to build and that the linear algebra stage is easier. Such special-purpose hardware has yet to be built (unless it has been built in secret), so it remains to be seen if this work will have any impact on the size of RSA moduli used in practice.
CONCLUSIONS
In this review paper attempt has been made to present in a simple manner a very difficult topic of integer factorization that is the connerstone of the security behind RSA cryptosystems.As a review paper most of the work presented here have been sourced from variant research works that came to our notice. We hope it will be of good use to the future budding cryptanalyst and number theorists. It has been pointed out that the security of RSA depends on the difficulty of factorizing n into its constituent primes p and q. With the increase in the speed of computers today and the improvement of factorization methods, it becomes increasingly feasible to factorize large numbers. From this perspective, choosing a longer key is essential to safeguard against attacks based on factorization. However, choosing a longer key may incur higher overhead in performing encryption and decryption to secure your data. The question for the crypto designer to ponder, is given a certain amount of budget and time, what should be the length of the key that is reasonably secure without causing unnecessary performance degradation?