Abstract: This study presents a novel modular multiplication algorithm based on a new multiplier Non-Adjacent Form (NAF). In this new multiplier representation, the sliding window method is applied on the canonical recoded multiplier to reduce the average number of multiplier digits and thereby the average number of the required clock cycle for the modular multiplication algorithm. Moreover, a new and efficient modular exponentiation algorithm is presented based on this new modular multiplication algorithm, Common-Multiplicand-Multiplication (CMM) method and parallel structure. In this new CMM-NAF modular exponentiation algorithm, not only the common part of the modular multiplication is computed once rather than several times but also the modular multiplication and modular squaring operations are performed in parallel. Our analysis shows that the average number of required clock cycle in the proposed CMM-NAF modular exponentiation algorithm is reduced in comparison with other modular exponentiation algorithms considerably.
INTRODUCTION
Recently, cryptography algorithms and protocols have been used to provide the information security (Wang, 2011; Sharma et al., 2006; Zaidan et al., 2010; Jaberi et al., 2012; Nikooghadam et al., 2008). Public-Key Cryptography (PKC) is an important component of the cryptography (Olorunfemi et al., 2007; Xia et al., 2010; Azeez et al., 2012; Salem et al., 2011). The modular exponentiation with large modulus is a crucial operation in many PKCs such as RSA and DSA (Rabah, 2006). This operation is implemented by repeating modular multiplication. So, the efficiency of many PKCs is determined by the efficiency of the modular multiplication algorithm and the number of required modular multiplication in the modular exponentiation algorithm (Nedjah and Murelle, 2009a; Wu, 2009a).
Montgomery modular multiplication algorithm (Montgomery, 1985) is an efficient algorithm for the modular multiplication because it avoids division by the modulus (Wu, 2009a; Nedjah and Murelle, 2009a; Xiangyan et al., 2009).
There are many research efforts to speed up the performance of the Montgomery modular multiplication algorithm such as high-radix design (Pinckney and Harris, 2008; Tawalbeh et al., 2005), scalable design (Shieh and Lin, 2010; Pinckney and Harris, 2008), parallel calculation quotient and partial result (Keshavarzi and Harrison, 1998) and signed-digit recoding (Koc and Hung, 1992; Philips and Burgess, 2004).
Moreover, there are many research efforts to reduce the number of modular multiplication such as sliding window method (Nedjah and Murelle, 2009a, b), signed-digit recoding (Wu, 2009a; Wu et al., 2008) and Common-Multiplicand-Multiplication (CMM) method (Ha and Moon, 1998; Wu, 2009a, b; Rezai and Keshavarzi, 2011).
This study presents a new modular multiplication algorithm based on the Non-Adjacent Form (NAF) and sliding window method. In this new algorithm, sliding window method is applied on the canonical recoded multiplier to reduce the average number of the multiplier digits and thereby, the average number of the required clock cycle (multiplication step) for the modular multiplication algorithm. Moreover, this study presents a novel modular exponentiation algorithm, called CMM-NAF modular exponentiation algorithm, by using this new modular multiplication, the CMM method and the parallel structure. Present analysis shows that the average number of required clock cycle in the CMM-NAF exponentiation algorithm is reduced in comparison with other CMM modular exponentiation algorithms considerably.
PRELIMINARIES
The Montgomery modular multiplication algorithm: Montgomery (1985) proposed a modular multiplication algorithm which speeds up the modular multiplication and modular exponentiation algorithm. The Montgomery modular multiplication algorithm replaces the trial division by the modulus with a simple right shift. Algorithm 1 shows the radix-2 Montgomery modular multiplication algorithm.
Algorithm 1: | The radix-2 Montgomery modular multiplication algorithm(Mont(X,Y)) |
In this algorithm, the inputs are n-bit X, Y and M. The output is S (n) = XY2-n mod M. This algorithm computes S(n) = XY2-n mod M in n-clock cycle. So, it is a time-consuming operation.
The modular exponentiation algorithm: As modular exponentiation consists of series of the modular multiplications, the performance of the modular multiplication is determined by the efficiency of the implementation of the modular multiplication (Wu, 2009a; Nedjah and Murelle, 2009a). The Right-to-Left (R2L) Montgomery modular exponentiation algorithm is shown in Algorithm 2:
Algorithm 2: | The R2L Montgomery modular exponentiation algorithm |
In this algorithm, the inputs are A, E, R = 2n and N. The output is C = AE mod N. In Algorithm 2, when the exponent bit is nonzero, both Mont (S, C) and Mont (S, S), are performed. Ha and Moon (1998) proposed the common part in Mont (S, C) and Mont (S, S) can be computed once rather than twice. There are many attempts (Ha and Moon, 1998; Wu et al., 2008; Wu, 2009a, b) to speed up the performance of the modular exponentiation algorithm based on this idea. One of the recent attempts is the parallel CMM-CSD Montgomery algorithm (Wu, 2009b) which is shown in Algorithm 3.
Algorithm 3: | The parallel CMM-CSD Montgomery modular exponentiation algorithm |
In this algorithm, the inputs are M, ECSD, N and R = 2n
where, ECSD is canonical recoded E. the outputs are
THE PROPOSED CMM-NAF MODULAR EXPONENTIATION
In serial-parallel multiplication, partial result shifts one bit per iteration. Moreover, multiplication by zero bit results in zero but this multiplication by zero is performed and implemented per iteration. In this study, we proposed a new modular multiplication by recoding and then by partitioning the multiplier. This new modular multiplication performs multiplication by zero partition with any length in only one clock cycle instead of several clock cycles. The proposed modular multiplication algorithm is shown in Algorithm 4.
Algorithm 4: | The proposed modular multiplication algorithm |
In this algorithm, the inputs are n-bit integer X, Y and M. The output is P = XY2-n mod M. m is the number of partitions in the multiplier, li is the length (i.e., the number of digits) of ith partition and Vi is the ith partition value.
In the recoding phase of this new algorithm, the canonical recoding is performed on the multiplier. In the partitioning phase, the partitioning is performed on the resulted signed-digit multiplier. The partitioning strategy instrumented in this algorithm scans the multiplier from the least significant digit to the most significant digit according to Algorithm 5. In this strategy, the zero windows are allowed to have an arbitrary length but the maximum length of the nonzero windows should be the exacted value of d digit.
Algorithm 5: | The partitioning algorithm |
In the pre-computation phase of Algorithm 4, the least significant
digit of the nonzero partition is either 1 or
The multiplication phase of Algorithm 4 is performed m times. Recall that, m denotes the number of partitioning in the signed-digit multiplier. In each iteration of the multiplication phase of algorithm 5, li bits of the multiplier and n-bit multiplicand are processed.
We also proposed using this new modular multiplication algorithm in order to speeding up the CMM-CSD Montgomery exponentiation algorithm (Wu, 2009b) as shown in Algorithm 6.
Algorithm 6: | The proposed CMM-NAF modular exponentiation algorithm |
In this algorithm, the inputs are M, ECSD, N and R = 2n
where, ECSD is canonical recoded E. the outputs are C = ME[1]
mod N; D = ME[-1] mod N. In this algorithm, for ei = 1,
both Z.C mod N and Z.S mod N are computed in parallel. Similarly, for ei
=
In step 1 of this algorithm, S is computed by using Algorithm
4. In step 2, Z is computed by executing steps 2-5 of Algorithm
4 on S after computing m0. R. In steps 3 and 4 of Algorithm
6,
EVALUATION
In the proposed modified modular multiplication algorithm, sliding window method is performed on the canonical recoded multiplier. So, the Hamming weight of the multiplier is 3n/3d+4. Therefore, based on the computational analyses of Ha and Moon (1998), for n-bit modulus, both operations Z.C mod N, and Z.D mod N require:
Multiplication steps (clock cycles). Moreover, the operation Z.S mod N requires multiplication steps:
In the proposed CMM-NAF modular exponentiation for ei = 1, both
Z.C mod N and Z.S mod N are performed in parallel. Similarly, for ei
=
However, the Dusse and Kaliskis exponentiation algorithm (Dusse and Kaliski, 1990), the Ha-Moons improved Montgomery exponentiation algorithm (Ha and Moon, 1998), the CMM-MSD algorithm (Wu et al., 2008), the Wus improved CMM-MSD exponentiation algorithm (Wu, 2009a) and the Wus parallel CMM-CSD exponentiation algorithm (Wu, 2009b) require 1.5k (2n2+n), 0.5k(5n2+4n), 0.5k(2n2+2n+0.75), 1.833k(n2-n-2) and 0.5k(2n2+2n+5.33) multiplication steps, respectively. So, the proposed modular exponentiation algorithm reduces the required multiplication steps (clock cycles) in comparison with other similar work as follow:
Figure 1 summarizes these multiplication steps (clock cycles) improvement for the proposed CMM-CSD modular exponentiation algorithm in comparison with exponentiation algorithm by Dusse and Kaliski (1990), Ha and Moon (1998), Wu (2009a,b) and Wu et al. (2008) for various window widths.
The results show that this new CMM-NAF exponentiation algorithm reduces the average number of required clock cycles by about 69.2-88.2, 63.1-85.9, 7.7-64.7, 49.6-80.7 and 7.7-64.7% in comparison with Dusse and Kaliski (1990), Ha and Moon (1998), Wu et al. (2008) and Wu (2009a, b), respectively for d = 3-10. In addition, the algorithm complexity of the proposed CMM-NAF modular exponentiation algorithm is reduced in comparison with Wu (2009a) and Rezai and Keshavarzi (2011).
Fig. 1: | The clok cycle (multiplication step) improvement of the proposed CMM-NAF modular exponentaition algorithm |
CONCLUSIONS
This study presents a novel modular exponentiation algorithm based on a novel modular multiplication, called CMM-NAF modular exponentiation algorithm. The CMM-NAF modular exponentiation uses other techniques such as CMM method and parallel processing. In the proposed modular multiplication algorithm, the sliding window method is performed on the signed-digit multiplier to reduce the average number of multiplier digits and thereby, to reduce the average number of the required clock cycles for the modular multiplication algorithm. Using the CMM method in the CMM-NAF modular exponentiation algorithm, the common part of the modular multiplication is computed once rather than several times. Moreover, by using the parallel processing, the speed of the proposed modular exponentiation increases considerably. The results show that the average number of required clock cycles (multiplication steps) in the proposed CMM-NAF exponentiation algorithm is reduced by 69.2-88.2, 63.1-85.9, 7.7-64.7, 49.6-80.7 and 7.7-64.7% in comparison with Dusse and Kaliski (1990), Ha and Moon (1998), Wu et al. (2008) and Wu (2009a, b), respectively for d = 3-10.