Hardware Implementations of GF (2m) Arithmetic Using Normal Basis

Journal of Applied Sciences

Year: 2006 | Volume: 6 | Issue: 6 | Page No.: 1362-1372
DOI: 10.3923/jas.2006.1362.1372

Hardware Implementations of GF (2^m) Arithmetic Using Normal Basis

Turki F. Al-Somani and Alaaeldin Amin

Abstract: This study presents a survey of algorithms used in field arithmetic over GF (2^m) using normal basis and their hardware implementations. These include the following arithmetic field operations: addition, squaring, multiplication and inversion. This study shows that the type II Sunar-Koc multiplier is the best multiplier with a hardware complexity of m² AND gates + XOR gates and a time complexity of T_A+ (1+ l log₂ (m) l )T_x. The study also show that the Itoh-Tsujii inversion algorithm was the best inverter and it requires almost log₂(m-1) multiplications.

Fulltext PDF Fulltext HTML

How to cite this article

Turki F. Al-Somani and Alaaeldin Amin, 2006. Hardware Implementations of GF (2^m) Arithmetic Using Normal Basis. Journal of Applied Sciences, 6: 1362-1372.

Keywords: inversion, Normal Basis, multiplication and GF (2m) arithmetic

INTRODUCTION

Efficient computations in finite fields and their architectures are important in many applications, including coding theory, computer algebra systems and public-key cryptosystems (e.g., elliptic curve cryptosystems (ECC) (Koblitz, 1987). Although all finite fields of the same cardinality are isomorphic, their arithmetic efficiency depends greatly on the choice of the basis use for field element representation. The most commonly used basis are polynomial basis (PB) and normal basis (NB) (Menezes, 1993; Menezes et al., 1993). Normal basis is more suitable for hardware implementations than polynomial basis since operations in normal basis representation are mainly comprised of rotation, shifting and exclusive-ORing which can be efficiently implemented in hardware (Lidl and Niederreiter, 1994). This study surveys GF (2^m) field arithmetic operations and algorithms.

A normal basis in GF (2^m) is a basis of the form where β ∈ GF (2^m). In normal basis, an element A ∈ GF (2^m) can be uniquely represented in the form where, α_i ∈ {0,1}. Arithmetic operations using normal basis in GF (2^m) are performed as follows:

Addition: Addition is performed by a simple bit-wise exclusive-OR (XOR) operation.

Squaring: Squaring is simply a rotate left operation. Thus, if , then .

Multiplication: ∀ A, B ∈ GF (2^m), where

and

The product C = A * B, is given by:

Multiplication is defined in terms of a set of m multiplication matrices λ^(k) (k = 0, 1,…, m-1).

where,

The number of non-zero elements in the λ matrix defines the complexity of the multiplication process and accordingly the complexity of its hardware implementation. This value is denoted as C_N and is equal to 2m-1 for optimal normal basis (ONB) (Mullin et al., 1989). An optimal normal basis is one with the minimum possible number of non-zero elements in the λ_ij matrix. Such basis typically lead to efficient hardware implementations since operations are mainly comprised of rotation, shifting and exclusive-OR operations.

Derivation of values of the λ matrix elements is dependent on the filed size m. There are two types of optimal normal basis that are refereed to as Type I and Type II (Mullin et al., 1989). An ONB of Type I exists a given field GF (2^m) if:

•	m + 1is a prime
•	2 is a primitive in GF (m + 1)

On the other hand, a Type II optimal normal basis exists in GF (2^m) if:

•	2m + 1 is prime
•	either 2 is a primitive in GF (2m + 1) or 2m + 1 ≡ 3 (mod 4) and 2 generates the quadratic residues in GF (2m + 1)

An ONB exists in GF (2^m) for 23% of all possible values of m (Mullin et al., 1989). The λ^(k) matrix can be constructed by a k-fold cyclic shift to λ⁽⁰⁾ as follows:

for all 0 ≤ i, j, k ≤ m-1

The λ⁽⁰⁾ matrix is derived differently for the two types of ONB. For the Type I ONB, λ_ij⁽⁰⁾=1 iff i and j satisfy one of the following two congruencies (Rosing, 1999):

•	2ⁱ + 2 ≡ 1 mod (m + 1)
•	2ⁱ + 2^j ≡ 0 mod (m + 1)

For Type II ONB, λ_ij^(k)=1 iff i and j satisfy one of the following four congruencies (Rosing, 1999):

•	2ⁱ + 2^j ≡ 2^k mod (2m + 1)
•	2ⁱ + 2^j ≡ -2^k mod (2m + 1)
•	2ⁱ - 2^j ≡ 2^k mod (2m + 1)
•	2ⁱ - 2^j ≡ -2^k mod (2m + 1)

Therefore, λ_ij⁽⁰⁾=1 iff i and j satisfy one of the following four congruencies:

2ⁱ ± 2^j ≡ ±1 mod (2m + 1)

Example 1: The λ matrix of Type I ONB over GF (2⁴) is constructed as:

Example 2: The λ matrix of Type II ONB over GF (2⁵) is constructed as:

Inversion: Inverse of α ∈ GF (2^m), denoted as a^-1, is defined as follows.

aa^-1 ≡ 1 mod 2^m

Most inversion algorithms are derived from Fermat’s Little Theorem, where

for all α ≠ 0 in GF (2^m).

MULTIPLICATION

Multiplication is more complicated than addition and squaring operations in finite field arithmetic. An efficient multiplier is highly needed and is the key for efficient finite field computations. Finite filed multipliers using normal basis can be classified into two main categories: (1) λ-matrix based multipliers and (2) Conversion based multipliers.

λ-Matrix Based Multipliers: Massey and Omura (1986) proposed an efficient normal basis bit-serial multiplier over GF (2^m). The Massey-Omura multiplier requires only two m-bit cyclic shift registers and combinational logic. The combinational logic consists of a set of AND and XOR logic gates. The first implementation of the Massey-Omura multiplier was reported by Wang et al. (1985). The space complexity of the Massey-Omura multiplier is (2m - 1) AND gates + (2m - 2) XOR gates, while the time complexity is T_A + (1 + log₂ (m - 1)) T_X, where T_A and T_X are the delay of one AND gate and one XOR gate, respectively.


Fig. 1:	GF (2^m) bit-serial Massey-Omura multiplier

One advantage of the Massey-Omura multiplier is that it can be used with both types of the optimal normal basis (Type I and Type II). Another advantage is that it is a bit-serial multiplier and hence the same circuitry used to generate c₀ can be used to generate c_i (i = 1, 2,…, m - 1). The bit-parallel version of the Massey-Omura multiplier requires (2m² - m) AND gates + (2m² - 2m) XOR gates.

Example 3: Let A, B and C ∈ GF (2⁵). Using the λ matrix in Example 2, the product C is found as:

The corresponding GF (2^m) bit-serial Massey-Omura multiplier is shown in Fig. 1. c₁ is generated by simple 1-bit rotation of the content of the two registers in Fig. 1. Figure 1 shows the case where c₀ is generated. Other product bits (c₁, c₂,…, c_m-1) are generated in a similar manner.

Let A = B = (11000), | A x B = A² = (11000)²= (10001) = C. Now using the Massey-Omura multiplier we apply the above formulas to get:

Hasan et al. (1993) proposed a modified version of the Massey-Omura parallel multiplier which works only with ONB Type I. The basic idea is to decompose the λ^m-1 matrix as a sum of two matrices P and Q.

λ^m-1 = P +Q (mod 2)

where the (i, j) entry of P is defined as follows:

The product c_m-1-k can be obtained as:

where A^(k) and B^(k) are the k-cyclic shifted vectors of A and B. Hasan et al. (1993) noticed that ĉ is independent of k and is present in each c_m-1-k. Hence, it can be precomputed once at the beginning thus reducing the multiplier’s hardware complexity. Compared to the Massey-Omura parallel multiplier, this multiplier requires only m² - 1 XOR gates and the same number of AND gates. However, the time complexity of this modified parallel multiplier is the same as the Massey-Omura parallel multiplier and the number of XOR gates is still O (m²).

Example 4: Let A, B and C ∈ GF (2⁴). Using the λ₃ matrix in Example 1, P and Q is found as:

Let A = B = (11000), | A x B = A² = (11000)²= (1001) = C.

1. Since ĉ = APB^t, we have:
ĉ = [0 0 1 1] .P. [0 0 1 1]^t = 0,
1. Since ĉ_m-1-k = A^(k) QB^(k)t, we have:

Alternatively, Gao and Sobelman (2000) noticed that the Massey-Omura bit-serial multiplier is constructed using an AND plane and an XOR plane.


Fig. 2:	XOR plane, AND and XOR planes of Gao and Sobelman multiplier

Gao and Sobelman (2000) suggested to rearrange these planes into three planes: XOR-plane, AND-plane and another XOR-plane as shown in Fig. 2. This is achieved by rearranging the multiplication equation as:

The Gao and Sobelman (2000) multiplier requires the same number of XOR gates and has the same time complexity as the Massey-Omura multiplier. The only improvement was reducing the number of AND gates to only m² AND gate compared to (2m²- m) AND gates for the Massey-Omura multiplier.

Example 5: Let A, B and C ∈ GF (2⁵). Using the λ matrix in Example 2 and Gao and Sobelman (2000) multiplier, the product C is found as:

Let A = B = (11000), ⇒ A x B = A² = (11000)²= (10001) = C, we have:

Reyhani-Masoleh and Hasan (2004) presented another multiplier based on the same idea of rearranging the multiplier’s XOR and AND planes (Gao and Sobelman, 2000) but with a different formulation. The product terms are reformulated as:

where Φ_k contains the coordinates of 1’s in the upper part of the λ_k matrix. The space complexity is (m²) AND + (3m²-3m) XOR gates while the time complexity is T_A + R log₂ (2m - 1)X T^X. This means that the Gao and Sobelman multiplier is more efficient than the reported multiplier by Reyhani-Masoleh and Hasan (2004).

Example 6: Let A, B and C ∈ GF (2⁵). Using the λ matrix in Example 2 and Reyhani-Masoleh and Hasan multiplier, we have:

Thus, the product C can be found as follows:

The corresponding GF (2^m) bit-serial Reyhani-Masoleh and Hasan multiplier is shown in Fig. 3. Let A = B = (11000), ⇒ A x B = A² = (11000)²= (10001) = C, we have:

Table 1:	The λ-based multipliers and their space and time complexities


Fig. 3:	GF (2^m) bit-serial Reyhani-Masoleh and Hasan multiplier

Reyhani-Masoleh and Hasan (2002) have also reported a parallel multiplier that takes advantage of the symmetry of the λ matrix and reduced redundancy in the λ matrix. This is achieved by rewriting the λ matrix as λ = U +U^T + D, where D is the diagonal matrix and U is the upper triangular matrix having zeros at diagonal entries. Thus, the multiplication product can be written as follows:

The U matrix is reformulated according to the value of m (even or odd). However, the proposed multiplier’s space complexity is (m²) AND + (m² - 1) XOR and the time complexity is T_A + (1 + log₂ (m -1)) T_X which represents minor gain as compared to other multipliers. Table 1 compares the λ-based multipliers space and time complexities.

CONVERSION BASED MULTIPLIERS

Koc and Sunar (1998) proposed a new bit-parallel multiplier over GF (2^m). The multiplier was employed to perform Type I optimal normal basis multiplication by converting the two operands into canonical basis (a basis of the form (α^m-1,…, α², α¹,1), where α ∈ GF (2^m) is a root of the generating polynomial of degree m is called a canonical basis). After multiplication, the result is converted back to the normal basis. The proposed technique yields a slight improvement in the number of XOR gates as compared to the Massey-Omura multiplier. The number of required XOR gates is reduced down to m²-1 as compared to 2m²-m for the Massey-Omura multiplier, while its time complexity (T_A + (2 + log₂ (m -1)) T_X is slightly more than Massey-Omura’s time complexity. The main advantage of this technique, however, is the opening of a new direction in multiplications based on basis conversion which was first explored by Sunar and Koc (2001).

Sunar and Koc (2001) reported a new Type II optimal normal basis multiplier. The main idea of the work was based on converting the two operands into their equivalent representation in another basis, perform the multiplication in that basis and convert the product back to the normal basis. The conversion step requires only a single clock cycle since it is nothing but a permutation of the normal basis (Sunar and Koe, 2001).

Using optimal normal basis Type II and assuming that p = 2m + 1 is prime, another new basis which is obtained by a simple permutation of the normal basis elements was presented. Let β be γ + γ^-1, where γ is the primitive pth root of unity, the new basis can be written in the form γ + γ^-1, γ² + γ^-2+ γ³, + γ^-z,…, γ^m +γ^-m. Two elements Â and ∈ GF (2^m)can be represented in the new basis as:

and

The product Ĉ = Â. is written as:

This product can transformed to the following form:

The term Ĉ₁ has the property that the exponent (i-j) of γ is already within the proper range, i.e., -m ≤ (i-j) ≤ m for all I, j ∈ [1, m]. The term Ĉ₂ should be ensured to be in the proper range. Hence, Ĉ₂ is computed as follows:

The exponents of the basis elements in are guaranteed to be in the proper range 1 ≤ (I + j) ≤ m for I = 1, 2,…, m and j = 1, 2,…, m-i. If k = i + j, then product contributes to the basis element β_k as i and j take these values. The basis elements of , however, are all out of range. Thus, the identity γ^2m+1 is used to bring them to the proper range as:

Therefore, if k = i + j > m, β_k is replaced by β_2m+1+k and thus the final product can be found as:

The hardware complexity of this bit-parallel multiplier is m² AND gates + 3/2 (m² - m) XOR gates and the time complexity is T_A + (1 + R log₂mX ) T_X. This represents an improvement of about 25 percent less XOR gates compared to the Massey-Omura multiplier but with slightly more delay. A major advantage of this method, however, is the fact that there is no need to store the λ matrix to perform multiplications which requires m³ locations if all λ matrices are stored or m² if only λ⁽⁰⁾ is stored. This is a major advantage in area constrained environments since we only need to store the permutation array which requires only mlog₂ (m) storage locations.

Example 7: Let A, B and C ∈ GF (2⁵). Since 2m + 1 = 2.5 + 1 = 11 and 2 is primitive in modulo 11, there exists an optimal basis of Type II for the field GF (2⁵), which is of the form where, Using the identity γ¹¹ = 1, we find the converted form as follows:

•	The first three exponents 1, 2 and 4 are in the proper range [1, 5].
•	We have 16 = 15 (mod 11), which brings the exponent 16 to the proper range.
•	In order to bring 8 to the range [1, m] = [1, 5], we use the identity γ⁸ = γ^11-8= γ³.


Fig. 4:	The construction of Ĉ₁, , and Ĉ in GF (2⁵) in (Sunar and Koc, 2001)

Thus the converted form is (β₁, β₂, β₄, β₃, β₅). Figure 4 shows the construction of Ĉ₁, , .

The product C can be found as:

Let A = B = (11000) ⇒ A x B = A² = (11000) ² = (10001) C, Â = = (10100), we have:

Ĉ = (10001) and the product C after conversion back to the normal basis is also (10001).

Alternatively, Wu et al. (2002) extended the work described by (Gao and Vanstone, 1995) and presented a new Type II multiplier. The basic idea is the same as that of (Sunar and Koc, 2001). Multiplication is performed by converting the two operands into another basis which is simply a permutation of the normal basis, perform the multiplication and convert the product back to the original normal basis. The difference is only in the multiplication part. The product can be found in the new basis as:

where s (i) is defined as:

Table 2:	The conversion based multipliers and their space and time complexities


Fig. 5:	The Wu et al., (2002) multiplier

The hardware complexity of this multiplier is m² AND gates + 2m² - m XOR gates which is worse than the Sunar and Koc multiplier (2001) as shown in Fig. 5. However, the time complexity is T_A + (1 + R log₂mX ) T_X which is the same as the multiplier reported by (Sunar and Koc, 2001).

Example 8: Let A, B and C ∈ GF (2⁵). Using the same conversion procedure illustrated in Example 7, the product C can be found as:

Let A = B = (11000), ⇒ AxB = A² = (11000)² = (10001) C, Â = = (10100), we have:

Ĉ and the product C after conversion back to the normal basis is also (10001).

Table 2 shows the conversion based multipliers and their space and time complexities.

INVERSION

Inversion using normal basis consists of multiplications and cyclic shifts. Since cyclic shifts require almost negligible time, the number of multiplications is the key parameter for efficient inversion. In the following subsections we survey existing algorithms for inversion over GF (2^m) using normal basis. These inversion algorithms can be classified into three main categories: (1) Standard, (2) Exponent Decomposing and (3) Exponent Grouping inversion algorithms.

Standard inversion algorithm: This category contains only one algorithm, which is the first proposed normal basis inversion algorithm over GF (2^m) in Wang et al. (1985). The basic idea is derived from Fermat’s Little Theorem where the inverse of a ∈ GF (2^m) is given by since , we can express a^-1 as:

Algorithm 1: Wang’s et al. (1985) inversion algorithm.

Inputs: a

Output: a^-

•	B = a², C = 1 and k = 0.
•	D = B x C and k = k + 1.
•	if k = m-1, a^-1 = D, Stop.
•	if k < m-1, B = B² and C = D.
•	Go back to 2.

This only requires multiplication and cyclic shift operations. The algorithm procedure for computing a^-1 as suggested by Wang et al. (1985) is shown in Algorithm 1. It requires (m-2) multiplications + (m-1) cyclic shifts. The advantage of this method is its simplicity while its disadvantage is the large O (m) number of multiplications.

Example 9: A and A^-1C ∈ GF (2⁵). The inverse of A is . By applying Wang’s et al. (1985) inversion algorithm we have:

Step 1:	B =A², C = 1, k = 0
Step 2:	D = BxC = A²x1 =A², k = 1
Step 4:	k =1 < 4 \| B = B² = (A²)² = A⁴, C = A²
Step 2:	D = B x C = A⁴ x A² =A⁶, k = 2
Step 4:	k = 2 < 4 \| B = B² = (A⁴)² = A⁸, C = A⁶
Step 2:	D = BxC = A⁸ xA⁶ =A¹4, k = 3
Step 4:	k = 3 < 4 \| B = B² = (A⁸)² = A¹6, C = A¹4
Step 2:	D = B x C =A¹6xA¹4 = A³⁰, k = 4
Step 3:	k = 4 = 4 \| A^-1 = D = A³⁰, STOP.

Exponent decomposing inversion algorithms: Since the number of multiplications is the dominant factor in determining the computation time of the inversion operation, several algorithms attempted to improve the inversion speed by decomposing the exponent to reduce the required number of multiplications and replace it with squaring operations which are much simpler compared to multiplication.

In 1988 Itoh and Tsujii (1988) proposed a GF (2^m) inversion algorithm derived from Fermat’s Little Theorem using normal basis. The basic idea used was to decompose the exponent m-1 as follows:

The exponent 2m¯1 is further decomposed as follows:

1.If m is odd, then

and

2.If m is even, then

and

The proposed algorithm by (Itoh and Tsujii, 1988) is shown in Algorithm 2 and it requires log₂ (m-1)+v (m-1)-1 multiplications, where v (x) is the number of 1’s in the binary representation of x.

Algorithm 2: Itoh-Tsujii inversion algorithm.

Inputs: a
Output: l = a^-1
1.	set s ← ⌊log₂ (m-1)⌋-1.
2.	set p ← a.
3.	for i = s down to 0 do
3.1.	set r ← shift m-1 to right by s bit(s)
3.2.	set q ← p
3.3.	rotate q to left by⌊ r/2⌋ bit(s)
3.4.	set t ← p x q
3.5.	if last bit of r = 1,
3.5.1.	rotate t to left by 1 bit.
3.5.2.	p ← t x a
3.5.	else
3.5.3.	p ← t
3.6.	s ← s-1
4.	rotate p to left by 1 bit.
5.	set l ← p.
6.	return l.

Example 10: Let A and A^-1 ∈ GF (2⁵). The inverse of A is . By applying Itoh and Tsujii inversion algorithm we have:

step 1.	S = 2-1 = 1
step 2.	P = A
step 3.	i = 1
step 3.1.	r = 2
step 3.2.	q = p = A
step 3.3.	q = q² = A²
step 3.4.	t = p x q = A x A² = A³
step 3.5.3.	p = t = A³
step 3.6.	s = s -1 = 1-1 = 0
step 3.	i = 0
step 3.1.	r = 4
step 3.2.	q = p = A³
step 3.3.	q = (q²)² = (( A³)²)² =A¹²
step 3.4.	t = p x q =A³ x A¹² = A¹⁵
step 3.5.3.	p = t = A¹⁵
step 3.6.	s =s -1 = 0-1 = -1
step 4.	P = P² = (A¹⁵)² = A³⁰
step 5.	i = p = A³⁰
step 6.	return l = A³⁰

Feng (1989) has also proposed an inversion algorithm which requires the same time complexity as the Itoh and Tsujii inversion algorithm, i.e., O (log₂ (m)). The inversion algorithm was also derived from Fermat’s Little Theorem and is also based on exponent decomposition as the Itoh and Tsujii inversion algorithm. Feng defined m-1 as follows:

Let m_qm_q-1… m₁m₀ be the binary representation of (m-1), where m_q = 1 and m_i is 0 or 1 for i = 0 to q-1, i.e., m-1 = m_q 2^q + m_q¯1 2_q¯1 +… + m₁ 2¹ + m₀ 2⁰. The inverse a^-1 can be computed using Algorithm 2.

Algorithm 3: Feng’s inversion algorithm.

Inputs: a

Output: a^-1

1.	b = a
2.	for i = q to 1 do
2.1.	if m_i = 1, then .
2.2.	.
2.3.	if m_i-1 = 1, then b = b x a.
3.	, Stop.

Only step 2.2 and 2.3 in Algorithm 3 require multiplications. Step 2.1 and 3 need only cyclic shifts. Thus, the algorithm has (q + p) multiplications where q = ⌊ log₂ (m) ⌋ and, p = # 1’s in the binary expression of (m-1). Accordingly, the proposed algorithm has the same complexity as the Itoh and Tsujii inversion algorithm. The difference is only that the Itoh and Tsujii algorithm performs squaring during each iteration, while Feng’s algorithm computes the square roots each iteration and squares at the end by 2^m-m0.

Example 11: Let A and A^-1∈ GF (2⁵). The inverse of A is . By applying Feng’s inversion algorithm we have: q = 2, m-1 = 4, m₂ = 1, m₁ =0, m₀ =0 and p = 0,

step 1.	b = A
step 2.	i = 2
step 2.1.
step 2.2.	=
step 2.	i = 1
step 2.2.
step 3.

Takagi et al. (2001) proposed another inversion algorithm which was also based on exponent decomposition. The main idea is as follows: Since

Hence, the algorithm can use the inversion algorithm by (Itoh and Tsujii, 1988) by replacing m by m - h. This method reduces the number of multiplications through performing an exhaustive search for an optimal value of h. The time complexity, however, is O (log₂ (m)).

EXPONENT GROUPING INVERSION ALGORITHMS

Grouping exponent terms is another approach which attracted some researchers. However, the resulting speed is O (m) which is worse than the O (log₂ (m)) of the Itoh and Tsujii inverter. The inversion algorithms in this category are based on the idea of grouping exponent terms as follows:

Since , this allows grouping exponent terms in different ways. Fenn et al. (1996) proposed an inversion algorithm which requires m/2 multiplications. The basic idea was by dividing the exponent terms into two equal groups. The first group contains a², a⁴,.., a^k, where, k = 2^(m-1)/2. The other group contains the remaining terms till . By multiplying the first term in each group and repeated squaring by , the following formula can be applied to give the inverse within m/2 multiplications.

Example 12: Let A and A^-1∈ GF (2⁵), m = 5 is odd. The inverse of A is . By applying Fenn’s inversion algorithm we have:

Example 13: Let A and A^-1∈ GF (2⁸), m = 8 is even. The inverse of A is . By applying Fenn’s inversion algorithm we have:

A further improvement, however, have been reported by Jimenez Calvo and Torres (1997). The Jimenez Calvo and Torres (1997) inversion algorithm uses a fixed seed a². a⁴ = a⁶, which uses the same idea of grouping into two groups but in a different manner. This can be shown as:

Table 3:	The inverters and their time complexities

Example 14: Let A and A^-1∈ GF (2⁸), m = 8 is even. The inverse of A is . By applying Jimenez Calvo and Torres inversion algorithm we have:

This method was generalized by choosing m’ and b such that m = bm' + r, where r = [1,2]. This is basically the same way trying to group more terms in the seed at the beginning.

This requires around m/m’ multiplications but still with the same O (m) time complexity.

Example 15: Let A and A^-1∈ GF (2¹⁶), m = 16 is even. The inverse of A is . By applying Jimenez Calvo and Torres generalized inversion algorithm we have:

Let b = 3, 5 and r = 1. First we calculate as follows:

Then applying the generalized equation we get:

Yen (1997) proposed another approach for grouping terms which results in reduced number of clock cycles. Yen realized that the fundamental idea behind Fenn’s et. al. inverter is to rearrange all terms into (m-1)/2 groups and to extract the common term (). Accordingly, Yen redefined the common term to contain p terms and a^-1 can be found as:

where the common part X is precomputed as:

The best case requires k + (p - 1) multiplications while the worst case requires k + (p - 1) + (p - 2) = k + 2p - 3 multiplications and the optimal selection of p for m = pk + 1 is . The time complexity, however, is still O (m). Table 3 summarizes the inversion algorithms and their time complexities.

CONCLUSIONS

In this study we presented a survey on finite field arithmetic using normal basis over GF (2^m). Addition requires simple XOR operation while squaring requires only a cyclic shift. This is the most attractive property of using normal basis rather than using polynomial basis which is more suitable for software implementations.

We categorized normal basis multipliers into two main categories: (1) λ-matrix based multipliers and (2) Conversion based multipliers. The multipliers in (2) are more efficient since they don’t require storing the λ-matrix. The comparisons shows that the Type II Sunar-Koc multiplier is the best multiplier with a hardware complexity of m² AND gates + 3/2m (m-1) XOR gates and a time complexity of T_A + (1 + ⌈log₂m⌉) T_X.

We also categorized the inversion algorithms into three main categories: (1) Standard, (2) Exponent Decomposing and (3) Exponent Grouping. Inversion algorithms in category (3) performed better than the standard with time complexity O (m). The exponent decomposing inversion algorithm of Itoh and Tsujii, however, is found to be the best inverter requiring only log₂ (m-1) multiplications.

ACKNOWLEDGEMENT

The authors would like to acknowledge the support of King Abdul Aziz City for Science and Technology for the grant of the research No. AR-22-17. The authors would like also to acknowledge the support of King Fahd University of Petroleum and Minerals.

REFERENCES

Feng, G.L., 1989. A VLSI architecture for fast inversion in GF (2m). IEEE Trans. Comput., 38: 1383-1386.
Direct Link

Fenn, S.T.J., M. Benaissa and D. Taylor, 1996. Fast normal basis inversion in GF (2m). Elect. Lett., 32: 1566-1567.
Direct Link

Gao, S. and S. Vanstone, 1995. On orders of optimal normal basis generators. Math. Comput., 64: 1227-1233.
Direct Link

Gao, L. and G.E. Sobelman, 2000. Fast normal basis inversion in GF (2m). Proceedings of 13th Annual IEEE International ASIC/SOC Conference, Sept. 13-16, USA., pp: 97-101.

Hasan, M.A., M.Z. Wang and V.K. Bhargava, 1993. A modified Massey-Omura parallel multiplier for a class of finite fields. IEEE Trans. Comput., 42: 1278-1280.
Direct Link

Itoh, T. and S. Tsujii, 1988. A fast algorithm for computing multiplicative inverses in GF(2^m) using normal basis. Inform. Comput., 78: 171-177.
Direct Link

Menezes, A., 1993. Elliptic Curve Public Key Cryptosystems. 1st Edn., Kluwer Academic Publishers, New York

Koblitz, N., 1987. Elliptic curve cryptosystems. Maths. Comput., 48: 203-209.

Koc, C.K. and B. Sunar, 1998. Low-complexity bit-parallel canonical and normal basis multipliers for a class of finite fields. IEEE Trans. Comput., 47: 353-356.
Direct Link

Lidl, R. and H. Niederreiter, 1994. Introduction to Finite Fields and Their Applications. Cambridge University Press, Cambridge, UK

Massey, J.L. and J.K. Omura, 1986. Computational method and apparatus for finite field arithmetic. US Patent No. 4: 587627. http://www.freepatentsonline.com/4587627.html.

Menezes, A.J., 1993. Elliptic Curve Public Key Cryptosystems. Kluwer Academic Publishers, New York

Menezes, A.J., I.F. Blake, X. Gao, R.C. Mullin, S.A. Vanstone and T. Yaghoobian, 1993. Applications of Finite Fields. Kluwer Academic Publishers, Boston, MA

Mullin, R.C., I.M. Onyszchuk, S.A. Vanstone and R.M. Wilson, 1988. Optimal normal bases in GF(pm). Discrete Applied Maths., 22: 149-161.

Reyhani-Masoleh, A. and M.A. Hasan, 2002. A new construction of Massey-Omura parallel multiplier over GF (2m). IEEE Trans. Comput., 51: 511-520.
Direct Link

Reyhani-Masoleh, A. and M.A. Hasan, 2004. Efficient digit-serial normal basis multipliers over binary extension fields. ACM Trans. Embedded Comput. Systems, 3: 575-592.
Direct Link

Rosing, M., 1999. Implementing Elliptic Curve Cryptography. Manning Publications Co., USA

Sunar, B. and C.K. Koc, 2001. An efficient optimal normal basis type II multiplier. IEEE Trans. Comput., 50: 83-87.
CrossRef Direct Link

Takagi, N., J. Yoshiki and K. Takagi, 2001. A fast algorithm for multiplicative inversion in GF (2m). Using normal basis. IEEE Trans. Comput., 50: 394-398.
CrossRef Direct Link

Wang, C.C., T.K. Truong, H.M. Shao, L.J. Deutsch, J.K. Omura and I.S. Reed, 1985. VLSI architectures for computing multiplications and inverses in GF (2m). IEEE Trans. Comput., 34: 709-716.
CrossRef Direct Link

Wu, H., A. Hasan, I. Blake and S. Gao, 2002. Finite field multiplier using redundant representation. IEEE Trans. Comput., 51: 1306-1316.
CrossRef Direct Link

Yen, S., 1997. Improved normal basis inversion in GF (2m). Elect. Lett., 33: 196-197.
Direct Link

HOME JOURNALS CONTACT