INTRODUCTION
Efficient computations in finite fields and their architectures are important in many applications, including coding theory, computer algebra systems and publickey cryptosystems (e.g., elliptic curve cryptosystems (ECC) (Koblitz, 1987). Although all finite fields of the same cardinality are isomorphic, their arithmetic efficiency depends greatly on the choice of the basis use for field element representation. The most commonly used basis are polynomial basis (PB) and normal basis (NB) (Menezes, 1993; Menezes et al., 1993). Normal basis is more suitable for hardware implementations than polynomial basis since operations in normal basis representation are mainly comprised of rotation, shifting and exclusiveORing which can be efficiently implemented in hardware (Lidl and Niederreiter, 1994). This study surveys GF (2^{m}) field arithmetic operations and algorithms.
A normal basis in GF (2^{m}) is a basis of the form where
β ∈ GF (2^{m}). In normal basis, an element A ∈ GF (2^{m})
can be uniquely represented in the form where,
α_{i} ∈ {0,1}. Arithmetic operations using normal basis in
GF (2^{m}) are performed as follows:
Addition: Addition is performed by a simple bitwise exclusiveOR (XOR) operation.
Squaring: Squaring is simply a rotate left operation. Thus, if ,
then .
Multiplication: ∀ A, B ∈ GF (2^{m}), where
and
The product C = A * B, is given by:
Multiplication is defined in terms of a set of m multiplication matrices λ^{(k)}
(k = 0, 1,…, m1).
where,
The number of nonzero elements in the λ matrix defines the complexity
of the multiplication process and accordingly the complexity of its hardware
implementation. This value is denoted as C_{N} and is equal to 2m1
for optimal normal basis (ONB) (Mullin et al., 1989). An optimal normal
basis is one with the minimum possible number of nonzero elements in the λ_{ij}
matrix. Such basis typically lead to efficient hardware implementations since
operations are mainly comprised of rotation, shifting and exclusiveOR operations.
Derivation of values of the λ matrix elements is dependent on the filed
size m. There are two types of optimal normal basis that are refereed to as
Type I and Type II (Mullin et al., 1989). An ONB of Type I exists a given
field GF (2^{m}) if:
• 
m + 1is a prime 
• 
2 is a primitive in GF (m + 1) 
On the other hand, a Type II optimal normal basis exists in GF (2^{m})
if:
• 
2m + 1 is prime 
• 
either 2 is a primitive in GF (2m + 1) or 2m + 1 ≡ 3 (mod 4) and
2 generates the quadratic residues in GF (2m + 1) 
An ONB exists in GF (2^{m}) for 23% of all possible values of m (Mullin
et al., 1989). The λ^{(k)} matrix can be constructed by
a kfold cyclic shift to λ^{(0)} as follows:
for
all 0 ≤ i, j, k ≤ m1 
The λ^{(0)} matrix is derived differently for the two types of
ONB. For the Type I ONB, λ_{ij}^{(0)}=1 iff i and j satisfy
one of the following two congruencies (Rosing, 1999):
• 
2^{i} + 2 ≡ 1 mod (m + 1) 
• 
2^{i} + 2^{j} ≡ 0 mod (m + 1) 
For Type II ONB, λ_{ij}^{(k)}=1 iff i and j satisfy one of the following four congruencies
(Rosing, 1999):
• 
2^{i} + 2^{j} ≡ 2^{k} mod (2m
+ 1) 
• 
2^{i} + 2^{j} ≡ 2^{k} mod (2m + 1) 
• 
2^{i}  2^{j} ≡ 2^{k} mod (2m + 1) 
• 
2^{i}  2^{j} ≡ 2^{k} mod (2m + 1) 
Therefore, λ_{ij}^{(0)}=1 iff i and j satisfy one of the
following four congruencies:
2^{i} ± 2^{j} ≡ ±1
mod (2m + 1) 
Example 1: The λ matrix of Type I ONB over GF (2^{4}) is
constructed as:
Example 2: The λ matrix of Type II ONB over GF (2^{5})
is constructed as:
Inversion: Inverse of α ∈ GF (2^{m}), denoted as a^{1},
is defined as follows.
Most inversion algorithms are derived from Fermat’s Little Theorem, where
for all α ≠ 0 in GF (2^{m}).
MULTIPLICATION
Multiplication is more complicated than addition and squaring operations in finite field arithmetic. An efficient multiplier is highly needed and is the key for efficient finite field computations. Finite filed multipliers using normal basis can be classified into two main categories: (1) λmatrix based multipliers and (2) Conversion based multipliers.
λMatrix Based Multipliers: Massey and Omura (1986) proposed an
efficient normal basis bitserial multiplier over GF (2^{m}). The MasseyOmura
multiplier requires only two mbit cyclic shift registers and combinational
logic. The combinational logic consists of a set of AND and XOR logic gates.
The first implementation of the MasseyOmura multiplier was reported by Wang
et al. (1985). The space complexity of the MasseyOmura multiplier is
(2m  1) AND gates + (2m  2) XOR gates, while the time complexity is T_{A}
+ (1 + log_{2} (m  1)) T_{X}, where T_{A} and T_{X}
are the delay of one AND gate and one XOR gate, respectively.

Fig. 1: 
GF (2^{m}) bitserial MasseyOmura multiplier 
One advantage
of the MasseyOmura multiplier is that it can be used with both types of the
optimal normal basis (Type I and Type II). Another advantage is that it is a bitserial multiplier and hence the same
circuitry used to generate c_{0} can be used to generate c_{i}
(i = 1, 2,…, m  1). The bitparallel version of the MasseyOmura multiplier
requires (2m^{2}  m) AND gates + (2m^{2}  2m) XOR gates.
Example 3: Let A, B and C ∈ GF (2^{5}). Using the λ matrix in Example 2, the product C is found as:
The corresponding GF (2^{m}) bitserial MasseyOmura multiplier is
shown in Fig. 1. c_{1} is generated by simple 1bit rotation of the
content of the two registers in Fig. 1. Figure
1 shows the case where c_{0} is generated. Other product bits (c_{1},
c_{2},…, c_{m1}) are generated in a similar manner.
Let A = B = (11000),  A x B = A^{2} = (11000)^{2 }= (10001) = C. Now using the MasseyOmura multiplier we apply the above formulas to get:
Hasan et al. (1993) proposed a modified version of the MasseyOmura
parallel multiplier which works only with ONB Type I. The basic idea is to decompose
the λ^{m1} matrix as a sum of two matrices P and Q.
where the (i, j) entry of P is defined as follows:
The product c_{m1k} can be obtained as:
where A^{(k)} and B^{(k)} are the kcyclic shifted vectors
of A and B. Hasan et al. (1993) noticed that ĉ
is independent of k and is present in each c_{m1k}. Hence, it can
be precomputed once at the beginning thus reducing the multiplier’s hardware
complexity. Compared to the MasseyOmura parallel multiplier, this multiplier
requires only m^{2}  1 XOR gates and the same number of AND gates.
However, the time complexity of this modified parallel multiplier is the same
as the MasseyOmura parallel multiplier and the number of XOR gates is still
O (m^{2}).
Example 4: Let A, B and C ∈ GF (2^{4}). Using the λ_{3} matrix in Example 1, P and Q is found as:
Let A = B = (11000),  A x B = A^{2} = (11000)^{2 }= (1001)
= C.
1. Since ĉ = APB^{t}, we have:
ĉ = [0 0 1 1] .P. [0 0 1 1]^{t} = 0,
1. Since ĉ_{m1k} = A^{(k)} QB^{(k)t}, we have:
Alternatively, Gao and Sobelman (2000) noticed that the MasseyOmura bitserial
multiplier is constructed using an AND plane and an XOR plane.

Fig. 2: 
XOR plane, AND and XOR planes of Gao and Sobelman multiplier 
Gao and Sobelman (2000) suggested to rearrange these planes into three planes:
XORplane, ANDplane and another XORplane as shown in Fig. 2.
This is achieved by rearranging the multiplication equation as:
The Gao and Sobelman (2000) multiplier requires the same number of XOR gates and has the same time complexity as the MasseyOmura multiplier. The only improvement was reducing the number of AND gates to only m^{2} AND gate compared to (2m^{2 } m) AND gates for the MasseyOmura multiplier.
Example 5: Let A, B and C ∈ GF (2^{5}). Using the λ matrix in Example 2 and Gao and Sobelman (2000) multiplier, the product C is found as:
Let A = B = (11000), ⇒ A x B = A^{2} = (11000)^{2 }= (10001)
= C, we have:
ReyhaniMasoleh and Hasan (2004) presented another multiplier based on the
same idea of rearranging the multiplier’s XOR and AND planes (Gao and Sobelman,
2000) but with a different formulation. The product terms are reformulated as:
where Φ_{k} contains the coordinates of 1’s in the upper
part of the λ_{k} matrix. The space complexity is (m^{2})
AND + (3m^{2}3m) XOR gates while the time complexity is T_{A}
+ R log_{2} (2m  1)X T^{X}. This means that the Gao and Sobelman
multiplier is more efficient than the reported multiplier by ReyhaniMasoleh
and Hasan (2004).
Example 6: Let A, B and C ∈ GF (2^{5}). Using the λ matrix in Example 2 and ReyhaniMasoleh and Hasan multiplier, we have:
Thus, the product C can be found as follows:
The corresponding GF (2^{m}) bitserial ReyhaniMasoleh and Hasan multiplier
is shown in Fig. 3. Let A = B = (11000), ⇒ A x B = A^{2}
= (11000)^{2 }= (10001) = C, we have:
Table 1: 
The λbased multipliers and their space and time complexities 


Fig. 3: 
GF (2^{m}) bitserial ReyhaniMasoleh and Hasan multiplier 
ReyhaniMasoleh and Hasan (2002) have also reported a parallel multiplier that
takes advantage of the symmetry of the λ matrix and reduced redundancy
in the λ matrix. This is achieved by rewriting the λ matrix as λ
= U +U^{T} + D, where D is the diagonal matrix and U is the upper triangular
matrix having zeros at diagonal entries. Thus, the multiplication product can
be written as follows:
The U matrix is reformulated according to the value of m (even or odd). However, the proposed multiplier’s space complexity is (m^{2}) AND + (m^{2}  1) XOR and the time complexity is T_{A} + (1 + log_{2} (m 1)) T_{X} which represents minor gain as compared to other multipliers. Table 1 compares the λbased multipliers space and time complexities.
CONVERSION BASED MULTIPLIERS
Koc and Sunar (1998) proposed a new bitparallel multiplier over GF (2^{m}). The multiplier was employed to perform Type I optimal normal basis multiplication by converting the two operands into canonical basis (a basis of the form (α^{m1},…, α^{2}, α^{1},1), where α ∈ GF (2^{m}) is a root of the generating polynomial of degree m is called a canonical basis). After multiplication, the result is converted back to the normal basis. The proposed technique yields a slight improvement in the number of XOR gates as compared to the MasseyOmura multiplier. The number of required XOR gates is reduced down to m^{2}1 as compared to 2m^{2}m for the MasseyOmura multiplier, while its time complexity (T_{A} + (2 + log_{2} (m 1)) T_{X} is slightly more than MasseyOmura’s time complexity. The main advantage of this technique, however, is the opening of a new direction in multiplications based on basis conversion which was first explored by Sunar and Koc (2001).
Sunar and Koc (2001) reported a new Type II optimal normal basis multiplier. The main idea of the work was based on converting the two operands into their equivalent representation in another basis, perform the multiplication in that basis and convert the product back to the normal basis. The conversion step requires only a single clock cycle since it is nothing but a permutation of the normal basis (Sunar and Koe, 2001).
Using optimal normal basis Type II and assuming that p = 2m + 1 is prime, another
new basis which is obtained by a simple permutation of the normal basis elements
was presented. Let β be γ + γ^{1}, where γ is the
primitive pth root of unity, the new basis can be written in the form γ
+ γ^{1}, γ^{2} + γ^{2 }+ γ^{3},
+ γ^{z},…, γ^{m} +γ^{m}. Two elements
Â and
∈ GF (2^{m})can be represented in the new basis as:
and
The product Ĉ = Â.
is written as:
This product can transformed to the following form:
The term Ĉ_{1} has the property that the exponent (ij) of γ
is already within the proper range, i.e., m ≤ (ij) ≤ m for all I, j
∈ [1, m]. The term Ĉ_{2} should be ensured to be in the proper
range. Hence, Ĉ_{2} is computed as follows:
The exponents of the basis elements in
are guaranteed to be in the proper range 1 ≤ (I + j) ≤ m for I = 1, 2,…,
m and j = 1, 2,…, mi. If k = i + j, then product
contributes to the basis element β_{k} as i and j take these values.
The basis elements of ,
however, are all out of range. Thus, the identity γ^{2m+1} is used
to bring them to the proper range as:
Therefore, if k = i + j > m, β_{k} is replaced by β_{2m+1+k}
and thus the final product can be found as:
The hardware complexity of this bitparallel multiplier is m^{2} AND gates + 3/2 (m^{2}  m) XOR gates and the time complexity is T_{A} + (1 + R log_{2}mX ) T_{X}. This represents an improvement of about 25 percent less XOR gates compared to the MasseyOmura multiplier but with slightly more delay. A major advantage of this method, however, is the fact that there is no need to store the λ matrix to perform multiplications which requires m^{3} locations if all λ matrices are stored or m^{2} if only λ^{(0)} is stored. This is a major advantage in area constrained environments since we only need to store the permutation array which requires only mlog_{2} (m) storage locations.
Example 7: Let A, B and C ∈ GF (2^{5}). Since 2m + 1 = 2.5 + 1 = 11 and 2 is primitive in modulo 11, there exists an optimal basis of Type II for the field GF (2^{5}), which is of the form where, Using the identity γ^{11} = 1, we find the converted form as follows:
• 
The first three exponents 1, 2 and 4 are in the proper range
[1, 5]. 
• 
We have 16 = 15 (mod 11), which brings the exponent 16 to the proper range. 
• 
In order to bring 8 to the range [1, m] = [1, 5], we use the identity
γ^{8} = γ^{118 }= γ^{3}. 

Fig. 4: 
The construction of Ĉ_{1}, ,
and
Ĉ in GF (2^{5}) in (Sunar and Koc, 2001) 
Thus the converted form is (β_{1}, β_{2}, β_{4},
β_{3}, β_{5}). Figure 4 shows the
construction of Ĉ_{1}, ,
.
The product C can be found as:
Let A = B = (11000) ⇒ A x B = A^{2} = (11000) ^{2} = (10001)
C, Â =
= (10100), we have:
Ĉ = (10001) and the product C after conversion back to the normal basis
is also (10001).
Alternatively, Wu et al. (2002) extended the work described by (Gao and Vanstone, 1995) and presented a new Type II multiplier. The basic idea is the same as that of (Sunar and Koc, 2001). Multiplication is performed by converting the two operands into another basis which is simply a permutation of the normal basis, perform the multiplication and convert the product back to the original normal basis. The difference is only in the multiplication part. The product can be found in the new basis as:
where s (i) is defined as:
Table 2: 
The conversion based multipliers and their space and time
complexities 


Fig. 5: 
The Wu et al., (2002) multiplier 
The hardware complexity of this multiplier is m^{2} AND gates + 2m^{2}
 m XOR gates which is worse than the Sunar and Koc multiplier (2001) as shown
in Fig. 5. However, the time complexity is T_{A} +
(1 + R log_{2}mX ) T_{X} which is the same as the multiplier
reported by (Sunar and Koc, 2001).
Example 8: Let A, B and C ∈ GF (2^{5}). Using the same conversion procedure illustrated in Example 7, the product C can be found as:
Let A = B = (11000), ⇒ AxB = A^{2} = (11000)^{2} = (10001)
C, Â =
= (10100), we have:
Ĉ and the product C after conversion back to the normal basis is also
(10001).
Table 2 shows the conversion based multipliers and their space and time complexities.
INVERSION
Inversion using normal basis consists of multiplications and cyclic shifts. Since cyclic shifts require almost negligible time, the number of multiplications is the key parameter for efficient inversion. In the following subsections we survey existing algorithms for inversion over GF (2^{m}) using normal basis. These inversion algorithms can be classified into three main categories: (1) Standard, (2) Exponent Decomposing and (3) Exponent Grouping inversion algorithms.
Standard inversion algorithm: This category contains only one algorithm,
which is the first proposed normal basis inversion algorithm over GF (2^{m})
in Wang et al. (1985). The basic idea is derived from Fermat’s Little
Theorem where the inverse of a ∈ GF (2^{m}) is given by since
, we can express a^{1} as:
Algorithm 1: Wang’s et al. (1985) inversion algorithm.
• 
B = a^{2}, C = 1 and k = 0. 
• 
D = B x C and k = k + 1. 
• 
if k = m1, a^{1} = D, Stop. 
• 
if k < m1, B = B^{2} and C = D. 
• 
Go back to 2. 
This only requires multiplication and cyclic shift operations. The algorithm
procedure for computing a^{1} as suggested by Wang et al. (1985)
is shown in Algorithm 1. It requires (m2) multiplications + (m1) cyclic shifts.
The advantage of this method is its simplicity while its disadvantage is the
large O (m) number of multiplications.
Example 9: A and A^{1 }C ∈ GF (2^{5}). The inverse
of A is .
By applying Wang’s et al. (1985) inversion algorithm we have:
Step 1: 
B =A^{2}, C = 1, k = 0 
Step 2: 
D = BxC = A^{2}x1 =A^{2}, k = 1 
Step 4: 
k =1 < 4  B = B^{2} = (A^{2})^{2} = A^{4},
C = A^{2} 
Step 2: 
D = B x C = A^{4} x A^{2} =A^{6}, k = 2 
Step 4: 
k = 2 < 4  B = B^{2} = (A^{4})^{2} = A^{8},
C = A^{6} 
Step 2: 
D = BxC = A^{8} xA^{6} =A^{1}4, k = 3 
Step 4: 
k = 3 < 4  B = B^{2} = (A^{8})^{2} = A^{1}6,
C = A^{1}4 
Step 2: 
D = B x C =A^{1}6xA^{1}4 = A^{30}, k = 4 
Step 3: 
k = 4 = 4  A^{1} = D = A^{30}, STOP. 
Exponent decomposing inversion algorithms: Since the number of multiplications
is the dominant factor in determining the computation time of the inversion
operation, several algorithms attempted to improve the inversion speed by decomposing
the exponent to reduce the required number of multiplications and replace it
with squaring operations which are much simpler compared to multiplication.
In 1988 Itoh and Tsujii (1988) proposed a GF (2^{m}) inversion algorithm derived from Fermat’s Little Theorem using normal basis. The basic idea used was to decompose the exponent m1 as follows:
The exponent 2m¯1 is further decomposed as follows:
1.If m is odd, then
and
2.If m is even, then
and
The proposed algorithm by (Itoh and Tsujii, 1988) is shown in Algorithm 2 and it requires log_{2} (m1)+v (m1)1 multiplications, where v (x) is the number of 1’s in the binary representation of x.
Algorithm 2: ItohTsujii inversion algorithm.
Inputs: a 
Output: l = a^{1} 
1. 
set s ← ⌊log_{2}
(m1)⌋1. 
2. 
set p ← a. 
3. 
for i = s down to 0 do 
3.1. 
set r ← shift m1 to right by s bit(s) 
3.2. 
set q ← p 
3.3. 
rotate q to left by⌊ r/2⌋ bit(s) 
3.4. 
set t ← p x q 
3.5. 
if last bit of r = 1, 
3.5.1. 
rotate t to left by 1 bit. 
3.5.2. 
p ← t x a 
3.5. 
else 
3.5.3. 
p ← t 
3.6. 
s ← s1 
4. 
rotate p to left by 1 bit. 
5. 
set l ← p. 
6. 
return l. 
Example 10: Let A and A^{1 } ∈ GF (2^{5}). The
inverse of A is .
By applying Itoh and Tsujii inversion algorithm we have:
step 1. 
S = 21 = 1 
step 2. 
P = A 
step 3. 
i = 1 
step 3.1. 
r = 2 
step 3.2. 
q = p = A 
step 3.3. 
q = q^{2} = A^{2} 
step 3.4. 
t = p x q = A x A^{2} = A^{3} 
step 3.5.3. 
p = t = A^{3} 
step 3.6. 
s = s 1 = 11 = 0 
step 3. 
i = 0 
step 3.1. 
r = 4 
step 3.2. 
q = p = A^{3} 
step 3.3. 
q = (q^{2})^{2} = (( A^{3})^{2})^{2}
=A^{12} 
step 3.4. 
t = p x q =A^{3} x A^{12} = A^{15} 
step 3.5.3. 
p = t = A^{15} 
step 3.6. 
s =s 1 = 01 = 1 
step 4. 
P = P^{2} = (A^{15})^{2} = A^{30} 
step 5. 
i = p = A^{30} 
step 6. 
return l = A^{30} 
Feng (1989) has also proposed an inversion algorithm which requires the same
time complexity as the Itoh and Tsujii inversion algorithm, i.e., O (log_{2}
(m)). The inversion algorithm was also derived from Fermat’s Little Theorem
and is also based on exponent decomposition as the Itoh and Tsujii inversion
algorithm. Feng defined m1 as follows:
Let m_{q }m_{q1}… m_{1 }m_{0} be the binary representation of (m1), where m_{q} = 1 and m_{i} is 0 or 1 for i = 0 to q1, i.e., m1 = m_{q} 2^{q} + m_{q¯1} 2_{q¯1} +… + m_{1} 2^{1} + m_{0} 2^{0}. The inverse a^{1} can be computed using Algorithm 2.
Algorithm 3: Feng’s inversion algorithm.
1. 
b = a 
2. 
for i = q to 1 do 
2.1. 
if m_{i} = 1, then . 
2.2. 
. 
2.3. 
if m_{i1} = 1, then b = b x a. 
3. 
, Stop. 
Only step 2.2 and 2.3 in Algorithm 3 require multiplications. Step 2.1 and
3 need only cyclic shifts. Thus, the algorithm has (q + p) multiplications where
q = ⌊
log_{2} (m) ⌋
and, p = # 1’s in the binary expression of (m1). Accordingly, the proposed
algorithm has the same complexity as the Itoh and Tsujii inversion algorithm.
The difference is only that the Itoh and Tsujii algorithm performs squaring
during each iteration, while Feng’s algorithm computes the square roots
each iteration and squares at the end by 2^{mm0}.
Example 11: Let A and A^{1 }∈ GF (2^{5}). The
inverse of A is .
By applying Feng’s inversion algorithm we have: q = 2, m1 = 4, m_{2}
= 1, m_{1} =0, m_{0} =0 and p = 0,
Takagi et al. (2001) proposed another inversion algorithm which was
also based on exponent decomposition. The main idea is as follows: Since
Hence, the algorithm can use the inversion algorithm by (Itoh and Tsujii, 1988)
by replacing m by m  h. This method reduces the number of multiplications through
performing an exhaustive search for an optimal value of h. The time complexity,
however, is O (log_{2} (m)).
EXPONENT GROUPING INVERSION ALGORITHMS
Grouping exponent terms is another approach which attracted some researchers. However, the resulting speed is O (m) which is worse than the O (log_{2} (m)) of the Itoh and Tsujii inverter. The inversion algorithms in this category are based on the idea of grouping exponent terms as follows:
Since ,
this allows grouping exponent terms in different ways. Fenn et al. (1996)
proposed an inversion algorithm which requires m/2 multiplications. The basic
idea was by dividing the exponent terms into two equal groups. The first group
contains a^{2}, a^{4},.., a^{k}, where, k = 2^{(m1)/2}.
The other group contains the remaining terms till .
By multiplying the first term in each group and repeated squaring by ,
the following formula can be applied to give the inverse within m/2 multiplications.
Example 12: Let A and A^{1 }∈ GF (2^{5}), m =
5 is odd. The inverse of A is .
By applying Fenn’s inversion algorithm we have:
Example 13: Let A and A^{1 }∈ GF (2^{8}), m =
8 is even. The inverse of A is .
By applying Fenn’s inversion algorithm we have:
A further improvement, however, have been reported by Jimenez Calvo and Torres
(1997). The Jimenez Calvo and Torres (1997) inversion algorithm uses a fixed
seed a^{2}. a^{4} = a^{6}, which uses the same idea
of grouping into two groups but in a different manner. This can be shown as:
Table 3: 
The inverters and their time complexities 

Example 14: Let A and A^{1 }∈ GF (2^{8}), m =
8 is even. The inverse of A is .
By applying Jimenez Calvo and Torres inversion algorithm we have:
This method was generalized by choosing m’ and b such that m = bm' + r,
where r = [1,2]. This is basically the same way trying to group more terms in
the seed at the beginning.
This requires around m/m’ multiplications but still with the same O (m) time complexity.
Example 15: Let A and A^{1 }∈ GF (2^{16}), m =
16 is even. The inverse of A is .
By applying Jimenez Calvo and Torres generalized inversion algorithm we have:
Let b = 3, 5 and r = 1. First we calculate as
follows:
Then applying the generalized equation we get:
Yen (1997) proposed another approach for grouping terms which results in reduced
number of clock cycles. Yen realized that the fundamental idea behind Fenn’s
et. al. inverter is to rearrange all terms into (m1)/2 groups and to extract
the common term ().
Accordingly, Yen redefined the common term to contain p terms and a^{1}
can be found as:
where the common part X is precomputed as:
The best case requires k + (p  1) multiplications while the worst case requires
k + (p  1) + (p  2) = k + 2p  3 multiplications and the optimal selection
of p for m = pk + 1 is .
The time complexity, however, is still O (m). Table 3 summarizes
the inversion algorithms and their time complexities.
CONCLUSIONS
In this study we presented a survey on finite field arithmetic using normal basis over GF (2^{m}). Addition requires simple XOR operation while squaring requires only a cyclic shift. This is the most attractive property of using normal basis rather than using polynomial basis which is more suitable for software implementations.
We categorized normal basis multipliers into two main categories: (1) λmatrix
based multipliers and (2) Conversion based multipliers. The multipliers in (2)
are more efficient since they don’t require storing the λmatrix.
The comparisons shows that the Type II SunarKoc multiplier is the best multiplier
with a hardware complexity of m^{2} AND gates + 3/2m (m1) XOR gates
and a time complexity of T_{A} + (1 + ⌈log_{2}m⌉)
T_{X}.
We also categorized the inversion algorithms into three main categories: (1) Standard, (2) Exponent Decomposing and (3) Exponent Grouping. Inversion algorithms in category (3) performed better than the standard with time complexity O (m). The exponent decomposing inversion algorithm of Itoh and Tsujii, however, is found to be the best inverter requiring only log_{2} (m1) multiplications.
ACKNOWLEDGEMENT
The authors would like to acknowledge the support of King Abdul Aziz City for Science and Technology for the grant of the research No. AR2217. The authors would like also to acknowledge the support of King Fahd University of Petroleum and Minerals.