Introduction
Association rule mining finds interesting association among a large set of data items. With massive amount of data continuously being collected and stored. Many industries are becoming interested in mining association rules from their databases. The discovery of interesting association relationship among huge amount of business transaction records can help in many business decision making process, such as catalogue design, cross marketing and loss leader analysis.
The problem of mining frequent patterns first arose as a subproblem of mining association rules (Agrawal and Srikant, 1994). A frequent pattern is a set of binary attributes (items) which support, i.e., the number of objects in the database containing it, is at least equal to a minimum support threshold minsup defined by the user. It then was discovered that frequent patterns are involved in a variety of problems (Han et al., 2000): mining sequential patterns (Agrawal and Srikant, 1995), episodes (Mannila et al., 1997), correlations (Brin et al., 1997; Silverstein et al., 1998), multidimensional patterns (Kamber et al., 1997; Lent et al., 1997), maximal patterns (Lin and Kedem 1998; Bayardo 1998; Zaki et al., 1997), closed patterns (Pasquier et al., 1999a; Pei et al., 2000; Taouil et al., 2000).
Typical algorithms for finding the frequent pattern, i.e., the set of all frequent itemsets (Arrawal and Srikant, 1994; Mannila et al., 1994), operate in a bottomup breadthfirst fashion. In other words, the computation starts from frequent 1itemsets (minimal length frequent itemsets at the bottom) and then extends one level up in every pass until all maximal (length) frequent itemsets are discovered. All frequent itemsets are explicitly examined and discovered by these algorithms. When all maximal frequent itemsets are small, these algorithms perform reasonably well. However, performance drastically decreases when any of the maximal frequent itemsets becomes larger. This is due to the fact that a maximal frequent itemset of size l implies the presence of 2^{l} 2 nontrivial frequent itemsets (its nontrivial subsets) as well, each of which is explicitly examined by such algorithms.
Of course the set of the maximal (length) frequent itemsets (called maximum frequent pattern) uniquely defines the entire frequent itemsets: frequent itemsets are precisely all the nonempty subsets of its elements. Thus trivially, discovering the maximum frequent pattern implies immediate discovery of all the maximal frequent itemsets. In many situations, it suffices to know the supports of the maximal frequent itemsets and the supports of all subsets of the maximal frequent itemsets. For such situations this MAXFP (Maximum Frequent Pattern) algorithm provides efficient solution.
In this study we propose Fast MAXFP algorithm to find the maximal frequent itemsets as well as support of all its subsets. Not only the this study contains the process of finding maximum frequent patterns but also support of all frequent itemsets which are subsets of maximal frequent itemsets discovered by MAXFP algorithm. This new concept also reduces number of database passes as compared to other approaches (Lin and Kedem, 1998).
The Fast MAXFP algorithm is based on PincerSearch (Lin and Kedem, 1998) and Counting Inference approach (Bastide et al., 2000). As in PincerSearch (Lin and Kedem, 1998), our approach use MFP (maximum frequent pattern) and search MFP from both bottomup and topdown directions. This Fast MAXFP algorithm performs well even when the maximal frequent itemsets are long.
In Fast MAXFP algorithm, the bottomup search follows concept of pattern Counting Inference method presented in (Bastide et al., 2000), were the PascalGen algorithm was introduced, that minimizes as much as possible the number of pattern support counts performed when extracting frequent patterns. This method relies on the concept of key patterns, where a key pattern is a minimal (with respect to set inclusion) pattern of an equivalence class gathering all patterns common to the same objects of the database relation. Hence, all patterns in an equivalence class have the same support and the supports of the nonkey patterns of an equivalence class can be determined using the supports of the key patterns of this class. With pattern counting inference, only the supports of the frequent key patterns (and some infrequent ones) are determined from the database, while supports of the frequent nonkey patterns are derived from those of the frequent key patterns. In this approach, as in Apriori (Agrawal and Srikant, 1994), frequent patterns are extracted in a levelwise manner: During each iteration, candidate patterns of size k are created by joining the frequent patterns of size k1, their supports are determined and infrequent ones are discarded. Using counting inference, if a candidate pattern of size k is a nonkey pattern, then its support is equal to the minimal support among the patterns of size k 1 that are its subsets. This allows to reduce the number of patterns considered during each database pass while counting supports and even more important, to reduce the total number of passes. This optimization is valid since key patterns have a property that is compatible with the pruning of Apriori: all subsets of a key pattern are key patterns and all supersets of a nonkey pattern are nonkey patterns.
The topdown search is implemented efficiently by introducing a set that we call the maximumfrequentcandidatepattern or MFCP. This set consists of the minimum number of itemsets such that the union of all their subsets contains all the known (discovered so far) frequent itemsets but not any of the known infrequent itemsets. The known frequent and infrequent itemsets are those determined by the supports of the itemsets that have been discovered until the most recent pass. Thus obviously at any point of the algorithm MFCP is a superset of MFP. When the algorithm terminates, MFCP and MFP are equal. Unlike the bottomup search that goes up one level in each pass, MFCP set can “move down” many levels in one pass.
In the presesnt study, we use the MFCP idea, as in (Lin and Kedem, 1998) along with counting inference (Bastide et al., 2000), to discover frequent itemsets in market basket data. In most cases, our algorithm not only reduces the number of passes of reading the database but also reduce the number of candidates (for whom support is counted). In such cases, both I/O time and CPU time are reduced by: i) eliminating the candidates that are subsets of maximal frequent itemsets found in MFCP. ii) using PascalGen (Bastide et al., 2000), determine as much support count as possible without accessing the database by information gathered in previous passes.
The Problem of Mining Frequent Patterns
Definition 1
Let P be a finite set of items, O a finite set of objects (e. g., transaction
ids) and R ⊆ OxP a binary relation (where (o, p) ∈ R
may be read as item p is included in transaction o). The triple
D = (O, P, R) is called dataset.
Each subset P of P is called a pattern. We say that a pattern P is included in an object o ∈ O if (o, p) ∈ R for all p ∈ P. Let f be the function which assigns to each pattern P ⊆ P the set of all objects which include this pattern: f(P) = {o ∈ O  o includes P }.
The support of a pattern P is given by supp(P) = card(f(P))/card(O). For a given threshold minsup ∈ (0,1), a pattern P is called frequent pattern or frequent itemsets if supp(P) ≥ minsup.
The problem of association rule mining consists of two stages (Agrawal and Srikant,1994): i) the discovery of frequent itemsets, followed by ii) the generation of association rules.
The maximum frequent pattern (MFP) is the set of all the maximal frequent itemsets. (An itemset is a maximal frequent itemset if it is frequent and no proper superset of it is frequent.) Obviously, an itemset is frequent if and only if it is a subset of a maximal frequent itemset. Thus, it is necessary to discover only the maximum frequent pattern during the first stage. Of course, an algorithm for that stage may explicitly discover and store some other frequent itemsets as a necessary part of its executionbut minimizing such effort may increase efficiency. If the maximum frequent pattern is known, one can easily generate the required subsets and count their supports by reading the database once, which is quite straightforward.
It follows, that in the vast majority of cases, the discovery of the maximum frequent pattern dominates the performance of the whole process. Therefore, first we find maximum frequent patterns and then calculate support of all frequents patterns.
Key Features of the Counting Inference Approach
Here, we give the theoretical basis of Counting Inference Approach as in
(Bastide et al., 2000). Like Apriori, Counting Inference traverses the
powerset of P levelwise: At the kth iteration, the algorithm generates first
all candidate kpatterns.
Definition 2
A kpattern P is a subset of P with card(P) = k.
A candidate kpattern is a kpattern where all its proper subpatterns
are frequent.
For the candidate kpatterns one database pass is used to determine their support. Then infrequent patterns are pruned.
Key Patterns
Counting Inference approach is based on the observation that patterns can
be considered as equivalent if they are included in exactly the same objects.
We describe this fact by the following equivalence relation θ on
patterns.
Definition 3
For patterns P, Q ⊆P, we let P θ Q if and
only if f(P) = f(Q). The set of patterns which are
equivalent to a pattern P is given by [P] = {Q ⊆ P
P θ Q}.
In the case of patterns P and Q with P θ Q, both patterns obviously have the same support:
Lemma 1

Let P and Q be two patterns. 

(i) P θ Q⇒ supp(P) = supp(Q) 

(ii) P ⊆ Q ∧ supp(P) = supp(Q) 
PθQ 
Proof

(i) P θ Q ⇔ f(P) = f(Q)⇒
supp(P) = card(f(P))/card(O) = 

card(f(Q))/card(O) = supp(Q). 

(ii) Since P ⊆ Q and f is monotonous decreasing,
we have f(P) ⊇ f(Q). 

supp(P) = supp(Q) is equivalent to card (f(P))
= card(f(Q)) which implies with the former f(P)
= f(Q) and thus P θ Q. 
Hence if we knew the relation θ in advance, we would need to count the support of only one pattern in each equivalence class. Of course we do not know the relation in advance, but we can construct it step by step. Thus, we will (in general) need to determine the support of more than one pattern in each class, but not of all of them. If we already have determined the support of a pattern P in the database and pass later a pattern Q ∈ [P], then we need not access the database for it because we know that supp(Q) = supp (P). The first patterns of an equivalence class that we reach using a levelwise approach are exactly the minimal patterns in the class:
Definition 4
A pattern P is a key pattern if P ∈ min (P). A
candidate key pattern is a pattern where all its proper subpatterns are frequent
key patterns.
Observe that all candidate key patterns are also candidate patterns.
Pattern Counting Inference
In the algorithm, we apply the pruning strategy to both candidate patterns
and candidate key patterns. This is justified by the following theorem as in
(Bastide et al., 2000).
Theorem 1

If Q is a key pattern and P
⊆ Q, then P is also a key pattern. 

If P is not a key pattern and P ⊆ Q,
then Q is not a key pattern either. 
Proof
Let P ⊆ Q and P is not a key pattern. Then, ∃P’
∈ min [P] with P’ ⊂ P. From f(P’)
= f(P) it follows that f(Q) = f(Q\(P\
P’)). Hence Q is not minimal in [Q] and thus by definition
not a key pattern. (i) is a direct logical consequence of (ii).
The algorithm determines, at each iteration, the key patterns among the candidate key patterns by using (ii) of the following theorem:
Theorem 2

Let P be a pattern. 

(I) Let p ∈ P. Then P ∈
(P\{p}] if and only if supp(P) = supp(P\{p}). 

(ii) P is a key pattern if and only if supp(P)
≠ min _{p}_{∈}_{P} (supp(P\{p})): 
Proof
(I) The if part follows from Lemma 1.(ii). The only if part is obvious.
(ii) From (i) we deduce that P is a key pattern if and only if supp(P)
≠ supp(P\{p}), for all p∈P. Since supp is
a monotonous decreasing function, this is equivalent to (ii).
Since all candidate key patterns are also candidate patterns, when generating all candidate patterns for the next level we can at the same time determine the candidate key patterns among them. If we reach a candidate kpattern which is not a candidate key pattern, then we already passed along at least one of the key patterns in its equivalence class in an earlier iteration. Hence we already know its support. Using the following theorem, we determine this support without accessing the database:
Theorem 3

If P is a nonkey pattern, then 

supp(P) = min _{p}_{∈}_{P}(supp(P\{p})): 
Proof
“≤” follows from the fact that supp is a monotonous decreasing
function. “≥”: If P is not a key pattern then there exists
p∈P with PθP\{p}. Hence supp(P)
= supp(P\{p}) ≥ min_{p}_{∈}_{P} (supp(P\{q}).
Thus the database pass needs to count the supports of the candidate key patterns only.
A Fast MAXFP Algorithm for Discovering the Maximum Frequent
Pattern and Support Count
We propose a Fast MultiStrategy algorithm which combines topdown and bottomup
search. It relies on a new data structure during its execution, the maximumfrequentcandidatepattern
(MFCP), which is defined below.
Definition 5
Consider some point during the execution of an algorithm for finding MFP.
Some itemsets are frequent, some infrequent and some unclassified. The maximumfrequentcandidatepattern
(MFCP) is a minimum cardinality set of itemsets such that the union of all the
subsets of its elements contains all the frequent itemsets but does not contain
any infrequent itemsets, that is, it is a minimum cardinality set satisfying
the conditions
where FREQUENT and INFREQUENT, stand respectively for all frequent and infrequent itemsets.
Thus obviously at any point of the algorithm MFCP is a superset of MFP. When the algorithm terminates, MFCP and MFP are equal. The computation of our algorithm follows the bottomup search approach. In each pass, in addition to counting supports of the candidates in the bottomup direction by PascalGen (Bastide et al., 2000), the algorithm also counts supports of the itemsets in MFCP: this set is adapted for the topdown search. This will help in pruning candidates, but will also require changes in candidate generation.
Consider now some pass k, during which itemsets of size k are to be classified. If some itemset that is an element of MFCP, say X of cardinality greater than k is found to be frequent in this pass, then all its subsets must be frequent. Therefore, all of its subsets of cardinality k can be pruned from the set of candidates considered in the bottomup direction in this pass. These itemsets and their supersets will never be candidates throughout the rest of the execution, potentially improving performance. But of course, as the maximum frequent pattern is finally computed, they will not be forgotten.
Similarly, when a new infrequent itemset is found in the bottomup direction, the algorithm will use it to update MFCP. The subsets of MFCP must not contain this infrequent itemset. By using the MFCP, we will be able to discover some maximal frequent itemsets in early passes. This is especially significant when the maximal frequent itemsets discovered in the early passes are long.
MFCP Updating Algorithm
Consider some itemset Y that has been just classified as infrequent
and assume that it is a subset of some itemset that is an element of MFCP. To
update MFCP, we replace X by Y itemsets, each obtained
by removing from X a single item (element) of Y. We do this for
each newly discovered infrequent itemset and each of its supersets that is an
element of MFCP. Formally, we have the following MFCPgen algorithm (shown here
for pass k).
Algorithm
MFCPGen (From (Lin and Kedem, 1998)
Input
Old MFCP and the infrequent set S_{k} discovered in pass k
Output
New MFCP
Example
Suppose {{1,2,3,4,5,6}} is the current (old) value of MFCP and two new infrequent itemsets {1,6} and {3,6} are discovered. Consider first the infrequent itemset {1,6}. Since the itemset {1,2,3,4,5,6} (element of MFCP) contains items 1 and 6, one of its subsets will be {1,6}.By removing item 1 from itemset {1,2,3,4,5,6}, we get {2,3,4,5,6} and by removing item 6 from itemset {1,2,3,4,5,6} we get {1,2,3,4,5}. After considering itemset {1,6}, MFCP becomes {{1,2,3,4,5},{2,3,4,5,6}}. Itemset {3,6} is then used to update this MFCP. Since {3,6} is a subset of {2,3,4,5,6}, two itemsets {2,3,4,5} and {2,4,5,6} are generated to replace {2,3,4,5,6}. The itemset {2,3,4,5} is a subset of the itemset {1,2,3,4,5} in the new MFCP and it will be removed from MFCP. Therefore, MFCP becomes {{1,2,3,4,5}, {2,4,5,6}}. The topdown arrows in Fig. 1 show the updates of MFCP.

Fig. 1: 
Twoway search with Counting Inference approach, *Bold style
represents candidates that are frequent, *Underline style represents candidates
that are in frequent, *Italic style represents the candidates that are pruned
by using MFCS, *Strikethrough itemsets represent the candidates that their
supports are counted but will not be used to generate new candidates, *Itemsets
in ellipses are the maximum frequent itemsets, *Itemsets within the [] are
key pattern 
A New Candidate Generation Method
Our method follows Counting Inference approach in bottomup direction for
generating candidate itemsets. for this purpose use PascalGen algorithm.
Algorithm
PascalGen From (Bastide et al., 2000) the join procedure of counting
inference algorithm.
Input
L_{k}, the set of frequent kpatterns p with
their support p.supp and the p.key flag.
Output
C_{k}, the set of candidate k+1patterns c
each with the flag c.key, the value
c.pred_supp and the support c.supp if c is not a key pattern.
The way PascalGen operates is basically known from the generator function AprioriGen which was introduced in (Agrawal and Srikant, 1994). In addition to AprioriGen’s join and prune steps, PascalGen makes the new candidates inherit the fact of being or not a candidate key pattern (step 16) by using Theorem 1 and it determines at the same time the support of all non key candidate patterns (step 19) by using Theorem 3.
In working of PascalGen is shown by example. For our example we use dataset in Fig. 4, the algorithm performs first one database pass to count the support of the 1patterns (Fig. 2). As {5} has the same support as the empty set, {5} is marked as a nonkey pattern and call PascalGen to generate candidate pattern.
At the next iteration, all candidate 2patterns are created and stored in C_{2} (Fig. 3). At the same time, the support of all patterns containing {5} as subpattern is computed. Then a database pass is performed to determine the supports of the remaining ten candidate patterns:
Similarly at the third iteration, it turns out in PascalGen that each newly generated candidate pattern contains at least one subpattern, which is not a key pattern. so their supports are determined directly in PascalGen. From there, the database will be accessed only for one candidate pattern. In the fourth and fifth iteration (if needed ), all supports are determined directly in PascalGen.
In our algorithm, simultaneously PascalGen, at each pass MFCP updates and after a maximal frequent itemset is added to MFCP, all of its subsets in the frequent set (as computed so far) will be removed. We show by example that if the PascalGen join procedure is applied some of the needed itemsets could be missing from the candidate set. Consider Fig. 1. Suppose the original frequent itemset L_{3} is {1,2,3}, {1,2,4}, {1,2,5}, {1,3,4}, {1,3,5}, {1,4,5}, {2,3,4}, {2,3,5}, {2,4,5}, {2,4,6}, {2,5,6}, {3,4,5}, {4,5,6}. Assume itemset {1,2,3,4,5} in MFCP is determined to be frequent. Then all 3itemsets of the original frequent set L_{3} will be removed from it by our algorithm, except for {2,4,6}, {2,5,6} and {4,5,6}. Since the join procedure uses a (k1)prefix test on the frequent set to generate new candidates and no two itemsets in the current frequent set {2,4,6}, {2,5,6}, {4,5,6} share a 2prefix, no candidate will be generated by applying the join procedure on this frequent set. However, the correct candidate set should be {2,4,5,6}.
The full set of required candidates can be obtained by restoring some itemsets to the current frequent set. They will be extracted from MFCP, which implicitly maintains all frequent itemsets discovered so far. The itemsets that need to be restored are precisely those kitemsets that have the same (k1)prefix as an itemset in the current frequent set.
Consider then in pass k, an itemset X in MFCP and an itemset Y in the current frequent set such that X>k. Suppose that the first k1 items of Y are in X and the (k1)’st item of Y is equal to the j’th item of X. We determine the ksubsets of X that have the same (k1)prefix as Y by taking one item of X that has an index greater than j and combining it with the first k1 items of Y to get one of these ksubsets. After these kitemsets are found, we generate candidates by combining them with the itemset Y. When we recovered new itemsets calculate their pred_supp, supp and key values. This is specified by the following recovery procedure.

Fig. 2: 
Frequent 1Itemsets 

Fig. 3: 
Candidate pattern C_{2} and Frequent 2Itemsets L_{2} 

Fig. 4: 
Transaction Table 
Algorithm
The recovery procedure
Input
C_{k+1}, L_{k} and current MFP containing
the maximal frequent itemsets discover until pass k
Output
A complete candidate set C_{k+1}
1. 
for all itemsets l in L_{k} 
2. 
for all itemsets m in MFP 
3. 
if the first k1 items in l are also in m 
4. 
/* suppose m.item_{j} = l.item_{k}1
*/ 
5. 
for i from j + 1 to m 
6. 
C_{k+1} := C_{k+1} ∪ {{l.item_{1},
l.item_{2},..., l.item_{k}, m.item_{i}}} 
9) 

forall new discovered itemsets c ∈ C_{k+1}
do begin 
10) 


c.key := true; c.pred_supp := +∞ 
11) 

forall ksubsets s of c do begin 
12) 



c.pred_supp := min(c.pred_supp, s.supp) 
13) 



if not s.key then c.key := false 
14) 


end 
15) 

end 
16) 

if not c.key then c.supp := c.pred_supp 
17) 
return C_{k+1}. 
Example
Fig. 1 shows that MFCP is {{1,2,3,4,5}} and the current
frequent set is {{2,4,6}, {2,5,6},{4,5,6}}. The only 3subset of {{1,2,3,4,5}}
that needs to be restored for the itemset {2,4,6} to generate a new candidate
is {2,4,5}. This is because it is the only subset of {{1,2,3,4,5}} that has
the same length and the same 2prefix as the itemset {2,4,6}. By combining {2,4,5}
and {2,4,6}, we recover the missing candidate {2,4,5,6}. No itemset needs to
be restored for itemsets {2,5,6} and {4,5,6}. As stated earlier, our algorithm
will not consider the subsets of a maximal frequent itemset as candidates. Therefore,
the prune procedure in our new candidate generation algorithm will remove those
subsets.
Algorithm
The new prune procedure
Input
Current MFP and C_{k+1} generated from PascalGen and recovery
procedures above
Output
Final candidate set C_{k+1}
1. 
for all itemsets c in C_{k+1} 
2. 
if c is a subset of any itemset in current MFP 
3. 
delete c from C_{k+1} 
we eliminate the candidates that are subsets of elements of current MFP (line
3).
In summary, our candidate generation process contains the following three steps:
1. 
call the PascalGen join procedure 
2. 
call the recovery procedure if necessary 
3. 
call the new prune procedure 
The Fast MAXFP Algorithm
We now present our complete algorithm; it relies on the combined approach
for determining the maximum frequent pattern.
Algorithm
The MAXFP Algorithm
Input
A database and a userdefined minimum support
Output
MFP which contains all maximal frequent itemsets
1. 
.supp := 1;key
:= true 
2. 
L_{0} := 0; k := 1 
3. 
C_{1} := {{i} i ∈ I } 
4. 
MFCP := {{1, 2, ….,n}} 
5. 
MFP := 0 
6. 
while C_{k} ≠0 begin 
7. 
if k = 1 then 
8. 

{read database and count supports for itemsets in C_{k}
and MFCP 
9. 


MFP := MFP ∪ {frequent itemsets in MFCP} 
10. 


L_{k} := {frequent itemsets in C_{k}}\{subsets
of itemsets in MFP } 
11. 


for all l ∈ L_{1} do l.predsupp :=
1; l.key := (l.supp ≠1) 
12. 


} 
13. 

else {read database and count supports for itemsets in C_{k}
call count_support_procedure and MFCP 
14. 



MFP := MFP ∪ {frequent itemsets in MFCP} 
15. 
L_{k} :={ frequent itemsets in C_{k}
call frequent_pattern_procedure}\{subsets of itemsets in MFP} 
16. 

} 
17. 
S_{k} := {infrequent itemsets in C_{k}} 
18. 
call the PascalGen to generate C_{k+1} 
19. 
if any frequent itemset in C_{k} is a subset
of an itemset in MFP 
20. 
call the recovery procedure to recover candidates to C_{k+1} 
21. 
call new prune procedure to prune candidates in C_{k+1} 
22. 
call the MFCPgen algorithm when S_{k} is
not empty 
23. 
k := k + 1 
24. 
end 
25. 
return MFP 
The algorithm starts with the empty set, which always has a support of 1 and which is (by definition) a key pattern (line 1 and 2). In line 8, frequent 1patterns are determined they are marked as key patterns unless their support is 1(line 11). PascalGen is called to compute the candidate pattern (line 18). MFCP is initialized to contain one itemset, which consists of all the database items. MFCP is updated whenever new infrequent itemsets are found (line 22). If an itemset in MFCP is found to be frequent, then its subsets will not participate in the subsequent support counting and candidate set generation steps. Line 10,15 and line 21 will exclude those itemsets that are subsets of any itemset in the current MFP set, which contains the frequent itemsets found in the MFCP. If some itemsets in L_{k} are removed, the algorithm will call the recovery procedure to recover missing candidates (line 20). Line 13 call count_support_procedure for counting support of candidate itemsets(when k>1). frequent_pattern_procedure call (when k>1) for finding frequent pattern from new generated candidate itemsets(line 15).
Algorithm
The count_support_procedure
Input
C_{k} candidate set generated from the PascalGen and recovery
procedures above with each candidate c have flag c.key.
Output
Candidate set C_{k }in which each candidate c does
count support c.supp.
1. 
if ∃c ∈ C_{k}  c.key
then 
2. 
forall o ∈ D do begin 
3. 
C_{o} := subset(C_{k}; o) 
4. 
forall c ∈ C_{o}  c.key do 
5. 
c.supp ++ 
6. 
end 
Algorithm
The frequent_pattern_procedure
Input
C_{k}, the set of candidate kpatterns c
each with the flag c.key, the value
c.pred_supp and the support c.supp.
Output
L_{k}, the set of frequent kpatterns p with their
support p.supp and the p.key flag.
1. 
forall c ∈ C_{k} do 
2. 
if c.supp ≥ minsup then begin 
3. 
if c.key and c.supp = c.pred_supp then 
4. 
c.key := false 
5. 
L_{k} := L_{k} ∪{c} 
6. 
end 
7. 
end 
Counting the Supports of All Frequent Itemsets
The MAXFP algorithm extracts all maximal frequent patterns from database.
The nonempty subsets of these maximal frequent patterns gives all frequent
patterns but in those cases when we need to know the supports of all frequent
patterns, there is no efficient algorithm for counting supports of all frequent
itemsets.
Most of the previous algorithms (Lin and Kedem, 1998) require extra (one more) database pass for counting supports of all frequent itemsets which is generated by maximum frequent patterns. They build hashtree for the maximum frequent patterns, which stores all the subsets of the maximal frequent patterns. For counting support of all frequent patterns they use Apriori algorithm with minor change and add a field for storing the supports of internal nodes. The support in every node is incremented whenever the node is visited in the forward direction.
Here we introduce new concept for counting support of all frequent patterns. In our MAXFP approach, PascalGen algorithm uses concept of key patterns in this case each item has three fields, key, predsupp and supp. The supp field used for storing support of itemsets, so there is no need to add extra field as in Lin and Kedem (1998). In some cases there is no need for counting support of all frequent itemsets when all itemsets generated during extracting maximum frequent patterns. But when above mentioned cases do not occur then we will use PascalGen with minor changes for counting supports of those patterns which have not occurred during extracting maximum frequent patterns. It counts support of any frequent itemset by accessing record of frequent itemsets which have been extracted by MAXFP, or its subsets support (by property of key pattern). Using concept of key pattern modified PascalGen counts the support of all frequent patterns without additional database access.
Performance Evaluation
According to both (Agrawal and Srikant, 1994) and our experiments, a large
fraction the 2itemsets will usually be infrequent. These infrequent itemsets
will cause MFCP to go down the levels very fast, allowing it to reach some maximal
frequent itemsets after only a few passes. Indeed, in our experiments, we have
found that, in most cases, many of the maximal frequent itemsets are found in
MFCP in very early passes.
Running Example
We have illustrated the new algorithm on the dataset shows in Fig.
4, for minsup = 2/5:
For finding maximum frequent pattern in above dataset our new MAXFP algorithm needs three database passes in which the algorithm counts the supports of 7+11+2 = 20 patterns. Pincer (Lin and Kedem, 1998) would have needed three database passes for counting the supports of 7+16+15 = 38 patterns for the same dataset. This algorithm performs well, when bottomup approach totally turns out in PascalGen then, there is no need of accessing database for support count of candidate patterns. If at any stage MFCP is not updated the algorithm reduces the number of database passes. On the other side if bottom up approach turns out to be PascalGen then no additional database scan is required. Because all frequent patterns are generated during extracting maximum frequent patterns, so there is no need for counting support of all frequents patterns.
Conclusions
The maximum frequent pattern provides a unique representation of all the frequent itemsets. In many situations, it suffices to discover the maximum frequent pattern and once it is known, all the required frequent patterns can be easily generated.
The present study, Fast MAXFP algorithm discovers the maximum frequent pattern as well as counting the supports of all frequent patterns without accessing database very efficiently. This new algorithm reduces both the number of database passes and the number of candidates considered. By using the MFCP, we will be able to discover some maximal frequent itemsets in early passes. This early discovery of the maximal frequent itemsets reduces the number of candidates and the passes of reading the database, which in turn can reduce the CPU time and I/O time. This is especially significant when the maximal frequent itemsets discovered in the early passes are long. In bottomup direction we have used pattern counting inference approach, which is based on the notion of key patterns of equivalence classes of patterns. It allows to count from the dataset the support of some frequent patterns only, the frequent key patterns, rather than counting the support of all frequent patterns.
For counting supports of all frequent patterns, it has following advantages.
• 
There is no need of extra process to add new field with each
itemsets. 
• 
It does not require additional database pass. 
• 
This algorithm can be extended with minor modifications to perform incremental
mining of association rules. 