HOME JOURNALS CONTACT

Information Technology Journal

Year: 2007 | Volume: 6 | Issue: 2 | Page No.: 251-254
DOI: 10.3923/itj.2007.251.254
Study on Mutual Information Based Clustering Algorithm
Hongfang Zhou, Boqin Feng, Lintao Lv and Hui Yue

Abstract: Traditional clustering algorithms are designed for isolated datasets. But in most cases, relationships among different datasets are always existed. So we must consider the actual circumstances from the cooperative aspects. A new collaborative model is proposed and based on this model a new cooperative clustering algorithm is presented. In theorem, the algorithm is proved to converge to the local minimum. Finally, experimental results demonstrate that the clustering structures obtained by new algorithm are different from those of conventional algorithms for the consideration of collaboration and the performances of these collaborative clustering algorithms can be much better than those traditional separated algorithms under the cooperating circumstances.

Fulltext PDF Fulltext HTML

How to cite this article
Hongfang Zhou, Boqin Feng, Lintao Lv and Hui Yue, 2007. Study on Mutual Information Based Clustering Algorithm. Information Technology Journal, 6: 251-254.

Keywords: fuzzy algorithm, cooperative model, clustering, Mutual information, global analysis and local analysis

INTRODUCTION

Clustering can be used in many different research domains (Han et al., 2001). But it is not suitable on any circumstance. They can either only cope with the isolated or independent datasets, or need surprising run time (Kanungo et al., 2002; Topchy et al., 2002). Especially with the web information crazy increasing, it is evident to be disabled. In view of these situations, the new clustering algorithms are urgent to be built. On the analysis large volume of conventional research data, we present a mutual information based approach to measure the cooperative relations among several datasets. And we discuss a new cooperative model. Furthermore, based on this we introduce corresponding cooperative clustering algorithm MICCA. Experiment results show that it is feasible and effective.

Fuzzy clustering algorithm: The most famous fuzzy clustering algorithm is FCM (Fuzzy C-Means) (Hopper, 1999) and its objective function JFCM is given by:

(1)

In this formula, K is the number of samples; C is the number of clusters; m∈ [1,∞) is a weighting exponent called fuzzier and is the distance from feature point xk to prototype βj. Minimization of (1) is usually achieved by a Picard iteration technique, which updates memberships and prototypes in an alternating fashion until convergence. The KxC matrix ∪ = [μkj] is called a constrained fuzzy C-partition matrix if it satisfies

(2)

In general, d2 (xk, vj) corresponds to Euclidean measures. According to these principles, we can infer that FCM’s membership update equation for this formulation is:

(3)

And the center update equation is:

(4)

Probability clustering algorithm: Given a dataset Γ = [x1, x2,..., xk} and its corresponding clusters’ centroids is vj (j = 1, 2,...,C), then probability clustering objective function J is defined as follows:

(5)

In this formula,

mfs (xk, vj) = {i|fi (xk, vj) ≠ 0,1≤i≤n}.

Therefore, in order to compute p (vj | xk), we need to find all of the possible cluster vj containing xk. And mfs (xk) = ∪mfs (xk, vj) corresponds the cluster collections containing xk.

ALGORITHM

Mutual information: In this section, we will use a new information theory based mutual information measure to quantify cooperative relations among datasets (Shen et al., 2005). Considering simplicity and generality, we assume two datasets T and S. As described above, for every sample xk, there exists a probability pkj which denotes the possibility of xk belonging to cj.

Lemma 1: If only a measurement judging a sample’s uncertainty information as to some specific cluster is used, then it is only relevant to pkj expressing the probability of sample xk belonging to cluster cj. That is to say, there exists a function defined in the interval between 0 and 1, which satisfies

(6)

In Formula (6) and correspond to the probabilities of sample xk belonging to cluster cj in target dataset T and source dataset S, respectively. As described in Lemma 1, if dataset T is expressed by expected distribution

φ = {| k = 1,2,...,K;j = 1,2,...,C}, the function f ( )

measures every sample’s uncertainty in T constrained by expected distribution φ. Suppose

p = { | k = 1,2,...,K;j = 1,2,...,C}

is the actual distribution of T, the Lemma 2 can be inferred.

Lemma 2: If the real probability distribution of T and discrete random variables

are used to compute expectation p(φ), p(φ) is then corresponded to T’s uncertainty information under the condition φ = { } :

(7)

Here, Ep is the mathematical expectation of P.

Def. 1. Suppose the real probability distribution of dataset T is

and the real probability distribution of dataset S is

S’s mutual information gotten from T is:

(8)

and we call ε (P, φ) as ’s mutual information measurement gotten from T.

According to deduction above, we can generalize them and infer that if dataset S is affected by N datasets n1, n2,..., nN at the same time. In this case, formula (8) can be transformed as:

(9)

Cooperative clustering algorithm (MICCA) based on mutual information: In pervious sections, the dataset S’s mutual information gotten from other datasets Di in the cooperative environment is ε (N, φ) . Accordingly, under these circumstances, S’s clustering objective function is:

(10)

since

values corresponding to outliers are large, the idea is to design the objective function so that its global minimum is achieved when large

are discounted or ignored. In formula (10), ρ(·) is a loss function to reduce the effect of outliers. The loss function is typically linear for small distances and then saturates for larger ones.

Because ρ(·) is a loss function and its function is reducing the effect of the noises. The loss function’s value is increasing with respect to the small values until its independent variable get to some certain values. It is obvious that when ρ(·) reaches a constant its differential coefficient is close to 0. And it is reasonable to ignore it. That is to say, when ρ(·)’s value is large enough, the clustering objective function becomes

(11)

When formula (11) is minimized, formula (12) is satisfied.

(12)

Through formula (12), we can get:

(13)

When

is not large enough, its differential coefficient is much larger than the corresponding differential coefficient of ε (N, φ). In this case, we ignore the second section and the corresponding objective function turns out to be

(14)

In view of the convenience, we can think of the ρ(·) as a linear function approximately. Suppose its differential coefficient is the constant Cons and in this case the objective function need to be satisfied:

(15)

Through formula (19), we can derive:

(16)

Formula (10), (13) and (16) compose the cooperative clustering algorithm MICCA and it can be summarized as:

Algorithm MICCA

Input: the initial cluster center , the clustering objective function threshold CPthreshold

Output: the clusters’ center c1, c2,...,ck

Set the initial cluster center be and the iterative number inum = 0;
According to formula (10), (16), calculate
According to formula (13), calculate centroid;
Use the calculated cluster centerin step 3, formula (10) and formula (16), calculate ;
If is satisfied, the algorithm stops and are the final cluster centers in dataset T; otherwise, set to be inum+1 and go to step 3.

RESULTS

In this section, we will test the performance of MICCA algorithm under cooperative circumstances. Experiment results show that cluster center’s track of MICCA is different from conventional independent clustering algorithm. And more, the final cluster center is more compact and harmonious.

Now we see about the variation of mutual information measurement defined in formula (11) in clustering process further. From Fig. 1, we can see that with clustering proceeds, cooperative relations among datasets become closer and closer. In more details we can explain this phenomenon as: in initial clustering, large volume of data in datasets is in confused order, no any ordered structure. In this phase, cooperation among datasets shows weak. With clustering proceeds, more and more ordered hidden in the datasets’ structure will be. Similarly, in this stage, the cooperative relations are enhanced among different datasets. It is worthwhile to note that in Fig. 1 the cooperative relations are enhanced faster in left area than those in right area. This phenomenon is caused by more steadily structure in left area than in right area.

Fig. 1: Cooperative strength in clustering procedures

In fact, all of samples in dataset are affected by MICCA considering cooperative relations hidden in many datasets. Selecting some typical points in dataset, we compare their dependence degree between considering and not considering cooperative effects.

CONCLUSIONS

In real world, a dataset is independent of other datasets but sometimes can be cooperative with others. Conventional clustering algorithms ignore this kind of cooperative relations. In this study, a novel collaborative model is discussed and new proper methods such as mutual information are proposed to quantitatively measure such collaboration between datasets. The corresponding collaborative clustering algorithms are presented accordingly and the theoretic analysis shows that the new cooperative clustering algorithms can finally converges to local minimum. Experimental results demonstrate that the clustering structures obtained by new cooperative algorithms are different from those of conventional algorithms for the consideration of collaboration and the performances of these collaborative clustering algorithms can be much better than those conventional isolated clustering algorithms under the cooperating circumstances.

ACKNOWLEDGEMENT

This work is supported by the National High Technology Research and Development (2001AA113182).

REFERENCES

  • Han, J. and M. Kamber, 2001. Data Mining: Concept and Techniques. Morgan Kaufman Publishers, Cambridge, MA


  • Hoppner, F., 1999. Fuzzy Cluster Analysis. Jonh Wiley and Sons Ltd., New York


  • Shen, H.B., J. Yang and S.T. Wang, 2005. Study on new information theory based cooperative clustering algorithm. Chinese J. Comput., 28: 1287-1294.


  • Kanungo, T., D.M. Mount, N.S. Netanyahu, C.D. Piatko, R.S. Angela and Y. Wu, 2002. An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell., 24: 881-892.
    CrossRef    


  • Topchy, A., K. Jain and W. Punch, 2004. A mixture model of clustering ensembles. Proceedings of the SIAM International Conference on Data Mining, April 22-24, 2004, Florida, pp: 225-232.

  • © Science Alert. All Rights Reserved