Subscribe Now Subscribe Today
Science Alert
Curve Top
Information Technology Journal
  Year: 2011 | Volume: 10 | Issue: 6 | Page No.: 1092-1105
DOI: 10.3923/itj.2011.1092.1105
Facebook Twitter Digg Reddit Linkedin StumbleUpon E-mail

Efficient Clustering for High Dimensional Data: Subspace Based Clustering and Density Based Clustering

Singh Vijendra

Finding clusters in a high dimensional data space is challenging because a high dimensional data space has hundreds of attributes and hundreds of data tuples and the average density of data points is very low. The distance functions used by many conventional algorithms fail in this scenario. Clustering relies on computing the distance between objects and thus, the complexity of the similarity models has a severe influence on the efficiency of the clustering algorithms. Especially for density-based clustering, range queries must be supported efficiently to reduce the runtime of clustering. The density-based clustering is also influenced by the density divergence problem that affects the accuracy of clustering. If clusters do not exist in the original high dimensional data space, it may be possible that clusters exist in some subspaces of the original data space. Subspace clustering algorithms localize the search for relevant dimensions allowing them to find clusters that exist in multiple, possibly overlapping subspaces. Subspace clustering algorithms identifies such subspace clusters. But for clustering based on relative region densities in the subspaces, density based subspace clustering algorithms are applied where the clusters are regarded as regions whose densities are relatively high as compared to the region densities in a subspace. This study presents a review of various subspaces based clustering algorithms and density based clustering algorithms with their efficiencies on different data sets.
PDF Fulltext XML References Citation Report Citation
  •    An Intelligent Mining Framework based on Rough Sets for Clustering Gene Expression Data
  •    A Fast Evolutionary Algorithm for Automatic Evolution of Clusters
  •    A Complete Survey of Duplicate Record Detection Using Data Mining Techniques
  •    A Framework for Classifying Uncertain and Evolving Data Streams
  •    Variations of k-mean Algorithm: A Study for High-Dimensional Large Data Sets
  •    Clustering Methods for Statistical Analysis of Genome Databases
  •    A Robust Algorithm for Subspace Clustering of High-Dimensional Data*
  •    Integrated Approach of Reduct and Clustering for Mining Patterns from Clusters
How to cite this article:

Singh Vijendra , 2011. Efficient Clustering for High Dimensional Data: Subspace Based Clustering and Density Based Clustering. Information Technology Journal, 10: 1092-1105.

DOI: 10.3923/itj.2011.1092.1105






Curve Bottom