HOME JOURNALS CONTACT

Information Technology Journal

Year: 2013 | Volume: 12 | Issue: 23 | Page No.: 7280-7288
DOI: 10.3923/itj.2013.7280.7288
Trend Based Sketching for Massive Uncertain Time Series Clustering
Jingyu Chen, Ping Chen and Xian`gang Sheng

Abstract: Due to the inaccuracy and noisy, uncertainty is inherent in time series data and increases the complexity of clustering. For the massive data size, efficient data storage is a crucial task. Based on the Hilbert SFC, a trend sketches is constructed to store trends of the uncertain time series. And based on divergence and sketch metric, a sketch based similarity is given. Then a clustering algorithm is proposed to improve the quality of clustering. The experimental results are shown in Final.

Fulltext PDF

How to cite this article
Jingyu Chen, Ping Chen and Xian`gang Sheng, 2013. Trend Based Sketching for Massive Uncertain Time Series Clustering. Information Technology Journal, 12: 7280-7288.

Keywords: Uncertainty, Hilbert SFC, sketch, divergence and clustering

REFERENCES

  • Ackermann, M.R., J. Blomer and C. Sohler, 2010. Clustering for metric and nonmetric distance measures. ACM Trans. Algorithms, Vol. 6, No. 4.
    CrossRef    


  • Ackermann, M.R., M. Martens, C. Raupach, K. Swierkot, C. Lammersen and C. Sohler, 2012. StreamKM++: A clustering algorithm for data streams. J. Exp. Algorithmics, Vol. 17.
    CrossRef    


  • Aggarwal, C., 2009. A framework for clustering massive-domain data streams. Proceedings of the 25th International Conference on Data Engineering, March 29-April 2, 2009, Shanghai, China, pp: 102-113.


  • Anceaume, E. and Y. Busnel, 2012. Sketch ⋆-metric: Comparing data streams via sketching. Technical Report, CIDER-IRISA. http://hal.inria.fr/docs/00/72/12/11/PDF/AB13-INFOCOM-RR.pdf.


  • Ankerst, M., M.M. Breunig, H.P. Kriegel and J. Sander, 1999. Optics: Ordering points to identify the clustering structure. ACM SIGMOD Rec., 28: 49-60.
    CrossRef    


  • Banerjee, A., S. Merugu, I.S. Dhillon and I. Ghosh, 2005. Clustering with Bregman divergences. J. Mach. Learn. Res., 6: 1705-1749.
    Direct Link    


  • Cormode, G. and S. Muthukrishnan, 2005. An improved data-stream summary: The count-min sketch and its applications. J. Algorithms, 55: 58-75.
    CrossRef    


  • Ester, M., H.P. Kriegel, J. Sander and X. Xu, 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, August 2-4, 1996, Portland, pp: 226-231.


  • Jiang, B., J. Pei, Y.F. Tao and X.M. Lin, 2013. Clustering uncertain data based on probability distribution similarity. IEEE Trans. Knowledge Data Eng., 25: 751-763.
    CrossRef    


  • Jagadish, H.V., 1990. Linear clustering of objects with multiple attributes. Proceedings of the ACM SIGMOD International Conference on Management of Data, May 23-26, 1990, Atlantic City, NJ., USA., pp: 332-342.


  • Kriegel, H.P. and M. Pfeifle, 2005. Density-based clustering of uncertain data. Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, August 21-24, 2005, Chicago, IL., USA., pp: 672-677.


  • Lawder, J.K., 2000. Calculation of mappings between one and n-dimensional values using the Hilbert space-filling curve. Technical Report No. JL1/00, August 15, 2000, University of London, UK. http://www.dcs.bbk.ac.uk/TriStarp/pubs/JL1_00.pdf.


  • Liu, Y., L.F. Zhang and Y. Guan, 2010. Sketch-based streaming PCA algorithm for network-wide traffic anomaly detection. Processing of the IEEE 30th International Conference on Distributed Computing Systems, June 21-25, 2010, Genova, Italy, pp: 807-816.


  • Manerikar, N. and T. Palpanas, 2008. Frequent items in streaming data: An experimental evaluation of the state-of-the-art. Technical Report DISI-08-017, University of Trento, Trento, Italy, March 2008. http://disi.unitn.it/~themis/frequentitems/dke09.pdf.


  • Ngai, W.K., B. Kao, C.K. Chui, R. Cheng, M. Chau and K.Y. Yip, 2006. Efficient clustering of uncertain data. Proceedings of the 6th International Conference on Data Mining, December 18-22, 2006, Hong Kong, pp: 436-445.


  • Nie, Y., R. Cocci, Z. Cao, Y.L. Diao and P. Shenoy, 2012. SPIRE: Efficient data inference and compression over RFID streams. IEEE Trans. Knowledge Data Eng., 24: 141-155.
    CrossRef    


  • Papapetrou, O., M. Garofalakis and A. Deligiannakis, 2010. Sketch-based querying of distributed sliding-window data streams. Proc. VLDB Endowment, 5: 992-1003.
    Direct Link    


  • Somasundaram, R.S. and R. Nedunchezhian, 2011. Evaluation of three simple imputation methods for enhancing preprocessing of data with missing values. Int. J. Comput. Appl., 21: 14-19.
    CrossRef    Direct Link    


  • Tran, T.T.L., L.P. Peng, B.D. Li, Y.L. Diao and A.N. Liu, 2010. PODS: A new model and processing algorithms for uncertain data streams. Proceedings of the International Conference on Management of Data, June 6-11, 2010, Indianapolis, IN., USA., pp: 159-170.


  • Wang, X.M. and D.B. Yuan, 2012. A query verification scheme for dynamic outsourced databases. J. Comput., 7: 156-160.
    CrossRef    


  • Xu, H.J. and G.H. Li, 2008. Density-based probabilistic clustering of uncertain data. Proceedings of the International Conference on Computer Science and Software Engineering, Volume 4, December 12-14, 2008, Wuhan, Hubei, China, pp: 474-477.

  • © Science Alert. All Rights Reserved