Trend Based Sketching for Massive Uncertain Time Series Clustering

Information Technology Journal

Year: 2013 | Volume: 12 | Issue: 23 | Page No.: 7280-7288
DOI: 10.3923/itj.2013.7280.7288

Trend Based Sketching for Massive Uncertain Time Series Clustering

Jingyu Chen, Ping Chen and Xian`gang Sheng

Abstract: Due to the inaccuracy and noisy, uncertainty is inherent in time series data and increases the complexity of clustering. For the massive data size, efficient data storage is a crucial task. Based on the Hilbert SFC, a trend sketches is constructed to store trends of the uncertain time series. And based on divergence and sketch metric, a sketch based similarity is given. Then a clustering algorithm is proposed to improve the quality of clustering. The experimental results are shown in Final.

Fulltext PDF

How to cite this article

Jingyu Chen, Ping Chen and Xian`gang Sheng, 2013. Trend Based Sketching for Massive Uncertain Time Series Clustering. Information Technology Journal, 12: 7280-7288.

Keywords: Uncertainty, Hilbert SFC, sketch, divergence and clustering

REFERENCES

Ackermann, M.R., J. Blomer and C. Sohler, 2010. Clustering for metric and nonmetric distance measures. ACM Trans. Algorithms, Vol. 6, No. 4.
CrossRef

Ackermann, M.R., M. Martens, C. Raupach, K. Swierkot, C. Lammersen and C. Sohler, 2012. StreamKM++: A clustering algorithm for data streams. J. Exp. Algorithmics, Vol. 17.
CrossRef

Aggarwal, C., 2009. A framework for clustering massive-domain data streams. Proceedings of the 25th International Conference on Data Engineering, March 29-April 2, 2009, Shanghai, China, pp: 102-113.

Anceaume, E. and Y. Busnel, 2012. Sketch ⋆-metric: Comparing data streams via sketching. Technical Report, CIDER-IRISA. http://hal.inria.fr/docs/00/72/12/11/PDF/AB13-INFOCOM-RR.pdf.

Ankerst, M., M.M. Breunig, H.P. Kriegel and J. Sander, 1999. Optics: Ordering points to identify the clustering structure. ACM SIGMOD Rec., 28: 49-60.
CrossRef

Banerjee, A., S. Merugu, I.S. Dhillon and I. Ghosh, 2005. Clustering with Bregman divergences. J. Mach. Learn. Res., 6: 1705-1749.
Direct Link

Cormode, G. and S. Muthukrishnan, 2005. An improved data-stream summary: The count-min sketch and its applications. J. Algorithms, 55: 58-75.
CrossRef

Ester, M., H.P. Kriegel, J. Sander and X. Xu, 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, August 2-4, 1996, Portland, pp: 226-231.

Jiang, B., J. Pei, Y.F. Tao and X.M. Lin, 2013. Clustering uncertain data based on probability distribution similarity. IEEE Trans. Knowledge Data Eng., 25: 751-763.
CrossRef

Jagadish, H.V., 1990. Linear clustering of objects with multiple attributes. Proceedings of the ACM SIGMOD International Conference on Management of Data, May 23-26, 1990, Atlantic City, NJ., USA., pp: 332-342.

Kriegel, H.P. and M. Pfeifle, 2005. Density-based clustering of uncertain data. Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, August 21-24, 2005, Chicago, IL., USA., pp: 672-677.

Lawder, J.K., 2000. Calculation of mappings between one and n-dimensional values using the Hilbert space-filling curve. Technical Report No. JL1/00, August 15, 2000, University of London, UK. http://www.dcs.bbk.ac.uk/TriStarp/pubs/JL1_00.pdf.

Liu, Y., L.F. Zhang and Y. Guan, 2010. Sketch-based streaming PCA algorithm for network-wide traffic anomaly detection. Processing of the IEEE 30th International Conference on Distributed Computing Systems, June 21-25, 2010, Genova, Italy, pp: 807-816.

Manerikar, N. and T. Palpanas, 2008. Frequent items in streaming data: An experimental evaluation of the state-of-the-art. Technical Report DISI-08-017, University of Trento, Trento, Italy, March 2008. http://disi.unitn.it/~themis/frequentitems/dke09.pdf.

Ngai, W.K., B. Kao, C.K. Chui, R. Cheng, M. Chau and K.Y. Yip, 2006. Efficient clustering of uncertain data. Proceedings of the 6th International Conference on Data Mining, December 18-22, 2006, Hong Kong, pp: 436-445.

Nie, Y., R. Cocci, Z. Cao, Y.L. Diao and P. Shenoy, 2012. SPIRE: Efficient data inference and compression over RFID streams. IEEE Trans. Knowledge Data Eng., 24: 141-155.
CrossRef

Papapetrou, O., M. Garofalakis and A. Deligiannakis, 2010. Sketch-based querying of distributed sliding-window data streams. Proc. VLDB Endowment, 5: 992-1003.
Direct Link

Somasundaram, R.S. and R. Nedunchezhian, 2011. Evaluation of three simple imputation methods for enhancing preprocessing of data with missing values. Int. J. Comput. Appl., 21: 14-19.
CrossRef Direct Link

Tran, T.T.L., L.P. Peng, B.D. Li, Y.L. Diao and A.N. Liu, 2010. PODS: A new model and processing algorithms for uncertain data streams. Proceedings of the International Conference on Management of Data, June 6-11, 2010, Indianapolis, IN., USA., pp: 159-170.

Wang, X.M. and D.B. Yuan, 2012. A query verification scheme for dynamic outsourced databases. J. Comput., 7: 156-160.
CrossRef

Xu, H.J. and G.H. Li, 2008. Density-based probabilistic clustering of uncertain data. Proceedings of the International Conference on Computer Science and Software Engineering, Volume 4, December 12-14, 2008, Wuhan, Hubei, China, pp: 474-477.

HOME JOURNALS CONTACT

Information Technology Journal

Year: 2013 | Volume: 12 | Issue: 23 | Page No.: 7280-7288 DOI: 10.3923/itj.2013.7280.7288

Trend Based Sketching for Massive Uncertain Time Series Clustering

Jingyu Chen, Ping Chen and Xian`gang Sheng

How to cite this article

Jingyu Chen, Ping Chen and Xian`gang Sheng, 2013. Trend Based Sketching for Massive Uncertain Time Series Clustering. Information Technology Journal, 12: 7280-7288.

Keywords: Uncertainty, Hilbert SFC, sketch, divergence and clustering

REFERENCES

Year: 2013 | Volume: 12 | Issue: 23 | Page No.: 7280-7288
DOI: 10.3923/itj.2013.7280.7288