Trend Based Sketching for Massive Uncertain Time Series Clustering
Abstract:
Due to the inaccuracy and noisy, uncertainty is inherent in time series data and increases the complexity of clustering. For the massive data size, efficient data storage is a crucial task. Based on the Hilbert SFC, a trend sketches is constructed to store trends of the uncertain time series. And based on divergence and sketch metric, a sketch based similarity is given. Then a clustering algorithm is proposed to improve the quality of clustering. The experimental results are shown in Final.
How to cite this article
Jingyu Chen, Ping Chen and Xian`gang Sheng, 2013. Trend Based Sketching for Massive Uncertain Time Series Clustering. Information Technology Journal, 12: 7280-7288.
REFERENCES
Ackermann, M.R., J. Blomer and C. Sohler, 2010. Clustering for metric and nonmetric distance measures. ACM Trans. Algorithms, Vol. 6, No. 4.
CrossRef
Ackermann, M.R., M. Martens, C. Raupach, K. Swierkot, C. Lammersen and C. Sohler, 2012. StreamKM++: A clustering algorithm for data streams. J. Exp. Algorithmics, Vol. 17.
CrossRef
Aggarwal, C., 2009. A framework for clustering massive-domain data streams. Proceedings of the 25th International Conference on Data Engineering, March 29-April 2, 2009, Shanghai, China, pp: 102-113.
Anceaume, E. and Y. Busnel, 2012. Sketch ⋆-metric: Comparing data streams via sketching. Technical Report, CIDER-IRISA. http://hal.inria.fr/docs/00/72/12/11/PDF/AB13-INFOCOM-RR.pdf.
Ankerst, M., M.M. Breunig, H.P. Kriegel and J. Sander, 1999. Optics: Ordering points to identify the clustering structure. ACM SIGMOD Rec., 28: 49-60.
CrossRef
Banerjee, A., S. Merugu, I.S. Dhillon and I. Ghosh, 2005. Clustering with Bregman divergences. J. Mach. Learn. Res., 6: 1705-1749.
Direct Link
Cormode, G. and S. Muthukrishnan, 2005. An improved data-stream summary: The count-min sketch and its applications. J. Algorithms, 55: 58-75.
CrossRef
Ester, M., H.P. Kriegel, J. Sander and X. Xu, 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, August 2-4, 1996, Portland, pp: 226-231.
Jiang, B., J. Pei, Y.F. Tao and X.M. Lin, 2013. Clustering uncertain data based on probability distribution similarity. IEEE Trans. Knowledge Data Eng., 25: 751-763.
CrossRef
Jagadish, H.V., 1990. Linear clustering of objects with multiple attributes. Proceedings of the ACM SIGMOD International Conference on Management of Data, May 23-26, 1990, Atlantic City, NJ., USA., pp: 332-342.
Kriegel, H.P. and M. Pfeifle, 2005. Density-based clustering of uncertain data. Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, August 21-24, 2005, Chicago, IL., USA., pp: 672-677.
Lawder, J.K., 2000. Calculation of mappings between one and n-dimensional values using the Hilbert space-filling curve. Technical Report No. JL1/00, August 15, 2000, University of London, UK. http://www.dcs.bbk.ac.uk/TriStarp/pubs/JL1_00.pdf.
Liu, Y., L.F. Zhang and Y. Guan, 2010. Sketch-based streaming PCA algorithm for network-wide traffic anomaly detection. Processing of the IEEE 30th International Conference on Distributed Computing Systems, June 21-25, 2010, Genova, Italy, pp: 807-816.
Manerikar, N. and T. Palpanas, 2008. Frequent items in streaming data: An experimental evaluation of the state-of-the-art. Technical Report DISI-08-017, University of Trento, Trento, Italy, March 2008. http://disi.unitn.it/~themis/frequentitems/dke09.pdf.
Ngai, W.K., B. Kao, C.K. Chui, R. Cheng, M. Chau and K.Y. Yip, 2006. Efficient clustering of uncertain data. Proceedings of the 6th International Conference on Data Mining, December 18-22, 2006, Hong Kong, pp: 436-445.
Nie, Y., R. Cocci, Z. Cao, Y.L. Diao and P. Shenoy, 2012. SPIRE: Efficient data inference and compression over RFID streams. IEEE Trans. Knowledge Data Eng., 24: 141-155.
CrossRef
Papapetrou, O., M. Garofalakis and A. Deligiannakis, 2010. Sketch-based querying of distributed sliding-window data streams. Proc. VLDB Endowment, 5: 992-1003.
Direct Link
Somasundaram, R.S. and R. Nedunchezhian, 2011. Evaluation of three simple imputation methods for enhancing preprocessing of data with missing values. Int. J. Comput. Appl., 21: 14-19.
CrossRef Direct Link
Tran, T.T.L., L.P. Peng, B.D. Li, Y.L. Diao and A.N. Liu, 2010. PODS: A new model and processing algorithms for uncertain data streams. Proceedings of the International Conference on Management of Data, June 6-11, 2010, Indianapolis, IN., USA., pp: 159-170.
Wang, X.M. and D.B. Yuan, 2012. A query verification scheme for dynamic outsourced databases. J. Comput., 7: 156-160.
CrossRef
Xu, H.J. and G.H. Li, 2008. Density-based probabilistic clustering of uncertain data. Proceedings of the International Conference on Computer Science and Software Engineering, Volume 4, December 12-14, 2008, Wuhan, Hubei, China, pp: 474-477.
© Science Alert. All Rights Reserved