Huang Bin
Department of Computer, Huaihua University, 418008, Huaihua, China
Peng Yuxing
National Laboratory of Parallel and Distributed Processing, National University of Defense Technology, 410073, Changsha, China
ABSTRACT
For its perception of unlimited resources and infinite scalability, Cloud Computing has emerged as a pervasive paradigm for hosting? data-centric applications in large computing infrastructures. The data produced by these applications are essentially sparse and wide and may change schema frequently, traditional relational data model is inappropriate for their data management. A new data model, called Sparse Wide Table, was introduced for this task. Unfortunately, we have to face many challenges in building the secondary index for Sparse Wide Table in cloud, as the distributed and column-oriented storage which eliminates a number of NULLs. In this study, we present a three-level index scheme for efficient data processing in the Cloud. Our approach can be summarized as follows. First, we build an index for each column by which the records can be rebuilt easily. Second, we build a bitmap index for each storage node which only indexes the data residing on the node. Third, we organize the storage nodes as a structured overlay and each node maintains a portion of the global index for the all different data. The global index is a bitmap index to indicate the node each data resides in. Finally, based on the three-level index scheme, some query algorithms are implemented. We conduct extensive experiments on a LAN and the results demonstrate that our indexing scheme is dynamic, efficient and scalable.
PDF References Citation
How to cite this article
Huang Bin and Peng Yuxing, 2013. Efficiently Indexing Sparse Wide Tablesin Cloud Computing. Information Technology Journal, 12: 5415-5423.
DOI: 10.3923/itj.2013.5415.5423
URL: https://scialert.net/abstract/?doi=itj.2013.5415.5423
DOI: 10.3923/itj.2013.5415.5423
URL: https://scialert.net/abstract/?doi=itj.2013.5415.5423
REFERENCES
- Crainiceanu, A., P. Linga, A. Machanavajjhala, J. Gehrke and J. Shanmugasundaram, 2007. P-ring: An efficient and robust p2p range index structure. Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, June 11-14, 2007, Beijing, China, pp: 223-234.
CrossRef - Yang, B., W.N. Qian and A.Y. Zhou, 2008. Using wide table to manage web data: A survey. Front. Comput. Sci. China, 2: 211-223.
CrossRef - Copeland, G.P. and S.N. Khoshafian, 1985. A decomposition storage model. Proceedings of the 1985 ACM SIGMOD International Conference on Management of Data, May 28-31, 1985, Austin, Texas, pp: 268-279.
CrossRef - DeWitt, D. and J. Gray, 1992. Parallel database systems: The future of high performance database systems. Commun. ACM, 35: 85-98.
CrossRef - Chu, E., J. Beckmann and J. Naughton, 2007. The case for a wide-table approach to manage sparse relational data sets. Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, June 11-14, 2007, Beijing, China, pp: 821-832.
CrossRef - DeCandia, G., D. Hastorun, M. Jampani, G. Kakulapati and A. Lakshman et al., 2007. Dynamo: Amazon's highly available key-value store. Proceedings of 21st ACM SIGOPS Symposium on Operating Systems Principles, October 14-17, 2007, Stevenson, WA, USA, pp: 205-220.
CrossRef - Abu Sayed, M. and L. Hoque, 2002. Storage and querying of high dimensional sparsely populated data in compressed representation. Proceedings of the 1st EurAsian Conference on Information and Communication Technology, October 29-31, 2002, Iran, pp: 418-425.
CrossRef - Beckmann, J.L., A. Halverson, R. Krishnamurthy and J.F. Naughton, 2006. Extending RDBMSs to support sparse datasets using an interpreted attribute storage format. Proceedings of the 22nd International Conference on Data Engineering, April 3-7, 2006, Atlanta, Georgia, pp: 58-58.
CrossRef - Dean, J. and S. Ghemawat, 2004. MapReduce: Simplified data processing on large clusters. Proceedings of the 6th Symposium on Operating Systems Design and Implementation, December 6-8, 2004, San Francisco, CA., USA., pp: 137-150.
Direct Link