• [email protected]
  • +971 507 888 742
Submit Manuscript
SciAlert
  • Home
  • Journals
  • Information
    • For Authors
    • For Referees
    • For Librarian
    • For Societies
  • Contact
  1. Information Technology Journal
  2. Vol 9 (8), 2010
  3. 1706-1712
  • Online First
  • Current Issue
  • Previous Issues
  • More Information
    Aims and Scope Editorial Board Guide to Authors Article Processing Charges
    Submit a Manuscript

Information Technology Journal

Year: 2010 | Volume: 9 | Issue: 8 | Page No.: 1706-1712
DOI: 10.3923/itj.2010.1706.1712
crossmark

Facebook Twitter Digg Reddit Linkedin StumbleUpon E-mail
Research Article

Converting Probabilistic Relational Data to Probabilistic XML Data Tree

Jianwei Wang and Zhongxiao Hao

ABSTRACT


As a kind of special XML data, probabilistic XML has been also presented as the de facto standard for probability data exchange on the web. And there are a mass of probabilistic relational data. In order to realize the conversion of probabilistic relational data to probabilistic XML data, the representation methods of probability data in the probabilistic relational data and in the probabilistic XML data are firstly analyzed separately. And the probabilistic XML data tree is a classical probabilistic data model with some distribution nodes. The conversion process of probabilistic relational data to probabilistic XML data is composed of two stages what are schema conversion and data conversion. Algorithm PRS2PDTD converts the probabilistic relational schema to the probabilistic DTD, while preserving semantic constraints of the original probabilistic relational schema. And algorithm PRD2PT converts probabilistic relational data to the probabilistic XML data tree. The instance analysis results show that the algorithms are effective methods for conversion of the existent probabilistic relational data to the probabilistic XML data.
PDF Abstract XML References Citation
Received: June 30, 2010;   Accepted: August 13, 2010;   Published: September 17, 2010

How to cite this article

Jianwei Wang and Zhongxiao Hao, 2010. Converting Probabilistic Relational Data to Probabilistic XML Data Tree. Information Technology Journal, 9: 1706-1712.

DOI: 10.3923/itj.2010.1706.1712

URL: https://scialert.net/abstract/?doi=itj.2010.1706.1712

Search


INTRODUCTION


Because XML emerged as the de facto standard for data formats on the web (XML-TR, http://www.w3.org/ TR/, 2007; Nath and Batanov, 2005; Haw and Lee, 2007), the probabilistic XML data has been also presented as a special style of XML data from 2001. It can represent the distribute distribution data in XML element in the form of probabilistic data. The researchers mainly focuses the fields what are the probabilistic XML data models (Nierman and Jagadish, 2002; Van Maurice et al., 2005; De Keijzer and van Keulen, 2007) the probabilistic XML algebra (Zhao et al., 2005; Magnani and Montesi, 2008; Dekhtyar et al., 2006) query methods (Benny and Sagiv, 2007; Kimelfeld et al., 2008) and prototype system design (Nierman and Jagadish, 2002; Hung and Subrahmanian, 2005; Li et al., 2006).

The probabilistic relational data is a kind of relational data model that represents the triple uncertain relation data in the form of the probability data (Zhongxiao, 1996; Yuan, 2005). Cavallo and Pittarelli (1987) and Barbar et al. (1992) proposed a kind of extended relation data model to represent probability data by adding the attribute of probability data and defined a function that can convert the relational data to the probabilistic relational data. Lee (1992) and Yuan (2005) applied the Dempster-Shafer theory and probability theory as the basis of theory to research the probabilistic relation data and defined the correlative algebra operation (Dey and Sarkar, 1996).

Since, the probabilistic XML data is the standard of probabilistic data format, the existent probabilistic relation data need be converted to probabilistic XML data. From 1998, the researchers have represented a series of methods in order to convert the relation data to the XML data. Jacinto and Marta (2002) studied the algorithm for bidirectional conversion between XML documents and relational databases. Fong et al. (2001) studied the algorithm for converting relational database into XML document. Teng et al. (2005) analyzed the conversion relational schema to XML Schema with constraints. Anthony et al. (2006) and Sengupta (2010) designed the tool of visual relational to XML conversion. The methods provided some ways for converting the probabilistic relation data to probabilistic XML data.

So, the algorithms for converting the probabilistic relation data to probabilistic XML data are proposed. At first the paper analyzes the representation of probability data in relational data schema and XML data schema. The two triple form <attribute,probability> represents the attribute value and its probability in probabilistic relational data. Because the probabilistic XML data tree is a classical probabilistic XML data model, so the paper applies the tree model (Nierman and Jagadish, 2002; Li et al., 2006).

The contributions of this study include:

• We argue that the distribution node form represents the element nodes and their probability data in the probabilistic XML data tree
• We analyze the path mapping rules from the probabilistic relational data to probabilistic XML data tree based on analyzing the path types in the probabilistic XML data tree
• Algorithm PRS2PDTD is proposed to convert probabilistic relational schema to probabilistic DTD
• Algorithm PRS2PT is proposed to convert probabilistic relational data to probabilistic XML data tree
• The instance analysis verifies the efficiency of the two algorithms

METHODS OF PROBABILITY DATA REPRESENTATION

Probabilistic relation schema PRS is a kind of non-classical relation model that represent the uncertainty of probabilistic relation triple in the form of the two triple <attribute,probability> (Yuan, 2005). Probabilistic relation PR is a finite set of triple in PRS and one triple in PR isnot equal with the other triple. The definition of probabilistic relation is proposed which is shown as follows:

Definition 1: Probabilistic relation PR is one of 1-NF probabilistic relation in the probabilistic relational schema PRS, the attribute set is C = {ci,...,cm}, there is one and only one attribute ci≤a, ps > i = 2,...,m, where, a is the attribute name of probabilistic relation, ps is the probability value with the restriction condition:

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree

Probabilistic XML Data Tree (Nierman and Jagadish, 2002; Li et al., 2006) is a kind of classical data model of probabilistic XML data that the schema document is probabilistic DTD (Nierman and Jagadish, 2002). The probabilistic XML data tree and probabilistic DTD document are defined as follows.

Definition 2: Probabilistic XML data tree is six tuple PT = (N, E, R, F, V, P), where:

• N is the finite set of node N = Nord,∪Ndist∪Nmax∪Nposs

where, Nord is the set of general node, Ndist is the set of distribution node, Nmax is the set of the node that has the child node that is exclusive form one another, Nposs is the set of possibility node.

• E⊆NxN is the set of edges
• R is the root of PT
• F: Nord→L is the label function of the general node, where, L is the name of element; poss, F(n) = poss
• v: Ni→D is the data of leaf node, where, Ni are the leaf nodes Nl⊂Nord∪Nposs, D is the data set
• P: Nord∪Nposs→p is the probability value of the given node, 0≤p≤1, the default value is 1 without assigning the probability value

Definition 3: Probabilistic DTD is a pair (f, N) that maps f to regular expressions over N, and R∈N is the start symbol. PT = (N, E, R, F, V, P) satisfies probabilistic DTD, if its root is labeled by r, for each function F: Nord→L, and for every node Nord with label L, the sequence L of labels of its children matches the regular expression:

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree

There are different type paths in a probabilistic XML data tree. The path definitions are proposed which are as follows.

Definition 4: Absolute path in a probabilistic XML data tree PT is defined a series of nodes from the root R to the leaf node V(Ni) (V(Nl)∈D). APATHPT is the set of absolute path in a probabilistic XML data tree PT, |APATHPT| is the number of the element of absolute path set.

Definition 5: Extended path in a probabilistic XML data tree PT is absolute path set without leaf nodes. EPATHPT is the set of extended path in the probabilistic XML data tree PT, |EPATHPT| is the number of the element of extended path set.

Definition 6: Basic absolute path in a probabilistic XML data tree PT is absolute path set without leaf nodes and distribution nodes. PATHPT is the set of basic absolute path in a probabilistic XML data tree PT, |PATHPT| is the number of the element of basic absolute path set.

Definition 7: Candidate absolute path in a probabilistic XML data tree PT is extended path without any distributi- on node. CPATHPT is the set of candidate absolute path in a probabilistic XML data tree PT, |CPATHPT| is the number of the element of candidate absolute path set.

Definition 8: The extended absolute path epath is equal to the basic absolute pathpath if the distribution nodes in the extended absolute path are deleted. The relation between epath and path is called path equivalence. PT, path = delete(epath.(dist,mux,poss) and path ∈PATHPT, epath is equal with path which is denoted by the equation epath Ñ path.

Let, EPATHPT be extended path set and PATHPT be basic absolute path set in the probabilistic XML tree PT. Then,

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree
(1)

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree
(2)

Theorem 1: Let Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree, Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree, Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree. Then, Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree. So, Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree can be partitioned to some sets of the equivalent path.

Proof: From definition 5 and 6, the expression:

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree
(3)

comes into existence.

According to definition 8,

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree

then:

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree
(4)

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree
(5)

Definition 9: The absolute pathapath is equal to the extended absolute pathepath if the leaf nodes in the extended absolute path are deleted. The relation between apath and epath is called path consistent. Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Treethere is one and only one absolute path Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree, when Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree, so epath is consist with apath which is denoted by the equation:

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree

Theorem 2: Let, Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree, then |PATHPT|, so, DPATHPT can be partitioned to some sets of the consistent paths.

Proof: From definition 5 and 6,the inequation Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Treecomes into existence.

From definition 9 Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree,

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree
(6)

if epathImage for - Converting Probabilistic Relational Data to Probabilistic XML Data Treeapath, then

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree
(7)

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree
(8)

ALGORITHM FOR CONVERSION PROBABILISTIC RELATIONAL SCHEMA TO PROBABILISTIC XML SCHEMA

The common method of converting probabilistic relation schema to probabilistic DTD is directly mapping. So the mapping rules are proposed which are listed as follows.

Rule 1: The basic absolute path set pathPR can be com- puted by the attribute set of probabilistic relation PR.

Let, C = {ci,...,cm} be the attribute set of probabilistic relation PR, then the absolute path set of probabilistic XML data tree PT is pathPR = {PR/c1,...,PR/cm}. And the result of union operation of absolute path set from PRS is:

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree
(9)

Rule 2: The extended absolute path set epathPR can be computed by the attribute set of probabilistic relation PR.

Let, the attribute set C = {ci,...,cm} of probabilistic relation PR, then the extended absolute path set of probabilistic XML data tree PT is shown as follows:

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree
(10)

And the result of union operation of extended path set from PRSis shown as follows:

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree

Rule 3: The candidate absolute path set cpathPR can be computed by the attribute set of probabilistic relation PR.

Let, basic absolute path set be PATHPRS and the can absolute path set be EPATHPRS, then the candidate absolute path set CPATHPT is shown as follows:

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree
(11)

According to the mapping rules, the algorithm process of conversion probabilistic relation schema to probabilis-tic XML schema is divided three steps. The root node r is firstly obtained from probabilistic relation schema PRS as the start tag of PDTD. The child node of r is secondly obtainned from probabilistic relation PR. At last the absolute path set is obtained from the attribute set of probabilistic relation PR. So, the algorithm is shown as follows.

Algorithm 1: PRS2PDTD (PRS)
Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree

Theorem 3: Algorithm PRS2PDTD () for converting the probabilistic relation schema to probabilistic XML data is correct and complete.

Proof: Definition 1, 3, 5, 6, 7, 8 and the mapping rules offer methods to compute PRS2PDTD (PRS) and prove themselves when PRS is a set of probabilistic relational schema in the form of 1-NF. The algorithm fulfills the methods offered by these definition and these theorems. Therefore, the algorithm can correctly compute PRS2PDTD (PRS) for any PR in PRS. Thus the algorithm is correct and complete.

ALGORITHM OF CONVERSION PROBABILISTIC RELATION DATA TO PROBABILISTIC XML DATA TREE

The method of conversion probabilistic relation data to probabilistic XML data is to convert the triples of probabilistic relation data to the absolute path. So the mapping rules are proposed which are listed as follows.

Rule: The absolute path set apathPR can be computed by the attribute set of probabilistic relation PR.

Let, the sattribute set C = {c1,...,cm} of probabilistic relation PR, then the absolute path set of probabilistic XML data tree is shown as follows:

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree
(12)

And the result of union operation of absolute path set of probabilistic XML data tree apathPRS is:

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree
(13)

According to the mapping rule, the algorithm process of conversion probabilistic relation data to probabilistic XML data is proposed which is shown as follows.

The root node r is firstly obtained from probabilistic relation data PRS as the root of PT. The absolute path setapathPR is secondly obtained from probabilistic relation PR. At last the absolute path set is created from the attribute set of probabilistic relation PR.

So, the algorithm is shown as follows.

Algorithm 2: PRD2PT(PRS)
Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree

Theorem 4: Algorithm PRD2PT () for converting the probabilistic relational data to probabilistic XML data tree is correct and complete.

Proof: Definition 1, 2, 3 and the mapping rule offer methods to compute PRD2PT (PRS) and prove themselves when PRS is a set of probabilistic relational data in the form of 1-NF. The algorithm fulfills the methods offered by these definitions and the rule. Therefore, the algorithm can correctly compute PRD2PT (PRS) for any PR in PRS. Thus, the algorithm is correct and complete.

INSTANCE ANALYSIS

The purpose of this section is to confirm that algorithm PRS2PDTD and PRD2PT are correct by comparing the results of the algorithms with the teacher probabilistic relation.

Example 1: The teacher probabilistic data about rank, department and course are represented in the probabilistic relational data table. The teacher probabilistic relational schema is composed of Table 1-4 where no is the key attribute in Table 1. The <rank,ps> attribute in Table 2 represents the rank and its distribution data, the <dept,ps> attribute in Table 3 represents the department and its distribution data, and the <course,ps> attribute in Table 4 represents the teaching course and its distribution data.

From the tables, the probabilistic relational schema is shown as follows.

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree

The attribute set of teacher is shown as follows.

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree

So, the basic absolute path set is computed which is shown as follows.

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree

The elements of APATH is also shown in Fig. 1.

The extended absolute path set is computed which is shown as follows.

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree

The elements of EPATH is also shown in Fig. 2.

The candidate absolute path set is computed which is shown as follows.

CPATH={ teacher/no,teacher/name}

The elements of CPATH is also shown in Fig. 3.

Table 1: Teacher-name
Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree

Table 2: Teacher-rank
Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree

Table 3: Teacher-department
Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree

Table 4: Teacher-course
Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree
Fig. 1: Basic absolute path set, (a) teacher/no, (b) teacher/name, (c) teacher/rank, (d) teacher/dept and (e) teacher/course

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree
Fig. 2: Extended absolute path set, (a) teacher/no, (b) teacher/name, (c) teacher/rank/dist/mux/poss, (d) teacher/dept/dist/mux/poss and (e) teacher/course/dist/mux/poss

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree
Fig. 3: Candidate absolute path set, (a) teacher/no and (b) teacher/name

Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree
Fig. 4: Teacher probabilistic XML data tree PT

Based on the analysis of path type, the PDTD document is computed by algorithm PRS2PDTD. PDTD document of teacher is shown as follows:

PDTD document of teacher:
Image for - Converting Probabilistic Relational Data to Probabilistic XML Data Tree

The probabilistic XML data tree is computed by algo- rithm PRD2PT which is shown in Fig. 4.

CONCLUSIONS


In this study we present algorithms to solve the problem of conversion probabilistic relational schema to probabilistic DTD and conversion probabilistic relational data to probabilistic XML data. The contribution is original and useful in itself, additionally, it will be helpful to convert the existing probabilistic relational data to probabilistic XML data and probabilistic XML data become the probabilistic data exchange standard.

ACKNOWLEDGMENTS


Authors would like to thank National Natural Science Foundation of China and Hei longjiang National Science Grant No. 60673136 and No. F200702, for funding this research. The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.

REFERENCES


  1. Haw, S.C. and C.S. Lee, 2007. Structural query optimization in native XML databases: A hybrid approach. J. Applied Sci., 7: 2934-2946.
    CrossRefDirect Link

  2. Nath, U.K.D. and D.N. Batanov, 2005. Comparative analysis of three promising XML query languages and some recommendations. Inform. Technol. J., 4: 439-444.
    CrossRefDirect Link

  3. Nierman and H.V. Jagadish, 2002. ProTDB: Probabilistic data in XML. Proceedings of the 28th VLDB Conference, Aug. 20-23, Hong Kong, China, pp: 446-457.

  4. Van Maurice, K., A. de Keijzer and W. Alink, 2005. A probabilistic XML approach to data integration. Proceedings of the 21st International Conference on Data Engineering, April 5-8 Tokyo, Japan, pp: 459-470.

  5. De Keijzer, A. and M. van Keulen, 2007. User feedback in probabilistic integration. Proceedings of the 18th International Workshop on Database and Expert Systems Applications,Proceedings of the 18th International Workshop on Database and Expert Systems Applications, Sept. 3-7, Regensburg, Germany, pp: 377-381.

  6. Zhao, Z., A. Dekhtyar and J. Goldsmith, 2005. A framework for management of semistructured probabilistic data. J. Intell. Inform. Syst., 25: 293-332.
    CrossRef

  7. Magnani, M. and D. Montesi, 2008. Management of interval probabilistic data. Acta Inform., 45: 93-130.
    CrossRef

  8. Dekhtyar, A., K.K. Mathias and P. Gutti, 2006. Structured queries for semistructured probabilistic data. Proceedings of 2nd Twente Data Management Workshop on Uncertainty in Databases, June 6, Netherlands, pp: 11-18.

  9. Kimelfeld, B., Y. Kosharovsky and Y. Sagiv, 2008. Query efficiency in probabilistic XML models. Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, June 10-12, Vancouver, Canada, pp: 701-714.

  10. Benny, K. and Y. Sagiv, 2007. Matching twigs in probabilistic XML. Proceedings of the 33rd International Conference on Very Large Data Bases, Sept. 23-27, Vienna, Austria, 27-38.

  11. Hung, E. and V.S. Subrahmanian, 2005. Managinguncertaintyand ontologies in databases. Ph.D. Thesis, University of Maryland

  12. Li, T., Q. Shao and Y. Chen, 2006. PEPX: A query-friendly probabilistic XML database. Proceedings of the 15th ACM International Conference on Information and Knowledge Management, Nov. 6-11, ACM Press, New York, pp: 848-849.
    CrossRef

  13. Zhongxiao, H., 1996. Database Guide Under the Null Values Circumstances[M]. Machine Press, Beijing, China, pp: 3-131.

  14. Yuan, M., 2005. Non_Classical Relational Database Theory. Tsinghua University Press, Beijing, China, pp: 203-226.

  15. Cavallo, R. and M. Pittarelli, 1987. The theory of probabilistic databases. Proceedings of the 13th International Conference on Very Large Data Bases, Sept. 1-4, Brighton, England, pp: 71-81.

  16. Barbar, D., H.G. Molina and D. Porter, 1992. The management of probabilistic data. IEEE Trans. Knowl. Data Eng., 4: 487-502.
    Direct Link

  17. Lee, S.K., 1992. An extended relational database model for uncertain and imprecise information. Proceedings of the 18th International Conference on Very Large Data Bases, Aug. 23-27, Vancouver, Canada, pp: 211-220.

  18. Dey, D. and S. Sarkar, 1996. A probabilistic relational model and algebra. ACM Trans. Database Syst., 21: 339-369.

  19. Jacinto and H. Marta, 2002. Bidirectional conversion between XML documents and relational data bases. Proceedings of the International Conference on CSCW in Design, Sept. 25-27, Brazil, pp: 4-9.

  20. Fong, J., F. Pang and C. Bloor, 2001. Converting relational database into XML document. Proceedings of the 12th International Workshop on Database and Expert Systems Applications, IEEE Computer Society. Sept. 3-7, Washington, DC, USA., pp: 61-65.

  21. Teng, L., P. Yan and Q. Huang, 2005. Relational to XML Schema Conversion with Constraints. Vol. 3818, Springer, Berlin, Heidelberg, ISBN: 978-3-540-30767-9, pp: 278-279.

  22. Anthony, L., R. Alhajj and K. Barke, 2006. VIREX: Visual relational to XML conversion tool. J. Visual Languages Comput., 17: 25-45.
    CrossRef

  23. Sengupta, G.J., 2010. Regression testing method based on XML schema for GUI components. J. Software Eng., 4: 137-146.
    CrossRefDirect Link

Search


Related Articles

Structural Query Optimization in Native XML Databases: A Hybrid Approach
Comparative Analysis of Three Promising XML Query Languages and Some Recommendations
Regression Testing Method Based on XML Schema for GUI Components

Leave a Comment


Your email address will not be published. Required fields are marked *

Useful Links

  • Journals
  • For Authors
  • For Referees
  • For Librarian
  • For Socities

Contact Us

Office Number 1128,
Tamani Arts Building,
Business Bay,
Deira, Dubai, UAE

Phone: +971 507 888 742
Email: [email protected]

About Science Alert

Science Alert is a technology platform and service provider for scholarly publishers, helping them to publish and distribute their content online. We provide a range of services, including hosting, design, and digital marketing, as well as analytics and other tools to help publishers understand their audience and optimize their content. Science Alert works with a wide variety of publishers, including academic societies, universities, and commercial publishers.

Follow Us
© Copyright Science Alert. All Rights Reserved