ABSTRACT
As a kind of special XML data, probabilistic XML has been also presented as the de facto standard for probability data exchange on the web. And there are a mass of probabilistic relational data. In order to realize the conversion of probabilistic relational data to probabilistic XML data, the representation methods of probability data in the probabilistic relational data and in the probabilistic XML data are firstly analyzed separately. And the probabilistic XML data tree is a classical probabilistic data model with some distribution nodes. The conversion process of probabilistic relational data to probabilistic XML data is composed of two stages what are schema conversion and data conversion. Algorithm PRS2PDTD converts the probabilistic relational schema to the probabilistic DTD, while preserving semantic constraints of the original probabilistic relational schema. And algorithm PRD2PT converts probabilistic relational data to the probabilistic XML data tree. The instance analysis results show that the algorithms are effective methods for conversion of the existent probabilistic relational data to the probabilistic XML data.
PDF Abstract XML References Citation
How to cite this article
DOI: 10.3923/itj.2010.1706.1712
URL: https://scialert.net/abstract/?doi=itj.2010.1706.1712
INTRODUCTION
Because XML emerged as the de facto standard for data formats on the web (XML-TR, http://www.w3.org/ TR/, 2007; Nath and Batanov, 2005; Haw and Lee, 2007), the probabilistic XML data has been also presented as a special style of XML data from 2001. It can represent the distribute distribution data in XML element in the form of probabilistic data. The researchers mainly focuses the fields what are the probabilistic XML data models (Nierman and Jagadish, 2002; Van Maurice et al., 2005; De Keijzer and van Keulen, 2007) the probabilistic XML algebra (Zhao et al., 2005; Magnani and Montesi, 2008; Dekhtyar et al., 2006) query methods (Benny and Sagiv, 2007; Kimelfeld et al., 2008) and prototype system design (Nierman and Jagadish, 2002; Hung and Subrahmanian, 2005; Li et al., 2006).
The probabilistic relational data is a kind of relational data model that represents the triple uncertain relation data in the form of the probability data (Zhongxiao, 1996; Yuan, 2005). Cavallo and Pittarelli (1987) and Barbar et al. (1992) proposed a kind of extended relation data model to represent probability data by adding the attribute of probability data and defined a function that can convert the relational data to the probabilistic relational data. Lee (1992) and Yuan (2005) applied the Dempster-Shafer theory and probability theory as the basis of theory to research the probabilistic relation data and defined the correlative algebra operation (Dey and Sarkar, 1996).
Since, the probabilistic XML data is the standard of probabilistic data format, the existent probabilistic relation data need be converted to probabilistic XML data. From 1998, the researchers have represented a series of methods in order to convert the relation data to the XML data. Jacinto and Marta (2002) studied the algorithm for bidirectional conversion between XML documents and relational databases. Fong et al. (2001) studied the algorithm for converting relational database into XML document. Teng et al. (2005) analyzed the conversion relational schema to XML Schema with constraints. Anthony et al. (2006) and Sengupta (2010) designed the tool of visual relational to XML conversion. The methods provided some ways for converting the probabilistic relation data to probabilistic XML data.
So, the algorithms for converting the probabilistic relation data to probabilistic XML data are proposed. At first the paper analyzes the representation of probability data in relational data schema and XML data schema. The two triple form <attribute,probability> represents the attribute value and its probability in probabilistic relational data. Because the probabilistic XML data tree is a classical probabilistic XML data model, so the paper applies the tree model (Nierman and Jagadish, 2002; Li et al., 2006).
The contributions of this study include:
• | We argue that the distribution node form represents the element nodes and their probability data in the probabilistic XML data tree |
• | We analyze the path mapping rules from the probabilistic relational data to probabilistic XML data tree based on analyzing the path types in the probabilistic XML data tree |
• | Algorithm PRS2PDTD is proposed to convert probabilistic relational schema to probabilistic DTD |
• | Algorithm PRS2PT is proposed to convert probabilistic relational data to probabilistic XML data tree |
• | The instance analysis verifies the efficiency of the two algorithms |
METHODS OF PROBABILITY DATA REPRESENTATION
Probabilistic relation schema PRS is a kind of non-classical relation model that represent the uncertainty of probabilistic relation triple in the form of the two triple <attribute,probability> (Yuan, 2005). Probabilistic relation PR is a finite set of triple in PRS and one triple in PR isnot equal with the other triple. The definition of probabilistic relation is proposed which is shown as follows:
Definition 1: Probabilistic relation PR is one of 1-NF probabilistic relation in the probabilistic relational schema PRS, the attribute set is C = {ci,...,cm}, there is one and only one attribute ci≤a, ps > i = 2,...,m, where, a is the attribute name of probabilistic relation, ps is the probability value with the restriction condition:
![]() |
Probabilistic XML Data Tree (Nierman and Jagadish, 2002; Li et al., 2006) is a kind of classical data model of probabilistic XML data that the schema document is probabilistic DTD (Nierman and Jagadish, 2002). The probabilistic XML data tree and probabilistic DTD document are defined as follows.
Definition 2: Probabilistic XML data tree is six tuple PT = (N, E, R, F, V, P), where:
• | N is the finite set of node N = Nord,∪Ndist∪Nmax∪Nposs |
where, Nord is the set of general node, Ndist is the set of distribution node, Nmax is the set of the node that has the child node that is exclusive form one another, Nposs is the set of possibility node.
• | E⊆NxN is the set of edges |
• | R is the root of PT |
• | F: Nord→L is the label function of the general node, where, L is the name of element; poss, F(n) = poss |
• | v: Ni→D is the data of leaf node, where, Ni are the leaf nodes Nl⊂Nord∪Nposs, D is the data set |
• | P: Nord∪Nposs→p is the probability value of the given node, 0≤p≤1, the default value is 1 without assigning the probability value |
Definition 3: Probabilistic DTD is a pair (f, N) that maps f to regular expressions over N, and R∈N is the start symbol. PT = (N, E, R, F, V, P) satisfies probabilistic DTD, if its root is labeled by r, for each function F: Nord→L, and for every node Nord with label L, the sequence L of labels of its children matches the regular expression:
![]() |
There are different type paths in a probabilistic XML data tree. The path definitions are proposed which are as follows.
Definition 4: Absolute path in a probabilistic XML data tree PT is defined a series of nodes from the root R to the leaf node V(Ni) (V(Nl)∈D). APATHPT is the set of absolute path in a probabilistic XML data tree PT, |APATHPT| is the number of the element of absolute path set.
Definition 5: Extended path in a probabilistic XML data tree PT is absolute path set without leaf nodes. EPATHPT is the set of extended path in the probabilistic XML data tree PT, |EPATHPT| is the number of the element of extended path set.
Definition 6: Basic absolute path in a probabilistic XML data tree PT is absolute path set without leaf nodes and distribution nodes. PATHPT is the set of basic absolute path in a probabilistic XML data tree PT, |PATHPT| is the number of the element of basic absolute path set.
Definition 7: Candidate absolute path in a probabilistic XML data tree PT is extended path without any distributi- on node. CPATHPT is the set of candidate absolute path in a probabilistic XML data tree PT, |CPATHPT| is the number of the element of candidate absolute path set.
Definition 8: The extended absolute path epath is equal to the basic absolute pathpath if the distribution nodes in the extended absolute path are deleted. The relation between epath and path is called path equivalence. PT, path = delete(epath.(dist,mux,poss) and path ∈PATHPT, epath is equal with path which is denoted by the equation epath Ñ path.
Let, EPATHPT be extended path set and PATHPT be basic absolute path set in the probabilistic XML tree PT. Then,
![]() | (1) |
![]() | (2) |
Theorem 1: Let ,
,
. Then,
. So,
can be partitioned to some sets of the equivalent path.
Proof: From definition 5 and 6, the expression:
![]() | (3) |
comes into existence.
According to definition 8,
![]() |
then:
![]() | (4) |
![]() | (5) |
Definition 9: The absolute pathapath is equal to the extended absolute pathepath if the leaf nodes in the extended absolute path are deleted. The relation between apath and epath is called path consistent. there is one and only one absolute path
, when
, so epath is consist with apath which is denoted by the equation:
![]() |
Theorem 2: Let,
, then |PATHPT|, so, DPATHPT can be partitioned to some sets of the consistent paths.
Proof: From definition 5 and 6,the inequation comes into existence.
From definition 9 ,
![]() | (6) |
if epathapath, then
![]() | (7) |
![]() | (8) |
ALGORITHM FOR CONVERSION PROBABILISTIC RELATIONAL SCHEMA TO PROBABILISTIC XML SCHEMA
The common method of converting probabilistic relation schema to probabilistic DTD is directly mapping. So the mapping rules are proposed which are listed as follows.
Rule 1: The basic absolute path set pathPR can be com- puted by the attribute set of probabilistic relation PR.
Let, C = {ci,...,cm} be the attribute set of probabilistic relation PR, then the absolute path set of probabilistic XML data tree PT is pathPR = {PR/c1,...,PR/cm}. And the result of union operation of absolute path set from PRS is:
![]() | (9) |
Rule 2: The extended absolute path set epathPR can be computed by the attribute set of probabilistic relation PR.
Let, the attribute set C = {ci,...,cm} of probabilistic relation PR, then the extended absolute path set of probabilistic XML data tree PT is shown as follows:
![]() | (10) |
And the result of union operation of extended path set from PRSis shown as follows:
![]() |
Rule 3: The candidate absolute path set cpathPR can be computed by the attribute set of probabilistic relation PR.
Let, basic absolute path set be PATHPRS and the can absolute path set be EPATHPRS, then the candidate absolute path set CPATHPT is shown as follows:
![]() | (11) |
According to the mapping rules, the algorithm process of conversion probabilistic relation schema to probabilis-tic XML schema is divided three steps. The root node r is firstly obtained from probabilistic relation schema PRS as the start tag of PDTD. The child node of r is secondly obtainned from probabilistic relation PR. At last the absolute path set is obtained from the attribute set of probabilistic relation PR. So, the algorithm is shown as follows.
Algorithm 1: PRS2PDTD (PRS)![]() |
Theorem 3: Algorithm PRS2PDTD () for converting the probabilistic relation schema to probabilistic XML data is correct and complete.
Proof: Definition 1, 3, 5, 6, 7, 8 and the mapping rules offer methods to compute PRS2PDTD (PRS) and prove themselves when PRS is a set of probabilistic relational schema in the form of 1-NF. The algorithm fulfills the methods offered by these definition and these theorems. Therefore, the algorithm can correctly compute PRS2PDTD (PRS) for any PR in PRS. Thus the algorithm is correct and complete.
ALGORITHM OF CONVERSION PROBABILISTIC RELATION DATA TO PROBABILISTIC XML DATA TREE
The method of conversion probabilistic relation data to probabilistic XML data is to convert the triples of probabilistic relation data to the absolute path. So the mapping rules are proposed which are listed as follows.
Rule: The absolute path set apathPR can be computed by the attribute set of probabilistic relation PR.
Let, the sattribute set C = {c1,...,cm} of probabilistic relation PR, then the absolute path set of probabilistic XML data tree is shown as follows:
![]() | (12) |
And the result of union operation of absolute path set of probabilistic XML data tree apathPRS is:
![]() | (13) |
According to the mapping rule, the algorithm process of conversion probabilistic relation data to probabilistic XML data is proposed which is shown as follows.
The root node r is firstly obtained from probabilistic relation data PRS as the root of PT. The absolute path setapathPR is secondly obtained from probabilistic relation PR. At last the absolute path set is created from the attribute set of probabilistic relation PR.
So, the algorithm is shown as follows.
Algorithm 2: PRD2PT(PRS)![]() |
Theorem 4: Algorithm PRD2PT () for converting the probabilistic relational data to probabilistic XML data tree is correct and complete.
Proof: Definition 1, 2, 3 and the mapping rule offer methods to compute PRD2PT (PRS) and prove themselves when PRS is a set of probabilistic relational data in the form of 1-NF. The algorithm fulfills the methods offered by these definitions and the rule. Therefore, the algorithm can correctly compute PRD2PT (PRS) for any PR in PRS. Thus, the algorithm is correct and complete.
INSTANCE ANALYSIS
The purpose of this section is to confirm that algorithm PRS2PDTD and PRD2PT are correct by comparing the results of the algorithms with the teacher probabilistic relation.
Example 1: The teacher probabilistic data about rank, department and course are represented in the probabilistic relational data table. The teacher probabilistic relational schema is composed of Table 1-4 where no is the key attribute in Table 1. The <rank,ps> attribute in Table 2 represents the rank and its distribution data, the <dept,ps> attribute in Table 3 represents the department and its distribution data, and the <course,ps> attribute in Table 4 represents the teaching course and its distribution data.
From the tables, the probabilistic relational schema is shown as follows.
![]() |
The attribute set of teacher is shown as follows.
![]() |
So, the basic absolute path set is computed which is shown as follows.
![]() |
The elements of APATH is also shown in Fig. 1.
The extended absolute path set is computed which is shown as follows.
![]() |
The elements of EPATH is also shown in Fig. 2.
The candidate absolute path set is computed which is shown as follows.
CPATH={ teacher/no,teacher/name} |
The elements of CPATH is also shown in Fig. 3.
Table 1: | Teacher-name |
![]() |
Table 2: | Teacher-rank |
![]() |
Table 3: | Teacher-department |
![]() |
Table 4: | Teacher-course |
![]() |
![]() | |
Fig. 1: | Basic absolute path set, (a) teacher/no, (b) teacher/name, (c) teacher/rank, (d) teacher/dept and (e) teacher/course |
![]() | |
Fig. 2: | Extended absolute path set, (a) teacher/no, (b) teacher/name, (c) teacher/rank/dist/mux/poss, (d) teacher/dept/dist/mux/poss and (e) teacher/course/dist/mux/poss |
![]() | |
Fig. 3: | Candidate absolute path set, (a) teacher/no and (b) teacher/name |
![]() | |
Fig. 4: | Teacher probabilistic XML data tree PT |
Based on the analysis of path type, the PDTD document is computed by algorithm PRS2PDTD. PDTD document of teacher is shown as follows:
PDTD document of teacher:![]() |
The probabilistic XML data tree is computed by algo- rithm PRD2PT which is shown in Fig. 4.
CONCLUSIONS
In this study we present algorithms to solve the problem of conversion probabilistic relational schema to probabilistic DTD and conversion probabilistic relational data to probabilistic XML data. The contribution is original and useful in itself, additionally, it will be helpful to convert the existing probabilistic relational data to probabilistic XML data and probabilistic XML data become the probabilistic data exchange standard.
ACKNOWLEDGMENTS
Authors would like to thank National Natural Science Foundation of China and Hei longjiang National Science Grant No. 60673136 and No. F200702, for funding this research. The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.
REFERENCES
- Haw, S.C. and C.S. Lee, 2007. Structural query optimization in native XML databases: A hybrid approach. J. Applied Sci., 7: 2934-2946.
CrossRefDirect Link - Nath, U.K.D. and D.N. Batanov, 2005. Comparative analysis of three promising XML query languages and some recommendations. Inform. Technol. J., 4: 439-444.
CrossRefDirect Link - De Keijzer, A. and M. van Keulen, 2007. User feedback in probabilistic integration. Proceedings of the 18th International Workshop on Database and Expert Systems Applications,Proceedings of the 18th International Workshop on Database and Expert Systems Applications, Sept. 3-7, Regensburg, Germany, pp: 377-381.
- Zhao, Z., A. Dekhtyar and J. Goldsmith, 2005. A framework for management of semistructured probabilistic data. J. Intell. Inform. Syst., 25: 293-332.
CrossRef - Magnani, M. and D. Montesi, 2008. Management of interval probabilistic data. Acta Inform., 45: 93-130.
CrossRef - Li, T., Q. Shao and Y. Chen, 2006. PEPX: A query-friendly probabilistic XML database. Proceedings of the 15th ACM International Conference on Information and Knowledge Management, Nov. 6-11, ACM Press, New York, pp: 848-849.
CrossRef - Barbar, D., H.G. Molina and D. Porter, 1992. The management of probabilistic data. IEEE Trans. Knowl. Data Eng., 4: 487-502.
Direct Link - Anthony, L., R. Alhajj and K. Barke, 2006. VIREX: Visual relational to XML conversion tool. J. Visual Languages Comput., 17: 25-45.
CrossRef - Sengupta, G.J., 2010. Regression testing method based on XML schema for GUI components. J. Software Eng., 4: 137-146.
CrossRefDirect Link