Abstract: In grid computing environment, we can create a global ontology for multiple local ontologies to integrate geographically distributed data sources. However, some data sources are likely to dynamically join and quit grid environment. Thus, it is very important for data semantic access to build dynamic mapping relation between global ontology and local ontologies. In this study, we present mapping algorithms of joining and quitting for data sources in grid system. When a new data source joins grid system, elements mapping between global ontology and its local ontology can be rapidly established through using the method of subtree interception and the corresponding global elements should be added to active elements set. During processing of quitting algorithms, elements mapping between global ontology and the quitted ontology should be canceled and the corresponding global elements need to be move to suspended elements set. In semantic data operation, we regulate only mappings generated from active elements set can be employed for data semantic query. Experimental results show that proposed algorithm effectively fulfils dynamic mappings between global ontology and local ontologies and improves the accuracy of user semantic query in grid environment.
INTRODUCTION
Grid computing aims to gather geographically distributed resources such as computational power, data, software, equipment to form a virtual organization which can provide users with integrative services of information and application (Foster and Kesselman, 1999). Data resources in one grid system, especially in a virtual organizational context, usually have larger correlation. Related data resources were integrated to provide services, which usually can generate more systemic and more comprehensive data sets and even improve reliability and performance of data services. However, these related data sources may have heterogeneities in data formats and storage structures. In addition, there may be man-made semantic confusions of "synonyms" and "homonyms" among them. Therefore, in order to better achieve access and management of related data sources, we can use ontology technology (Rubiolo et al., 2012) to solve the semantic issues in grid computing environment. Global ontology and local ontologies usually need to be built for application grid to deal with massive data. This paper assumes that the global ontology and local ontologies are established with LAV scheme (McBrien and Poulovassilis, 2003). Global ontology abstracted and synthesized from data sources can be obtained through integrating domain information included in data sources, as well as describes system global view, so it likes a shared vocabulary which built knowledge model for domain waiting to be integrated and provided common semantic description for data integration (Qasim and Khan, 2009). Local ontology is term relative to global ontology. Each data source has an ontology (called local ontology) that describes its data information and establishes mapping relation with it. Local ontology usually establishes relationship with global ontology using some form of mapping. With the mapping relation, queries on global ontology can be decomposed to subqueries to which data sources are corresponding (Zuofu et al., 2011). Various data sources can freely join and quit grid system as grid computing is inherently distributed and dynamic. How to solve the problem of mapping between global ontology and local ontologies caused by dynamic of data sources is a key link for data integration in grid system. So, this study presents a dynamic mapping strategy between global ontology and local ontologies which uses joining mapping algorithm and quitting mapping algorithm respectively to achieve effective mappings between global elements and local elements.
DESCRIPTION OF ONTOLOGY AND MAPPING
In this study, an ontology is defined as a 6-tuple: O: = <C, P, HC, HP, A, I> (Pirro and Talia, 2010), where C is set of concepts, P is set of properties, HC denotes concept hierarchy which associates each concept with its sub-concepts, HP denotes property hierarchy that associates each property with its sub-properties, A is set of axioms, I is a set that contains all instances of concepts and properties. So, global ontology is represented as GO = (GC,GP,HGC,HGP,GA,GI); local ontology denotes as LOi = (Ci, Pi, Hci, Hpi,Ai, Ii), i = 1, 2, . Mappings between global ontology GO and local ontology LOi is represented as Mi = <Mi(c), Mi(p)>, where Mi(c) is set of concept mappings expressed as Mi(c) = {<cG,cL>|cGεGC,cLεCi}, Mi(p) is set of property mappings expressed as Mi(p) = {<pG,pL>|pGεGP, pLεPi}.
SIMILARITY AGGREGATING
Elements belonging to different ontologies are usually established mappings based on their similarity. However, a certain kind of similarity is difficult to accurately describe the degree of element similarity. So, it is necessary for us to integrate various similarities between elements. In this paper, we assume that name similarity, profile similarity, structure similarity and semantic similarity between elements (Mao et al., 2010) are represented as Sim1(ei, ej), Sim2(ei, ej), Sim3(ei, ej), Sim4(ei, ej), respectively. Then C1_Sim and C2_Sim, as comprehensive similarities between elements, can be computed using the following expressions:
(1) |
(2) |
where, α means weight assigned to a certain similarity and satisfying Σ2αi = 1. The computing formula of α is as follows:
(3) |
The sig in Eq. 1 is sigmoid function (Kros et al., 2006) and defined as follows:
(4) |
where, a is the central point of sigmoid function. In Eq. 2, β also means weight assigned to a certain similarity and satisfying Σβi = 1 too.
Through calculating the values of C1_Sim and C2_Sim with Eq. 1 and 2, respectively, we can integrate C1_Sim and C2_Sim using F-Measure method (Chen et al., 2004), the integration formula is as follows:
(5) |
DYNAMIC MAPPING
Various data sources are likely to dynamically join and quit application grid system in data integration environment based on grid computing. In order to effectively solve the problem of mapping between global ontology and local ontologies in grid, this paper proposes a dynamic mapping strategy, its steps Fig. 1 are as follows:
Step 1: | Data sources need to register in resources registry whether they join or quit grid system, then resources registry pass the information of joining or quitting grid system for data sources to ontology manager |
Step 2: | After ontology manager have received changed information of data sources, ontology manager obtains mapping information between global ontology and corresponding local ontology from mapping storage center if it detected that data source will quit grid system and delivers mapping information of quitting data source to quitting mapping module of dynamic mapper which can carry out quitting processing; ontology manager builds the corresponding local ontology for data source if it detected that data source will join grid system, then delivers global ontology and new local ontology to joining mapping module of dynamic mapper which can create mapping relation between global ontology and joining local ontology |
Step 3: | When all of mapping tasks were completed by dynamic mapper, new mapping relations between global ontology and local ontologies will be stored in mapping storage center |
Fig. 1: | Processing of dynamic mapping |
Fig. 2(a-b): | Subtree intercepting for joining algorithm, (a) Global ontology tree and (b) Intercepted subtree |
Joining mapping: In this study, global ontology tree is represented as GOT = <GR,GV,GE>, where GR is root node of global ontology tree, GV is set of vertexes belonging to GOT, GE is set of edges belonging to GOT; Local ontology tree denotes as LOTi = <LRi,LVi,LEi>, where i = 1,2,3, .
Definition 1 (Subtree Interception): With the specified node SRεGC as root node, a subtree intercepted from global ontology tree is represented as ST = <SR, SV, SE>, where SR is the root node of ST, SV (SV⊆GV) is vertex set of ST, SE(SE⊆GE) is edge set of ST. For example, Fig. 2a is a global ontology tree of Disease, subtree with Respiratory Tract Infection as root node intercepted from Disease is shown in Fig. 2b.
Definition 2 (active element set): If elements in global ontology can serve for the current tasks of data semantic processing in grid, the set of these elements can denote as AES (Active Element Set), namely AES = {AC, AP}, where AC (AC⊆GC) are set of active concepts, AP (AP⊆GP) are set of active properties, SC∩AC = Ø, SP∩AP = Ø.
Definition 3 (suspended element set): If elements in global ontology cant serve for data semantic processing of current tasks in grid due to data sources quit grid system dynamically, the set of these elements can be represented as SES(Suspended Element Set), namely SES = {SC,SP},where SC(SC⊆GC) is set of suspended concepts, SP(SP⊆GP) is set of suspended properties, SC∩AC = Ø, SP∩AP = Ø.
When a new data source joins grid system and builds its local ontology LOj, the corresponding ontology tree LOTj firstly needs to be constructed for LOj. Then, we should compute similarities between LRj and each concept in GC and find out the ci which satisfy the condition of max (sim (Lrj, ci|i = 1,2, )). Next, we intercept subtree ST (ST = <ci, SV, SE>) from GOT with ci as root node and build mapping pairs between elements of ST and elements of LOTj in accordance with procedure of similarity computing and similarity integrating. If some local elements in LOTj cant form mapping with global elements in ST, we will establish mapping for Rest (remaining elements set of LOTj) and SES again. The detailed algorithm of joining mapping is as follows:
Quitting mapping: With the aforementioned definition, we can divide elements in global ontology into AES and SES two categories. When some data source to which Loi is the corresponding local ontology quits grid system, all of mapping element pairs <gej, lej> (j = 1,2, ) between GO and LOi need to be investigated. If Σlet (letεLOk, k≠i), the mapping relation <gej, let> was established, gej still belongs to AES, or gej will be moved to SES from AES. The detailed algorithm of quitting mapping is as follows:
EXPERIMENTAL RESULTS AND ANALYSIS
In order to verify effectiveness of dynamic mapping strategy in this study, we carried out dynamic mapping experiments between global ontology and local ontologies in simulated grid environment GridSim. We mainly evaluated two aspects of these experiments: (1) semantic query accuracy and (2) time and efficiency of dynamic mapping. Experimental data was achieved from six data sources of Edas, openConf, Ekaw, Confious, lasted and Linklings in OAEI Conference. We firstly built global ontology (i.e. conference) and local ontologies for these data sources with Protégé tool (Table 1).
Data source can dynamically quit grid system due to inherent particularity of grid environment. In order to ensure accuracy of semantic query, this paper proposed quitting mapping algorithm for data sources in grid system. In the situation that some data sources quitting grid system, we conducted a total of 6 semantic queries.
Fig. 3: | Accuracies comparison of semantic query related to quitting data sources |
Fig. 4: | Accuracies comparison of semantic query not related to quitting data sources |
Table 1: | Experimental ontologies description |
Accuracies of semantic queries related to quitting data sources are shown in Fig. 3. Figure 3 shows that accuracies of semantic queries referred to quitting data sources are greatly affected without processing of quitting mapping while semantic queries with processing of quitting mapping obtain very high accuracy. Accuracies for semantic queries which are not related to quitting data sources are shown in Fig. 4. We can know from Fig. 4 that accuracies of any semantic query are almost the same whether quitting mapping is carried out or not. So, accuracy of any semantic query relies heavily on whether grid system includes the data source referred to the query.
We should build mapping relation between global ontology and its local ontology when a new data source joins grid system in order not to affect data semantic processing. So, this paper proposed joining mapping algorithm for data sources in grid environment.
Fig. 5(a-b): | Comparison of elapsed time and mapping accuracy for ASMOV, GLUE and Join mapping method |
Joining mapping experiments based on 6 data sources were conducted with GLUE (Park and Kim, 2007), ASMOV (Jean-Mary et al., 2009) and joining mapping method presented in this paper. Experimental results on elapsed time and mapping accuracy is shown in Fig. 5. Figure 5 indicates that mapping accuracy of join mapping method is slightly less than GLUE and ASMOV, but its elapsed time is far lower than GLUE and ASMOV. However, task execution time is an important factor for evaluating performance in grid system, so the proposed method is effective.
CONCLUSION AND FUTURE WORK
Various data resources are likely to dynamically join and leave grid system in data integration environment based on grid computing. This paper proposed a dynamical mapping strategy so as to timely establish mapping relation for ontologies in grid system. The strategy solves dynamic mappings between global ontology and local ontologies with joining and quitting algorithms. The experimental results show that algorithms presented in this paper are helpful to data semantic processing, while query accuracy and mapping accuracy obtained through these algorithms remain to be improved. In the future, we will study expansion optimization of user semantic query based on work achieved in this study.
ACKNOWLEDGMENT
This study was supported by National Natural Science Foundation of China (Grant no. 51277015), Hunan Province Science and Technology Planning Project of China (Grant no. 2011GK3114).