Subscribe Now Subscribe Today
Research Article
 

Distributed Knowledge Integration Based on Intelligent Topic Map



Huimin Lu and Boqin Feng
 
Facebook Twitter Digg Reddit Linkedin StumbleUpon E-mail
ABSTRACT

We propose a novel concept of Intelligent Topic Map, which extends the conventional topic map in structure and enhances the reasoning functions. With the Intelligent Topic Map as infrastructure, a mechanism of distributed knowledge integration is designed. The structure is divided into three layers: local Intelligent Topic Map layer, similarity measure layer and global Intelligent Topic Map layer. It provides a uniform query interface to a multitude of knowledge sources and lays the foundation for high-quality knowledge services. Moreover, we propose a new similarity measure algorithm based on comprehensive information theory and merging rules for knowledge integration. The experimental results show that our method is feasible and it has the significance of reference and value of further study for the distributed knowledge integration.

Services
Related Articles in ASCI
Search in Google Scholar
View Citation
Report Citation

 
  How to cite this article:

Huimin Lu and Boqin Feng, 2010. Distributed Knowledge Integration Based on Intelligent Topic Map. Information Technology Journal, 9: 132-138.

DOI: 10.3923/itj.2010.132.138

URL: https://scialert.net/abstract/?doi=itj.2010.132.138
 

INTRODUCTION

Along with the up-rising of knowledge economy, massive amounts of knowledge which are often geographically distributed and owned by different organization are being mined (Zhang et al., 2008). There is a need to provide solutions that integrate knowledge from different sources and make them available for application queries. Knowledge Integration (KI) plays the role of giving a common representation for the different information sources handled it and offers users a global view of the information sources that can be accessed (Seng and Kong, 2009). The KI is a complicated task because it requires creating a common data model, finding semantic correspondences between two entities, satisfying the merge requirements and generating the duplicate free entities, etc. However, previous works find the semantic correspondences between entities rather than entity merging. They do not consider defining merge problems and providing solutions to those problems.

Topic map is a new ISO standard (ISO/IEC 13250) (ISO/IEC JTC 1/SC34 N323, 2002; ISO/IEC, 2008) for describing knowledge structures and associating them with information resources. It absorbs the ideas contained in the semantic web. The semantic organization and joining between the physical resource entities and the abstract concepts are implemented.

Previously, many methods used an object model to deal with the integration problem of distributed information sources (Tomasic et al., 1998; Carey et al., 1995). Such object models are represented in different forms. After XML standardization, many researches choose XML as the underlying data model (Baru et al., 1999). XML has been the W3C standard document format for exchanging information on the Web. It is the lowest common denominator for integration tasking. However, while XML can indeed establish interoperability between different information sources on the Web, its main limitation is that it copes only with structural heterogeneity and it can barely handle semantic heterogeneity (Seng and Kong, 2009). So, ontology is employed to tackle not only structure but also semantic interoperability in information integration. An ontology defines a set of representational primitives with which to model a domain of knowledge or discourse. The representational primitives are typically classes, attributes and relationships. Ontology plays a key role in providing a shared terminology and supporting for the semantic representation and integration process. For example, FCA-Merge (Stumme and Madche, 2001) used the FCA (Formal Concept Analysis) method to merge between ontologies sharing a set of instances. Ontology reconciliation techniques such as merging, alignment and integration were focused on providing a formal description (Silva et al., 2007).

The method based on topic map uses XML as the common data model and adds ontology to enable the information integration. Merging between topic maps, which combines two topic maps to create a new ontology based on their semantic correspondences. XTM (XML Topic Maps) 1.0 specification (Pepper and Moore, 2001) describes how to merge between entities of topic maps to produce an integrated entity. However, the merging method proposed by topic maps standard community processes integration only between equivalent entities. The method cannot merge between entities which have different structures but have semantic correspondences. In order to overcome the above shortcomings, many researches proposed merging approach to find correspondences between ontologies based on the syntactic or semantic characteristics and constraints of the Topic Maps (Lu et al., 2008; Korthaus et al., 2009).

In this study, we propose a novel concept of Intelligent Topic Map (ITM), Lu and Feng (2009) construct the KI framework based on ITM. We define a detailed process for ITM merging. First, local ITMs for local knowledge resources are generated. Next, the similarities of local ITMs are computed and then the topic pairs and the knowledge element pairs which have high similarity are found respectively. Finally, the global ITM is generated by merging local ITMs according to specific rules.

INTELLIGENT TOPIC MAP

The structure of conventional topic map composed of Topics, Associations and Occurrences (TAO) (Pepper, 2001), which is shown in Fig. 1.

Topics define the concepts. Associations define the relationships between the topics and could represent arbitrary number of roles among arbitrary number of topics. Occurrences link the information resources (e.g., documents) with topics. Topic maps are dubbed the GPS (Global Positioning System) of the information universe. Topic maps are also destined to provide powerful new ways of navigating large and interconnected corpora, but the conventional topic maps can not describe the relationships between knowledge elements. Moreover, as the knowledge resources becoming mass, the only topic level is difficult to locate the knowledge points and can not provide users with efficient knowledge navigation. Conventional topic map is a graphical index but lack of knowledge reasoning abilities and we unable to acquire implicit knowledge.

Extended topic map in structure: In our framework of the ITM, we define a clustering level above the topic level. Furthermore, a knowledge element level is inserted above the resource level. The structure of ITM is shown in Fig. 2.

The ITM establishes a novel multi-resource knowledge organization which depicts the hierarchical relationship cluster-topic-knowledge element-occurrence.

Image for - Distributed Knowledge Integration Based on Intelligent Topic Map
Fig. 1: The structure of conventional topic map

Image for - Distributed Knowledge Integration Based on Intelligent Topic Map
Fig. 2: The structure of extended topic map

ITM organizes knowledge from four levels: cluster level, topic level, knowledge element level and resource level. It constructs multi-granularity knowledge representation architecture which includes clusters, topics, knowledge elements, associations and occurrences. Knowledge elements allow users to access to more detailed knowledge information and provide knowledge elements navigation. Each cluster contains several closely related topics. Clusters provide the effective navigation and browsing mechanism for users after processing the topics by clustering analysis. Clustering analysis is the assignment of a set of topics into subsets (called clusters) so that topics in the same cluster are similar in some sense. The expression of multi-level, multi-granularity and inner relevant characteristics of knowledge is improved.

Knowledge reasoning: The implicit knowledge is acquired by the reasoning based on the custom rules or internal rules. Knowledge reasoning mainly includes Relationship Type Reasoning, Association Reasoning, Knowledge Architecture Reasoning and Order Reasoning, etc. In this study, we discuss the knowledge architecture reasoning which is related to knowledge visualization display. Knowledge architecture reasoning can obtain the level and class structure of the knowledge. Knowledge architecture reasoning mainly implements the following function. Given knowledge node, knowledge architecture reasoning return the cluster, all the topics and knowledge elements associated with the node within a certain knowledge radius. In the ITM, if there is a concept sequence Cp, C1, C2,..., Cm, Cq and there are association-between (Cp, C1), (C1, C2),..., (Cm, Cq), then we said there exist a knowledge path between concept Cp and Cq. Association-between (Ci, Cj) denotes that the concept Ci is directly related to the concept Cj. the knowledge radius is the number of concepts traversed in a knowledge path, i.e. the length of the path. Reasoning results is shown in Fig. 3.

Image for - Distributed Knowledge Integration Based on Intelligent Topic Map
Fig. 3: Topic as the user interest node

ITM extends the conventional topic map in structure and enhances the reasoning functions.

DISTRIBUTED KNOWLEDGE INTEGRATION

Distributed knowledge integration based on ITM is realized by merging local ITMs. Local ITM merging is divided into three layers: local ITM layer, similarity measure layer and global ITM layer. The structure is shown in Fig. 4.

Local ITM layer: The distributed knowledge resource is managed by a knowledge logical organization model, which is based on ITMs. Local ITMs for local knowledge resources are generated. We perform extraction of the elements of ITM to obtain the topics, the knowledge elements, the relationships between topics and the relationships between knowledge elements. Topics and knowledge elements extraction is the scope of information extraction, but the relationships between topics and the relationships between knowledge elements are acquired based on semantic understanding. And then we cluster the topics to get the clusters. After local ITM elements are extracted, local ITM will be generated.

Similarity measure layer: Similarity computing is the prerequisite and basis for ITMs merging. Many researchers have done a lot of work in this area. Subject Identity Measure (SIM) (Maicher and Witschel, 2004) was used to measure the similarity between topics based on their name similarity and occurrence similarity. TM-MAP (Kim et al., 2007) was a multi-strategic matching technique, which measured four facets of similarity: name-based similarity, property-based similarity, hierarchy-based similarity and association-based similarity. Topic and Occurrence-oriented Merging (TOM) (Wu et al., 2006) can be implemented when two topics may be merged, after establishing that there is topic name similarity and occurrence data/resource similarity and so on.

Image for - Distributed Knowledge Integration Based on Intelligent Topic Map
Fig. 4: The structure of knowledge integration based on ITM

However, most of them can only operate at syntactic level. They are statistical methods, which are purely based on the character composition of two words. The semantic similarity and pragmatic relevance are not being considered. We propose the similarity measure method based on comprehensive information theory. The process used in the similarity algorithm consists of syntactic matching, semantic matching and pragmatic matching. The algorithm is summarized as follows:

Image for - Distributed Knowledge Integration Based on Intelligent Topic Map

Syntactic matching: Syntactic matching is used to compute the syntactic similarity by analyzing the character composition of topics or knowledge elements.

When linking a pair of topics (or knowledge elements), the syntactic similarity SIMsyntactic (w1, w2) is defined as follows:

Image for - Distributed Knowledge Integration Based on Intelligent Topic Map
(1)

The c denotes the number of characters of the largest common substring contained in two words. |w1| and |w2| denotes the number of characters of a pair of topics (or knowledge elements).

Semantic matching: Semantic matching analyses the static semantic similarity with aspect to synonyms. A pair of topics (or knowledge elements) is given. It is assumed that topics (or knowledge elements) are words and ES is the set of sense similarity valueES = {sv1, sv2,..., svmxn}. ES is divided into four intervals: A: [0.0, 0.1) B: [0.1, 0.2) C: [0.2, 0.8) D: [0.8, 1.0). We analyze the contribution of these four intervals in words similarity cognitive ambiguity and certainty. semantic similarity is defined as follows:

Image for - Distributed Knowledge Integration Based on Intelligent Topic Map
(2)

Sense similarity is defined as follows:

Image for - Distributed Knowledge Integration Based on Intelligent Topic Map
(3)

where, β1 is weight and SIMMP is mainly sememe similarity, SIMOP is other sememe similarity, SIMRP is relative sememe similarity and SIMSP is symbol sememe similarity.

Pragmatic matching: Pragmatic matching computes dynamic semantic similarity, which resolves the problem of polysemy. It considers the pragmatic relevance in linguistic context. When linking a pair of topics, the pragmatic similarity SIMpragmatic (Ta, Tb) is defined as follows:

Image for - Distributed Knowledge Integration Based on Intelligent Topic Map
(4)

where, SIMpt (CTa, CTb) is the similarity between set CTa and CTb. CTa is the set of all topics which are directly related to the topic Ta. CTb is the set of all topics which are directly related to the topic Tb. SIMpt (CTa, CTb) is defined as follows:

Image for - Distributed Knowledge Integration Based on Intelligent Topic Map
(5)

where, SIMpk (CKa, CKb) is the similarity between set CKa and CKb. CKa is the set of all knowledge elements which are directly related to the topic Ta. CKb is the set of all knowledge elements which are directly related to the topic Tb. SIMpk (CKa, CKb) can be calculated by the same as Eq. 5.

The measure based on comprehensive information not only considers the similarity of character composition of two words, but also considers the meaning and relevance in linguistic context. It solves the problem of synonym and polysemy and improves the veracity of similarity measuring.

Global ITM layer: This layer implements knowledge integration by merging local ITMs. Topic maps merging describe the process of integrating two topic maps into a new topic map. Merging operation ITMM (Intelligent Topic Map Merging) is defined as the following expression:

Image for - Distributed Knowledge Integration Based on Intelligent Topic Map
(6)

We propose the method of ITM merging based on rule engine. The merging rules are described by rule description language based on topic map. The rule descriptions are saved in the rule documents and the documents are loaded and parsed by rule engine. Topic-merging rule and association-merging rule are defined as follows:

Rule 1: Topic-merging rule. If topic T1 in ITMa has high similarity with T2 in ITMb, the two topics must be merged into a single topic (T1 or T2) in intelligent topic map ITMc. The same is true for knowledge element
Rule 2: Association-merging rule. Consider that in the ITMa, a relationship Ra (Ta1, Ta2) exists between two topics of Ta1 and Ta2. Also, consider that in ITMb, a relationship Rb (Tb1, Tb2) exists between two topics of Tb1 and Tb2. A merged topic Tc1 of Ta1 and Tb1 has two relationships Ra (Tc1, Ta2) and Rb (Tc1, Tb2). The same is true for the relationship of knowledge element. The rule is depicted in Fig. 5.

Image for - Distributed Knowledge Integration Based on Intelligent Topic Map
Fig. 5: The rule of association merging

The process of ITM merging based on rule engine is divided into four steps:

Step 1: Merging rules added
RuleBase rb = RuleBaseFactory.getRuleBase();
TopicMap ruleName = null;
String ruleNamePath = "ruleName.xml";
rule1 = Transformer.File2Obj (ruleNamePath);
rb.addEtm (ruleNamePath);

Step 2: The high similarity element pairs are loaded into Working Memory
WorkingMemory workingMemory = new WorkingMemory();
TopicMap tm = merger.getEtm();
workingMemory.insert(tm);
rb.setWorkingMemory (workingMemory);

Step 3: Configure executing engine
Agenda agenda = new Agenda();
HashMap < String, ExecEngine> execMap = new HashMap < String, ExecEngine>();
execMap.put("ruleName", new ruleNameExecutor());
agenda.setExecMap(execMap);
rb.setAgenda(agenda);

Step 4: Executing the rules in Agenda, until to all the rules in Agenda are finished.
rb.matchAllRules();
rb.getAgenda().execute();

SYSTEM FRAMEWORK

The system framework of distributed knowledge integration based on ITM mainly includes two modules: background processing and foreground processing, as shown in Fig. 6.

Background processing: Background processing module includes the following functions:

Image for - Distributed Knowledge Integration Based on Intelligent Topic Map
Fig. 6:The system framework of distributed knowledge integration based on ITM

Knowledge storage: Since knowledge resources are massive, knowledge distributed storage are built up. Local ITMs for local knowledge resources are generated and local autonomy management is carried out
Knowledge processing: Knowledge discovery extracts and defines the new patterns from the original data set, which includes data preparation, data mining and outcome interpretation and evaluation and realizes knowledge reasoning
Knowledge organization and management: It implements the centralized management of local ITMs merging, global ITM, metadata, indexing and the interest models of users. In the mean time, this step also accomplishes the optimizing process of user query

Foreground processing: The mainly function of foreground processing is implements interactive learning with user. Portal is an interface between end users and web service. Here, it represents the new knowledge service for web accessing. It provides hot authoritative resource discovery, knowledge navigation, knowledge recommendation, knowledge visualization display and so on.

The system framework of distributed knowledge integration based on ITM implements distributed storage, centralized management and dynamic data integration of knowledge resources. It constructs the global views during the information integration and provides good knowledge services for users.

EXPERIMENTAL RESULTS

We evaluate the effectiveness of the proposed ITM-based distributed knowledge integration system model from similarity algorithm, because similarity algorithm is the foundation of ITM merging and its performance is directly related to the quality of merging. It is applied in a part of the knowledge domain of computer network (Table 1).

In this study, we use performance measurement of information retrieval F (F-measure). We get true-positive set (TP) which includes correctly identified matches, false-positive set (FP) includes false matches and false-negative set (FN) which includes missed matches. We can measure match quality of automatic matching process by evaluating following expression.

Table 1: The experimental data of merging
Image for - Distributed Knowledge Integration Based on Intelligent Topic Map
TN: Topic No., KeN: Knowledge element No., ATN: Association No. between topics, AKeN: Association No. between knowledge elements, ATKeN: Association No. between topics and knowledge elements

Image for - Distributed Knowledge Integration Based on Intelligent Topic Map
Fig. 7: F-measures of SIM, TM-MAP, ETMSC similarity algorithms

Image for - Distributed Knowledge Integration Based on Intelligent Topic Map
(7)

We compare our method named ETMSC with other topic maps similarity algorithms which are called SIM and TM-MAP (TOM is similar as SIM, which measures the topic similarity and occurrence similarity on syntactic level). Figure 7 shows the experimental result that F-measures of similarity algorithms each other.

The experimental results indicate that ETMSC has higher F-measure than SIM and TM-MAP, when the threshold is more than 0.55. Because SIM measures the similarity between topics based on their names similarity and occurrence similarity, it does not consider the external structures of topic maps, such as hierarchy and association. TM-MAP measured four facets of similarity: name-based similarity, property-based similarity, hierarchy-based similarity and association-based similarity. However, our similarity algorithm ETMSC comprehensively considers syntactic matching, semantic matching and pragmatic matching. Compared with the traditional algorithms which purely based on the syntactic or semantic similarity, the F-measure of ETMSC is improved by 9.2-11.1%.

Compared with the traditional algorithms which purely based on the syntactic or semantic similarity, the accuracy of ETMSC is improved, but it has great relationship with threshold selection. Threshold selection is relative to the importance degree of similarity and the different levels of fusion demand. Threshold selection should give full consideration to achieve the desired objectives and outcomes of the time and effort. Before determining the threshold, the test must be carried out in a certain amount of target data. Threshold should be selected in the average F-value of best-case scenario.

CONCLUSION AND FUTURE WORK

This study has presented a mechanism of the distributed knowledge integration based on a new concept of ITM. Using ITM in distributed knowledge integration field is a novel direction and presents a new way for distributed knowledge management. We concisely summarize the main advantages of our proposed framework as follows: (1) ITM not only express the multi-level, multi-granularity of knowledge, but also fully reflect the association between the knowledge and the information resources related to the knowledge. It contains rich knowledge and information. (2) Distributed knowledge integration is easy to realize by merging local ITMs into a global ITM and (3) Graphic display based on ITM is more perceivable, it can provide visual knowledge navigation.

The dynamic updating mechanism, automatic adaptation and real system deployment of ITMs are the essential future work. We hope that, from our initiative framework, the standards could be made and the real system will be widely deployed in the future.

ACKNOWLEDGMENTS

This research is sponsored by the National High-Tech Research and Development Plan of China under Grant No. 2008AA01Z131; The National Natural Science Foundation of China under Grant No. 60803162. This study is also partially supported by National High-Tech Research and Development Plan of China under Grant No. 2008AA01Z136.

REFERENCES

1:  Baru, C., A. Gupta, B. Ludascher, R. Marciano, Y. Papakonstantinou, P. Velikhov and V. Chu, 1999. XML-based information mediation with MIX. ACM SIGMOD Record, 28: 597-599.
Direct Link  |  

2:  Carey, M.J., L.M. Hass, P.M. Schwarz, M. Arya and W.F. Cody et al., 1995. Towards heterogeneous multimedia information systems: The garlic approach. Proceedings of the 5th International Workshop on Research Issues in Data Engineering-Distributed Object Management, March 6-7, Taipei, Taiwan, pp: 124-131

3:  ISO/IEC, 2008. Information technology topic maps part 2: Data model. http://www.isotopicmaps.org/sam/sam-model/data-model.pdf.

4:  ISO/IEC JTC 1/SC34 N323, 2002. Guide to the topic map standards. International Organization for Standardization. http://www1.y12.doe.gov/capabilities/sgml/sc34/document/0323.htm.

5:  Kim, J.M., H. Shin and H.J. Kim, 2007. Schema and constraints-based matching and merging of topic maps. Inform. Process. Manage., 43: 930-945.
Direct Link  |  

6:  Korthaus, A., M. Aleksy and S. Henke, 2009. A distributed knowledge management infrastructure based on a Topic Map grid. Int. J. High Performance Comput. Network., 6: 66-80.
CrossRef  |  

7:  Lu, H., B. Feng, Y. Zhao, Q. Zheng and J. Liu, 2008. A new model for distributed knowledge organization management. Proceedings of the 7th International Conference on Grid and Cooperative Computing, Oct. 24-26, Shenzhen, China, pp: 261-265
CrossRef  |  

8:  Maicher, L. and H.F. Witschel, 2004. Merging of distributed topic maps based on the subject identity measure (SIM) approach. Proceedings of the LIT 2004, Sept. 29-Oct. 1, Leipzig, Germany, pp: 1-11

9:  Pepper, S., 2001. The TAO of topic maps. http://www.gca.org/papers/xmleurope2000/.

10:  Pepper, S. and G. Moore, 2001. XML topic maps (XTM) 1.0. TopicMaps Org.

11:  Seng, J.L. and I.L. Kong, 2009. A schema and ontology-aided intelligent information integration. Expert Syst. Appl., 36: 10538-10550.
CrossRef  |  

12:  Silva, P.A., C.M.F.A. Ribeiro and U. Schiel, 2007. Formalizing ontology reconciliation techniques as a basis for meaningful mediation in service related tasks. Proceedings of the ACM 1st Ph.D. Workshop in Conference on Information and Knowledge Management, Nov. 9, Lisbon, Portugal, pp: 147-154

13:  Stumme, G. and A. Madche, 2001. FCA-merge: Bottom up merging of ontologies. Proceedings of the 17th International Joint Conference on Artificial Intelligence, Aug. 4-10, Seattle, Washington, USA., pp: 225-234

14:  Tomasic, A., L. Raschid and P. Valduriez, 1998. Scaling access to distributed heterogeneous data sources with DISCO. IEEE Trans. Knowledge Data Eng., 10: 808-823.
Direct Link  |  

15:  Wu, X., L. Zhou, L. Zhang and Q. Ding, 2006. TOM algorithm in distributed topic maps merging. Eng. J. Wuhan Univ., 39: 131-136.

16:  Zhang, C., X. Yang and S. Du, 2008. A distributed knowledge model for knowledge management system. Proceedings of the International Conference on Wireless Communications, Networking and Mobile Computing, Oct. 12-14, Dalian, China, pp: 1-4

17:  Lu, H. and B. Feng, 2009. An intelligent topic map-based approach to detecting and resolving conflicts for multi-resource knowledge fusion. Inform. Technol. J., 8: 1242-1248.
CrossRef  |  Direct Link  |  

©  2021 Science Alert. All Rights Reserved