HOME JOURNALS CONTACT

Information Technology Journal

Year: 2007 | Volume: 6 | Issue: 1 | Page No.: 37-47
DOI: 10.3923/itj.2007.37.47
FROut: A Novel Approach to Linking Large Databases
Mingzhen . Wei, Andrew . H. Sung and Martha . E. Cather

Abstract: The present study, deals with record linkage problems that occur when multiple data sources are different in size or are different in data formats or conventions, which is often seen in real practice. A systematic solution, FROut which stands for Filtering Relevant Out, is proposed. In the present study, a high quality filtering or searching strategy is the key to the success of record linkage practices. The importance of domain knowledge and data quality is emphasized for selecting the most reliable and important identifying attribute domains in the filtering strategy design. By generating different dynamic filtering criteria for the records processed, the new searching algorithm generates different sizes of relevant record sets, which ensures that all selected records are somehow relevant to the targeting records. By designing proper filtering criteria to consider only reliable data in identifying attribute domains, this approach saves a large number of wasteful comparisons in later stage of record linkage, hence improves the record linkage efficiency significantly. A linear relationship between computational cost and size of incoming data sets is observed, which is important for estimating the workload on different sizes of incoming data sets. The proposed approach was tested using real petroleum production data sets. Empirical results show that the new approach not only can resolve the problems that SNM-based methods cannot, but it also provides great computational efficiency and high linkage accuracy.

Fulltext PDF Fulltext HTML

How to cite this article
Mingzhen . Wei, Andrew . H. Sung and Martha . E. Cather, 2007. FROut: A Novel Approach to Linking Large Databases. Information Technology Journal, 6: 37-47.

Related Articles:
© Science Alert. All Rights Reserved