Abstract: In real time applications, identification of records that represent the same real-world entity is a major challenge to be solved. Such records are termed to be duplicate records. This study presented a thorough analysis of the literature on duplicate record detection. The duplicate record detection is an important step for data integration. An overview of data deduplication issue is discussed in detail. This paper covered almost all the metrics that are commonly used to detect similar entries and a set of duplicate detection algorithms.