Worldwide

  • English

Duplicate Detection and Resolution

Duplicate Detection and Resolution (DDR) software is now in full operation. A run of the full WorldCat database (beginning with OCLC #1) began 2 February 2010 and completed on 30 September 2010. A total of 166,422,941 records were processed and 5,126,132 duplicate records were eliminated.

In addition, a separate process that examines selected new records and replaced records from each day's journal files began running 26 January 2010. This processing will continue.

History

Beginning in 1991, OCLC used its Duplicate Detection and Resolution (DDR) software to match WorldCat bibliographic records in the books format against themselves to find and merge duplicates.

By mid-2005 when WorldCat migrated to its new platform, sixteen runs through WorldCat had been completed, resulting in the elimination of a total of 1.6 million duplicate records.

In 2005, a project was started to re-invent the DDR software to work in the new environment and to expand its capabilities to deal with all types of bibliographic records. This large multi-year project is now bearing fruit. Great improvements to our matching software, which are a key component of the new DDR, have regularly been incorporated into the batchloading process. This helps bring both DDR and batchloading processes into alignment as never before in dealing with the problem of duplicate records in WorldCat.

In May 2009, the new software was put into production following rigorous planning, development, and testing. In addition to its ability to deal with continuing resources, scores, sound recordings, visual materials, maps, and electronic resources, as well as books, this new DDR is much more sophisticated than its predecessor in its power to distinguish legitimate matches from incorrect ones. It also has the flexibility to allow selection of certain categories of bibliographic records to target for deduplication. Processing of small subsets of WorldCat against the live database has begun. A full pass through the WorldCat database began in February 2010 and ended in September 2010.

Having the new DDR software in production is resulting in the merging of a larger number of bibliographic records. Libraries will notice fewer duplicate records in WorldCat. This should be particularly visible for printed music, sound recordings and AV materials since the previous DDR software did not address these duplicates. Regular removal of duplicates provides a better WorldCat for all its users.

DDR statistics

Between May 2009 and 30 June 2013:

  • 342,080,141 records have been processed through DDR
  • 11,294,384 duplicate records have been removed

Wondering about a merge?

Every effort has been made to prevent inappropriate merges. Since DDR is an automated process, there may be an occasional inappropriate merge. If you notice a record that appears to be an inappropriate merge, please report it to bibchange@oclc.org. OCLC staff will examine the records in question and, if possible, reverse the merge if it is inappropriate.

We are a worldwide library cooperative, owned, governed and sustained by members since 1967. Our public purpose is a statement of commitment to each other—that we will work together to improve access to the information held in libraries around the globe, and find ways to reduce costs for libraries through collaboration. Learn more »