FRBR Work-Set Algorithm
OCLC Research is pleased to provide you with an algorithm for converting MARC21 bibliographic databases to the "FRBR" model.
Background, Context and Goals
FRBR (Functional Requirements for Bibliographic Records) is a 1998 recommendation of the International Federation of Library Associations and Institutions (IFLA) to restructure catalog databases to reflect the conceptual structure of information resources. This project is one of several FRBR-related OCLC Research projects, which examine issues associated with converting a set of bibliographic records to conform to FRBR requirements (a process referred to as "FRBRization").
The OCLC Research FRBR project page provides additional background information about FRBR and an overview of OCLC's research efforts in this area.
The FRBR model brings together bibliographic records that are intellectually related as "works". Having resources brought together under the "works" umbrella enables users to sift through the myriad resources available digitally. It will help them acquire the work, or content, that they are looking for, irrespective of the specific "container" or item the content is carried in. An example of a FRBRized database is available.
In large databases, such as WorldCat, collocation is indispensable for discovery and navigation. OCLC plans to "FRBRize" WorldCat as it implements WorldCat's new database technology.
Chief scientist Thom Hickey led the development of computer algorithms to explore automating FRBR conversions. Hickey and his associates also clustered the entire 48 million record WorldCat database at the 'work' level and created a number of subsets, including records representing works of fiction.
OCLC's research activities are an important benefit of OCLC membership and one of the many ways that member fees are used to benefit the larger community. There is no fee for downloading the algorithm.
Research overview
Researchers made a copy of WorldCat that included holdings data and NACO authorities, and created a personal author file.
Records were processed in MARC Communications format after being converted to Unicode. Much of the early research investigated how best to divide a particular 'work' into its component 'expressions'. Unfortunately, this and other FRBR research has shown that the information in existing bibliographic records is, in general, insufficient to reliably divide a work into expressions, so this line of investigation has been abandoned for now.
Our research then focused on the seemingly simpler problem of collecting bibliographic records into groups corresponding to different works (such as Shakespeare's Hamlet). An algorithm was developed, based primarily on author and titles found in bibliographic records, to find works in the WorldCat database with a high degree of reliability. One major finding is that looking authors and author/titles up in the authority file has a significant positive impact on the matching of works.
Since the NACO authority normalization rules were used to simplify names and titles before matching, researchers investigated existing implementations of the rules. Discrepancies found between implementations led to the establishment of a public NACO normalization test-bed to make it possible for others to compare and verify their normalization routines to that developed in this project.
Some of the more difficult records to group properly into works are those without authors or uniform titles. Many of these records will match on title, but really represent different groups. Work is continuing on exploring and exploiting information in bibliographic records to help establish reliable matches without bringing together unrelated records.
Publication
- Hickey, Thomas B., Edward T. O'Neill, and Jenny Toves. 2002. " Experiments with the IFLA Functional Requirements for Bibliographic Records (FRBR)." D- Lib Magazine 8, 9 (September).
License and Distribution
The OCLC Research FRBR Work-set Algorithm, version 2.0 (August 2009), is available as a PDF file (.pdf: 77.8K/9 pp.). This version of the algorithm is available for use in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 Unported license: http://creativecommons.org/licenses/by-nc-nd/3.0/. (See algorithm document for full license statement.)
We encourage you to download the OCLC Research FRBR Work-set Algorithm, version 2.0 (August 2009) for use in converting your own bibliographic databases.
Earlier version
The original version of the algorithm (April 2005) is still available (.pdf: 124K/7 pp.). It may be used without charge in accord with the terms of the OCLC Research Public License, an Open Software Initiative-approved license. A PDF version of the license also is available (PDF: 130K/3 pp.).
We appreciate comments and suggestions for improving the algorithm, and learning about your experience implementing FRBR. Please email frbr@oclc.org with your suggestions.
Research team
- Thom Hickey
- Jenny Toves