Data Science & Metadata Research
To be discoverable by today’s online users, traditional library data must be transformed. OCLC Research analyzes bibliographic data to derive new meaning, insights, and services for use by library and information seekers. This work includes special projects in metadata enrichment, authorities & identities, linked data, subjects & classification, and data analysis.
Highlighted Data Science & Metadata Research projects, publications, and presentations
Application |
ArchiveGrid
ArchiveGrid is a collection of millions of archival material descriptions, including MARC records from WorldCat and finding aids harvested from the web.
ArchiveGrid provides access to detailed archival collection descriptions such as documents, personal papers, family histories, and other archival materials held by thousands of libraries, museums, historical societies, and archives. It also provides contact information for the institutions where these collections are kept.
Application |
FAST (Faceted Application of Subject Terminology)
FAST is a vocabulary of controlled terms that can be used to describe the subject content of any kind of intellectual or creative work. The terms used by FAST are derived from the Library of Congress Subject Headings system.
FAST has several exploratory interfaces:
- searchFAST—A full feature search interface to the FAST database.
- FAST Converter—A web interface for the conversion of LCSH headings to FAST headings.
- assignFAST—A Web service that automates the manual selection of FAST Subjects based on autosuggest technology.
- FAST Linked Data—FAST as a Linked Data service to interact with the Semantic Web.
- importFAST [beta]—importFAST allows you to import Library of Congress Personal or Corporate names into the FAST Authorities, with the immediate assignment of a FAST number. Topical heading and subdivision combinations can also be assigned.
Project |
Stewarding the Collective Collection: Exploratory Data Analysis
A collaborative project between OCLC and The Partnership for Shared Book Collections has examined the nature of retention commitments currently registered in WorldCat. This builds on the previous work of OCLC and the Center for Research Libraries, funded by the Andrew W. Mellon Foundation, to support shared print.
Publication |
Responsible Operations: Data Science, Machine Learning, and AI in Libraries
by Thomas Padilla
Responsible Operations is intended to help chart library community engagement with data science, machine learning, and artificial intelligence (AI) and was developed in partnership with an advisory group and a landscape group comprised of more than 70 librarians and professionals from universities, libraries, museums, archives, and other organizations.
Hanging Together: the OCLC Research blog
For information and insights on the topics and challenges faced by the library, archive, and museum communities check out the OCLC Research blog, Hanging Together.