Menu
Search

Data Science & Metadata Research

To be discoverable by today’s online users, traditional library data must be transformed. OCLC Research analyzes bibliographic data to derive new meaning, insights, and services for use by library and information seekers. This work includes special projects in metadata enrichment, authorities & identities, linked data, subjects & classification, and data analysis.

Completed Projects

The listed research projects have concluded. The project pages are available for historical purposes only. Some portion of this content may be out-of-date and include broken links.

Ariadne's Thread: Interactive Context Explorer

The Ariadne's Thread: Interactive Context Explorer was designed to visualize the networks of entities associated with bibliographic records.

Article Exchange

A document-sharing site that provides a single, secure location where lending libraries can place requested PDF and TIFF articles and library users can retrieve articles or book chapters obtained for them via interlibrary loan. Article Exchange adds convenience, security and enhanced copyright compliance to online article sharing through interlibrary loan.

Bookmarklets

Got a book title from an online bookseller? Check whether your local library has it.

CatVis: Visual Analytics for the World's Library Data

The purpose of the project was to develop a cutting-edge visual analytics toolkit, to answer both the pressing needs of humanities researchers and concrete demands of the library industry.

Classify

Classify was a FRBR-based prototype designed to support the assignment of classification numbers and subject headings for books, DVDs, CDs, and other types of materials.

Cookbook Finder

Cookbook Finder was an experimental, works-based application that provided access to thousands of cookbooks and other works about food and nutrition described in library records.

Europeana Innovation Pilots

OCLC Research and Europeana conducted innovation pilots from May 2012 through December 2013. This collaborative initiative sought to pilot the use of existing and newly developed OCLC methods and techniques for cleansing and enriching large aggregations of metadata.

FictionFinder: A FRBR-based Prototype for Fiction in WorldCat

FictionFinder was a FRBR-based prototype that provided access to over 2.9 million bibliographic records for fiction books, eBooks, and audio materials described in OCLC WorldCat.

Getting Found: SEO for Digital Repositories

This activity was part of the IMLS-funded project "Getting Found: Search Engine Optimization for Digital Repositories" which looked to develop strategies for improving the visibility of library digital repositories in Internet search engines.

Kindred Works

OCLC Research developed an experimental service that provided a list of items similar to an item of interest. The prototype service uses various characteristics of a sample work, such as classification numbers, subject headings, and genre terms, to retrieve related resources from WorldCat. This approach was called content-based recommendation.

MARC Usage in WorldCat

This project will studied utilization rates of MARC tags and subfields in WorldCat and produced tools and reports to that end. This provided an evidence base for testing assertions about the value of capturing various attributes by demonstrating whether the cataloging community has made the effort to populate specific tags, not just to define them in anticipation of use.

Measuring Up: Assessing Accuracy of Reported Use and Impact of Digital Repositories

This work was part of the IMLS-funded grant "Measuring Up: Assessing Accuracy of Reported Use and Impact of Digital Repositories" which aimed to better improve data collection and information sharing for institutional repositories and digitized collections.

Missing Materials Beta Procedure

In order to centralize information about stolen and missing rare books and special collections, this working group developed a procedure to “tag” records in WorldCat.org. The tagged records are then automatically fed to a blog, missingmaterials.org. Simultaneously, holdings are set in WorldCat, in order to alert prospective buyers and sellers.

Multilingual Bibliographic Structure

The Multilingual Bibliographic Structure activity was designed to leverage the multilingual content of WorldCat, so that bibliographic information can be presented in the preferred language and script of the user.

NACO Normalization Service

This service was used to prepare text strings for machine comparison and sorting, according to the NACO normalization rules.

Name Extraction

The problem of automatically recognizing, extracting, and disambiguating named entities (e.g., the names of people, places, and organizations) from digitized text has received considerable attention in research produced by the library, computer science, and linguistics communities in the past five years. Name identification and extraction tools, particularly when integrated with an authority file, can enhance reliable subject access for a document collection, improving on its discoverability by end users.

OAICat

The OAICat Open Source Software (OSS) project was a Java Servlet web application providing a repository framework that conformed to the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) v2.0.

OCLC ResearchWorks IIIF Explorer

With the IIIF Explorer application, OCLC ResearchWorks has created an index of all of the images in the CONTENTdm digital content management systems hosted by OCLC.

PURL

PURLs (Persistent Uniform Resource Locators) are Web addresses that act as permanent identifiers in the face of a dynamic and changing Web infrastructure. Instead of resolving directly to Web resources, PURLs provide a level of indirection that allows the underlying Web addresses of resources to change over time without negatively affecting systems that depend on them. This capability provides continuity of references to network resources that may migrate from machine to machine for business, social or technical reasons.

SRW/U

The SRW (Search & Retrieve Web Service) initiative was part of an international collaborative effort to develop a standard web-based text-searching interface.

Scholars’ Contributions to VIAF

This activity explored the potential benefits of collaborating with scholars to enrich the Virtual International Authority File (VIAF) with new names and additional script forms for names already represented.

Sharing and Aggregating Social Metadata

One of the activities related to metadata management and support, this activity focused on the user contributions that would enrich the descriptive metadata created by libraries, archives, and museums.

The NDLTD Union Catalog

The NDLTD Union Catalog project focused on theses metadata via the Protocol for Metadata Harvesting (OAI-PMH). It was a lightweight protocol for moving or sharing metadata that allowed synchronization of loosely coupled databases and mandates XML Dublin Core as the default metadata format.

VIAF (The Virtual International Authority File)

VIAF explores virtually combining the name authority files of participating institutions into a single name authority service.

Work Records in WorldCat

This project applies principles of the FRBR model to aggregate bibliographic information above the manifestation level. Records are clustered into works using the OCLC FRBR Work-Set Algorithm.

WorldCat Genres

WorldCat Genre Profiles allowed users to browse genre terms for hundreds of titles, authors, subjects, characters, places, and more, ranked by popularity in WorldCat.

WorldCat Identities

The OCLC Research Identities work provided valuable insight into how to mine bibliographic data for insight into the People and Organizations that create and serve as subjects for library materials

info URI Registry

The info URI scheme was developed within the library and publishing communities (specifically, in conjunction with the development of the NISO OpenURL standard; more below) because of the need for URIs as pure identifiers, that is, to identify (not retrieve, dereference, locate, name, or any of those other things that URIs do). The most pressing need was to find a way to use URIs to reference information assets that have identifiers in public namespaces but had no representation within the URI allocation – for example, LCCNs.

OCLC Research Archive

OCLC Research continually evolves what we investigate, research, and report on as the field's needs change. For historical project information, explore the OCLC Research Archive, which holds a wealth of information about the work of OCLC produced over the decades..

Access the OCLC Research Archive >