Using OpenRefine and the WorldCat Metadata API to get current OCLC numbers
One issue that a lot of catalogers and metadata librarians struggle with is getting the current OCLC number for a given record into their local system. This is because records and OCLC numbers are merged over time. When a library pulls a record, they get the record and OCLC number at that moment in time. However, most libraries want to keep their OCLC numbers current. So, what’s the best way to do this?
One could write some Python, Ruby, Perl, or whatever programming language to look for merged OCLC numbers, but that requires coding skills. Non-coders can use OpenRefine in combination with the WorldCat Metadata API to accomplish this task.
Using OCLC APIs that are protected with Access Tokens
Most OCLC APIs are protected with a stronger form of authentication and authorization called Access Tokens. You might think that this means you need to know how to write Jython in order to interact with them via OpenRefine. However, that isn’t true. A valid Access Token can be inserted into a “Fetch by URL” operation.
How do you obtain a valid Access Token if you don’t know how to write code? The easiest way is by using OCLC’s API Explorer.
- Go to the DevNet page for OAuth, ClientCredentialGrant GetToken.
- Choose “Use my credentials.”
- Enter your WSKey, secret, principal ID, and principal ID namespace (IDNS).
(You can find out more about how to get a principal ID and principal IDNS by reading the User Level Authentication and Authorization page.) - For the prompted URL, fill in the following:
- authenticatingInstitutionID: your library’s registry ID
- contextInstitutionID: your library’s registry ID
- scopes: WorldCatMetadataAPI (the ID(s) for the service you want to access)
- Click the “Send the Request” button.
- Look at the JSON response, and find the value labeled “access_token.” This will look something like: tk_ruQqgWU2PZsBdkMTq2Po73PCvYxWybJgzlIR.
Now you’ve got a valid Access Token that is good for 20 minutes. You can use this token in OpenRefine by putting it in the “Authorization” field of your “Fetch by URL” operation.
Note the field labeled “Accept” with the value */* in it. If the API you are using supports JSON, change this to “application/json” to get JSON back.
Making the request to the WorldCat Metadata API
Now that you have an Access Token, you need to make a request to the WorldCat Metadata API in order to get the current OCLC numbers.
“https://worldcat.org/bib/checkcontrolnumbers?oclcNumbers=” + value
This is done by adding a column to Fetch by URL.
This will return a response that looks like the following:
{ "entry": [ { "requestedOclcNumber": "2416076", "currentOclcNumber": "24991049", "httpStatusCode": "HTTP 200 OK", "id": "http://worldcat.org/oclc/24991049", "institution": "OCPSB" } ], "startIndex": 1, "totalResults": 1, "itemsPerPage": 1 }
Now we want to add a new column named "Current OCLC Number". The contents of this column will be based on parsing the data in the fetched response. The following snippet parses the data.
value.parseJson().entry[0].currentOclcNumber
Checking if holdings are set
This same technique can be used to make an API call to see if the library has holdings set on a particular record. To do this, “Add a column by fetching URLs” and use the following snippet to create the requests:
“https://worldcat.org/ih/checkholdings?oclcNumber=”+ value
Add a new column, named "Holding set?", based on parsing the data in the fetched response.
{ "requestedOclcNumber": "2416076", "currentOclcNumber": "24991049", "isHoldingSet": false, "id": "http://worldcat.org/oclc/24991049", "institution": "OCPSB" }
The following snippet parses the data.
value.parseJson().entry[0].isHoldingSet
This single API request actually returns both
- the current OCLC number in the field currentOclcNumber, and
- whether or not the library has holdings set in the field isHoldingSet.
Based on the data we can create a spreadsheet of current OCLC Numbers and whether or not holdings are set.
Next steps
These examples demonstrate how to make authenticated API call using an Access Token. However, they don’t help with API calls that require a different HTTP method, such as POST, PUT, or DELETE. We’ll cover how to use custom Jython code to make those types of requests within OpenRefine in the next post.
-
Karen Coombs
Senior Product Analyst