Getting Found: SEO for Digital Repositories
This activity was part of the IMLS-funded project "Getting Found: Search Engine Optimization for Digital Repositories" which looked to develop strategies for improving the visibility of library digital repositories in Internet search engines.
The goals of this activity are to:
- Develop an RDF model based on Schema.org for the people, places, organizations, and objects associated with an institutional repository and its contents. Schema.org is the vocabulary used by the world's foremost search engines for indexing web documents and creating structured data from them.
- Apply the model to the contents of two or three existing institutional repositories.
- Develop a prototype that demonstrates how data conforming to the new model can be browsed and searched.
- Collaborate with Semantic Web experts at Wright State University to promote this work to a formal logical model, which would enable automated reasoners to make semantic inferences about the data.
- Explore methods derived from text and data mining algorithms for increasing the population of objects in the repository. Evaluate the results.
- Work with the rest of the Getting Found team to develop best-practices recommendations for institutional repository managers.
Outputs
A prototype and several presentations are planned as outputs.
Presentations
Jeff Mixter, Patrick OBrien (Montana State University), Kenning Arlitsch (Montana State University)
Describing Theses and Dissertations Using Schema.org
DCMI International Conference on Dublin Core and Metadata Applications, 8-11 October 2014, Austin, Texas (USA)
Download the presentation (.pptx: 2.3MB/15 slides)
View on SlideShare
Background
Libraries are making heavy investments in digital repositories to preserve the scholarly record of their host institutions and demonstrate their relevance in the age of electronic publishing. Unfortunately, institutional repositories are not easily discoverable on the Internet and cannot be found unless the user conducts a known-item search using the name of the repository. Once found, it is difficult to browse the contents.
Impact
This work will:
- improve institutional repositories' Web presence,
- allow institutional repositories maintained at different universities to more easily link with one another, and
- provide universities with a powerful new tool for showcasing the intellectual capital of their institutions.
Details
- Two sets of best practices guidelines are being developed by the Getting Found project team: one that offers advice about metadata schemas and vocabularies for data published online, and one about page design—i.e., where the metadata needs to be located within the html in order for Google to see and index the page.
- To better understand who is accessing and using the materials in the institutional repository, a custom Google Analytics dashboard was developed by the project principals. It can be deployed by institutional repository managers and used to gather valuable usage data.
- To project the institutional data into the Semantic Web, the OCLC team is leading the development of an ontology that can be used to describe materials typically found in an institutional repository. The ontology has been successfully applied to Montana State University’s institutional repository and a second data sample is currently being evaluated and converted.
- The OCLC team is leading the development of a method for mining faculty CVs for citations that can be included in the repository. This information can be used to help improve the coverage/scope of the institutional repository by tracking the highest areas of interest. More importantly, the information can be used to collect and record data that can help present the value added for the institutional repository. Sample data from both the University of Utah as well as Montana State University were tested in this initial phase of the project.
Lead
Jean Godby
Jeff Mixter