Menu
Search

Please note: This experimental research project has concluded.
The research prototype application is no longer supported or maintained by OCLC services, and information on this page is provided for historical purposes only. Some portion of this content may be out-of-date and include broken links. Please visit the OCLC Research website to learn more about our current research.

Scorpion documentation

Data flow overview

The name of an input document is specified on the command line using Java's system properties. The contents of this document are queried against a Pears database. The Pears library code returns a list of records that matched the query. The Gwen routines then rank the results according to a particular ranking scheme. The Scorpion code passes the ranked record set to a handler class that is specified in its initialization file. The handler class returns a String object, the contents of which are written to another command line specified file.

Scorpion dependencies

  • The Pears Open Source Package. This code builds and indexes a Pears database.
  • The Gwen Search Engine, which retrieves and ranks records from a Pears database.
  • The Dbutils package, which offers utilities to support database programing.

Getting the sample application working

  1. Make sure the following scripts' sh-bang lines point to bash on your system. Executing 'which bash' will tell you where it is. The defaults are shown in parentheses.
    scorpion/setup.sh (#!/bin/bash)
    scorpion/PDB/LCC/test.sh (#!/bin/bash)
    scorpion/PDB/LCC/buildPDB.sh (#!/bin/bash)
    scorpion/PDB/LCC/correlatePDB.sh (#!/bin/bash)
    scorpion/PDB/LCC/makeScorpionPDB.sh (#!/bin/bash)
  2. Run setup.sh. This script changes some pathnames in the configuration files to be correct for where you've installed Scorpion. It also creates a file with some common shell variables set. This file will be used by some of the other scripts.
  3. Either copy pears.jar, gwen.jar and Dbutils.jar to the scorpion/lib directory, or create links to those jars.
  4. cd into the scorpion/PDB/LCC directory. Run './makeScorpionPDB.sh lccSample'. This is a fairly CPU intensive program, so you may want to shut down other large applications first.
  5. Run test.sh to test the demo database. It will classify the file scorpion/demo/scorpion.input. An HTML fragment with the results of the classification will be placed in scorpion/demo/scorpion.output.html

Making Scorpion work with your database

  1. Design and create your database as an SGML file
  2. Transform the SGML file into a Pears database
  3. Create a Pears initialization file and a Gwen properties file.
  4. Write a Java class implementing the ORG.oclc.scorpion.RecordSetHandler interface
  5. Create a Scorpion initialization file
  6. If you want a web interface, modify the Perl script to meet your needs

Scorpion

Database

Record handler