Scorpion documentation
Data flow overview
The name of an input document is specified on the command line using Java's system properties. The contents of this document are queried against a Pears database. The Pears library code returns a list of records that matched the query. The Gwen routines then rank the results according to a particular ranking scheme. The Scorpion code passes the ranked record set to a handler class that is specified in its initialization file. The handler class returns a String object, the contents of which are written to another command line specified file.
Scorpion dependencies
- The Pears Open Source Package. This code builds and indexes a Pears database.
- The Gwen Search Engine, which retrieves and ranks records from a Pears database.
- The Dbutils package, which offers utilities to support database programing.
Getting the sample application working
- Make sure the following scripts' sh-bang lines point to bash on your system. Executing 'which bash' will tell you where it is. The defaults are shown in parentheses.
scorpion/setup.sh (#!/bin/bash) scorpion/PDB/LCC/test.sh (#!/bin/bash) scorpion/PDB/LCC/buildPDB.sh (#!/bin/bash) scorpion/PDB/LCC/correlatePDB.sh (#!/bin/bash) scorpion/PDB/LCC/makeScorpionPDB.sh (#!/bin/bash) - Run setup.sh. This script changes some pathnames in the configuration files to be correct for where you've installed Scorpion. It also creates a file with some common shell variables set. This file will be used by some of the other scripts.
- Either copy pears.jar, gwen.jar and Dbutils.jar to the scorpion/lib directory, or create links to those jars.
-
cd
into the scorpion/PDB/LCC directory. Run './makeScorpionPDB.sh lccSample'. This is a fairly CPU intensive program, so you may want to shut down other large applications first. - Run test.sh to test the demo database. It will classify the file scorpion/demo/scorpion.input. An HTML fragment with the results of the classification will be placed in scorpion/demo/scorpion.output.html
Making Scorpion work with your database
- Design and create your database as an SGML file
- Transform the SGML file into a Pears database
- Create a Pears initialization file and a Gwen properties file.
- Write a Java class implementing the ORG.oclc.scorpion.RecordSetHandler interface
- Create a Scorpion initialization file
- If you want a web interface, modify the Perl script to meet your needs
Scorpion
Database
Record handler