SNAC Saxon Extensions

Saxon HE Extensions for use with SNAC

Saxon HE Extensions for use with the SNAC project. Includes Java parsers for human written dates and geographical names. Serves as a sample for creating other extensions to Saxon without the built-in Saxon libraries.

Documentation

JavaDoc documentation is available here.

Installation

Java Dependencies

The following JAR files are required to build and run the date and GeoNames parser extensions. Using the links below, download and unzip (as necessary) the JAR files into a common place:

Installing

To install the Saxon extensions, you need only download and compile our source code. We provide an ant script to aid in compilation. This may be completed with the following steps:

  1. Download the SNAC Saxon Extensions (link at the top of the page) and unzip
  2. Download and unzip the Saxon 9.5+ HE JAR files
  3. Download and unzip the Apache Commons JAR files
  4. Download and install the latest Java JDK. On an Ubuntu Linux system, this may be achieved using the following command: sudo apt-get install default-jdk ant.
  5. Compile extensions into executable JAR file. The easiest way to install is using our Apache Ant script. Copy the Saxon and Commons JAR files to the /java/lib folder in SNAC-Saxon-Extensions, then run ant as follows:

    > cd SNAC-Saxon-Extensions/java
    > cp /path/to/saxon9he.jar lib/saxon9he.jar
    > cp /path/to/commons-lang3-3.x.jar lib/commons-lang3.jar
    > ant
    
  6. Once the JAR file is built, execute using the new JAR file instead of the original Saxon saxon9he.jar file. For example: java -jar snacTransform.jar /path/to/XMLfile /path/to/XSLTfile

Using the new Saxon extensions

Date Parser

See the xslt/date.xsl file for a sample. The xslt wrappers for the Java methods are available in xslt/lib/.

GeoNames Parser

See xslt/place.xsl for a sample on how to use the parsing library. The xslt wrappers for the Java methods are available in xslt/lib/.

Installing Cheshire

Cheshire is more complicated to install and set up as a service on your machine. The instructions below are provided for Linux users.

  1. Download and compile the Cheshire binaries. They can be found at the Cheshire homepage. You will need a C compiler.
  2. Download the GeoNames data from their website. This requires a subscription fee to get access to the Premium Data. Unzip the xml files into a directory, such as GeoNames/data.
  3. Build indexes for the GeoNames data.
    1. First, set up a config file. A sample can be found here. Specifically, note the first few configs:
      • <DBENV> /full/root/path/to/GeoNames/dbenv </DBENV> defines the location to the database environment. This directory, dbenv must be world readable AND writable.
      • <defaultpath> /full/root/path/to/GeoNames </defaultpath> defines the full path to the base directory of cheshire. This should include the dbenv directory, config, data, etc.
      • <FILENAME> data </FILENAME> defines the directory (or file) where all the geonames data lives. In this example, the data would be stored in /full/root/path/to/GeoNames/data.
      • <ASSOCFIL> assoc/geonames_all </ASSOCFIL> defines the associator file. This is built early on before indexing. Note that here it is inside the assoc directory.
    2. Build the associator file.
      • > buildassoc -r data assoc/geonames_all row, since we want to build the associator for the data directory and use GeoName's entry divider, <row>.
    3. Build the indexes for cheshire.
      • > index_cheshire -b -L logfile.txt geonames_config.txt, which will take a long time. If you use the sample file provided, it will produce all the indexes needed for the Saxon extension.
  4. Set up the Cheshire server, jserver.
    1. Set up the Jserver config file, which tells the port number, cheshire index config file, location to data, etc. A sample is provided here. The only config options that need to be set from the default are:
      • DATABASE_NAMES "data" the location of the data files.
      • DATABASE_DIRECTORIES "/full/path/to/basedir/GeoNames" the full path to where the data files (and other config information) is stored. This should be the same as above.
      • DATABASE_CONFIGFILES "/full/path/to/config/geonames_config.txt" config file from above.
    2. Install xinetd from your repository. For Ubuntu, you can use the command sudo apt-get install xinetd.
    3. Add Jserver to the xinetd config by saving the xinetd config file here to the /etc/xinetd.d/ directory and removing the .txt extension. Change the following lines in the config file to match your configuration:
      • server = /usr/bin/jserver full path of the jserver binary
      • server_args = -c /full/path/to/jserver/config/server_config.txt full path of the jserver config file (note the -c command option)
    4. Restart xinetd. On Ubuntu, this command is service xinetd restart.
    5. Test your system by telneting to port 7010. You should be prompted by jserver.