Saxon HE Extensions for use with SNAC
Saxon HE Extensions for use with the SNAC project. Includes Java parsers for human written dates and geographical names. Serves as a sample for creating other extensions to Saxon without the built-in Saxon libraries.
JavaDoc documentation is available here.
The following JAR files are required to build and run the date and GeoNames parser extensions. Using the links below, download and unzip (as necessary) the JAR files into a common place:
To install the Saxon extensions, you need only download and compile our source code. We provide an ant script to aid in compilation. This may be completed with the following steps:
sudo apt-get install default-jdk ant
.Compile extensions into executable JAR file. The easiest way to install is using our Apache Ant script. Copy the Saxon and Commons JAR files to the /java/lib
folder in SNAC-Saxon-Extensions
, then run ant
as follows:
> cd SNAC-Saxon-Extensions/java
> cp /path/to/saxon9he.jar lib/saxon9he.jar
> cp /path/to/commons-lang3-3.x.jar lib/commons-lang3.jar
> ant
java -jar snacTransform.jar /path/to/XMLfile /path/to/XSLTfile
See the xslt/date.xsl file for a sample. The xslt wrappers for the Java methods are available in xslt/lib/.
See xslt/place.xsl for a sample on how to use the parsing library. The xslt wrappers for the Java methods are available in xslt/lib/.
Cheshire is more complicated to install and set up as a service on your machine. The instructions below are provided for Linux users.
GeoNames/data
.<DBENV> /full/root/path/to/GeoNames/dbenv </DBENV>
defines the location to the database environment. This directory, dbenv
must be world readable AND writable.<defaultpath> /full/root/path/to/GeoNames </defaultpath>
defines the full path to the base directory of cheshire. This should include the dbenv
directory, config, data, etc.<FILENAME> data </FILENAME>
defines the directory (or file) where all the geonames data lives. In this example, the data would be stored in /full/root/path/to/GeoNames/data
.<ASSOCFIL> assoc/geonames_all </ASSOCFIL>
defines the associator file. This is built early on before indexing. Note that here it is inside the assoc
directory.> buildassoc -r data assoc/geonames_all row
, since we want to build the associator for the data
directory and use GeoName's entry divider, <row>
.> index_cheshire -b -L logfile.txt geonames_config.txt
, which will take a long time. If you use the sample file provided, it will produce all the indexes needed for the Saxon extension. jserver
.
DATABASE_NAMES "data"
the location of the data files.DATABASE_DIRECTORIES "/full/path/to/basedir/GeoNames"
the full path to where the data files (and other config information) is stored. This should be the same as above.DATABASE_CONFIGFILES "/full/path/to/config/geonames_config.txt"
config file from above.sudo apt-get install xinetd
./etc/xinetd.d/
directory and removing the .txt
extension. Change the following lines in the config file to match your configuration:
server = /usr/bin/jserver
full path of the jserver binaryserver_args = -c /full/path/to/jserver/config/server_config.txt
full path of the jserver config file (note the -c
command option)service xinetd restart
.