Harvester

ANDS-Harvester

The Harvester Service provides a service-oriented framework to support the processing and routing of content and metadata from a source Data Provider to a target application. It potentially makes management of distributed harvests simpler by providing a single harvest application which can service many clients wishing to perform harvesting without the overhead of writing their own embedded harvester.

Harvester Service Installation Guide

Download the latest ANDS-Harvester and extract to your server in <$HARVESTER_SOURCE> directory

wget https://github.com/au-research/ANDS-Harvester/archive/master.zip

Pre-requisites
The following software must be available in order to install and run the Harvester Service:

  • Sun’s Java 1.5 SDK
  • MySQL
  • Apache Ant
  • Apache Tomcat 5.x or later

Harvester Service Install process

Having installed a MySQL server, setup new databases and initialise the tables:

Create a new database called ‘dbs_harvester’:

mysql > CREATE DATABASE `dbs_harvester`

Create a database user (else use an existing user and ignore this step):

mysql > CREATE USER '<db_username>' IDENTIFIED BY '<db_password>'

Create the tables and indices:

mysql >  -u root -p dbs_harvester < $HARVESTER_SOURCE/etc/db/mysql/database.sql

Create a directory $HARVESTER_DIR where Harvester log files will be created. This should be a location that is not web accessible and must have read/write access by the Tomcat user

Ensure servlet-api.jar is in the CLASSPATH. This should be found somewhere in the common library directories of the Tomcat install

Build the distribution:

cd $HARVESTER_SOURCE
ant -Dinstall_dir=<$HARVESTER_DIR> install

Configure the default Tomcat datasource (connection pooling) by adding the following entry to the server.xml file’s Host element:

<Context path="/harvester" docBase="harvester" crossContext="true" reloadable="true" debug="1">
<Resource name="jdbc/mysql" validationQuery="SELECT 1" testOnBorrow="true" auth="Container" type="javax.sql.DataSource" driverClassName="com.mysql.jdbc.Driver" url="jdbc:mysql://db_host:3306/db_name username=“db_username" password="db_password"/>
<Valve className="org.apache.catalina.valves.RemoteAddrValve" allow=“comma separated IP list” />
</Context>

Ensure the IP and port number in the url attribute reflect the server setup and ensure the harvester database user is correctly configured.

Download (http://jdbc.postgresql.org/download.html) and copy the appropriate mysql jar file to the Tomcat lib directory  e.g.: {/usr/share/java/tomcat6/mysql-connector-java-5.1.22-bin.jar}

Deploy harvester.war to the Tomcat webapps directory.

Testing the Harvester

NB: The harvester is a web application which could be embedded or integrated with other applications and as such does not provide a security framework or user interface.

Once installed, to ensure the harvester app is running try accessing http://localhost:8080/harvester/getHarvestStatus

<?php
$harvestid = $_POST['harvestid'];
$content = $_POST['content'];
$done = $_POST['done'];
$nextrun = $_POST['date'];
$str = "content=".$content."\nharvest=".$harvestid."\ndate=".$nextrun."\ndone=".$done."\n";
$fh = fopen("/usr/local/harvest/test.txt", "a");
fwrite($fh, $str);
fclose($fh);
?>

Then create a harvest request in a browser such as:

http://my-tomcat.edu/harvester/requestHarvest?responsetargeturl=http://my-apache.edu/test.php&harvestid=test&sourceurl=http://any-data-provider.edu/oai-pmh&method=PMH

Refer to the javadocs (run ant javadoc to generate these) for more information on the services available.

Javadocs
The javadocs can be built by running ant javadoc

Comments 1


  1. Keir Vaughan-Taylor

    The step

    mysql > -u root -p dbs_harvester < $HARVESTER_SOURCE/etc/db/mysql/database.sql

    There is no database.sql in the download or master.zip and the syntax doesen't seem correct for a command to mysql prompt.

Leave a Reply

Your email address will not be published. Required fields are marked *

three × 3 =