com.googlecode.fascinator.harvester.oaipmh
Class OaiPmhHarvester

java.lang.Object
  extended by com.googlecode.fascinator.common.harvester.impl.GenericHarvester
      extended by com.googlecode.fascinator.harvester.oaipmh.OaiPmhHarvester
All Implemented Interfaces:
Harvester, Plugin

public class OaiPmhHarvester
extends GenericHarvester

This plugin harvests metadata records from an OAI-PMH compatible repository using OAI-PMH protocol. If the repository returns a 503, the HTTP headers are checked for Retry-After value, in an effort not to hammer the server.

Configuration

Sample configuration file for OAI PMH harvester: usq.json

Option Description Required Default
url The base URL of the OAI-PMH repository to harvest Yes None
maxRequests Limit number of HTTP requests to make. Set this to -1 to configure the harvester to retrieve all records. No -1
metadataPrefix Set the type of metadata records to harvest, the first prefix in the list will be set as the source payload No oai_dc
setSpec Set the OAI-PMH set to harvest No None
from Harvest records from this date No None
until Harvest records up to this date No None

Examples

  1. Get the first page of records from USQ EPrints
     "harvester": {
         "type": "oai-pmh",
         "oai-pmh": {
             "url": "http://eprints.usq.edu.au/cgi/oai2",
             "maxRequests": 1
         }
     }
     
  2. Get a specific record from USQ EPrints
     "harvester": {
         "type": "oai-pmh",
         "oai-pmh": {
             "url": "http://eprints.usq.edu.au/cgi/oai2",
             "recordID": "oai:eprints.usq.edu.au:5"
         }
     }
     
  3. Get only records from January 2009 from USQ EPrints
     "harvester": {
         "type": "oai-pmh",
         "oai-pmh": {
             "url": "http://eprints.usq.edu.au/cgi/oai2",
             "from": "2009-01-01T00:00:00Z",
             "until": "2009-01-31T00:00:00Z"
         }
     }
     

Rule file

Sample rule file for the OAI PMH harvester: usq.py

Wiki Link

None

Author:
Oliver Lucido

Field Summary
static String DATE_FORMAT
          Date format
static String DATETIME_FORMAT
          Date and time format
static String DEFAULT_METADATA_PREFIX
          Default metadataPrefix (Dublin Core)
 
Constructor Summary
OaiPmhHarvester()
          Basic constructor.
 
Method Summary
 Set<String> getObjectIdList()
          Gets a list of digital object IDs.
 boolean hasMoreObjects()
          Tests whether there are more objects to retrieve.
 void init()
          Basic init() function.
 
Methods inherited from class com.googlecode.fascinator.common.harvester.impl.GenericHarvester
getDeletedObjectIdList, getId, getJsonConfig, getName, getObjectId, getPluginDetails, getStorage, hasMoreDeletedObjects, init, init, setStorage, shutdown
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DATE_FORMAT

public static final String DATE_FORMAT
Date format

See Also:
Constant Field Values

DATETIME_FORMAT

public static final String DATETIME_FORMAT
Date and time format

See Also:
Constant Field Values

DEFAULT_METADATA_PREFIX

public static final String DEFAULT_METADATA_PREFIX
Default metadataPrefix (Dublin Core)

See Also:
Constant Field Values
Constructor Detail

OaiPmhHarvester

public OaiPmhHarvester()
Basic constructor.

Method Detail

init

public void init()
          throws HarvesterException
Basic init() function. Notice the lack of parameters. This is not part of the Plugin API but from the GenericHarvester implementation. It will be called following the constructor verifies configuration is available.

Specified by:
init in class GenericHarvester
Throws:
HarvesterException - : If there are problems during instantiation

getObjectIdList

public Set<String> getObjectIdList()
                            throws HarvesterException
Gets a list of digital object IDs. If there are no objects, this method should return an empty list, not null.

Returns:
a list of object IDs, possibly empty
Throws:
HarvesterException - if there was an error retrieving the objects

hasMoreObjects

public boolean hasMoreObjects()
Tests whether there are more objects to retrieve. This method should return true if called before getObjects.

Returns:
true if there are more objects to retrieve, false otherwise


Copyright © 2009-2012. All Rights Reserved.