Class OAIHarvester

java.lang.Object
org.dspace.harvest.OAIHarvester

public class OAIHarvester extends Object
This class handles OAI harvesting of externally located records into this repository.
Author:
Alexey Maslov
  • Field Details

  • Constructor Details

  • Method Details

    • getORENamespace

      public static org.jdom2.Namespace getORENamespace()
      Search the configuration options and find the ORE serialization string
      Returns:
      Namespace of the supported ORE format. Returns null if not found.
    • getDMDNamespace

      public static org.jdom2.Namespace getDMDNamespace(String metadataKey)
      Cycle through the options and find the metadata namespace matching the provided key.
      Parameters:
      metadataKey -
      Returns:
      Namespace of the designated metadata format. Returns null of not found.
    • runHarvest

      public void runHarvest() throws SQLException, IOException, AuthorizeException
      Performs a harvest cycle on this collection. This will query the remote OAI-PMH provider, check for updates since last harvest, and ingest the returned items.
      Throws:
      IOException - A general class of exceptions produced by failed or interrupted I/O operations.
      SQLException - An exception that provides information on a database access error or other errors.
      AuthorizeException - Exception indicating the current user of the context does not have permission to perform a particular action.
    • processRecord

      protected void processRecord(org.jdom2.Element record, String OREPrefix, long currentRecord, long totalListSize) throws SQLException, AuthorizeException, IOException, CrosswalkException, HarvestingException, ParserConfigurationException, SAXException, XPathExpressionException
      Process an individual PMH record, making (or updating) a corresponding DSpace Item.
      Parameters:
      record - a JDOM Element containing the actual PMH record with descriptive metadata.
      OREPrefix - the metadataprefix value used by the remote PMH server to disseminate ORE. Only used for collections set up to harvest content.
      currentRecord - current record number to log
      totalListSize - The total number of records that this Harvest contains
      Throws:
      SQLException - An exception that provides information on a database access error or other errors.
      AuthorizeException - Exception indicating the current user of the context does not have permission to perform a particular action.
      IOException - A general class of exceptions produced by failed or interrupted I/O operations.
      CrosswalkException - if crosswalk error
      HarvestingException - if harvesting error
      ParserConfigurationException - XML parsing error
      SAXException - if XML processing error
      XPathExpressionException - if XPath error
    • extractHandle

      protected String extractHandle(Item item)
      Scan an item's metadata, looking for the value "identifier.*". If it meets the parameters that identify it as valid handle as set in dspace.cfg (harvester.acceptedHandleServer and harvester.rejectedHandlePrefix), use that handle instead of minting a new one.
      Parameters:
      item - a newly created, but not yet installed, DSpace Item
      Returns:
      null or the handle to be used.
    • oaiResolveNamespaceToPrefix

      public static String oaiResolveNamespaceToPrefix(String oaiSource, String MDNamespace) throws IOException, ParserConfigurationException, SAXException, XPathExpressionException, ConnectException
      Query the OAI-PMH server for its mapping of the supplied namespace and metadata prefix. For example for a typical OAI-PMH server a query "http://www.openarchives.org/OAI/2.0/oai_dc/" would return "oai_dc".
      Parameters:
      oaiSource - the address of the OAI-PMH provider
      MDNamespace - the namespace that we are trying to resolve to the metadataPrefix
      Returns:
      metadataPrefix the OAI-PMH provider has assigned to the supplied namespace
      Throws:
      IOException - A general class of exceptions produced by failed or interrupted I/O operations.
      ParserConfigurationException - XML parsing error
      SAXException - if XML processing error
      XPathExpressionException - if XPath error
      ConnectException - if could not connect to OAI server
    • alertAdmin

      protected void alertAdmin(int status, Exception ex)
      Generate and send an email to the administrator. Prompted by errors encountered during harvesting.
      Parameters:
      status - the current status of the collection, usually HarvestedCollection.STATUS_OAI_ERROR or HarvestedCollection.STATUS_UNKNOWN_ERROR
      ex - the Exception that prompted this action
    • getMDrecord

      protected List<org.jdom2.Element> getMDrecord(String oaiSource, String itemOaiId, String metadataPrefix) throws IOException, ParserConfigurationException, SAXException, XPathExpressionException, HarvestingException
      Query the OAI-PMH provider for a specific metadata record.
      Parameters:
      oaiSource - the address of the OAI-PMH provider
      itemOaiId - the OAI identifier of the target item
      metadataPrefix - the OAI metadataPrefix of the desired metadata
      Returns:
      list of JDOM elements corresponding to the metadata entries in the located record.
      Throws:
      IOException - A general class of exceptions produced by failed or interrupted I/O operations.
      ParserConfigurationException - XML parsing error
      SAXException - if XML processing error
      XPathExpressionException - if XPath error
      HarvestingException - if harvesting error
    • verifyOAIharvester

      public List<String> verifyOAIharvester()
      Verify OAI settings for the current collection
      Returns:
      list of errors encountered during verification. Empty list indicates a "success" condition.
    • getAvailableMetadataFormats

      public static List<Map<String,String>> getAvailableMetadataFormats()
      Return all available metadata formats
      Returns:
      a list containing a map for each supported metadata format