Class OCRDocumentService

java.lang.Object
org.imixs.archive.documents.OCRDocumentService

public class OCRDocumentService extends Object
The TikaDocumentService extracts the textual information from document attachments. The CDI bean runs on the ProcessingEvent BEFORE_PROCESS. The service sends each new attached document to an instance of an Apache Tika Server to get the file content.

The service expects a valid Rest API end-point defined by the Environment Parameter 'TIKA_SERVICE_ENDPONT'. If the TIKA_SERVICE_ENDPONT is not set, then the service will be skipped.

The environment parameter 'TIKA_SERVICE_MODE' must be set to 'auto' to enable the service.

See also the project: https://github.com/imixs/imixs-docker/tree/master/tika

Version:
1.1
Author:
rsoika
  • Field Details

  • Constructor Details

    • OCRDocumentService

      public OCRDocumentService()
  • Method Details

    • onBeforeProcess

      public void onBeforeProcess(@Observes org.imixs.workflow.engine.ProcessingEvent processingEvent) throws org.imixs.workflow.exceptions.PluginException
      React on the ProcessingEvent. This method sends the document content to the tika server and updates the DMS information.
      Throws:
      org.imixs.workflow.exceptions.PluginException