Class OCRDocumentAdapter

  • All Implemented Interfaces:
    org.imixs.workflow.Adapter, org.imixs.workflow.SignalAdapter

    public class OCRDocumentAdapter
    extends Object
    implements org.imixs.workflow.SignalAdapter
    The TikaDocumentAdapter reacts on ProcessingEvent to auto extract the text content.

    The adapter expect the following environment setting TIKA_SERVICE_MODE: "MODEL" You can set additional options to be passed to the Tika Service

     
            <tika name="options">X-Tika-PDFocrStrategy=OCR_ONLY</tika>
            <tika name="options">X-Tika-PDFOcrImageType=RGB</tika>
            <tika name="options">X-Tika-PDFOcrDPI=400</tika>
       
     
    Version:
    1.0
    Author:
    rsoika
    See Also:
    OCRDocumentService
    • Constructor Detail

      • OCRDocumentAdapter

        public OCRDocumentAdapter()
    • Method Detail

      • execute

        public org.imixs.workflow.ItemCollection execute​(org.imixs.workflow.ItemCollection document,
                                                         org.imixs.workflow.ItemCollection event)
                                                  throws org.imixs.workflow.exceptions.AdapterException
        This method posts a text from an attachment to the Imixs-ML Analyse service endpoint
        Specified by:
        execute in interface org.imixs.workflow.Adapter
        Throws:
        org.imixs.workflow.exceptions.AdapterException