Class PDFXMLExtractorPlugin

java.lang.Object
org.imixs.workflow.engine.plugins.AbstractPlugin
org.imixs.archive.documents.PDFXMLExtractorPlugin
All Implemented Interfaces:
org.imixs.workflow.Plugin

public class PDFXMLExtractorPlugin extends org.imixs.workflow.engine.plugins.AbstractPlugin
The PDFXMLExtractorPlugin extracts embedded XML files from a PDF document and transforms the content into a Imixs XMLDocument. This data can be added into the current workitem for further processing.

The plugin is based on the Apache PDFBox project. The maven dependency need to be added to a project

 
     <dependency>
       <groupId>org.apache.pdfbox</groupId>
       <artifactId>pdfbox</artifactId>
       <scope>compile</scope>
     </dependency>
   
 
To activate the plugin, the BPMN event must contain the following item definition
 
     <item name="PDFXMLExtractor">
    <filename>*.xml</filename>
    <report>myReport</report>
   </item>
   
 
Version:
1.0
Author:
rsoika
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final String
     
    static final String
     
    static final String
     
    static final String
     
    static final String
     
    static final String
     

    Fields inherited from class org.imixs.workflow.engine.plugins.AbstractPlugin

    INVALID_ITEMVALUE_FORMAT, INVALID_PROPERTYVALUE_FORMAT
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    static byte[]
    getXMLFile(org.imixs.workflow.ItemCollection document, String file_pattern)
    This method searches attached PDF files of a workitem and extracts an embedded XML file.
    org.imixs.workflow.ItemCollection
    run(org.imixs.workflow.ItemCollection document, org.imixs.workflow.ItemCollection event)
    This method parses the content of an attached pdf file and extracts an embedded XML file.
    static byte[]
    This method converts a inputStream into a byte array.

    Methods inherited from class org.imixs.workflow.engine.plugins.AbstractPlugin

    close, getCtx, getWorkflowService, init, mergeFieldList, uniqueList

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

  • Constructor Details

    • PDFXMLExtractorPlugin

      public PDFXMLExtractorPlugin()
  • Method Details

    • run

      public org.imixs.workflow.ItemCollection run(org.imixs.workflow.ItemCollection document, org.imixs.workflow.ItemCollection event) throws org.imixs.workflow.exceptions.PluginException
      This method parses the content of an attached pdf file and extracts an embedded XML file. This xml file will than be transformed by a given report definition into a Imixs XMLDocument. The content of the XMLDocument is than merged into the current document.
      Throws:
      org.imixs.workflow.exceptions.PluginException
    • getXMLFile

      public static byte[] getXMLFile(org.imixs.workflow.ItemCollection document, String file_pattern) throws org.imixs.workflow.exceptions.PluginException
      This method searches attached PDF files of a workitem and extracts an embedded XML file.

      The method only returns the first embedded xml file and does not support multiple xml files embedded in one pdf file.

      Parameters:
      document -
      filePattern -
      Returns:
      Throws:
      org.imixs.workflow.exceptions.PluginException
    • streamToByteArray

      public static byte[] streamToByteArray(InputStream ins) throws IOException
      This method converts a inputStream into a byte array.
      Parameters:
      ins -
      Returns:
      Throws:
      IOException