Class PDFXMLExtractorPlugin

  • All Implemented Interfaces:
    org.imixs.workflow.Plugin

    public class PDFXMLExtractorPlugin
    extends org.imixs.workflow.engine.plugins.AbstractPlugin
    The PDFXMLExtractorPlugin extracts embedded XML files from a PDF document and transforms the content into a Imixs XMLDocument. This data can be added into the current workitem for further processing.

    The plugin is based on the Apache PDFBox project. The maven dependency need to be added to a project

     
         <dependency>
           <groupId>org.apache.pdfbox</groupId>
           <artifactId>pdfbox</artifactId>
           <scope>compile</scope>
         </dependency>
       
     
    To activate the plugin, the BPMN event must contain the following item definition
     
         <item name="PDFXMLExtractor">
        <filename>*.xml</filename>
        <report>myReport</report>
       </item>
       
     
    Version:
    1.0
    Author:
    rsoika
    • Constructor Detail

      • PDFXMLExtractorPlugin

        public PDFXMLExtractorPlugin()
    • Method Detail

      • run

        public org.imixs.workflow.ItemCollection run​(org.imixs.workflow.ItemCollection document,
                                                     org.imixs.workflow.ItemCollection event)
                                              throws org.imixs.workflow.exceptions.PluginException
        This method parses the content of an attached pdf file and extracts an embedded XML file. This xml file will than be transformed by a given report definition into a Imixs XMLDocument. The content of the XMLDocument is than merged into the current document.
        Throws:
        org.imixs.workflow.exceptions.PluginException
      • getXMLFile

        public static byte[] getXMLFile​(org.imixs.workflow.ItemCollection document,
                                        String file_pattern)
                                 throws org.imixs.workflow.exceptions.PluginException
        This method searches attached PDF files of a workitem and extracts an embedded XML file.

        The method only returns the first embedded xml file and does not support multiple xml files embedded in one pdf file.

        Parameters:
        document -
        filePattern -
        Returns:
        Throws:
        org.imixs.workflow.exceptions.PluginException
      • streamToByteArray

        public static byte[] streamToByteArray​(InputStream ins)
                                        throws IOException
        This method converts a inputStream into a byte array.
        Parameters:
        ins -
        Returns:
        Throws:
        IOException