Package org.imixs.archive.documents
Class PDFXMLExtractorPlugin
java.lang.Object
org.imixs.workflow.engine.plugins.AbstractPlugin
org.imixs.archive.documents.PDFXMLExtractorPlugin
- All Implemented Interfaces:
org.imixs.workflow.Plugin
public class PDFXMLExtractorPlugin
extends org.imixs.workflow.engine.plugins.AbstractPlugin
The PDFXMLExtractorPlugin extracts embedded XML files from a PDF
document and transforms the content into a Imixs XMLDocument. This data can
be added into the current workitem for further processing.
The plugin is based on the Apache PDFBox project. The maven dependency need to be added to a project
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<scope>compile</scope>
</dependency>
To activate the plugin, the BPMN event must contain the following item
definition
<item name="PDFXMLExtractor">
<filename>*.xml</filename>
<report>myReport</report>
</item>
- Version:
- 1.0
- Author:
- rsoika
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final Stringstatic final Stringstatic final Stringstatic final Stringstatic final Stringstatic final StringFields inherited from class org.imixs.workflow.engine.plugins.AbstractPlugin
INVALID_ITEMVALUE_FORMAT, INVALID_PROPERTYVALUE_FORMAT -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic byte[]getXMLFile(org.imixs.workflow.ItemCollection document, String file_pattern) This method searches attached PDF files of a workitem and extracts an embedded XML file.org.imixs.workflow.ItemCollectionrun(org.imixs.workflow.ItemCollection document, org.imixs.workflow.ItemCollection event) This method parses the content of an attached pdf file and extracts an embedded XML file.static byte[]This method converts a inputStream into a byte array.Methods inherited from class org.imixs.workflow.engine.plugins.AbstractPlugin
close, getCtx, getWorkflowService, init, mergeFieldList, uniqueList
-
Field Details
-
PDFXMLEXTRACTOR
- See Also:
-
PARSING_EXCEPTION
- See Also:
-
PLUGIN_ERROR
- See Also:
-
REPORT_ERROR
- See Also:
-
FILE_PATTERN_PDF
- See Also:
-
FILE_PATTERN_XML
- See Also:
-
-
Constructor Details
-
PDFXMLExtractorPlugin
public PDFXMLExtractorPlugin()
-
-
Method Details
-
run
public org.imixs.workflow.ItemCollection run(org.imixs.workflow.ItemCollection document, org.imixs.workflow.ItemCollection event) throws org.imixs.workflow.exceptions.PluginException This method parses the content of an attached pdf file and extracts an embedded XML file. This xml file will than be transformed by a given report definition into a Imixs XMLDocument. The content of the XMLDocument is than merged into the current document.- Throws:
org.imixs.workflow.exceptions.PluginException
-
getXMLFile
public static byte[] getXMLFile(org.imixs.workflow.ItemCollection document, String file_pattern) throws org.imixs.workflow.exceptions.PluginException This method searches attached PDF files of a workitem and extracts an embedded XML file.The method only returns the first embedded xml file and does not support multiple xml files embedded in one pdf file.
- Parameters:
document-filePattern-- Returns:
- Throws:
org.imixs.workflow.exceptions.PluginException
-
streamToByteArray
This method converts a inputStream into a byte array.- Parameters:
ins-- Returns:
- Throws:
IOException
-