A B C E F G L M O P R S T X

A

addUnitOnValues(List<String>, String) - Static method in class org.ow2.weblab.services.normaliser.tika.TikaExtractorService
Adding unit on each values of the list
annotate(Document, Map<String, List<String>>) - Static method in class org.ow2.weblab.services.normaliser.tika.TikaExtractorService
Annotates cu with the predicates and literals contained in toAnnot.

B

BASE_URI_PROPERTY_NAME - Static variable in class org.ow2.weblab.services.normaliser.tika.TikaExtractorService
 

C

characters(char[], int, int) - Method in class org.ow2.weblab.services.normaliser.tika.MediaUnitContentHandler
 
checkArgs(ProcessArgs) - Static method in class org.ow2.weblab.services.normaliser.tika.TikaExtractorService
 
cleanMap(Map<String, List<String>>) - Static method in class org.ow2.weblab.services.normaliser.tika.TikaExtractorService
Modify the Map in parameter.
CONFIG_FILE - Static variable in class org.ow2.weblab.services.normaliser.tika.TikaExtractorService
Properties file
contentManager - Variable in class org.ow2.weblab.services.normaliser.tika.TikaExtractorService
The BinaryFolderContentManager to use
convertToISO8601Date(String) - Static method in class org.ow2.weblab.services.normaliser.tika.TikaExtractorService
 
CustomBoilerpipeHtmlParser - Class in org.apache.tika.parser.html
Defines a HTML documents parser using boilerpipe/tika HtmlParser The extractor used is defined in a properties file
CustomBoilerpipeHtmlParser() - Constructor for class org.apache.tika.parser.html.CustomBoilerpipeHtmlParser
 
CustomOfficeParser - Class in org.apache.tika.parser.microsoft
Defines a Microsoft document content extractor.
CustomOfficeParser() - Constructor for class org.apache.tika.parser.microsoft.CustomOfficeParser
 
CustomOutlookExtractor - Class in org.apache.tika.parser.microsoft
Outlook Message Parser.
CustomOutlookExtractor(POIFSFileSystem, ParseContext) - Constructor for class org.apache.tika.parser.microsoft.CustomOutlookExtractor
 
CustomPDFParser - Class in org.apache.tika.parser.pdf
PDF parser.
CustomPDFParser() - Constructor for class org.apache.tika.parser.pdf.CustomPDFParser
 

E

EmlParser - Class in org.apache.tika.parser.microsoft
Defines a EML document content extractor.
EmlParser() - Constructor for class org.apache.tika.parser.microsoft.EmlParser
 
endElement(String, String, String) - Method in class org.ow2.weblab.services.normaliser.tika.MediaUnitContentHandler
 
extractTextAndMetadata(Document, File, Map<String, List<String>>, boolean) - Static method in class org.ow2.weblab.services.normaliser.tika.TikaExtractorService
 

F

fillMapWithMetadata(Map<String, List<String>>, Metadata) - Static method in class org.ow2.weblab.services.normaliser.tika.TikaExtractorService
The method converts the metadata extracted by Tika into a Map of predicates with their values that can be annotated.

G

getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.html.CustomBoilerpipeHtmlParser
 
getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.microsoft.CustomOfficeParser
 
getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.microsoft.EmlParser
 
getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.pdf.CustomPDFParser
 
getTikaConfig() - Static method in class org.ow2.weblab.services.normaliser.tika.TikaExtractorService
 

L

loadTikaServiceProps() - Method in class org.ow2.weblab.services.normaliser.tika.TikaExtractorService
 

M

MediaUnitContentHandler - Class in org.ow2.weblab.services.normaliser.tika
 
MediaUnitContentHandler(ContentHandler, Document) - Constructor for class org.ow2.weblab.services.normaliser.tika.MediaUnitContentHandler
 

O

org.apache.tika.parser.html - package org.apache.tika.parser.html
 
org.apache.tika.parser.microsoft - package org.apache.tika.parser.microsoft
 
org.apache.tika.parser.pdf - package org.apache.tika.parser.pdf
 
org.ow2.weblab.services.normaliser.tika - package org.ow2.weblab.services.normaliser.tika
 
OVERRIDE_METADATA_PROPERTY_NAME - Static variable in class org.ow2.weblab.services.normaliser.tika.TikaExtractorService
 

P

parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.html.CustomBoilerpipeHtmlParser
 
parse(InputStream, ContentHandler, Metadata) - Method in class org.apache.tika.parser.html.CustomBoilerpipeHtmlParser
 
parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.microsoft.CustomOfficeParser
Extracts properties and text from an MS Document input stream
parse(InputStream, ContentHandler, Metadata) - Method in class org.apache.tika.parser.microsoft.CustomOfficeParser
Deprecated. This method will be removed in Apache Tika 1.0.
parse(XHTMLContentHandler, Metadata) - Method in class org.apache.tika.parser.microsoft.CustomOutlookExtractor
 
parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.microsoft.EmlParser
Extracts properties and text from an EML Document input stream
parse(InputStream, ContentHandler, Metadata) - Method in class org.apache.tika.parser.microsoft.EmlParser
Deprecated. This method will be removed in Apache Tika 1.0.
parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.pdf.CustomPDFParser
 
parse(InputStream, ContentHandler, Metadata) - Method in class org.apache.tika.parser.pdf.CustomPDFParser
Deprecated. This method will be removed in Apache Tika 1.0.
PASSWORD - Static variable in class org.apache.tika.parser.pdf.CustomPDFParser
Metadata key for giving the document password to the parser.
process(ProcessArgs) - Method in class org.ow2.weblab.services.normaliser.tika.TikaExtractorService
 

R

REMOVE_COTNENT_PROPERTY_NAME - Static variable in class org.ow2.weblab.services.normaliser.tika.TikaExtractorService
 

S

startElement(String, String, String, Attributes) - Method in class org.ow2.weblab.services.normaliser.tika.MediaUnitContentHandler
 

T

TikaExtractorService - Class in org.ow2.weblab.services.normaliser.tika
Tika extractor is quite simple since it does not handle with structure of documents (sheets in Excel, paragraphs in Word, etc.) The structure might have been represented as various MediaUnits
TikaExtractorService() - Constructor for class org.ow2.weblab.services.normaliser.tika.TikaExtractorService
The default and only constructor.

X

XHTML_FOLDER_PROPERTY_NAME - Static variable in class org.ow2.weblab.services.normaliser.tika.TikaExtractorService
 
XHTML_SAVE - Static variable in class org.ow2.weblab.services.normaliser.tika.TikaExtractorService
 

A B C E F G L M O P R S T X

Copyright © 2004-2011. All Rights Reserved.