Overview
Package
Class
Use
Tree
Deprecated
Index
Help
PREV NEXT
FRAMES
NO FRAMES
All Classes
A
B
C
E
F
G
L
M
O
P
R
S
T
X
A
addUnitOnValues(List<String>, String)
- Static method in class org.ow2.weblab.services.normaliser.tika.
TikaExtractorService
Adding unit on each values of the list
annotate(Document, Map<String, List<String>>)
- Static method in class org.ow2.weblab.services.normaliser.tika.
TikaExtractorService
Annotates
cu
with the predicates and literals contained in
toAnnot
.
B
BASE_URI_PROPERTY_NAME
- Static variable in class org.ow2.weblab.services.normaliser.tika.
TikaExtractorService
C
characters(char[], int, int)
- Method in class org.ow2.weblab.services.normaliser.tika.
MediaUnitContentHandler
checkArgs(ProcessArgs)
- Static method in class org.ow2.weblab.services.normaliser.tika.
TikaExtractorService
cleanMap(Map<String, List<String>>)
- Static method in class org.ow2.weblab.services.normaliser.tika.
TikaExtractorService
Modify the
Map
in parameter.
CONFIG_FILE
- Static variable in class org.ow2.weblab.services.normaliser.tika.
TikaExtractorService
Properties file
contentManager
- Variable in class org.ow2.weblab.services.normaliser.tika.
TikaExtractorService
The
BinaryFolderContentManager
to use
convertToISO8601Date(String)
- Static method in class org.ow2.weblab.services.normaliser.tika.
TikaExtractorService
CustomBoilerpipeHtmlParser
- Class in
org.apache.tika.parser.html
Defines a HTML documents parser using boilerpipe/tika HtmlParser The extractor used is defined in a properties file
CustomBoilerpipeHtmlParser()
- Constructor for class org.apache.tika.parser.html.
CustomBoilerpipeHtmlParser
CustomOfficeParser
- Class in
org.apache.tika.parser.microsoft
Defines a Microsoft document content extractor.
CustomOfficeParser()
- Constructor for class org.apache.tika.parser.microsoft.
CustomOfficeParser
CustomOutlookExtractor
- Class in
org.apache.tika.parser.microsoft
Outlook Message Parser.
CustomOutlookExtractor(POIFSFileSystem, ParseContext)
- Constructor for class org.apache.tika.parser.microsoft.
CustomOutlookExtractor
CustomPDFParser
- Class in
org.apache.tika.parser.pdf
PDF parser.
CustomPDFParser()
- Constructor for class org.apache.tika.parser.pdf.
CustomPDFParser
E
EmlParser
- Class in
org.apache.tika.parser.microsoft
Defines a EML document content extractor.
EmlParser()
- Constructor for class org.apache.tika.parser.microsoft.
EmlParser
endElement(String, String, String)
- Method in class org.ow2.weblab.services.normaliser.tika.
MediaUnitContentHandler
extractTextAndMetadata(Document, File, Map<String, List<String>>, boolean)
- Static method in class org.ow2.weblab.services.normaliser.tika.
TikaExtractorService
F
fillMapWithMetadata(Map<String, List<String>>, Metadata)
- Static method in class org.ow2.weblab.services.normaliser.tika.
TikaExtractorService
The method converts the metadata extracted by Tika into a Map of predicates with their values that can be annotated.
G
getSupportedTypes(ParseContext)
- Method in class org.apache.tika.parser.html.
CustomBoilerpipeHtmlParser
getSupportedTypes(ParseContext)
- Method in class org.apache.tika.parser.microsoft.
CustomOfficeParser
getSupportedTypes(ParseContext)
- Method in class org.apache.tika.parser.microsoft.
EmlParser
getSupportedTypes(ParseContext)
- Method in class org.apache.tika.parser.pdf.
CustomPDFParser
getTikaConfig()
- Static method in class org.ow2.weblab.services.normaliser.tika.
TikaExtractorService
L
loadTikaServiceProps()
- Method in class org.ow2.weblab.services.normaliser.tika.
TikaExtractorService
M
MediaUnitContentHandler
- Class in
org.ow2.weblab.services.normaliser.tika
MediaUnitContentHandler(ContentHandler, Document)
- Constructor for class org.ow2.weblab.services.normaliser.tika.
MediaUnitContentHandler
O
org.apache.tika.parser.html
- package org.apache.tika.parser.html
org.apache.tika.parser.microsoft
- package org.apache.tika.parser.microsoft
org.apache.tika.parser.pdf
- package org.apache.tika.parser.pdf
org.ow2.weblab.services.normaliser.tika
- package org.ow2.weblab.services.normaliser.tika
OVERRIDE_METADATA_PROPERTY_NAME
- Static variable in class org.ow2.weblab.services.normaliser.tika.
TikaExtractorService
P
parse(InputStream, ContentHandler, Metadata, ParseContext)
- Method in class org.apache.tika.parser.html.
CustomBoilerpipeHtmlParser
parse(InputStream, ContentHandler, Metadata)
- Method in class org.apache.tika.parser.html.
CustomBoilerpipeHtmlParser
parse(InputStream, ContentHandler, Metadata, ParseContext)
- Method in class org.apache.tika.parser.microsoft.
CustomOfficeParser
Extracts properties and text from an MS Document input stream
parse(InputStream, ContentHandler, Metadata)
- Method in class org.apache.tika.parser.microsoft.
CustomOfficeParser
Deprecated.
This method will be removed in Apache Tika 1.0.
parse(XHTMLContentHandler, Metadata)
- Method in class org.apache.tika.parser.microsoft.
CustomOutlookExtractor
parse(InputStream, ContentHandler, Metadata, ParseContext)
- Method in class org.apache.tika.parser.microsoft.
EmlParser
Extracts properties and text from an EML Document input stream
parse(InputStream, ContentHandler, Metadata)
- Method in class org.apache.tika.parser.microsoft.
EmlParser
Deprecated.
This method will be removed in Apache Tika 1.0.
parse(InputStream, ContentHandler, Metadata, ParseContext)
- Method in class org.apache.tika.parser.pdf.
CustomPDFParser
parse(InputStream, ContentHandler, Metadata)
- Method in class org.apache.tika.parser.pdf.
CustomPDFParser
Deprecated.
This method will be removed in Apache Tika 1.0.
PASSWORD
- Static variable in class org.apache.tika.parser.pdf.
CustomPDFParser
Metadata key for giving the document password to the parser.
process(ProcessArgs)
- Method in class org.ow2.weblab.services.normaliser.tika.
TikaExtractorService
R
REMOVE_COTNENT_PROPERTY_NAME
- Static variable in class org.ow2.weblab.services.normaliser.tika.
TikaExtractorService
S
startElement(String, String, String, Attributes)
- Method in class org.ow2.weblab.services.normaliser.tika.
MediaUnitContentHandler
T
TikaExtractorService
- Class in
org.ow2.weblab.services.normaliser.tika
Tika extractor is quite simple since it does not handle with structure of documents (sheets in Excel, paragraphs in Word, etc.) The structure might have been represented as various MediaUnits
TikaExtractorService()
- Constructor for class org.ow2.weblab.services.normaliser.tika.
TikaExtractorService
The default and only constructor.
X
XHTML_FOLDER_PROPERTY_NAME
- Static variable in class org.ow2.weblab.services.normaliser.tika.
TikaExtractorService
XHTML_SAVE
- Static variable in class org.ow2.weblab.services.normaliser.tika.
TikaExtractorService
A
B
C
E
F
G
L
M
O
P
R
S
T
X
Overview
Package
Class
Use
Tree
Deprecated
Index
Help
PREV NEXT
FRAMES
NO FRAMES
All Classes
Copyright © 2004-2011. All Rights Reserved.