org.apache.tika.parser.html
Class CustomBoilerpipeHtmlParser
java.lang.Object
org.apache.tika.parser.html.CustomBoilerpipeHtmlParser
- All Implemented Interfaces:
- java.io.Serializable, org.apache.tika.parser.Parser
public class CustomBoilerpipeHtmlParser
- extends java.lang.Object
- implements org.apache.tika.parser.Parser
Defines a HTML documents parser using boilerpipe/tika HtmlParser
The extractor used is defined in a properties file
- Author:
- lkhelif
- See Also:
- Serialized Form
|
Method Summary |
java.util.Set<org.apache.tika.mime.MediaType> |
getSupportedTypes(org.apache.tika.parser.ParseContext context)
|
void |
parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
org.apache.tika.metadata.Metadata metadata)
|
void |
parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
org.apache.tika.metadata.Metadata metadata,
org.apache.tika.parser.ParseContext context)
|
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CustomBoilerpipeHtmlParser
public CustomBoilerpipeHtmlParser()
parse
public void parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
org.apache.tika.metadata.Metadata metadata,
org.apache.tika.parser.ParseContext context)
throws java.io.IOException,
org.xml.sax.SAXException,
org.apache.tika.exception.TikaException
- Specified by:
parse in interface org.apache.tika.parser.Parser
- Throws:
java.io.IOException
org.xml.sax.SAXException
org.apache.tika.exception.TikaException
parse
public void parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
org.apache.tika.metadata.Metadata metadata)
throws java.io.IOException,
org.xml.sax.SAXException,
org.apache.tika.exception.TikaException
- Specified by:
parse in interface org.apache.tika.parser.Parser
- Throws:
java.io.IOException
org.xml.sax.SAXException
org.apache.tika.exception.TikaException
getSupportedTypes
public java.util.Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext context)
- Specified by:
getSupportedTypes in interface org.apache.tika.parser.Parser
Copyright © 2004-2011. All Rights Reserved.