org.apache.tika.parser.html
Class CustomBoilerpipeHtmlParser

java.lang.Object
  extended by org.apache.tika.parser.html.CustomBoilerpipeHtmlParser
All Implemented Interfaces:
java.io.Serializable, org.apache.tika.parser.Parser

public class CustomBoilerpipeHtmlParser
extends java.lang.Object
implements org.apache.tika.parser.Parser

Defines a HTML documents parser using boilerpipe/tika HtmlParser The extractor used is defined in a properties file

Author:
lkhelif
See Also:
Serialized Form

Constructor Summary
CustomBoilerpipeHtmlParser()
           
 
Method Summary
 java.util.Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext context)
           
 void parse(java.io.InputStream stream, org.xml.sax.ContentHandler handler, org.apache.tika.metadata.Metadata metadata)
           
 void parse(java.io.InputStream stream, org.xml.sax.ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CustomBoilerpipeHtmlParser

public CustomBoilerpipeHtmlParser()
Method Detail

parse

public void parse(java.io.InputStream stream,
                  org.xml.sax.ContentHandler handler,
                  org.apache.tika.metadata.Metadata metadata,
                  org.apache.tika.parser.ParseContext context)
           throws java.io.IOException,
                  org.xml.sax.SAXException,
                  org.apache.tika.exception.TikaException
Specified by:
parse in interface org.apache.tika.parser.Parser
Throws:
java.io.IOException
org.xml.sax.SAXException
org.apache.tika.exception.TikaException

parse

public void parse(java.io.InputStream stream,
                  org.xml.sax.ContentHandler handler,
                  org.apache.tika.metadata.Metadata metadata)
           throws java.io.IOException,
                  org.xml.sax.SAXException,
                  org.apache.tika.exception.TikaException
Specified by:
parse in interface org.apache.tika.parser.Parser
Throws:
java.io.IOException
org.xml.sax.SAXException
org.apache.tika.exception.TikaException

getSupportedTypes

public java.util.Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext context)
Specified by:
getSupportedTypes in interface org.apache.tika.parser.Parser


Copyright © 2004-2011. All Rights Reserved.