Class PoiWordFilter

  • All Implemented Interfaces:
    FormatFilter

    public class PoiWordFilter
    extends MediaFilter
    Extract flat text from Microsoft Word documents (.doc, .docx).
    • Constructor Detail

      • PoiWordFilter

        public PoiWordFilter()
    • Method Detail

      • getFilteredName

        public String getFilteredName​(String oldFilename)
        Description copied from interface: FormatFilter
        Get a filename for a newly created filtered bitstream
        Parameters:
        oldFilename - name of source bitstream
        Returns:
        filename generated by the filter - for example, document.pdf becomes document.pdf.txt
      • getBundleName

        public String getBundleName()
        Returns:
        name of the bundle this filter will stick its generated Bitstreams
      • getFormatString

        public String getFormatString()
        Returns:
        name of the bitstream format (say "HTML" or "Microsoft Word") returned by this filter look in the bitstream format registry or mediafilter.cfg for valid format strings.
      • getDescription

        public String getDescription()
        Returns:
        string to describe the newly-generated Bitstream - how it was produced is a good idea
      • getDestinationStream

        public InputStream getDestinationStream​(Item currentItem,
                                                InputStream source,
                                                boolean verbose)
                                         throws Exception
        Description copied from interface: FormatFilter
        Read the source stream and produce the filtered content.
        Parameters:
        currentItem - Item
        source - input stream
        verbose - verbosity flag
        Returns:
        result of filter's transformation as a byte stream.
        Throws:
        Exception - if error