org.dspace.app.mediafilter
Class XPDF2Text
java.lang.Object
org.dspace.app.mediafilter.MediaFilter
org.dspace.app.mediafilter.XPDF2Text
- All Implemented Interfaces:
- FormatFilter
public class XPDF2Text
- extends MediaFilter
Text MediaFilter for PDF sources
This filter produces extracted text suitable for building an index,
but not for display to end users.
It forks a process running the "pdftotext" program from the
XPdf suite -- see http://www.foolabs.com/xpdf/
This is a suite of open-source PDF tools that has been widely ported
to Unix platforms and the ones we use (pdftoppm, pdftotext) even
run on Win32.
This was written for the FACADE project but it is not directly connected
to any of the other FACADE-specific software. The FACADE UI expects
to find thumbnail images for 3D PDFs generated by this filter.
Requires DSpace config properties keys:
xpdf.path.pdftotext -- path to "pdftotext" executable (required!)
- Author:
- Larry Stone
- See Also:
MediaFilter
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
XPDF2Text
public XPDF2Text()
getFilteredName
public String getFilteredName(String oldFilename)
- Description copied from interface:
FormatFilter
- Get a filename for a newly created filtered bitstream
- Parameters:
oldFilename - name of source bitstream
- Returns:
- filename generated by the filter - for example, document.pdf
becomes document.pdf.txt
getBundleName
public String getBundleName()
- Returns:
- name of the bundle this filter will stick its generated
Bitstreams
getFormatString
public String getFormatString()
- Returns:
- name of the bitstream format (say "HTML" or "Microsoft Word")
returned by this filter look in the bitstream format registry or
mediafilter.cfg for valid format strings.
getDescription
public String getDescription()
- Returns:
- string to describe the newly-generated Bitstream's - how it was
produced is a good idea
getDestinationStream
public InputStream getDestinationStream(InputStream sourceStream)
throws Exception
- Parameters:
sourceStream - input stream
- Returns:
- result of filter's transformation, written out to a bitstream
- Throws:
Exception
Copyright © 2010 The DSpace Foundation. All Rights Reserved.