cz.muni.pdfjbim
Class PdfImageExtractor

java.lang.Object
  extended by cz.muni.pdfjbim.PdfImageExtractor

public class PdfImageExtractor
extends Object

class allowing extraction of images from a PDF document

Author:
Radim Hatlapatka (hata.radim@gmail.com)

Constructor Summary
PdfImageExtractor()
           
 
Method Summary
 void extractImages(File pdfFile, String password, Set<Integer> pagesToProcess, Boolean binarize)
          This method extracts images from PDF
 void extractImages(InputStream is, String password, Set<Integer> pagesToProcess, Boolean binarize)
          This method extracts images by going through all COSObjects pointed from xref table
static void extractImages(String filename)
          Parses a PDF and extracts all the images.
 void extractImages(String pdfFile, String password, Set<Integer> pagesToProcess, Boolean binarize)
          This method extracts images from PDF
 void extractImagesUsingPdfObjectAccess(String pdfFile, String prefix, String password, Set<Integer> pagesToProcess, Boolean binarize)
          Deprecated. -- do not use doesn't work properly yet This method extracts images by going through PDF tree structure
 void extractImagesUsingPdfParser(InputStream is, String prefix, String password, Set<Integer> pagesToProcess, Boolean binarize)
          This method extracts images by going through all COSObjects pointed from xref table
 void extractJbig2Images(InputStream is)
          Deprecated.  
 List<String> getNamesOfImages()
           
 List<PdfImageInformation> getOriginalImageInformations()
           
 String getUniqueFileName(String prefix, String suffix)
          get file name that is not used right now
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PdfImageExtractor

public PdfImageExtractor()
Method Detail

getNamesOfImages

public List<String> getNamesOfImages()
Returns:
names of images in a list

getOriginalImageInformations

public List<PdfImageInformation> getOriginalImageInformations()
Returns:
list of informations about images

extractImages

public void extractImages(File pdfFile,
                          String password,
                          Set<Integer> pagesToProcess,
                          Boolean binarize)
                   throws PdfRecompressionException
This method extracts images from PDF

Parameters:
pdfFile - input PDF file
password - password for access to PDF if needed
pagesToProcess - list of pages which should be processed if null given => processed all pages -- not working yet
binarize - -- enables processing of nonbitonal images as well (LZW is still not processed because of output with inverted colors)
Throws:
PdfRecompressionException - if problem to extract images from PDF

extractImages

public void extractImages(String pdfFile,
                          String password,
                          Set<Integer> pagesToProcess,
                          Boolean binarize)
                   throws PdfRecompressionException
This method extracts images from PDF

Parameters:
pdfFile - name of input PDF file
password - password for access to PDF if needed
pagesToProcess - list of pages which should be processed if null given => processed all pages -- not working yet
binarize - -- enables processing of nonbitonal images as well (LZW is still not processed because of output with inverted colors)
Throws:
PdfRecompressionException - if problem to extract images from PDF

extractImages

public void extractImages(InputStream is,
                          String password,
                          Set<Integer> pagesToProcess,
                          Boolean binarize)
                   throws PdfRecompressionException
This method extracts images by going through all COSObjects pointed from xref table

Parameters:
is - input stream containing input PDF file
password - password for access to PDF if needed
pagesToProcess - list of pages which should be processed if null given => processed all pages -- not working yet
binarize - -- enables processing of nonbitonal images as well (LZW is still not processed because of output with inverted colors)
Throws:
PdfRecompressionException - if problem to extract images from PDF

extractImages

public static void extractImages(String filename)
                          throws IOException,
                                 com.itextpdf.text.DocumentException
Parses a PDF and extracts all the images.

Parameters:
filename -
Throws:
IOException
com.itextpdf.text.DocumentException

extractJbig2Images

public void extractJbig2Images(InputStream is)
                        throws PdfRecompressionException
Deprecated. 

Extracts JBIG2Images from Input stream even if they are stored together with global dictionary in separate PDF object doesn't work yet, its in development stage

Parameters:
is -
Throws:
PdfRecompressionException

extractImagesUsingPdfParser

public void extractImagesUsingPdfParser(InputStream is,
                                        String prefix,
                                        String password,
                                        Set<Integer> pagesToProcess,
                                        Boolean binarize)
                                 throws PdfRecompressionException
This method extracts images by going through all COSObjects pointed from xref table

Parameters:
is - input stream containing PDF file
prefix - output basename for images
password - password for access to PDF if needed
pagesToProcess - list of pages which should be processed if null given => processed all pages -- not working yet
binarize - -- enables processing of nonbitonal images as well (LZW is still not processed because of output with inverted colors)
Throws:
PdfRecompressionException - if problem to extract images from PDF

extractImagesUsingPdfObjectAccess

public void extractImagesUsingPdfObjectAccess(String pdfFile,
                                              String prefix,
                                              String password,
                                              Set<Integer> pagesToProcess,
                                              Boolean binarize)
                                       throws PdfRecompressionException
Deprecated. -- do not use doesn't work properly yet This method extracts images by going through PDF tree structure

Parameters:
pdfFile - name of input PDF file
prefix -
password - password for access to PDF if needed
pagesToProcess - list of pages which should be processed if null given => processed all pages -- not working yet // * @param silent -- if true error messages are not written to output otherwise they are
binarize - -- enables processing of nonbitonal images as well (LZW is still not processed because of output with inverted colors)
Throws:
PdfRecompressionException - if problem to extract images from PDF

getUniqueFileName

public String getUniqueFileName(String prefix,
                                String suffix)
get file name that is not used right now

Parameters:
prefix - represents prefix of the name of file
suffix - represents suffix of the name of file
Returns:
file name that is not used right now


Copyright © 2012 Faculty of Informatics, Masaryk University, Brno. All Rights Reserved.