org.ow2.weblab.crawler
Class FolderCrawler

java.lang.Object
  extended by org.ow2.weblab.crawler.FolderCrawler

public class FolderCrawler
extends java.lang.Object

Use this component crawl a folder. This is a basic component, no thread, no complex timings, no data comparison. A real crawler could use multiple instances of this component.

Author:
EADS DS

Field Summary
protected  int bufferSize
           
protected  org.ow2.weblab.content.ContentManager contentManager
           
protected static java.lang.String CRAWLER_CONTENT_ID
           
protected static java.lang.String CRAWLER_ID
           
protected static java.text.SimpleDateFormat DATE_FORMAT
           
protected  java.io.FileFilter fileFilter
           
protected  java.io.File folder
           
protected  java.io.FileFilter folderFilter
           
protected  boolean recursiveMode
           
 
Constructor Summary
FolderCrawler(org.ow2.weblab.content.ContentManager contentManager, java.io.File folder, java.io.FileFilter fileFilter, boolean recursiveMode)
          Constructors
FolderCrawler(org.ow2.weblab.content.ContentManager contentManager, java.io.File folder, java.io.FileFilter fileFilter, boolean recursiveMode, java.io.FileFilter folderFilter)
          Constructors
FolderCrawler(java.lang.String folderToCrawl)
          Constructors
FolderCrawler(java.lang.String folderToCrawl, java.io.FileFilter fileFilter)
          Constructors
FolderCrawler(java.lang.String folderToCrawl, java.io.FileFilter fileFilter, boolean recursiveMode)
          Constructors
FolderCrawler(java.lang.String folderToCrawl, java.io.FileFilter fileFilter, boolean recursiveMode, java.io.FileFilter folderFilter)
          Constructors
 
Method Summary
 org.ow2.weblab.core.model.ComposedResource getCrawledDocuments(int offset, int limit)
           
 int getNbFiles()
           
protected  void listAndAddFiles(java.io.File newFolder)
           
 void startCrawl()
          Crawls the folder using the file filter and fills the crawled files list.
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

contentManager

protected final org.ow2.weblab.content.ContentManager contentManager

folder

protected final java.io.File folder

fileFilter

protected final java.io.FileFilter fileFilter

folderFilter

protected final java.io.FileFilter folderFilter

bufferSize

protected final int bufferSize
See Also:
Constant Field Values

recursiveMode

protected final boolean recursiveMode

CRAWLER_ID

protected static final java.lang.String CRAWLER_ID
See Also:
Constant Field Values

CRAWLER_CONTENT_ID

protected static final java.lang.String CRAWLER_CONTENT_ID
See Also:
Constant Field Values

DATE_FORMAT

protected static final java.text.SimpleDateFormat DATE_FORMAT
Constructor Detail

FolderCrawler

public FolderCrawler(org.ow2.weblab.content.ContentManager contentManager,
                     java.io.File folder,
                     java.io.FileFilter fileFilter,
                     boolean recursiveMode,
                     java.io.FileFilter folderFilter)
              throws org.ow2.weblab.core.extended.exception.WebLabCheckedException
Constructors

Parameters:
contentManager - The content manager
folder - The folder to crawl
fileFilter - The file filter to be used
recursiveMode - Whether or not to crawl contained folders
folderFilter - A filter on the folder
Throws:
org.ow2.weblab.core.extended.exception.WebLabCheckedException - If one of the parameters is not correct or if the creation of mimeinfo throws exception.

FolderCrawler

public FolderCrawler(org.ow2.weblab.content.ContentManager contentManager,
                     java.io.File folder,
                     java.io.FileFilter fileFilter,
                     boolean recursiveMode)
              throws org.ow2.weblab.core.extended.exception.WebLabCheckedException
Constructors

Parameters:
contentManager - The content manager
folder - The folder to crawl
fileFilter - The file filter to be used
recursiveMode - Whether or not to crawl contained folders
Throws:
org.ow2.weblab.core.extended.exception.WebLabCheckedException - If one of the parameters is not correct or if the creation of mimeinfo throws exception.

FolderCrawler

public FolderCrawler(java.lang.String folderToCrawl,
                     java.io.FileFilter fileFilter)
              throws org.ow2.weblab.core.extended.exception.WebLabCheckedException
Constructors

Parameters:
folderToCrawl - The folder to crawl
fileFilter - The file filter to be used
Throws:
org.ow2.weblab.core.extended.exception.WebLabCheckedException - If one of the parameters is not correct or if the creation of mimeinfo throws exception.

FolderCrawler

public FolderCrawler(java.lang.String folderToCrawl,
                     java.io.FileFilter fileFilter,
                     boolean recursiveMode)
              throws org.ow2.weblab.core.extended.exception.WebLabCheckedException
Constructors

Parameters:
folderToCrawl - The folder to crawl
fileFilter - The file filter to be used
recursiveMode - Whether or not to crawl contained folders
Throws:
org.ow2.weblab.core.extended.exception.WebLabCheckedException - If one of the parameters is not correct or if the creation of mimeinfo throws exception.

FolderCrawler

public FolderCrawler(java.lang.String folderToCrawl,
                     java.io.FileFilter fileFilter,
                     boolean recursiveMode,
                     java.io.FileFilter folderFilter)
              throws org.ow2.weblab.core.extended.exception.WebLabCheckedException
Constructors

Parameters:
folderToCrawl - The folder to crawl
fileFilter - The file filter to be used
recursiveMode - Whether or not to crawl contained folders
folderFilter - The folder filter to be used
Throws:
org.ow2.weblab.core.extended.exception.WebLabCheckedException - If one of the parameters is not correct or if the creation of mimeinfo throws exception.

FolderCrawler

public FolderCrawler(java.lang.String folderToCrawl)
              throws org.ow2.weblab.core.extended.exception.WebLabCheckedException
Constructors

Parameters:
internFolder -
folderToCrawl -
Throws:
org.ow2.weblab.core.extended.exception.WebLabCheckedException
Method Detail

getNbFiles

public int getNbFiles()
Returns:
The number of file crawled.

startCrawl

public void startCrawl()
Crawls the folder using the file filter and fills the crawled files list.


listAndAddFiles

protected void listAndAddFiles(java.io.File newFolder)
Parameters:
newFolder - The folder to be crawled

getCrawledDocuments

public org.ow2.weblab.core.model.ComposedResource getCrawledDocuments(int offset,
                                                                      int limit)
Parameters:
offset - the starting point in the collection. If negative, 0 is used.
limit - if negative of null, Integer.MAX_VALUE is used.
Returns:
A resource collection

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object


Copyright © 2004-2011. All Rights Reserved.