Class CmdLineCrawl

java.lang.Object
org.lockss.laaws.crawler.impl.pluggable.PluggableCrawl
org.lockss.laaws.crawler.impl.pluggable.CmdLineCrawl

public class CmdLineCrawl extends PluggableCrawl
A class to wrap a single CommandLineCrawl
  • Field Details

    • crawler

      protected CmdLineCrawler crawler
    • threadName

      protected String threadName
    • command

      protected List<String> command
    • tmpDir

      protected File tmpDir
      The temp directory used to store any files.
    • outputLogLevel

      protected String outputLogLevel
    • errorLogLevel

      protected String errorLogLevel
    • successPattern

      protected static Pattern successPattern
    • errorPattern

      protected static Pattern errorPattern
    • urlPattern

      protected static Pattern urlPattern
    • bytesPattern

      protected static Pattern bytesPattern
  • Constructor Details

    • CmdLineCrawl

      public CmdLineCrawl(CmdLineCrawler crawler, org.lockss.plugin.ArchivalUnit au, org.lockss.util.rest.crawler.CrawlJob crawlJob)
      Instantiates a new Cmd line crawl.
      Parameters:
      crawler - the crawler for this crawl
      crawlJob - the job for this crawl
  • Method Details

    • startCrawl

      public org.lockss.crawler.CrawlerStatus startCrawl()
      Description copied from class: PluggableCrawl
      Enqueue a crawl request.
      Specified by:
      startCrawl in class PluggableCrawl
      Returns:
      the crawler status
    • stopCrawl

      public org.lockss.crawler.CrawlerStatus stopCrawl()
      Description copied from class: PluggableCrawl
      Stop crawl crawler status.
      Specified by:
      stopCrawl in class PluggableCrawl
      Returns:
      the crawler status
    • getTmpDir

      public File getTmpDir()
      Gets tmp dir.
      Returns:
      the tmp dir
    • getWarcFiles

      public Collection<File> getWarcFiles(List<String> exts)
    • getCommand

      public List<String> getCommand()
      Gets command.
      Returns:
      the command
    • getReqUrls

      protected List<String> getReqUrls()
    • getStems

      protected List<String> getStems()
    • getRunnable

      public org.lockss.daemon.LockssRunnable getRunnable()
    • parseLine

      public void parseLine(String pre, String line)
    • extractUrls

      public static List<String> extractUrls(String text)
      Returns a list with all links contained in the input
    • extractBytes

      public static long extractBytes(String str)
    • toString

      public String toString()
      Overrides:
      toString in class Object