public class CmdLineCrawl extends PluggableCrawl
PluggableCrawl.PluggableCrawlerStatus| Modifier and Type | Field and Description |
|---|---|
protected static Pattern |
bytesPattern |
protected List<String> |
command |
protected CmdLineCrawler |
crawler |
protected String |
errorLogLevel |
protected static Pattern |
errorPattern |
protected String |
outputLogLevel |
protected static Pattern |
successPattern |
protected String |
threadName |
protected File |
tmpDir
The temp directory used to store any files.
|
protected static Pattern |
urlPattern |
au, crawlDesc, crawlerConfig, crawlerStatus, crawlJob| Constructor and Description |
|---|
CmdLineCrawl(CmdLineCrawler crawler,
org.lockss.plugin.ArchivalUnit au,
org.lockss.util.rest.crawler.CrawlJob crawlJob)
Instantiates a new Cmd line crawl.
|
| Modifier and Type | Method and Description |
|---|---|
static long |
extractBytes(String str) |
static List<String> |
extractUrls(String text)
Returns a list with all links contained in the input
|
List<String> |
getCommand()
Gets command.
|
protected List<String> |
getReqUrls() |
org.lockss.daemon.LockssRunnable |
getRunnable() |
protected List<String> |
getStems() |
File |
getTmpDir()
Gets tmp dir.
|
Collection<File> |
getWarcFiles(List<String> exts) |
void |
parseLine(String pre,
String line) |
org.lockss.crawler.CrawlerStatus |
startCrawl()
Enqueue a crawl request.
|
org.lockss.crawler.CrawlerStatus |
stopCrawl()
Stop crawl crawler status.
|
String |
toString() |
generateKey, getAu, getAuId, getCrawlDesc, getCrawlerConfig, getCrawlerId, getCrawlerStatus, getCrawlKey, getCrawlKind, getCrawlStatus, getJobStatus, setCrawlerStatusprotected CmdLineCrawler crawler
protected String threadName
protected File tmpDir
protected String outputLogLevel
protected String errorLogLevel
protected static Pattern successPattern
protected static Pattern errorPattern
protected static Pattern urlPattern
protected static Pattern bytesPattern
public CmdLineCrawl(CmdLineCrawler crawler, org.lockss.plugin.ArchivalUnit au, org.lockss.util.rest.crawler.CrawlJob crawlJob)
crawler - the crawler for this crawlcrawlJob - the job for this crawlpublic org.lockss.crawler.CrawlerStatus startCrawl()
PluggableCrawlstartCrawl in class PluggableCrawlpublic org.lockss.crawler.CrawlerStatus stopCrawl()
PluggableCrawlstopCrawl in class PluggableCrawlpublic File getTmpDir()
public Collection<File> getWarcFiles(List<String> exts)
public org.lockss.daemon.LockssRunnable getRunnable()
public static List<String> extractUrls(String text)
public static long extractBytes(String str)
Copyright © 2000–2023 LOCKSS Program. All rights reserved.