Package org.lockss.laaws.crawler.wget
Class WgetCmdLineCrawler
java.lang.Object
org.lockss.laaws.crawler.impl.pluggable.CmdLineCrawler
org.lockss.laaws.crawler.wget.WgetCmdLineCrawler
- All Implemented Interfaces:
PluggableCrawler
The type Wget cmd line crawler.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.lockss.laaws.crawler.impl.pluggable.CmdLineCrawler
CmdLineCrawler.CommandLineBuilder, CmdLineCrawler.RunnableCrawlJob -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringThe constant ATTR_OUTPUT_LEVEL.static final StringThe constant ATTR_SUCCESS_CODE.Fields inherited from class org.lockss.laaws.crawler.impl.pluggable.CmdLineCrawler
ATTR_COMPRESS_WARC, ATTR_COMPRESSED_WARC_FILE_EXTENSION, ATTR_CRAWL_EXECUTOR_SPEC, ATTR_ERROR_LOG_LEVEL, ATTR_EXCLUDE_STATUS_PATTERN, ATTR_JOIN_OUTPUT_STREAMS, ATTR_OUTPUT_LOG_LEVEL, ATTR_PROC_EXIT_WAIT, ATTR_UNCOMPRESSED_WARC_FILE_EXTENSION, ATTR_UNSUPPORTED_PARAMS, cmdLineBuilder, compressWarc, config, crawlMap, DEFAULT_CMDLINE_CRAWL_EXECUTOR_SPEC, DEFAULT_COMPRESS_WARC, DEFAULT_COMPRESSED_WARC_FILE_EXTENSION, DEFAULT_ERROR_LOG_LEVEL, DEFAULT_EXCLUDE_STATUS_PATTERN, DEFAULT_JOIN_OUTPUT_STREAMS, DEFAULT_OUTPUT_LOG_LEVEL, DEFAULT_PROC_EXIT_WAIT, DEFAULT_UNCOMPRESSED_WARC_FILE_EXTENSION, errorLogLevel, excludeStatusPattern, outputLogLevel, pcManager, PREFIX, procExitWait, START_URL_KEY, unsupportedParams, URL_STEMS_KEY, warcFileFilter -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected booleandidCrawlSucceed(int exitCode) doubledoublelongdoubledoublevoidupdateCrawlerConfig(CrawlerConfig crawlerConfig) set the configuration parameters for this crawlerMethods inherited from class org.lockss.laaws.crawler.impl.pluggable.CmdLineCrawler
deleteAllCrawls, disable, getCmdLineBuilder, getCompressedWarcExtension, getConfig, getCrawl, getCrawlerConfig, getCrawlerId, getErrorLogLevel, getOutputLogLevel, getPluggableCrawlManager, getProcExitWait, getUncompressedWarcExtension, getUnsupportedParams, getWarcFileFilter, initCrawlScheduler, isCrawlerEnabled, isElgibleForCrawl, isJoinOutputStreams, requestCrawl, setCmdLineBuilder, setConfig, setCrawlManager, setNamespace, setPluggableCrawlManager, setV2Repo, shutdown, shutdownWithWait, stopCrawl, storeInRepository, updateAuConfig, useCompressWarc
-
Field Details
-
ATTR_SUCCESS_CODE
The constant ATTR_SUCCESS_CODE.- See Also:
-
ATTR_OUTPUT_LEVEL
The constant ATTR_OUTPUT_LEVEL.- See Also:
-
-
Constructor Details
-
WgetCmdLineCrawler
public WgetCmdLineCrawler()
-
-
Method Details
-
updateCrawlerConfig
Description copied from interface:PluggableCrawlerset the configuration parameters for this crawler- Specified by:
updateCrawlerConfigin interfacePluggableCrawler- Overrides:
updateCrawlerConfigin classCmdLineCrawler- Parameters:
crawlerConfig- the configuration parameters to use
-
getMaxRetries
public long getMaxRetries() -
getRetryDelay
public double getRetryDelay() -
getConnectTimeout
public double getConnectTimeout() -
getReadTimeout
public double getReadTimeout() -
getFetchDelay
public double getFetchDelay() -
didCrawlSucceed
protected boolean didCrawlSucceed(int exitCode) - Overrides:
didCrawlSucceedin classCmdLineCrawler
-
getConfigOptions
-