public abstract class BaseSeimiCrawler extends Object implements SeimiCrawler
| 限定符和类型 | 字段和说明 |
|---|---|
protected org.apache.http.client.CookieStore |
cookieStore |
protected String |
crawlerName |
protected org.slf4j.Logger |
logger |
protected SeimiQueue |
queue |
| 构造器和说明 |
|---|
BaseSeimiCrawler() |
| 限定符和类型 | 方法和说明 |
|---|---|
String[] |
allowRules()
用于设置允许的请求URL匹配规则
|
String[] |
denyRules()
用于设置要放弃访问的请求URL匹配规则
|
org.apache.http.client.CookieStore |
getCookieStore()
如果开启cookies通过此方法获取cookiesStore
|
String |
getCrawlerName() |
String |
getUserAgent() |
void |
handleErrorRequest(Request request)
当一个请求处理异常次数超过开发者所设置或是默认设置的最大重新处理次数时会调用该方法记录异常请求
|
String |
proxy()
可以自定义返回随机的代理
|
protected void |
push(Request request) |
String |
seimiAgentHost()
设置SeimiAgent的主机地址,如 seimi.wanghaomiao.cn or 10.10.15.211
|
int |
seimiAgentPort()
seimiAgent监听端口
|
void |
setCrawlerName(String crawlerName) |
void |
setQueue(SeimiQueue queue) |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitstart, startUrlsprotected SeimiQueue queue
protected org.apache.http.client.CookieStore cookieStore
protected org.slf4j.Logger logger
protected String crawlerName
protected void push(Request request)
public String getUserAgent()
getUserAgent 在接口中 SeimiCrawlerpublic org.apache.http.client.CookieStore getCookieStore()
SeimiCrawlergetCookieStore 在接口中 SeimiCrawlerpublic String[] allowRules()
SeimiCrawlerallowRules 在接口中 SeimiCrawlerpublic String[] denyRules()
SeimiCrawlerdenyRules 在接口中 SeimiCrawlerpublic String proxy()
SeimiCrawlerproxy 在接口中 SeimiCrawlerpublic void handleErrorRequest(Request request)
SeimiCrawlerhandleErrorRequest 在接口中 SeimiCrawlerpublic String seimiAgentHost()
SeimiCrawlerseimiAgentHost 在接口中 SeimiCrawlerpublic int seimiAgentPort()
SeimiCrawlerseimiAgentPort 在接口中 SeimiCrawlerpublic void setQueue(SeimiQueue queue)
public void setCrawlerName(String crawlerName)
public String getCrawlerName()
Copyright © 2016. All Rights Reserved.