public abstract class BaseSeimiCrawler extends Object implements SeimiCrawler
| Modifier and Type | Field and Description |
|---|---|
protected org.apache.http.client.CookieStore |
cookieStore |
protected String |
crawlerName |
protected org.slf4j.Logger |
logger |
protected SeimiQueue |
queue |
| Constructor and Description |
|---|
BaseSeimiCrawler() |
| Modifier and Type | Method and Description |
|---|---|
String[] |
allowRules()
用于设置允许的请求URL匹配规则
|
String[] |
denyRules()
用于设置要放弃访问的请求URL匹配规则
|
org.apache.http.client.CookieStore |
getCookieStore()
如果开启cookies通过此方法获取cookiesStore
|
String |
getCrawlerName() |
String |
getUserAgent() |
String |
proxy()
可以自定义返回随机的代理
|
protected void |
push(Request request) |
void |
setCrawlerName(String crawlerName) |
void |
setQueue(SeimiQueue queue) |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitstart, startUrlsprotected SeimiQueue queue
protected org.apache.http.client.CookieStore cookieStore
protected org.slf4j.Logger logger
protected String crawlerName
protected void push(Request request)
public String getUserAgent()
getUserAgent in interface SeimiCrawlerpublic org.apache.http.client.CookieStore getCookieStore()
SeimiCrawlergetCookieStore in interface SeimiCrawlerpublic String[] allowRules()
SeimiCrawlerallowRules in interface SeimiCrawlerpublic String[] denyRules()
SeimiCrawlerdenyRules in interface SeimiCrawlerpublic String proxy()
SeimiCrawlerproxy in interface SeimiCrawlerpublic void setQueue(SeimiQueue queue)
public void setCrawlerName(String crawlerName)
public String getCrawlerName()
Copyright © 2015. All Rights Reserved.