Package org.archive.wayback.surt
Class SURTTokenizer
java.lang.Object
org.archive.wayback.surt.SURTTokenizer
provides iterative Url reduction for prefix matching to find ever coarser
grained URL-specific configuration. Assumes that a prefix binary search is
being attempted for each returned value. First value is the entire SURT
url String, with TAB appended. Second removes CGI ARGs. Then each subsequent
path segment ('/' separated) is removed. Then the login:password, if present
is removed. Then the port, if not :80 or omitted on the initial URL. Then
each subsequent authority segment(. separated) is removed.
the nextSearch() method will return null, finally, when no broader searches
can be attempted on the URL.
- Version:
- $Date$, $Revision$
- Author:
- brad
-
Constructor Summary
Constructors -
Method Summary
-
Constructor Details
-
SURTTokenizer
constructor- Parameters:
url- String URL- Throws:
org.apache.commons.httpclient.URIException
-
-
Method Details
-
nextSearch
update internal state and return the next smaller search string for the url- Returns:
- string to lookup for prefix match for relevant information.
-
exactKey
- Parameters:
url-- Returns:
- String SURT which will match exactly argument url
- Throws:
org.apache.commons.httpclient.URIException
-
prefixKey
- Parameters:
url-- Returns:
- String SURT which will match urls prefixed with the argument url
- Throws:
org.apache.commons.httpclient.URIException
-