Class SURTTokenizer

java.lang.Object
org.archive.wayback.surt.SURTTokenizer

public class SURTTokenizer extends Object
provides iterative Url reduction for prefix matching to find ever coarser grained URL-specific configuration. Assumes that a prefix binary search is being attempted for each returned value. First value is the entire SURT url String, with TAB appended. Second removes CGI ARGs. Then each subsequent path segment ('/' separated) is removed. Then the login:password, if present is removed. Then the port, if not :80 or omitted on the initial URL. Then each subsequent authority segment(. separated) is removed. the nextSearch() method will return null, finally, when no broader searches can be attempted on the URL.
Version:
$Date$, $Revision$
Author:
brad
  • Constructor Details

    • SURTTokenizer

      public SURTTokenizer(String url, boolean isSurt) throws org.apache.commons.httpclient.URIException
      constructor
      Parameters:
      url - String URL
      Throws:
      org.apache.commons.httpclient.URIException
  • Method Details

    • nextSearch

      public String nextSearch()
      update internal state and return the next smaller search string for the url
      Returns:
      string to lookup for prefix match for relevant information.
    • exactKey

      public static String exactKey(String url) throws org.apache.commons.httpclient.URIException
      Parameters:
      url -
      Returns:
      String SURT which will match exactly argument url
      Throws:
      org.apache.commons.httpclient.URIException
    • prefixKey

      public static String prefixKey(String url) throws org.apache.commons.httpclient.URIException
      Parameters:
      url -
      Returns:
      String SURT which will match urls prefixed with the argument url
      Throws:
      org.apache.commons.httpclient.URIException