Class SitemapsOrgGenerator


  • public class SitemapsOrgGenerator
    extends AbstractGenerator
    Class for generating Sitemaps to improve search engine coverage of the DSpace site and limit the server load caused by crawlers.
    Author:
    Robert Tansley, Stuart Lewis
    • Field Detail

      • indexURLStem

        protected String indexURLStem
        Stem of URLs sitemaps will eventually appear at
      • indexURLTail

        protected String indexURLTail
        Tail of URLs sitemaps will eventually appear at
      • w3dtfFormat

        protected DateFormat w3dtfFormat
        The correct date format
    • Constructor Detail

      • SitemapsOrgGenerator

        public SitemapsOrgGenerator​(File outputDirIn,
                                    String urlStem,
                                    String urlTail)
        Construct a sitemaps.org protocol sitemap generator, writing files to the given directory, and with the sitemaps eventually exposed at starting with the given URL stem and tail.
        Parameters:
        outputDirIn - Directory to write sitemap files to
        urlStem - start of URL that sitemap files will appear at, e.g. http://dspace.myu.edu/sitemap?sitemap=
        urlTail - end of URL that sitemap files will appear at, e.g. .html or null
    • Method Detail

      • getFilename

        public String getFilename​(int number)
        Description copied from class: AbstractGenerator
        Return the filename a sitemap at the given index should be stored at.
        Specified by:
        getFilename in class AbstractGenerator
        Parameters:
        number - index of the sitemap file (zero is first).
        Returns:
        the filename to write the sitemap to.
      • getMaxSize

        public int getMaxSize()
        Description copied from class: AbstractGenerator
        Return the maximum size in bytes that an individual sitemap file should be.
        Specified by:
        getMaxSize in class AbstractGenerator
        Returns:
        the size in bytes.
      • getMaxURLs

        public int getMaxURLs()
        Description copied from class: AbstractGenerator
        Return the maximum number of URLs that an individual sitemap file should contain.
        Specified by:
        getMaxURLs in class AbstractGenerator
        Returns:
        the maximum number of URLs.
      • getURLText

        public String getURLText​(String url,
                                 Date lastMod)
        Description copied from class: AbstractGenerator
        Return marked-up text to be included in a sitemap about a given URL.
        Specified by:
        getURLText in class AbstractGenerator
        Parameters:
        url - URL to add information about
        lastMod - date URL was last modified, or null if unknown or not applicable
        Returns:
        the mark-up to include
      • useCompression

        public boolean useCompression()
        Description copied from class: AbstractGenerator
        Return whether the written sitemap files and index should be GZIP-compressed.
        Specified by:
        useCompression in class AbstractGenerator
        Returns:
        true if GZIP compression should be used, false otherwise.
      • writeIndex

        public void writeIndex​(PrintStream output,
                               int sitemapCount)
                        throws IOException
        Description copied from class: AbstractGenerator
        Write the index file.
        Specified by:
        writeIndex in class AbstractGenerator
        Parameters:
        output - stream to write the index to
        sitemapCount - number of sitemaps that were generated
        Throws:
        IOException - if IO error if an IO error occurs