org.jasig.portlet.athletics.dao
Class ScreenScrapingAthleticsDaoImpl

java.lang.Object
  extended by org.jasig.portlet.athletics.dao.ScreenScrapingAthleticsDaoImpl
All Implemented Interfaces:
IAthleticsDao
Direct Known Subclasses:
UChicagoNewsAthleticsDaoImpl, UChicagoScoresAthleticsDaoImpl

public class ScreenScrapingAthleticsDaoImpl
extends Object
implements IAthleticsDao

ScreenScrapingAthleticsDaoImpl provides a reusable athletics DAO implementation targeted for collecting information from HTML content. Whenever possible, results should instead be retrieved from some more well-formatted web service or other high quality data source. This implementation uses OWASP AntiSamy to clean and validate external HTML pages, then uses an XSLT to transform the HTML page into the portlet's default athletics feed XML structure, at which point the data can be deseriablized.

Author:
Jen Bourey, jennifer.bourey@gmail.com

Field Summary
protected  org.apache.commons.logging.Log log
           
 
Constructor Summary
ScreenScrapingAthleticsDaoImpl()
           
 
Method Summary
protected  String getCleanedHtmlContent(String html)
          Clean and validate raw HTML, returning valid XML.
 AthleticsFeed getFeed()
          Return an athletics feed representing all current news stories and competitions for all known sports.
protected  String getHtmlContent(String url)
          Get the raw HTML content for a specified URL.
 Sport getSport(String sportKey)
          Return details, news stories, and competitions for an individual sport.
protected  Sport getSportForXml(String xml)
          Deserialize athletics feed XML into a Sport object.
 Map<String,String> getSportUrls()
          Get the mapping of URLs by sport.
protected  String getXml(String cleanHtml)
          Transform clean and valid HTML into the portlet's default athletics format XML feed.
protected  void postProcessSport(Sport sport)
          Optional post-processing method allows subclasses to add custom cleanup logic after deserialization.
 void setPolicy(org.springframework.core.io.Resource config)
          Set the AntiSamy policy file to be used to clean and validate external HTML.
 void setSportUrls(Map<String,String> urlMap)
          Set the mapping of URLs for each sport.
 void setXslt(org.springframework.core.io.Resource xslt)
          Set the XSLT to be used to transform the cleaned and validated HTML to the portlet's default XML strucutre.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

log

protected org.apache.commons.logging.Log log
Constructor Detail

ScreenScrapingAthleticsDaoImpl

public ScreenScrapingAthleticsDaoImpl()
Method Detail

setXslt

public void setXslt(org.springframework.core.io.Resource xslt)
Set the XSLT to be used to transform the cleaned and validated HTML to the portlet's default XML strucutre.

Parameters:
xslt -

setPolicy

public void setPolicy(org.springframework.core.io.Resource config)
               throws org.owasp.validator.html.PolicyException,
                      IOException
Set the AntiSamy policy file to be used to clean and validate external HTML.

Parameters:
config -
Throws:
org.owasp.validator.html.PolicyException
IOException

setSportUrls

public void setSportUrls(Map<String,String> urlMap)
Set the mapping of URLs for each sport. This implementation assumes that each sport is represented by its own HTML page.

Parameters:
urlMap -

getSportUrls

public Map<String,String> getSportUrls()
Get the mapping of URLs by sport.

Returns:

getFeed

public AthleticsFeed getFeed()
Description copied from interface: IAthleticsDao
Return an athletics feed representing all current news stories and competitions for all known sports.

Specified by:
getFeed in interface IAthleticsDao
Returns:

getSport

public Sport getSport(String sportKey)
Description copied from interface: IAthleticsDao
Return details, news stories, and competitions for an individual sport.

Specified by:
getSport in interface IAthleticsDao
Returns:

getHtmlContent

protected String getHtmlContent(String url)
                         throws org.apache.http.client.ClientProtocolException,
                                IOException
Get the raw HTML content for a specified URL.

Parameters:
url -
Returns:
Throws:
org.apache.http.client.ClientProtocolException
IOException

getCleanedHtmlContent

protected String getCleanedHtmlContent(String html)
                                throws org.owasp.validator.html.ScanException,
                                       org.owasp.validator.html.PolicyException
Clean and validate raw HTML, returning valid XML.

Parameters:
html -
Returns:
Throws:
org.owasp.validator.html.ScanException
org.owasp.validator.html.PolicyException

getXml

protected String getXml(String cleanHtml)
                 throws TransformerException,
                        IOException
Transform clean and valid HTML into the portlet's default athletics format XML feed.

Parameters:
cleanHtml -
Returns:
Throws:
TransformerException
IOException

getSportForXml

protected Sport getSportForXml(String xml)
                        throws JAXBException
Deserialize athletics feed XML into a Sport object.

Parameters:
xml -
Returns:
Throws:
JAXBException

postProcessSport

protected void postProcessSport(Sport sport)
Optional post-processing method allows subclasses to add custom cleanup logic after deserialization. The default implementation does nothing.

Parameters:
sport -


Copyright © 2011 Jasig. All Rights Reserved.