org.ow2.weblab.services.duplicates
Class DuplicatesDetectorService

java.lang.Object
  extended by org.ow2.weblab.services.duplicates.DuplicatesDetectorService
All Implemented Interfaces:
org.weblab_project.services.analyser.Analyser

public class DuplicatesDetectorService
extends java.lang.Object
implements org.weblab_project.services.analyser.Analyser


Field Summary
protected static java.lang.String CONFIG_FILE
          Config file properties
protected static java.util.Map<java.lang.String,java.lang.String> props
           
protected static java.lang.String SIMILARITY_LIMIT_PROPERTY
           
protected  javax.xml.ws.WebServiceContext wsContext
           
 
Constructor Summary
DuplicatesDetectorService()
           
 
Method Summary
 javax.xml.ws.WebServiceContext getWsContext()
           
 org.weblab_project.services.analyser.types.ProcessReturn process(org.weblab_project.services.analyser.types.ProcessArgs args)
          Process method is composed by 2 steps.
 void setWsContext(javax.xml.ws.WebServiceContext wsContext)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

wsContext

protected javax.xml.ws.WebServiceContext wsContext

CONFIG_FILE

protected static final java.lang.String CONFIG_FILE
Config file properties

See Also:
Constant Field Values

SIMILARITY_LIMIT_PROPERTY

protected static final java.lang.String SIMILARITY_LIMIT_PROPERTY
See Also:
Constant Field Values

props

protected static java.util.Map<java.lang.String,java.lang.String> props
Constructor Detail

DuplicatesDetectorService

public DuplicatesDetectorService()
Method Detail

process

public org.weblab_project.services.analyser.types.ProcessReturn process(org.weblab_project.services.analyser.types.ProcessArgs args)
                                                                 throws org.weblab_project.services.analyser.ProcessException
Process method is composed by 2 steps. First, extract source property values from resource and search them into duplicate index. If results are found, compare their texts with Levenshtein distance to in parameter resource text. If no resources found with source property, try with "more like this" Solr query and also compare texts with Leveinshtein distance.
The similarity limit, which is configurable into "duplicates-detector.config" file, is used to determine if a document is a duplicate or not and is expressed as a percentage. This percentage represents the number of characters unmodified against the total number of characters.

Specified by:
process in interface org.weblab_project.services.analyser.Analyser
Throws:
org.weblab_project.services.analyser.ProcessException

getWsContext

public javax.xml.ws.WebServiceContext getWsContext()
Returns:
the wsContext

setWsContext

public void setWsContext(javax.xml.ws.WebServiceContext wsContext)
Parameters:
wsContext - the wsContext to set


Copyright © 2004-2010. All Rights Reserved.