org.ow2.weblab.services.duplicates
Class DuplicatesDetectorService
java.lang.Object
org.ow2.weblab.services.duplicates.DuplicatesDetectorService
- All Implemented Interfaces:
- org.weblab_project.services.analyser.Analyser
public class DuplicatesDetectorService
- extends java.lang.Object
- implements org.weblab_project.services.analyser.Analyser
|
Field Summary |
protected static java.lang.String |
CONFIG_FILE
Config file properties |
protected static java.util.Map<java.lang.String,java.lang.String> |
props
|
protected static java.lang.String |
SIMILARITY_LIMIT_PROPERTY
|
protected javax.xml.ws.WebServiceContext |
wsContext
|
|
Method Summary |
javax.xml.ws.WebServiceContext |
getWsContext()
|
org.weblab_project.services.analyser.types.ProcessReturn |
process(org.weblab_project.services.analyser.types.ProcessArgs args)
Process method is composed by 2 steps. |
void |
setWsContext(javax.xml.ws.WebServiceContext wsContext)
|
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
wsContext
protected javax.xml.ws.WebServiceContext wsContext
CONFIG_FILE
protected static final java.lang.String CONFIG_FILE
- Config file properties
- See Also:
- Constant Field Values
SIMILARITY_LIMIT_PROPERTY
protected static final java.lang.String SIMILARITY_LIMIT_PROPERTY
- See Also:
- Constant Field Values
props
protected static java.util.Map<java.lang.String,java.lang.String> props
DuplicatesDetectorService
public DuplicatesDetectorService()
process
public org.weblab_project.services.analyser.types.ProcessReturn process(org.weblab_project.services.analyser.types.ProcessArgs args)
throws org.weblab_project.services.analyser.ProcessException
- Process method is composed by 2 steps. First, extract source property values from resource and search them into
duplicate index. If results are found, compare their texts with Levenshtein distance to in parameter resource
text. If no resources found with source property, try with "more like this" Solr query and also compare texts
with Leveinshtein distance.
The similarity limit, which is configurable into "duplicates-detector.config" file, is used to determine if a
document is a duplicate or not and is expressed as a percentage. This percentage represents the number of
characters unmodified against the total number of characters.
- Specified by:
process in interface org.weblab_project.services.analyser.Analyser
- Throws:
org.weblab_project.services.analyser.ProcessException
getWsContext
public javax.xml.ws.WebServiceContext getWsContext()
- Returns:
- the wsContext
setWsContext
public void setWsContext(javax.xml.ws.WebServiceContext wsContext)
- Parameters:
wsContext - the wsContext to set
Copyright © 2004-2010. All Rights Reserved.