org.ow2.weblab.service.language
Class LanguageExtraction

java.lang.Object
  extended by org.ow2.weblab.service.language.LanguageExtraction
All Implemented Interfaces:
org.weblab_project.services.analyser.Analyser

public class LanguageExtraction
extends java.lang.Object
implements org.weblab_project.services.analyser.Analyser

This class is a WebLab Web service for identifying the language of a Text.
It's a wrapper of the NGramJ project: "http://ngramj.sourceforge.net/". It uses the CNGram system that can computes character string instead of raw text files.
This algorithm return for each input text a score associated to every language profile previously learned (.ngp files). The score is a double between 0 and 1. 1 meaning that this text is written in this language for sure. 0 on the opposite means that this text is not written in this language. The sum of score equals 1.
Our wrapper annotate every Text section of a ComposedUnit in input (or the Text if the input is a Text). It fails if the input is something else. On each Text it uses CGram to determine which language profile are the best candidate to be annotated (using DC:language property). It can be configured using a property file named ngram.properties. In this file you can handle 6 properties.

Those 6 properties are optional. Default values are:

Author:
EADS IPCC Team
Date:
2009-11-05

Constructor Summary
LanguageExtraction()
           
 
Method Summary
 void init()
          Read the property file to get fields values.
 org.weblab_project.services.analyser.types.ProcessReturn process(org.weblab_project.services.analyser.types.ProcessArgs processArgs)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LanguageExtraction

public LanguageExtraction()
Method Detail

init

@PostConstruct
public void init()
          throws LanguageExtractionException
Read the property file to get fields values.

Throws:
LanguageExtractionException

process

public org.weblab_project.services.analyser.types.ProcessReturn process(org.weblab_project.services.analyser.types.ProcessArgs processArgs)
                                                                 throws org.weblab_project.services.analyser.ProcessException
Specified by:
process in interface org.weblab_project.services.analyser.Analyser
Throws:
org.weblab_project.services.analyser.ProcessException


Copyright © 2004-2010. All Rights Reserved.