org.ow2.weblab.service.language
Class LanguageExtraction

java.lang.Object
  extended by org.ow2.weblab.service.language.LanguageExtraction
All Implemented Interfaces:
org.ow2.weblab.core.services.Analyser

public class LanguageExtraction
extends java.lang.Object
implements org.ow2.weblab.core.services.Analyser

This class is a WebLab Web service for identifying the language of a Text.
It's a wrapper of the NGramJ project: "http://ngramj.sourceforge.net/". It uses the CNGram system that can computes character string instead of raw text files.
This algorithm return for each input text a score associated to every language profile previously learned (.ngp files). The score is a double between 0 and 1. 1 meaning that this text is written in this language for sure. 0 on the opposite means that this text is not written in this language. The sum of score equals 1.
Our wrapper annotate every Text section of a ComposedUnit in input (or the Text if the input is a Text). It fails if the input is something else. On each Text it uses CGram to determine which language profile are the best candidate to be annotated (using DC:language property). It can be configured using a property file named ngram.properties. In this file you can handle 7 properties.

Those 7 properties are optional. Default values are:

Author:
EADS IPCC Team
Date:
2009-11-05

Constructor Summary
LanguageExtraction()
           
 
Method Summary
 void init()
          Read the property file to get fields values.
 org.ow2.weblab.core.services.analyser.ProcessReturn process(org.ow2.weblab.core.services.analyser.ProcessArgs processArgs)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LanguageExtraction

public LanguageExtraction()
Method Detail

init

@PostConstruct
public void init()
          throws LanguageExtractionException
Read the property file to get fields values.

Throws:
LanguageExtractionException

process

public org.ow2.weblab.core.services.analyser.ProcessReturn process(org.ow2.weblab.core.services.analyser.ProcessArgs processArgs)
                                                            throws org.ow2.weblab.core.services.AccessDeniedException,
                                                                   org.ow2.weblab.core.services.ContentNotAvailableException,
                                                                   org.ow2.weblab.core.services.InsufficientResourcesException,
                                                                   org.ow2.weblab.core.services.InvalidParameterException,
                                                                   org.ow2.weblab.core.services.ServiceNotConfiguredException,
                                                                   org.ow2.weblab.core.services.UnexpectedException,
                                                                   org.ow2.weblab.core.services.UnsupportedRequestException
Specified by:
process in interface org.ow2.weblab.core.services.Analyser
Throws:
org.ow2.weblab.core.services.AccessDeniedException
org.ow2.weblab.core.services.ContentNotAvailableException
org.ow2.weblab.core.services.InsufficientResourcesException
org.ow2.weblab.core.services.InvalidParameterException
org.ow2.weblab.core.services.ServiceNotConfiguredException
org.ow2.weblab.core.services.UnexpectedException
org.ow2.weblab.core.services.UnsupportedRequestException


Copyright © 2004-2012. All Rights Reserved.