A B C D E F G I L M N O P Q R S T U W Y 

A

acronym - Static variable in class eus.ixa.ixa.pipe.tok.NonPeriodBreaker
General acronyms.
AlphaAposAlpha - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
Alphabetic apostrophe and alphabetic.
alphaAposNonAlpha - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
Alphabetic apostrophe and non alpha.
alphabetic - Static variable in class eus.ixa.ixa.pipe.tok.NonPeriodBreaker
Any alphabetic character.
alphaNumParaLowerNum - Static variable in class eus.ixa.ixa.pipe.seg.RuleBasedSegmenter
Alphanumeric, maybe a space, paragraph mark, maybe a space, and lowercase letter or digit.
Annotate - Class in eus.ixa.ixa.pipe.tok
This class provides the annotation functions to output the tokenized text into: A list of WF elements inside a NAF document (DEFAULT) As running tokenized and segmented text CoNLL format, namely, one token per line and two newlines for each sentence.
Annotate(BufferedReader, Properties) - Constructor for class eus.ixa.ixa.pipe.tok.Annotate
 
annotate(InputStream, OutputStream) - Method in class eus.ixa.ixa.pipe.tok.CLI
 
apostrophe - Static variable in class eus.ixa.ixa.pipe.tok.Normalizer
 
asciiHex - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
Non printable control characters.

B

beginLink - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
Re-tokenize beginning of link.
buildText(String) - Static method in class eus.ixa.ixa.pipe.seg.RuleBasedSegmenter
 

C

CLI - Class in eus.ixa.ixa.pipe.tok
ixa-pipe-tok provides several configuration parameters: lang: choose language to create the lang attribute in KAF header.
CLI() - Constructor for class eus.ixa.ixa.pipe.tok.CLI
 
commaNoDigit - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
Comma and no digit.
conventionalPara - Static variable in class eus.ixa.ixa.pipe.seg.RuleBasedSegmenter
End of sentence marker, one or more paragraph marks, maybe some starting punctuation, uppercase.
convertNonCanonicalStrings(List<Token>, String) - Static method in class eus.ixa.ixa.pipe.tok.Normalizer
Converts non-unicode and other strings into their unicode counterparts.
createDisjunctRegexFromList(List<String>) - Static method in class eus.ixa.ixa.pipe.tok.StringUtils
 
createToken(String, int, int) - Method in class eus.ixa.ixa.pipe.tok.TokenFactory
Constructs a Token as a String with corresponding offsets and length from which to calculate start and end position of the Token.

D

DEBUG - Static variable in class eus.ixa.ixa.pipe.tok.TokenizerEvaluator
 
deSegmentAcronyms(String) - Static method in class eus.ixa.ixa.pipe.tok.NonPeriodBreaker
Removes wrongly introduce SECTION marks in acronyms.
detokenParagraphs - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
De-tokenize paragraph marks.
digitCommaNoDigit - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
Digit comma and non digit.
dotmultiDot - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
Multi dot pattern and extra dot.
dotmultiDotAny - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
Dot multi pattern followed by anything.
doubleAsciiQuote - Static variable in class eus.ixa.ixa.pipe.tok.Normalizer
 
doubleAsciiQuoteAlphaNumeric - Static variable in class eus.ixa.ixa.pipe.tok.Normalizer
 
doubleBar - Static variable in class eus.ixa.ixa.pipe.tok.StringUtils
Pattern to remove double bars from disjunct regex.
doubleLineBreak - Static variable in class eus.ixa.ixa.pipe.seg.RuleBasedSegmenter
Two lines.
doubleSpaces - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
 

E

ellipsis - Static variable in class eus.ixa.ixa.pipe.tok.Normalizer
 
endInsideQuotesPara - Static variable in class eus.ixa.ixa.pipe.seg.RuleBasedSegmenter
End of sentence marker, maybe a space, punctuation (quotes, brackets), space, maybe some more punctuation, maybe some space and uppercase.
endInsideQuotesSpace - Static variable in class eus.ixa.ixa.pipe.seg.RuleBasedSegmenter
End of sentence marker, maybe a space, punctuation (quotes, brackets), space, maybe some more punctuation, maybe some space and uppercase.
endLink - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
 
endOfSentenceApos - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
Tokenize apostrophes occurring at the end of the string.
endPunctLinkPara - Static variable in class eus.ixa.ixa.pipe.seg.RuleBasedSegmenter
End of sentence markers, paragraph mark and link.
endPunctLinkSpace - Static variable in class eus.ixa.ixa.pipe.seg.RuleBasedSegmenter
End of sentence punctuation, maybe spaces and link.
englishApos - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
Split English apostrophes.
eus.ixa.ixa.pipe.seg - package eus.ixa.ixa.pipe.seg
 
eus.ixa.ixa.pipe.tok - package eus.ixa.ixa.pipe.tok
 
evaluate(List<Token>, List<Token>) - Method in class eus.ixa.ixa.pipe.tok.TokenizerEvaluator
Evaluates the given reference Token list wrt to the predicted Token list.

F

FINAL_PUNCT - Static variable in class eus.ixa.ixa.pipe.seg.RuleBasedSegmenter
Final punctuation in unicode.
FMeasure - Class in eus.ixa.ixa.pipe.tok
Evaluation results are the arithmetic mean of the precision scores calculated for each reference sample and the arithmetic mean of the recall scores calculated for each reference sample.
FMeasure() - Constructor for class eus.ixa.ixa.pipe.tok.FMeasure
 

G

getFMeasure() - Method in class eus.ixa.ixa.pipe.tok.FMeasure
Retrieves the f-measure score.
getFMeasure() - Method in class eus.ixa.ixa.pipe.tok.TokenizerEvaluator
 
getPrecisionScore() - Method in class eus.ixa.ixa.pipe.tok.FMeasure
Retrieves the arithmetic mean of the precision scores calculated for each evaluated sample.
getRecallScore() - Method in class eus.ixa.ixa.pipe.tok.FMeasure
Retrieves the arithmetic mean of the recall score calculated for each evaluated sample.
getStringFromTokens(String[]) - Static method in class eus.ixa.ixa.pipe.tok.StringUtils
Get an array of Strings and convert it into a string separated by spaces.
getStringFromTokens(List<Token>) - Static method in class eus.ixa.ixa.pipe.tok.StringUtils
Get a List of Strings and convert it into a string separated by spaces.
getTokenValue() - Method in class eus.ixa.ixa.pipe.tok.Token
 

I

INITIAL_PUNCT - Static variable in class eus.ixa.ixa.pipe.seg.RuleBasedSegmenter
Initial punctuation in unicode.
invertSingleAsciiQuote - Static variable in class eus.ixa.ixa.pipe.tok.Normalizer
 

L

leftDoubleQuote - Static variable in class eus.ixa.ixa.pipe.tok.Normalizer
 
leftSingleQuote - Static variable in class eus.ixa.ixa.pipe.tok.Normalizer
 
LINE_BREAK - Static variable in class eus.ixa.ixa.pipe.seg.RuleBasedSegmenter
The constant representing every line break in the original input text.
lineBreak - Static variable in class eus.ixa.ixa.pipe.seg.RuleBasedSegmenter
Line break pattern.
longDash - Static variable in class eus.ixa.ixa.pipe.tok.Normalizer
 

M

main(String[]) - Static method in class eus.ixa.ixa.pipe.tok.CLI
 
mergeInto(FMeasure) - Method in class eus.ixa.ixa.pipe.tok.FMeasure
 
multiDots - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
Multidots.
multiDotsParaStarters - Static variable in class eus.ixa.ixa.pipe.seg.RuleBasedSegmenter
Multi-dots, paragraph mark, sentence starters and uppercase.
multiDotsSpaceStarters - Static variable in class eus.ixa.ixa.pipe.seg.RuleBasedSegmenter
Multi-dots, space, sentence starters and uppercase.

N

noAlphaAposNoAlpha - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
No alphabetic apostrophe and no alphabetic.
noAlphaDigitAposAlpha - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
Non alpha, digit, apostrophe and alpha.
noDigitComma - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
No digit comma.
noDigitCommaDigit - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
Non digit comma and digit.
NON_BREAKER_DIGITS - Static variable in class eus.ixa.ixa.pipe.tok.NonPeriodBreaker
Do not split dot after these words if followed by number.
nonBreakerDigits - Static variable in class eus.ixa.ixa.pipe.tok.NonPeriodBreaker
Re-attach segmented dots after non breaker digits.
NonPeriodBreaker - Class in eus.ixa.ixa.pipe.tok
This class implements exceptions for periods as sentence breakers and tokens.
NonPeriodBreaker(Properties) - Constructor for class eus.ixa.ixa.pipe.tok.NonPeriodBreaker
This constructor reads some non breaking prefixes files in resources to create exceptions of segmentation and tokenization.
noPeriodSpaceEnd - Static variable in class eus.ixa.ixa.pipe.seg.RuleBasedSegmenter
Non-period end of sentence markers (?!), one or more spaces, sentence starters.
normalizeDoubleQuotes(List<Token>, String) - Static method in class eus.ixa.ixa.pipe.tok.Normalizer
Normalizes double and ambiguous quotes according to language and corpus.
normalizeQuotes(List<Token>, String) - Static method in class eus.ixa.ixa.pipe.tok.Normalizer
Normalizes non-ambiguous quotes according to language and corpus.
Normalizer - Class in eus.ixa.ixa.pipe.tok
Normalizer class for converting punctuation mostly following various corpora conventions such as Penn TreeBank, Ancora, Tutpenn, Tiger and CTAG.
normalizeTokens(List<List<Token>>, String) - Static method in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
Set as value of the token its normalized counterpart.
numbers - Static variable in class eus.ixa.ixa.pipe.tok.NonPeriodBreaker
Do not segment numbers like 11.1.

O

oneFourth - Static variable in class eus.ixa.ixa.pipe.tok.Normalizer
 
oneHalf - Static variable in class eus.ixa.ixa.pipe.tok.Normalizer
 
oneThird - Static variable in class eus.ixa.ixa.pipe.tok.Normalizer
 

P

PARAGRAPH - Static variable in class eus.ixa.ixa.pipe.seg.RuleBasedSegmenter
Constant representing a paragraph (a doubleLine) in the original input text.
paragraph - Static variable in class eus.ixa.ixa.pipe.seg.RuleBasedSegmenter
Paragraph pattern.
parseCLI(String[]) - Method in class eus.ixa.ixa.pipe.tok.CLI
Parse the command interface parameters with the argParser.
precision(List<List<String>>, List<List<String>>) - Static method in class eus.ixa.ixa.pipe.tok.FMeasure
Calculates the precision score for the given reference and predicted spans.
punctSpaceUpper - Static variable in class eus.ixa.ixa.pipe.seg.RuleBasedSegmenter
End of sentence marker, sentence starter punctuation and upper case.

Q

qexc - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
Question and exclamation marks (do not separate if multiple).

R

readText(BufferedReader) - Static method in class eus.ixa.ixa.pipe.tok.StringUtils
Reads standard input text from the BufferedReader and adds a line break mark for every line.
recall(List<List<String>>, List<List<String>>) - Static method in class eus.ixa.ixa.pipe.tok.FMeasure
Calculates the recall score for the given reference and predicted spans.
replacement - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
 
rightDoubleQuote - Static variable in class eus.ixa.ixa.pipe.tok.Normalizer
 
rightSingleQuote - Static variable in class eus.ixa.ixa.pipe.tok.Normalizer
 
RuleBasedSegmenter - Class in eus.ixa.ixa.pipe.seg
Rule based SentenceSegmenter.
RuleBasedSegmenter(String, Properties) - Constructor for class eus.ixa.ixa.pipe.seg.RuleBasedSegmenter
Construct a RuleBasedSegmenter from a BufferedReader and the properties.
RuleBasedTokenizer - Class in eus.ixa.ixa.pipe.tok
This class provides a multilingual rule based tokenizer.
RuleBasedTokenizer(String, Properties) - Constructor for class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
Construct a rule based tokenizer.

S

SECTION - Static variable in class eus.ixa.ixa.pipe.tok.NonPeriodBreaker
 
section - Static variable in class eus.ixa.ixa.pipe.tok.NonPeriodBreaker
 
segmentAll - Static variable in class eus.ixa.ixa.pipe.tok.NonPeriodBreaker
Segment everything not segmented in the SentenceSegmenter.
SegmenterNonBreaker(String) - Method in class eus.ixa.ixa.pipe.tok.NonPeriodBreaker
This function implements exceptions for periods as sentence breakers.
segmentSentence() - Method in class eus.ixa.ixa.pipe.seg.RuleBasedSegmenter
 
segmentSentence() - Method in interface eus.ixa.ixa.pipe.seg.SentenceSegmenter
 
SentenceSegmenter - Interface in eus.ixa.ixa.pipe.seg
 
setStartOffset(int) - Method in class eus.ixa.ixa.pipe.tok.Token
Set the token offset.
setTokenLength(int) - Method in class eus.ixa.ixa.pipe.tok.Token
Set the length of the token.
setTokenValue(String) - Method in class eus.ixa.ixa.pipe.tok.Token
Set the value for the token.
singleAsciiQuote - Static variable in class eus.ixa.ixa.pipe.tok.Normalizer
 
spaceDashSpace - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
Dashes or slashes preceded or followed by space.
specials - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
Tokenize everything but these characters.
spuriousParagraph - Static variable in class eus.ixa.ixa.pipe.seg.RuleBasedSegmenter
If paragraph mark, maybe some space and lowercase or punctuation (not start of sentence markers) then it is a spurious paragraph.
startDigit - Static variable in class eus.ixa.ixa.pipe.tok.NonPeriodBreaker
Starts with a digit.
startLower - Static variable in class eus.ixa.ixa.pipe.tok.NonPeriodBreaker
Starts with a lowercase.
startOffset() - Method in class eus.ixa.ixa.pipe.tok.Token
Get the token starting offset.
startPunct - Static variable in class eus.ixa.ixa.pipe.tok.NonPeriodBreaker
Starts with punctuation that is not beginning of sentence marker.
StringUtils - Class in eus.ixa.ixa.pipe.tok
Several string utils.

T

THREE_DOTS - Static variable in class eus.ixa.ixa.pipe.tok.Normalizer
 
threeQuarters - Static variable in class eus.ixa.ixa.pipe.tok.Normalizer
 
TLD - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
Top level domains for stopping the wrongLink pattern below.
TO_ASCII_SINGLE_QUOTE - Static variable in class eus.ixa.ixa.pipe.tok.Normalizer
 
toAsciiDoubleQuote - Static variable in class eus.ixa.ixa.pipe.tok.Normalizer
 
toAsciiSingleQuote - Static variable in class eus.ixa.ixa.pipe.tok.Normalizer
 
Token - Class in eus.ixa.ixa.pipe.tok
A Token object contains a single String, a startOffset and the length of the String.
Token() - Constructor for class eus.ixa.ixa.pipe.tok.Token
Create a new token with a null content.
Token(String) - Constructor for class eus.ixa.ixa.pipe.tok.Token
 
Token(String, int, int) - Constructor for class eus.ixa.ixa.pipe.tok.Token
Creates a new Token with the given content.
TokenFactory - Class in eus.ixa.ixa.pipe.tok
 
TokenFactory() - Constructor for class eus.ixa.ixa.pipe.tok.TokenFactory
Constructor for a new token factory which will add in the word and the begin/end position annotations.
TokenFactory(boolean) - Constructor for class eus.ixa.ixa.pipe.tok.TokenFactory
Constructor that allows one to choose if index annotation indicating begin/end position will be included in the token.
tokenize(String[]) - Method in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
 
tokenize(String[]) - Method in interface eus.ixa.ixa.pipe.tok.Tokenizer
 
Tokenizer - Interface in eus.ixa.ixa.pipe.tok
 
TokenizerEvaluator - Class in eus.ixa.ixa.pipe.tok
The TokenizerEvaluator measures the performance of a tokenizer wrt to some reference Tokens.
TokenizerEvaluator() - Constructor for class eus.ixa.ixa.pipe.tok.TokenizerEvaluator
 
TokenizerNonBreaker(String) - Method in class eus.ixa.ixa.pipe.tok.NonPeriodBreaker
It decides when periods do not need to be tokenized.
tokenizeToCoNLL() - Method in class eus.ixa.ixa.pipe.tok.Annotate
Tokenizes and segments input text.
tokenizeToCoNLLOffsets() - Method in class eus.ixa.ixa.pipe.tok.Annotate
Tokenizes and segments input text.
tokenizeToKAF(KAFDocument) - Method in class eus.ixa.ixa.pipe.tok.Annotate
 
tokenizeToText() - Method in class eus.ixa.ixa.pipe.tok.Annotate
Tokenize and Segment input text.
tokenLength() - Method in class eus.ixa.ixa.pipe.tok.Token
Get the token length.
tokensToKAF(Reader, KAFDocument) - Static method in class eus.ixa.ixa.pipe.tok.Annotate
 
toString() - Method in class eus.ixa.ixa.pipe.tok.FMeasure
Creates a human read-able String representation.
toString() - Method in class eus.ixa.ixa.pipe.tok.Token
 
twoThirds - Static variable in class eus.ixa.ixa.pipe.tok.Normalizer
 

U

updateScores(List<List<String>>, List<List<String>>) - Method in class eus.ixa.ixa.pipe.tok.FMeasure
 

W

wordDot - Static variable in class eus.ixa.ixa.pipe.tok.NonPeriodBreaker
Any non white space followed by a period.
wrongLink - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
Detect wrongly tokenized links.

Y

yearApos - Static variable in class eus.ixa.ixa.pipe.tok.RuleBasedTokenizer
Digit apostrophe and s (for 1990's).
A B C D E F G I L M N O P Q R S T U W Y 

Copyright © 2015 IXA pipes. All rights reserved.