public class NonPeriodBreaker extends Object
| Modifier and Type | Field and Description |
|---|---|
static Pattern |
acronym
General acronyms.
|
static Pattern |
alphabetic
Any alphabetic character.
|
static String |
NON_BREAKER_DIGITS
Do not split dot after these words if followed by number.
|
static Pattern |
nonBreakerDigits
Re-attach segmented dots after non breaker digits.
|
static Pattern |
numbers
Do not segment numbers like 11.1.
|
static Pattern |
section |
static String |
SECTION |
static Pattern |
segmentAll
Segment everything not segmented in the SentenceSegmenter.
|
static Pattern |
startDigit
Starts with a digit.
|
static Pattern |
startLower
Starts with a lowercase.
|
static Pattern |
startPunct
Starts with punctuation that is not beginning of sentence marker.
|
static Pattern |
wordDot
Any non white space followed by a period.
|
| Constructor and Description |
|---|
NonPeriodBreaker(Properties properties)
This constructor reads some non breaking prefixes files in resources to
create exceptions of segmentation and tokenization.
|
| Modifier and Type | Method and Description |
|---|---|
static String |
deSegmentAcronyms(String line)
Removes wrongly introduce SECTION marks in acronyms.
|
String |
SegmenterNonBreaker(String line)
This function implements exceptions for periods as sentence breakers.
|
String |
TokenizerNonBreaker(String line)
It decides when periods do not need to be tokenized.
|
public static String SECTION
public static Pattern section
public static Pattern segmentAll
public static String NON_BREAKER_DIGITS
public static Pattern nonBreakerDigits
public static Pattern acronym
public static Pattern numbers
public static Pattern wordDot
public static Pattern alphabetic
public static Pattern startLower
public static Pattern startPunct
public static Pattern startDigit
public NonPeriodBreaker(Properties properties)
properties - the optionspublic String SegmenterNonBreaker(String line)
line - the text to be processedpublic static String deSegmentAcronyms(String line)
line - the textCopyright © 2015 IXA pipes. All rights reserved.