java.lang.Object
org.miaixz.bus.extra.nlp.provider.analysis.AnalysisWord
- All Implemented Interfaces:
Serializable,NLPWord
Wrapper class for a single word (Attribute) from Lucene-analysis word segmentation. This class adapts the Lucene
Attribute (specifically CharTermAttribute and OffsetAttribute) to the common NLPWord
interface, providing a unified way to access segmented word information.- Since:
- Java 17+
- Author:
- Kimi Liu
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionAnalysisWord(org.apache.lucene.analysis.tokenattributes.CharTermAttribute word) Constructs anAnalysisWordinstance by wrapping a LuceneCharTermAttribute. -
Method Summary
Modifier and TypeMethodDescriptionintRetrieves the ending character offset of this word within the original text.intRetrieves the starting character offset of this word within the original text.getText()Retrieves the text of the word from the wrapped LuceneAttribute.toString()Returns the textual representation of this word, which is the same asgetText().
-
Constructor Details
-
AnalysisWord
public AnalysisWord(org.apache.lucene.analysis.tokenattributes.CharTermAttribute word) Constructs anAnalysisWordinstance by wrapping a LuceneCharTermAttribute.- Parameters:
word- TheCharTermAttributeobject from Lucene analysis.
-
-
Method Details
-
getText
Retrieves the text of the word from the wrapped LuceneAttribute. -
getStartOffset
public int getStartOffset()Retrieves the starting character offset of this word within the original text. This method checks if the underlying attribute is an instance ofOffsetAttributeand returns the start offset if available.- Specified by:
getStartOffsetin interfaceNLPWord- Returns:
- The starting position (inclusive) of the word, or -1 if not available.
-
getEndOffset
public int getEndOffset()Retrieves the ending character offset of this word within the original text. This method checks if the underlying attribute is an instance ofOffsetAttributeand returns the end offset if available.- Specified by:
getEndOffsetin interfaceNLPWord- Returns:
- The ending position (exclusive) of the word, or -1 if not available.
-
toString
Returns the textual representation of this word, which is the same asgetText().
-