Module bus.extra

Class AnalysisWord

java.lang.Object
org.miaixz.bus.extra.nlp.provider.analysis.AnalysisWord
All Implemented Interfaces:
Serializable, NLPWord

public class AnalysisWord extends Object implements NLPWord
Wrapper class for a single word (Attribute) from Lucene-analysis word segmentation. This class adapts the Lucene Attribute (specifically CharTermAttribute and OffsetAttribute) to the common NLPWord interface, providing a unified way to access segmented word information.
Since:
Java 17+
Author:
Kimi Liu
See Also:
  • Constructor Summary

    Constructors
    Constructor
    Description
    AnalysisWord(org.apache.lucene.analysis.tokenattributes.CharTermAttribute word)
    Constructs an AnalysisWord instance by wrapping a Lucene CharTermAttribute.
  • Method Summary

    Modifier and Type
    Method
    Description
    int
    Retrieves the ending character offset of this word within the original text.
    int
    Retrieves the starting character offset of this word within the original text.
    Retrieves the text of the word from the wrapped Lucene Attribute.
    Returns the textual representation of this word, which is the same as getText().

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • AnalysisWord

      public AnalysisWord(org.apache.lucene.analysis.tokenattributes.CharTermAttribute word)
      Constructs an AnalysisWord instance by wrapping a Lucene CharTermAttribute.
      Parameters:
      word - The CharTermAttribute object from Lucene analysis.
  • Method Details

    • getText

      public String getText()
      Retrieves the text of the word from the wrapped Lucene Attribute.
      Specified by:
      getText in interface NLPWord
      Returns:
      The text of the word as a String.
    • getStartOffset

      public int getStartOffset()
      Retrieves the starting character offset of this word within the original text. This method checks if the underlying attribute is an instance of OffsetAttribute and returns the start offset if available.
      Specified by:
      getStartOffset in interface NLPWord
      Returns:
      The starting position (inclusive) of the word, or -1 if not available.
    • getEndOffset

      public int getEndOffset()
      Retrieves the ending character offset of this word within the original text. This method checks if the underlying attribute is an instance of OffsetAttribute and returns the end offset if available.
      Specified by:
      getEndOffset in interface NLPWord
      Returns:
      The ending position (exclusive) of the word, or -1 if not available.
    • toString

      public String toString()
      Returns the textual representation of this word, which is the same as getText().
      Overrides:
      toString in class Object
      Returns:
      The text of the word.