morfologik.fsa
Class FSA5

java.lang.Object
  extended by morfologik.fsa.FSA
      extended by morfologik.fsa.FSA5
All Implemented Interfaces:
Iterable<ByteBuffer>

public final class FSA5
extends FSA

FSA binary format implementation for version 5.

Version 5 indicates the dictionary was built with these flags: FSAFlags.FLEXIBLE, FSAFlags.STOPBIT and FSAFlags.NEXTBIT. The internal representation of the FSA must therefore follow this description (please note this format describes only a single transition (arc), not the entire dictionary file).

 ---- this node header present only if automaton was compiled with NUMBERS option.
 Byte
        +-+-+-+-+-+-+-+-+\
      0 | | | | | | | | | \  LSB
        +-+-+-+-+-+-+-+-+  +
      1 | | | | | | | | |  |      number of strings recognized
        +-+-+-+-+-+-+-+-+  +----- by the automaton starting
        : : : : : : : : :  |      from this node.
        +-+-+-+-+-+-+-+-+  +
  ctl-1 | | | | | | | | | /  MSB
        +-+-+-+-+-+-+-+-+/
        
 ---- remaining part of the node
 
 Byte
       +-+-+-+-+-+-+-+-+\
     0 | | | | | | | | | +------ label
       +-+-+-+-+-+-+-+-+/
 
                  +------------- node pointed to is next
                  | +----------- the last arc of the node
                  | | +--------- the arc is final
                  | | |
             +-----------+
             |    | | |  |
         ___+___  | | |  |
        /       \ | | |  |
       MSB           LSB |
        7 6 5 4 3 2 1 0  |
       +-+-+-+-+-+-+-+-+ |
     1 | | | | | | | | | \ \
       +-+-+-+-+-+-+-+-+  \ \  LSB
       +-+-+-+-+-+-+-+-+     +
     2 | | | | | | | | |     |
       +-+-+-+-+-+-+-+-+     |
     3 | | | | | | | | |     +----- target node address (in bytes)
       +-+-+-+-+-+-+-+-+     |      (not present except for the byte
       : : : : : : : : :     |       with flags if the node pointed to
       +-+-+-+-+-+-+-+-+     +       is next)
   gtl | | | | | | | | |    /  MSB
       +-+-+-+-+-+-+-+-+   /
 gtl+1                           (gtl = gotoLength)
 


Field Summary
static int ADDRESS_OFFSET
          An offset in the arc structure, where the address and flags field begins.
 byte annotation
          Annotation character.
 byte[] arcs
          An array of bytes with the internal representation of the automaton.
static int BIT_FINAL_ARC
          Bit indicating that an arc corresponds to the last character of a sequence available when building the automaton.
static int BIT_LAST_ARC
          Bit indicating that an arc is the last one of the node's list and the following one belongs to another node.
static int BIT_TARGET_NEXT
          Bit indicating that the target node of this arc follows it in the compressed automaton structure (no goto field).
 byte filler
          Filler character.
 int gtl
          Number of bytes each address takes in full, expanded form (goto length).
 int nodeDataLength
          The length of the node header structure (if the automaton was compiled with NUMBERS option).
static byte VERSION
          Automaton version as in the file header.
 
Constructor Summary
FSA5(InputStream fsaStream)
          Read and wrap a binary automaton in FSA version 5.
 
Method Summary
 int getArc(int node, byte label)
          Returns the identifier of an arc leaving node and labeled with label.
 byte getArcLabel(int arc)
          Return the label associated with a given arc.
protected  int getDestinationNodeOffset(int arc)
          Returns the address of the node pointed to by this arc.
 int getEndNode(int arc)
          Return the end node pointed to by a given arc.
 int getFirstArc(int node)
          Returns the identifier of the first arc leaving node or 0 if the node has no outgoing arcs.
 Set<FSAFlags> getFlags()
          Returns a set of flags for this FSA instance.
 int getNextArc(int arc)
          Returns the identifier of the next arc after arc and leaving node.
 int getRootNode()
          Returns the start node of this automaton.
 boolean isArcFinal(int arc)
          Returns true if the destination node at the end of this arc corresponds to an input sequence created when building this automaton.
 boolean isArcLast(int arc)
          Returns true if this arc has LAST bit set.
 boolean isArcTerminal(int arc)
          Returns true if this arc does not have a terminating node (@link FSA.getEndNode(int) will throw an exception).
 boolean isNextSet(int arc)
           
 
Methods inherited from class morfologik.fsa.FSA
getInstance, getTraversalHelper, iterator
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

VERSION

public static final byte VERSION
Automaton version as in the file header.

See Also:
Constant Field Values

BIT_FINAL_ARC

public static final int BIT_FINAL_ARC
Bit indicating that an arc corresponds to the last character of a sequence available when building the automaton.

See Also:
Constant Field Values

BIT_LAST_ARC

public static final int BIT_LAST_ARC
Bit indicating that an arc is the last one of the node's list and the following one belongs to another node.

See Also:
Constant Field Values

BIT_TARGET_NEXT

public static final int BIT_TARGET_NEXT
Bit indicating that the target node of this arc follows it in the compressed automaton structure (no goto field).

See Also:
Constant Field Values

ADDRESS_OFFSET

public static final int ADDRESS_OFFSET
An offset in the arc structure, where the address and flags field begins. In version 5 of FSA automata, this value is constant (1, skip label).

See Also:
Constant Field Values

arcs

public final byte[] arcs
An array of bytes with the internal representation of the automaton. Please see the documentation of this class for more information on how this structure is organized.


nodeDataLength

public final int nodeDataLength
The length of the node header structure (if the automaton was compiled with NUMBERS option). Otherwise zero.


gtl

public final int gtl
Number of bytes each address takes in full, expanded form (goto length).


filler

public final byte filler
Filler character.


annotation

public final byte annotation
Annotation character.

Constructor Detail

FSA5

public FSA5(InputStream fsaStream)
     throws IOException
Read and wrap a binary automaton in FSA version 5.

Throws:
IOException
Method Detail

getRootNode

public int getRootNode()
Returns the start node of this automaton.

Specified by:
getRootNode in class FSA

getFirstArc

public final int getFirstArc(int node)
Returns the identifier of the first arc leaving node or 0 if the node has no outgoing arcs.

Specified by:
getFirstArc in class FSA

getNextArc

public final int getNextArc(int arc)
Returns the identifier of the next arc after arc and leaving node. Zero is returned if no more arcs are available for the node.

Specified by:
getNextArc in class FSA

getArc

public int getArc(int node,
                  byte label)
Returns the identifier of an arc leaving node and labeled with label. An identifier equal to 0 means the node has no outgoing arc labeled label.

Specified by:
getArc in class FSA

getEndNode

public int getEndNode(int arc)
Return the end node pointed to by a given arc. Terminal arcs (those that point to a terminal state) have no end node representation and throw a runtime exception.

Specified by:
getEndNode in class FSA

getArcLabel

public byte getArcLabel(int arc)
Return the label associated with a given arc.

Specified by:
getArcLabel in class FSA

isArcFinal

public boolean isArcFinal(int arc)
Returns true if the destination node at the end of this arc corresponds to an input sequence created when building this automaton.

Specified by:
isArcFinal in class FSA

isArcTerminal

public boolean isArcTerminal(int arc)
Returns true if this arc does not have a terminating node (@link FSA.getEndNode(int) will throw an exception). Implies FSA.isArcFinal(int).

Specified by:
isArcTerminal in class FSA

getFlags

public Set<FSAFlags> getFlags()
Returns a set of flags for this FSA instance.

For this automaton version, an additional FSAFlags.NUMBERS flag may be set to indicate the automaton contains extra fields for each node.

Specified by:
getFlags in class FSA

isArcLast

public boolean isArcLast(int arc)
Returns true if this arc has LAST bit set.

See Also:
BIT_LAST_ARC

isNextSet

public boolean isNextSet(int arc)
See Also:
BIT_TARGET_NEXT

getDestinationNodeOffset

protected final int getDestinationNodeOffset(int arc)
Returns the address of the node pointed to by this arc.



Copyright © 2010. All Rights Reserved.