public interface SDocumentGraph extends SGraph
SSequentialDS, STextualDS,
SMedialDS, SToken, SSpan and SStructure and
the relations: STextualRelation, SMedialRelation,
SSpanningRelation, SDominanceRelation,
SPointingRelation and SOrderRelation which we will discuss in
the following. All these nodes and relations are contained in a graph, the
SDocumentGraph, which is the model element representing the
document-structure.SGraph.GRAPH_TRAVERSE_TYPE| Modifier and Type | Method and Description |
|---|---|
SRelation |
addNode(SNode source,
SNode target,
SALT_TYPE relationType)
Creates and returns a 'SRelation' of the type given by
sRelationType between source and target.
|
SRelation |
createRelation(SNode source,
SNode target,
SALT_TYPE relationType,
String annotations)
Creates a
SRelation object, and sets its sSource and sTarget to
the passed ones. |
SSpan |
createSpan(List<SToken> sourceTokens)
Creates a
SSpan object, adds it to the graph and returns the new
object. |
SSpan |
createSpan(SToken... sourceToken)
Creates a new
SSpan object, adds it to the graph and returns the
new object. |
SStructure |
createStructure(List<SStructuredNode> structures)
Creates a new
SStructure object, adds it to the graph and returns
the new object. |
SStructure |
createStructure(SStructuredNode... structure)
Creates a new
SStructure object, adds it to the graph and returns
the new object. |
STextualDS |
createTextualDS(String text)
Creates and adds a new
STextualDS node object and sets its text
to the passed one and adds it to the graph. |
STimeline |
createTimeline()
Creates an
STimeline object being contained in this
SDocumentGraph object. |
SToken |
createToken(DataSourceSequence sequence)
Creates a new
SToken object and adds it to the graph. |
SToken |
createToken(List<DataSourceSequence> sequences)
Creates a new
SToken object and adds it to the graph. |
SToken |
createToken(SSequentialDS sequentialDS,
Integer start,
Integer end)
Creates a new
SToken object and adds it to the graph. |
Tokenizer |
createTokenizer()
Creates a new
Tokenizer object to tokenize the set
STextualDS objects being contained in this SDocumentGraph
object. |
Set<Difference> |
findDiffs(SDocumentGraph other)
Compares the passed graph with the current one and returns if they are
isomorph or not.
|
Set<Difference> |
findDiffs(SDocumentGraph other,
DiffOptions options)
Compares the passed graph with the current one and returns if they are
isomorph or not.
|
List<SNode> |
getChildren(SNode parent,
SALT_TYPE relationType)
Returns a list of nodes in base document.
|
SDocument |
getDocument()
Returns the
SDocument object containing this graph object. |
List<SDominanceRelation> |
getDominanceRelations() |
List<SMedialDS> |
getMedialDSs() |
List<SMedialRelation> |
getMedialRelations() |
List<SNode> |
getNodes(Class<?> clazz)
Returns all nodes of the passed class.
|
List<SNode> |
getNodes(SALT_TYPE type)
Returns all nodes of the passed type.
|
List<SNode> |
getNodesBySequence(DataSourceSequence sequence)
Returns all
SNode objects which refer to the passed
DataSourceSequence object. |
List<SOrderRelation> |
getOrderRelations() |
List<DataSourceSequence> |
getOverlappedDataSourceSequence(List<SNode> nodes,
SALT_TYPE... relationTypes)
Returns the sequences as
DataSourceSequence which are overlapped
by the given SNode node. |
List<DataSourceSequence> |
getOverlappedDataSourceSequence(SNode node,
SALT_TYPE... relationTypes)
Returns the sequences as
DataSourceSequence which are overlapped
by the given SNode node. |
List<SToken> |
getOverlappedTokens(SNode overlappingNode)
Returns all tokens in the graph, which are overlapped by the passed node
and are reachable via
SALT_TYPE.STEXT_OVERLAPPING_RELATION having
at least one of the passed types. |
List<SToken> |
getOverlappedTokens(SNode overlappingNode,
SALT_TYPE... overlappingRelationTypes)
Returns all tokens in the graph, which are overlapped by the passed node
and are reachable via relations having at least one of the passed types.
|
List<SPointingRelation> |
getPointingRelations() |
List<SRelation> |
getRelations(Class<?> clazz)
Returns all relations of the passed class.
|
List<SRelation> |
getRelations(SALT_TYPE type)
Returns all relations of the passed type.
|
List<SNode> |
getRootsByRelation(SALT_TYPE... type)
|
com.google.common.collect.Multimap<String,SNode> |
getRootsByRelationType(SALT_TYPE type)
Returns all nodes, which are roots for the given relation-class respects
to the given SType of the traversed relation.
|
List<SNode> |
getSharedParent(List<SNode> children,
SALT_TYPE nodeType)
Returns a list of nodes that are the parents of every node in the given
base list.
|
List<SToken> |
getSortedTokenByText()
Returns all
SToken objects being contained in the list
getTokens() and sorts them by the
SSequentialRelation.getStart() value of SToken object. |
List<SToken> |
getSortedTokenByText(List<SToken> tokens2sort)
Returns all
SToken objects being contained in the given list and
sorts them by the SSequentialRelation.getStart() value of
SToken object. |
List<SSpanningRelation> |
getSpanningRelations() |
List<SSpan> |
getSpans() |
List<SSpan> |
getSpansBySequence(DataSourceSequence sequence)
Returns all
SSpan objects which refer to the passed
DataSourceSequence object. |
List<SStructure> |
getStructures() |
List<SStructure> |
getStructuresBySequence(DataSourceSequence sequence)
Returns all
SStructure objects which refer to the passed
DataSourceSequence object. |
String |
getText(SNode sNode)
This method returns the exact text overlapped in the
STextualDS
by the given SNode. |
List<STextualDS> |
getTextualDSs()
Returns all primary texts contained in this document structure.
|
List<STextualRelation> |
getTextualRelations()
Returns all relations which connects a token with a primary text
contained in this document structure.
|
STimeline |
getTimeline()
Returns the timeline of this document graph.
|
List<STimelineRelation> |
getTimelineRelations() |
List<SToken> |
getTokens()
Returns all tokens contained contained in this document structure.
|
List<SToken> |
getTokensBySequence(DataSourceSequence sequence)
Returns all
SToken objects which refer to the passed
DataSourceSequence object. |
SToken |
insertTokenAt(STextualDS textualDS,
Integer posInText,
String text,
Boolean insertSpace)
Inserts a token to the graph starting at position posInText and
relates them to the given
STextualDS object. |
List<SToken> |
insertTokensAt(STextualDS textualDS,
Integer posInText,
List<String> texts,
Boolean insertSpace)
Inserts n tokens (where n is the size of the given list texts)
to the graph starting at position posInText and relates them to
the given
STextualDS object. |
boolean |
isContinuousByText(List<SNode> subNodeList)
Returns true, if the given list of nodes subNodeList is
continuous respecting the overlapped text.
|
boolean |
isContinuousByText(List<SNode> subNodeList,
List<SNode> fullNodeList)
Returns true, if the given list of nodes subNodeList is
continuous respecting the overlapped text.
|
boolean |
isIsomorph(SDocumentGraph other)
Compares the passed graph with the current one and returns if they are
isomorph or not.
|
boolean |
isIsomorph(SDocumentGraph other,
DiffOptions options)
Compares the passed graph with the current one and returns if they are
isomorph or not.
|
void |
setDocument(SDocument document)
Sets the
SDocument object as a container for this graph. |
void |
setTimeline(STimeline value)
Sets the value of the '
STimeline' reference.
|
void |
sortTokenByText()
Sorts all
SToken and STextualRelation objects being
contained in the list getTokens() and
SDocumentGraph#getSTextualRelations() by the
SSequentialRelation.getStart() value of SToken and
STextualRelation object. |
List<SToken> |
tokenize()
Tokenizes all
STextualDS object being contained in this
SDocumentGraph object. |
getLayerByName, getLeafs, getNodesByName, getRelationsByName, getRoots, traverse, traverseaddLayer, addNode, addRelation, containsLayer, containsNode, containsRelation, getIndexMgr, getInRelations, getLayer, getLayers, getNode, getNodes, getOutRelations, getRelation, getRelations, getRelations, removeLayer, removeNode, removeRelation, removeRelationsgetId, getIdentifier, setId, setIdentifieraddLabel, containsLabel, getLabel, getLabel, getLabels, getLabelsByNamespace, removeAll, removeLabel, removeLabel, sizeLabelsaddAnnotation, addFeature, addMetaAnnotation, addProcessingAnnotation, createAnnotation, createAnnotations, createFeature, createFeatures, createMetaAnnotation, createMetaAnnotations, createProcessingAnnotation, createProcessingAnnotations, getAnnotation, getAnnotation, getAnnotations, getFeature, getFeature, getFeatures, getMetaAnnotation, getMetaAnnotations, getProcessingAnnotation, getProcessingAnnotations, iterator_SAnnotation, iterator_SFeature, iterator_SMetaAnnotation, iterator_SProcessingAnnotationgetName, setNamegetPathSDocument getDocument()
SDocument object containing this graph object. The
SDocument object is linked via a SFeature object having
the namespace and the name
.void setDocument(SDocument document)
SDocument object as a container for this graph. The
given SDocument object is linked via a SFeature object
having the namespace and the name
.document - the new value of the 'SDocument Graph' reference.document - the new value of the 'SDocument' reference.List<STextualDS> getTextualDSs()
List<STextualRelation> getTextualRelations()
List<SToken> getTokens()
STimeline getTimeline()
STimelineRelations. A
timeline is necessary to set tokens in correspondance when they belong to
different STextualDSs. For instance a timeline is necessary to
model dialogue data.void setTimeline(STimeline value)
STimeline' reference. value - the new value of the 'STimeline' reference.getTimeline()List<SRelation> getRelations(SALT_TYPE type)
type - type of relationsList<SRelation> getRelations(Class<?> clazz)
clazz - class of relationsList<SNode> getNodes(SALT_TYPE type)
type - type of nodesList<SNode> getNodes(Class<?> clazz)
clazz - class of nodesList<STimelineRelation> getTimelineRelations()
List<SSpanningRelation> getSpanningRelations()
List<SStructure> getStructures()
List<SDominanceRelation> getDominanceRelations()
List<SPointingRelation> getPointingRelations()
List<SMedialRelation> getMedialRelations()
List<SOrderRelation> getOrderRelations()
SRelation addNode(SNode source, SNode target, SALT_TYPE relationType)
SALT_TYPE.STEXTUAL_RELATION,
SALT_TYPE.SSPANNING_RELATION,
SALT_TYPE.SDOMINANCE_RELATION and
SALT_TYPE.SPOINTING_RELATION.source - source nodetarget - target nodeSALT_TYPE - type of the relation to be created between source and targetSTextualDS createTextualDS(String text)
STextualDS node object and sets its text
to the passed one and adds it to the graph.text - the text which shall be the primary text and be added to the
created STextualDS nodeSTextualDS node which has been createdSToken createToken(List<DataSourceSequence> sequences)
SToken object and adds it to the graph. The
SToken object will be connected with the SSequentialDS
object given in the passed DataSourceSequence object. The created
relations get the borders also given in the DataSourceSequence
object.sequences - list of sequences which shall be overlapped by the created
tokenSToken createToken(DataSourceSequence sequence)
SToken object and adds it to the graph. The
SToken object will be connected with the SSequentialDS
objects given in the passed DataSourceSequence object. The
created relations get the borders also given in the
DataSourceSequence object.sequence - the sequence which shall be overlapped by the created tokenSSpan createSpan(SToken... sourceToken)
SSpan object, adds it to the graph and returns the
new object. Further, this method creates a SSpanningRelation
object and sets its source to the created SSpan object and the
target to the given source node.sourceToken - source SToken node to which the
SSpanningRelation relation points toSSpan nodeSSpan createSpan(List<SToken> sourceTokens)
SSpan object, adds it to the graph and returns the new
object. Further, this method creates SSpanningRelation relations
and sets their source to the created SSpan object and the targets
to the given source nodes.sourceTokens - source SToken node to which the
SSpanningRelation relation points toSSpan nodeSStructure createStructure(SStructuredNode... structure)
SStructure object, adds it to the graph and returns
the new object. Further, this method creates a SDominanceRelation
object and sets its source to the created SStructure object and
the target to the given source node.structure - source SStructuredNode node to which the
SDominanceRelation relation points toSStructure nodeSStructure createStructure(List<SStructuredNode> structures)
SStructure object, adds it to the graph and returns
the new object. Further, this method creates SDominanceRelation
relations and sets their sources to the created SStructure object
and the targets to the given source nodes.structures - list of source SStructuredNode nodes to which the
SDominanceRelation relation points toSStructure nodeSTimeline createTimeline()
STimeline object being contained in this
SDocumentGraph object. The new STimeline object is filled
with points of time computed out of the STextualRelation objects
being contained by this SDocumentGraph object. For each
STextualDS object, for each STextualRelation object one
point of time would be created. If this object already contains a not
empty STimeline object the already existing one would be
returned.STimeline
object if no one exists yetList<SToken> getTokensBySequence(DataSourceSequence sequence)
SToken objects which refer to the passed
DataSourceSequence object. The passed object determines the
borders of the sequence by the values sStart and sEnd
and the type of data source by the instance sSequentialDS.List<SSpan> getSpansBySequence(DataSourceSequence sequence)
SSpan objects which refer to the passed
DataSourceSequence object. The passed object determines the
borders of the sequence by the values sStart and sEnd
and the type of data source by the instance sSequentialDS.List<SStructure> getStructuresBySequence(DataSourceSequence sequence)
SStructure objects which refer to the passed
DataSourceSequence object. The passed object determines the
borders of the sequence by the values sStart and sEnd
and the type of datasource by the instance sSequentialDS.sequence - an object determining the sequence to which the returned
SStructure objects refer to.SStructure objects which refer or overlap the
passed sequenceList<SNode> getNodesBySequence(DataSourceSequence sequence)
SNode objects which refer to the passed
DataSourceSequence object. The passed object determines the
borders of the sequence by the values sStart and sEnd
and the type of data source by the instance sSequentialDS.List<DataSourceSequence> getOverlappedDataSourceSequence(SNode node, SALT_TYPE... relationTypes)
DataSourceSequence which are overlapped
by the given SNode node. Overlapped means, that starting from the
given SNode node the SSequentialDS can reached by
traversing relations of one of the types contained in the given list of
SALT_TYPE.node - node to start fromrelationTypes - a list of relation types, which are traversedDataSourceSequence objectsList<DataSourceSequence> getOverlappedDataSourceSequence(List<SNode> nodes, SALT_TYPE... relationTypes)
DataSourceSequence which are overlapped
by the given SNode node. Overlapped means, that starting from the
given SNode node the SSequentialDS can reached by
traversing relations of one of the types contained in the given list of
SALT_TYPE.nodes - a list of nodes to start fromrelationTypes - a list of relation types, which are traversedDataSourceSequence objectsboolean isContinuousByText(List<SNode> subNodeList, List<SNode> fullNodeList)
subNodeList - list to check against fullSNodeListfullNodeList - list which contains all nodes in correct orderboolean isContinuousByText(List<SNode> subNodeList)
subNodeList - list to check against fullSNodeListList<SToken> getSortedTokenByText(List<SToken> tokens2sort)
SToken objects being contained in the given list and
sorts them by the SSequentialRelation.getStart() value of
SToken object.tokens2sort - the list of SToken objects to sortList<SToken> getSortedTokenByText()
SToken objects being contained in the list
getTokens() and sorts them by the
SSequentialRelation.getStart() value of SToken object.void sortTokenByText()
SToken and STextualRelation objects being
contained in the list getTokens() and
SDocumentGraph#getSTextualRelations() by the
SSequentialRelation.getStart() value of SToken and
STextualRelation object.List<SNode> getRootsByRelation(SALT_TYPE... type)
SNode objects which are roots for the given types of
SRelation. Means, that all SNodes will be returned as
roots, which have no incoming relations of the given type.
For instance imagine the following structure and assume that the passed
SALT_TYPEs are SALT_TYPE.SDOMINANCE_RELATION and
SALT_TYPE.SSPANNING_RELATION:
struct1
// ||
span1 || span2
/ \ || |
tok1 tok2 tok3 tok4
the nodes:
struct1 and span2 are returned, even if a pointing relation connects
struct1 and span2.
saltTypes - a set of types for which nodes have to be computed.SNodes which are rootscom.google.common.collect.Multimap<String,SNode> getRootsByRelationType(SALT_TYPE type)
node1 ->t1 node2, node2 ->t2-> node3Also imagine, that -> is a relation of same class with sType=t1 respectivly sType=t2 The returned roots will be node1 and node 2, because of node1 is the root of a subgraph for relation.sType=t1 and node2 is the root of the subgraph for relation.sType=t2. Whereas the returned nodes of
#getRootsByRelation(SALT_TYPE) is only node1. SaltUtil.SALT_NULL_VALUE.type - type, which shall be used for computing rootsSToken createToken(SSequentialDS sequentialDS, Integer start, Integer end)
SToken object and adds it to the graph. The
SToken object will be connected with the given
SSequentialDS object. The created relation get the passed
positions.sequentialDS - the data source to which the created token should be connectedstart - the offset in the data source where the created token startsend - the offset in the data source where the created token endsList<SToken> tokenize()
STextualDS object being contained in this
SDocumentGraph object. The Tokenization is similar to the
tokenization made by the TreeTagger tokenizer. This method calls the
method createTokenizer() and initializes with automatic detected
values. The language will be detected automatically for each
STextualDS object by use of the TextCategorizer (see:
http://textcat.sourceforge.net/doc/org/knallgrau/utils/textcat/
TextCategorizer.html). If the language is one of the given ones:
English, French, Italian and German, abbreviations also taken from the
Treetagger will be used. To customize these settings use the method
createTokenizer().
The used Treetagger is a reimplementation in Java with permission from
the original TreeTagger tokenizer in Perl by Helmut Schmid (see:
http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/). For
each token detected in the text given by
sTextualDS.getSText() an SToken object is created
and linked with the STextualDS object via a new
STextualRelation object containing the textual offset.Tokenizer createTokenizer()
Tokenizer object to tokenize the set
STextualDS objects being contained in this SDocumentGraph
object. To customize the tokenization, set take a look to the properties
of the returned Tokenizer object. This method is used by the
method tokenize(). The used Treetagger is a reimplementation in
Java with permission from the original TreeTagger tokenizer in Perl by
Helmut Schmid (see:
http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/). For
each token detected in the text given by
sTextualDS.getSText() an SToken object is created
and linked with the STextualDS object via a new
STextualRelation object containing the textual offset. If the set
STextualDS object is empty or if it does not belong to this
SDocumentGraph object an exception is thrown.SToken insertTokenAt(STextualDS textualDS, Integer posInText, String text, Boolean insertSpace)
STextualDS object.tok1 tok2 tok3 tok4 This is a text.the call insertSTokensAt(sTextualDS, 5, "additional", true);
tok1 tok5 tok2 tok3 tok4 This additional is a text.
textualDS - the STextualDS object to which the new tokens should
be related to. Make sure, that stextualDS is already contained
in the SDocumentGraphposInText - textual position where to add the new tokens.text - text value, the new token should coverinsertSpace - if true, a blank after each new text is insertedList<SToken> insertTokensAt(STextualDS textualDS, Integer posInText, List<String> texts, Boolean insertSpace)
STextualDS object.tok1 tok2 tok3 tok4 This is a text.the call insertSTokensAt(sTextualDS, 5, {"additional" "text"}, true);
tok1 tok5 tok6 tok2 tok3 tok4 This additional text is a text.
textualDS - the STextualDS object to which the new tokens should
be related to. Make sure, that stextualDS is already contained
in the SDocumentGraphposInText - textual position where to add the new tokens.texts - text values, the new tokens should coverinsertSpace - if true, a blank after each new text is insertedSRelation createRelation(SNode source, SNode target, SALT_TYPE relationType, String annotations)
SRelation object, and sets its sSource and sTarget to
the passed ones. The created SRelation is of the passed type. If
annotations are not empty, even SAnnotation objects will be
created. The syntax to pass annotations is: source - source nodetarget - target noderelationType - type of the relationannotations - annotations to be added to the created relation
(SNS::)?SNAME(=SVALUE)?(;SNS::SNAME=SVALUE)+List<SToken> getOverlappedTokens(SNode overlappingNode, SALT_TYPE... overlappingRelationTypes)
overlappingNode - anchor node, which overlaps the returned tokensoverlappingRelationTypes - relation typesList<SToken> getOverlappedTokens(SNode overlappingNode)
SALT_TYPE.STEXT_OVERLAPPING_RELATION having
at least one of the passed types.overlappingNode - anchor node, which overlaps the returned tokensString getText(SNode sNode)
STextualDS
by the given SNode. Imagine the following graph:
n1
/ \
t1 t2 t3 t4 t5
| | | | |
This is a sample text.
The method will return the text 'text' for token t5 and 'is a' for node
n1.sNode - the node for which the text should be retrieved.boolean isIsomorph(SDocumentGraph other)
#findDiffs().other - the graph to be compared with this oneboolean isIsomorph(SDocumentGraph other, DiffOptions options)
#findDiffs().other - the graph to be compared with this oneoptions - an option map to customize the isomorphie comparison, for more
information about how to customize the comparison, please
check the javadoc to DiffOptionsSet<Difference> findDiffs(SDocumentGraph other)
#isIsomorph().other - the graph to be compared with this oneSet<Difference> findDiffs(SDocumentGraph other, DiffOptions options)
#isIsomorph().other - the graph to be compared with this oneoptions - an option map to customize the isomorphie comparison, for more
information about how to customize the comparison, please
check the javadoc to DiffOptionsList<SNode> getChildren(SNode parent, SALT_TYPE relationType)
SALT_TYPE.parent - node to who the children are retrievedrelationType - type of relations to be traversedList<SNode> getSharedParent(List<SNode> children, SALT_TYPE nodeType)
SALT_TYPE will be
considered.children - list of nodes whose parents are looked fornodeType - regarded types of relationsCopyright © 2009–2015 Humboldt-Universität zu Berlin, INRIA. All rights reserved.