public class SampleGenerator extends Object
SDocumentGraph and SCorpusGraph instances.| Modifier and Type | Field and Description |
|---|---|
static String |
LANG_DE
iso 639-1 language code for german
|
static String |
LANG_EN
iso 639-1 language code for english
|
static String |
MORPHOLOGY_LAYER
The name of the morphologic layer containing the tokens.
|
static String |
PRIMARY_TEXT_DE
The primary text, which is used for the samples.
|
static String |
PRIMARY_TEXT_EN
The primary text, which is used for the samples.
|
static String |
PRIMARY_TEXT_EN_SPK1
Primary text of speaker1
|
static String |
PRIMARY_TEXT_EN_SPK2
Primary text of speaker2
|
static String |
SYNTAX_LAYER |
| Constructor and Description |
|---|
SampleGenerator() |
| Modifier and Type | Method and Description |
|---|---|
static void |
createAnaphoricAnnotations(SDocument document) |
static SCorpusGraph |
createCorpusStructure_simple()
Creates the following structure:
rootCorpus | doc1
|
static SCorpusGraph |
createCorpusStructure()
Creates the following structure:
rootCorpus
/ \
subCorpus1 subCorpus2
/ \ / \
doc1 doc2 doc3 doc4
|
static SCorpusGraph |
createCorpusStructure(SaltProject saltProject)
Creates the following corpus structure and adds it to the given salt
project.
|
static SCorpusGraph |
createCorpusStructure(SCorpusGraph corpGraph1)
Creates the following structure:
rootCorpus
/ \
subCorpus1 subCorpus2
/ \ / \
doc1 doc2 doc3 doc4
|
static void |
createDependencies(SDocument document)
This method creates the sample's dependency annotation.
|
static void |
createDialogue(SDocument document)
Creates a
SDocumentGraph containing to texts of two different
speakers, who are aligned via the STimeline related to the
SToken objects. |
static void |
createDocumentStructure(SDocument document)
Creates a document structure containing:
primary text
tokenization
morphological annotations
information structure annotation
syntactical annotation
anaphoric annotation
|
static void |
createInformationStructureAnnotations(SDocument document)
Annotates the
SSpan objects above the tokenization with
information structural annotations. |
static void |
createInformationStructureSpan(SDocument document)
Creates
SSpan object above the tokenization. |
static void |
createMorphologyAnnotations(SDocument document)
Creates morphological annotations (pos and lemma) for the tokenized
sample and adds them to each
SToken object as
SPOSAnnotation or SLemmaAnnotation object. |
static void |
createParallelData(SDocument document) |
static void |
createParallelData(SDocument document,
boolean setTypeForPointRel)
Creates a small parallel corpus, containing an english and a german text.
|
static STextualDS |
createPrimaryData(SDocument document)
Creates an
STextualDS object containing the primary text
PRIMARY_TEXT_EN and adds the object to the
SDocumentGraph being contained by the given SDocument
object. |
static STextualDS |
createPrimaryData(SDocument document,
String language)
Creates a
STextualDS object containing the primary text
PRIMARY_TEXT_EN, which is either an english text
or its german translation and adds the object to the
SDocumentGraph being contained by the given SDocument
object. |
static SaltProject |
createSaltProject()
Creates a complete
SaltProject object having the complex
structure
rootCorpus
/ \
subCorpus1 subCorpus2
/ \ / \
doc1 doc2 doc3 doc4
|
static void |
createSyntaxAnnotations(SDocument document)
This method creates the categorical annotations for the nodes of the
sample syntax tree created in
createSyntaxStructure(SDocument). |
static void |
createSyntaxStructure(SDocument document)
Creates a syntax structure for the given
SDocument object. |
static SToken |
createToken(int start,
int end,
STextualDS textualDS,
SDocument document,
SLayer layer)
Creates a
SToken covering the passed position and returns it. |
static void |
createTokens(SDocument document)
Creates a set of
SToken objects tokenizing the primary text
PRIMARY_TEXT_EN in to the following tokens:
Is
this
example
more
complicated
than
it
appears
to
be
? |
static List<SToken> |
createTokens(SDocument document,
STextualDS textualDS)
Creates a set of
SToken objects tokenizing the primary text
PRIMARY_TEXT_EN or
PRIMARY_TEXT_DE depending on the given
STextualDS object in to the following tokens:
Is
this
example
more
complicated
than
it
appears
to
be
? |
static void |
createUntypedParallelData(SDocument document)
Creates a small parallel corpus, containing an english and a german text.
|
public static final String PRIMARY_TEXT_EN
public static final String PRIMARY_TEXT_EN_SPK1
public static final String PRIMARY_TEXT_EN_SPK2
public static final String PRIMARY_TEXT_DE
public static final String MORPHOLOGY_LAYER
public static final String LANG_EN
public static final String LANG_DE
public static final String SYNTAX_LAYER
public static SCorpusGraph createCorpusStructure(SaltProject saltProject)
rootCorpus
/ \
subCorpus1 subCorpus2
/ \ / \
doc1 doc2 doc3 doc4
IOExceptionSAXExceptionpublic static SaltProject createSaltProject()
SaltProject object having the complex
structure
rootCorpus
/ \
subCorpus1 subCorpus2
/ \ / \
doc1 doc2 doc3 doc4
public static SCorpusGraph createCorpusStructure()
rootCorpus
/ \
subCorpus1 subCorpus2
/ \ / \
doc1 doc2 doc3 doc4
IOExceptionSAXExceptionpublic static SCorpusGraph createCorpusStructure(SCorpusGraph corpGraph1)
rootCorpus
/ \
subCorpus1 subCorpus2
/ \ / \
doc1 doc2 doc3 doc4
IOExceptionSAXExceptionpublic static SCorpusGraph createCorpusStructure_simple()
rootCorpus | doc1
IOExceptionSAXExceptionpublic static void createDialogue(SDocument document)
SDocumentGraph containing to texts of two different
speakers, who are aligned via the STimeline related to the
SToken objects. The texts are "Is this example more complicated than it appears to be?" and
"Uhm oh yes!", which are tokenized by words. The words
'to' and 'Oh' have been said simultaneously and are overlapping via the
timeline.document - document to be filledpublic static STextualDS createPrimaryData(SDocument document)
STextualDS object containing the primary text
PRIMARY_TEXT_EN and adds the object to the
SDocumentGraph being contained by the given SDocument
object. TODO WHAT HAS BEEN SUPPOSED TO BE SHOWN HERE? THE ORIGINAL TEXT
OR THE LINK TO THE STRING OBJECT?document - the document, to which the created STextualDS object
will be added FEHLT IN SAMPLE GENERATORpublic static STextualDS createPrimaryData(SDocument document, String language)
STextualDS object containing the primary text
PRIMARY_TEXT_EN, which is either an english text
or its german translation and adds the object to the
SDocumentGraph being contained by the given SDocument
object.document - the document, to which the created STextualDS object
will be addedlanguage - the language of the resource to be created, LANG_EN
for english, LANG_DE for germanSTextualDS objectpublic static void createTokens(SDocument document)
SToken objects tokenizing the primary text
PRIMARY_TEXT_EN in to the following tokens:
SToken objects and corresponding
STextualRelation objects are added to the given SDocument
object.document - the document, to which the created SToken objects will
be addedpublic static List<SToken> createTokens(SDocument document, STextualDS textualDS)
SToken objects tokenizing the primary text
PRIMARY_TEXT_EN or
PRIMARY_TEXT_DE depending on the given
STextualDS object in to the following tokens:
SToken objects and corresponding
STextualRelation objects are added to the given SDocument
object.public static SToken createToken(int start, int end, STextualDS textualDS, SDocument document, SLayer layer)
SToken covering the passed position and returns it.start - end - textualDS - document - layer - SToken objectpublic static void createParallelData(SDocument document)
public static void createParallelData(SDocument document, boolean setTypeForPointRel)
document - he document containing the STextualDS objectspublic static void createUntypedParallelData(SDocument document)
document - he document containing the STextualDS objectspublic static void createMorphologyAnnotations(SDocument document)
SToken object as
SPOSAnnotation or SLemmaAnnotation object.
| token | pos | lemma |
| Is | VBZ | be |
| this | DT | this |
| example | NN | example |
| more | ABR | more |
| complicated | JJ | complicated |
| than | IN | than |
| it | PRP | it |
| appears | VBZ | appear |
| to | TO | to |
| be | VB | be |
document - the document containing the SToken and
STextualDS objectspublic static void createInformationStructureSpan(SDocument document)
SSpan object above the tokenization.
| contrast-focus | topic | ||||||||
| Is | this | example | more | complicated | than | it | appears | to | be |
document - public static void createInformationStructureAnnotations(SDocument document)
SSpan objects above the tokenization with
information structural annotations.document - public static void createSyntaxStructure(SDocument document)
SDocument object. If it
does not already contain a primary text and a tokenization, this method
calls createPrimaryData(SDocument) and
createTokens(SDocument).document - public static void createSyntaxAnnotations(SDocument document)
createSyntaxStructure(SDocument).document - public static void createDependencies(SDocument document)
document - public static void createAnaphoricAnnotations(SDocument document)
document - public static void createDocumentStructure(SDocument document)
document - Copyright © 2009–2019 Humboldt-Universität zu Berlin, INRIA. All rights reserved.