public class CorpusStorageManager extends Object
| Modifier and Type | Class and Description |
|---|---|
static class |
CorpusStorageManager.CountResult
Contains the extended results of the count query.
|
static class |
CorpusStorageManager.ImportFormat
An enum of all supported input formats of graphANNIS.
|
static class |
CorpusStorageManager.QueryLanguage
An enum over all supported query languages of graphANNIS.
|
static class |
CorpusStorageManager.ResultOrder
Defines the order of results of a "find" query
|
| Constructor and Description |
|---|
CorpusStorageManager(String dbDir)
Create a new instance with a an automatic determined size of the internal
corpus cache.
|
CorpusStorageManager(String dbDir,
String logfile,
LogLevel level,
boolean useParallel)
Create a new instance with a an automatic determined size of the internal
corpus cache.
|
CorpusStorageManager(String dbDir,
String logfile,
LogLevel level,
boolean useParallel,
long maxCacheSize)
Create a new instance with a maximum size for the internal corpus cache.
|
| Modifier and Type | Method and Description |
|---|---|
void |
applyUpdate(String corpusName,
GraphUpdate update)
Apply a sequence of updates to this graph for a corpus.
|
Graph |
corpusGraph(String corpusName)
Return the copy of the graph of the corpus structure given by its name.
|
Graph |
corpusGraphForQuery(String corpusName,
String query,
CorpusStorageManager.QueryLanguage queryLanguage)
Return the copy of the graph of the corpus structure which includes all nodes
matched by the given query.
|
long |
count(String corpusName,
String query,
CorpusStorageManager.QueryLanguage queryLanguage)
Count the number of results for a query.
|
CorpusStorageManager.CountResult |
countExtra(String corpusName,
String query,
CorpusStorageManager.QueryLanguage queryLanguage)
Count the number of results for a query and return both the total number of
matches and also the number of documents in the result set.
|
boolean |
deleteCorpus(String corpusName)
Delete a corpus from this corpus storage.
|
String[] |
find(String corpusName,
CorpusStorageManager.QueryLanguage queryLanguage,
String query,
long offset,
long limit)
Find all results for a `query` and return the match ID for each result in
default order.
|
String[] |
find(String corpusName,
String query,
CorpusStorageManager.QueryLanguage queryLanguage,
long offset,
long limit,
CorpusStorageManager.ResultOrder order)
Find all results for a `query` and return the match ID for each result.
|
List<FrequencyTableEntry<String>> |
frequency(String corpusName,
String query,
CorpusStorageManager.QueryLanguage queryLanguage,
String frequencyQueryDefinition)
Execute a frequency query.
|
List<Component> |
getAllComponentsByType(String corpusName,
ComponentType componentType)
Returns a list of all components of a corpus given by its name and a given
component type
|
List<NodeDesc> |
getNodeDescriptions(String query,
CorpusStorageManager.QueryLanguage queryLanguage) |
void |
importFromFileSystem(String path,
CorpusStorageManager.ImportFormat format,
String corpusName)
Import a corpus from an external location on the file system into this corpus
storage.
|
String[] |
list()
List all available corpora in the corpus storage.
|
List<Annotation> |
listEdgeAnnotations(String corpusName,
ComponentType componentType,
String componentName,
String componentLayer,
boolean listValues,
boolean onlyMostFrequentValues)
eturns a list of all edge annotations of a corpus given by its name and and
given component.
|
List<Annotation> |
listNodeAnnotations(String corpusName,
boolean listValues,
boolean onlyMostFrequentValues)
Returns a list of all node annotations of a corpus given its name.
|
Graph |
subcorpusGraph(String corpusName,
List<String> documentIDs)
Return the copy of a subgraph which includes all nodes that belong to any of
the given list of sub-corpus/document identifiers.
|
Graph |
subgraph(String corpusName,
List<String> nodeIDs,
long ctxLeft,
long ctxRight,
Optional<String> segmentation)
Return the copy of a subgraph which includes the given list of node
annotation identifiers, the nodes that cover the same token as the given
nodes and all nodes that cover the token which are part of the defined
context.
|
Graph |
subGraphForQuery(String corpusName,
String query,
CorpusStorageManager.QueryLanguage queryLanguage)
Return the copy of a subgraph which includes all nodes matched by the given
query.
|
void |
unloadCorpus(String corpusName)
Unloads a corpus from the cache.
|
boolean |
validateQuery(String corpusName,
String query,
CorpusStorageManager.QueryLanguage queryLanguage)
Parses a query and checks if it is valid.
|
public CorpusStorageManager(String dbDir) throws GraphANNISException
dbDir - The path on the filesystem where the corpus storage content is
located. Must be an existing directory.GraphANNISExceptionpublic CorpusStorageManager(String dbDir, String logfile, LogLevel level, boolean useParallel) throws GraphANNISException
dbDir - The path on the filesystem where the corpus storage
content is located. Must be an existing directory.logfile - Path to where a logfile should be writtenlevel - Log level for the logfileuseParallel - If "true" parallel joins are used by the system, using all
available cores.GraphANNISExceptionpublic CorpusStorageManager(String dbDir, String logfile, LogLevel level, boolean useParallel, long maxCacheSize) throws GraphANNISException
dbDir - The path on the filesystem where the corpus storage
content is located. Must be an existing directory.logfile - Path to where a logfile should be writtenlevel - Log level for the logfileuseParallel - If "true" parallel joins are used by the system, using
all available cores.maxCacheSize - Fixed maximum size of the cache in bytes.GraphANNISExceptionpublic String[] list() throws GraphANNISException
GraphANNISExceptionpublic List<Annotation> listNodeAnnotations(String corpusName, boolean listValues, boolean onlyMostFrequentValues)
corpusName - The name of the corpuslistValues - If true include the possible values in the
result.onlyMostFrequentValues - If both this argument and "listValues" are
true, only return the most frequent value for
each annotation name.public List<Annotation> listEdgeAnnotations(String corpusName, ComponentType componentType, String componentName, String componentLayer, boolean listValues, boolean onlyMostFrequentValues)
corpusName - The name of the corpuscomponentType - Type of the component.componentName - Name of the component.componentLayer - A layer name which allows to group different
components into the same layer. Can be empty.listValues - If true include the possible values in the
result.onlyMostFrequentValues - If both this argument and "listValues" are
true, only return the most frequent value for
each annotation name.public List<Component> getAllComponentsByType(String corpusName, ComponentType componentType)
corpusName - The name of the corpuscomponentType - Type of the component to be returnedpublic boolean validateQuery(String corpusName, String query, CorpusStorageManager.QueryLanguage queryLanguage) throws GraphANNISException
corpusName - The name of the corpus the query would be executed on
(needed because missing annotation names can be a
semantic parser error).query - The query as string.queryLanguage - The query language of the query (e.g. AQL).GraphANNISExceptionpublic List<NodeDesc> getNodeDescriptions(String query, CorpusStorageManager.QueryLanguage queryLanguage) throws GraphANNISException
GraphANNISExceptionpublic long count(String corpusName, String query, CorpusStorageManager.QueryLanguage queryLanguage) throws GraphANNISException
corpusName - The name of the corpus to execute the query on.query - The query as string.queryLanguage - The query language of the query (e.g. AQL).GraphANNISExceptionpublic CorpusStorageManager.CountResult countExtra(String corpusName, String query, CorpusStorageManager.QueryLanguage queryLanguage) throws GraphANNISException
corpusName - The name of the corpus to execute the query on.query - The query as string.queryLanguage - The query language of the query (e.g. AQL).GraphANNISExceptionpublic String[] find(String corpusName, CorpusStorageManager.QueryLanguage queryLanguage, String query, long offset, long limit) throws GraphANNISException
corpusName - The name of the corpus to execute the query on.query - The query as string.queryLanguage - The query language of the query (e.g. AQL).offset - Skip the n first results, where n is
the offset.limit - Return at most n matches, where n is
the limit.GraphANNISExceptionpublic String[] find(String corpusName, String query, CorpusStorageManager.QueryLanguage queryLanguage, long offset, long limit, CorpusStorageManager.ResultOrder order) throws GraphANNISException
corpusName - The name of the corpus to execute the query on.query - The query as string.queryLanguage - The query language of the query (e.g. AQL).offset - Skip the `n` first results, where `n` is the offset.limit - Return at most `n` matches, where `n` is the limit.order - Specify the order of the matches.GraphANNISExceptionpublic Graph subgraph(String corpusName, List<String> nodeIDs, long ctxLeft, long ctxRight, Optional<String> segmentation) throws GraphANNISException
corpusName - The name of the corpus for which the subgraph should be
generated from.nodeIDs - A set of node annotation identifiers describing the
subgraph.ctxLeft - Left context in token distance to be included in the
subgraph.ctxRight - Right context in token distance to be included in the
subgraph.segmentation - The name of the segmentation which should be used to as base for the context.
Use Optional.empty() to define the context in the default token layer.GraphANNISExceptionpublic Graph subcorpusGraph(String corpusName, List<String> documentIDs) throws GraphANNISException
corpusName - The name of the corpus for which the subgraph should be
generated from.documentIDs - A set of sub-corpus/document identifiers describing the
subgraph.GraphANNISExceptionpublic Graph corpusGraph(String corpusName) throws GraphANNISException
corpusName - The name of the corpus.GraphANNISExceptionpublic Graph corpusGraphForQuery(String corpusName, String query, CorpusStorageManager.QueryLanguage queryLanguage) throws GraphANNISException
corpusName - The name of the corpus.query - The query as string.queryLanguage - The query language of the query (e.g. AQL).GraphANNISExceptionpublic Graph subGraphForQuery(String corpusName, String query, CorpusStorageManager.QueryLanguage queryLanguage) throws GraphANNISException
corpusName - The name of the corpus.query - The query as string.queryLanguage - The query language of the query (e.g. AQL).GraphANNISExceptionpublic List<FrequencyTableEntry<String>> frequency(String corpusName, String query, CorpusStorageManager.QueryLanguage queryLanguage, String frequencyQueryDefinition) throws GraphANNISException
corpusName - The name of the corpus to execute the query
on.query - The query as string.queryLanguage - The query language of the query (e.g. AQL).frequencyQueryDefinition - A comma seperated list of single frequency
definition items as string. Each frequency
definition must consist of two parts: the
name of referenced node and the (possible
qualified) annotation name or "tok" separated
by ":". E.g. a frequency definition like
1:tok,3:pos,4:tiger::pos
would extract the token value for the node
#1, the pos annotation for node #3 and the
pos annotation in the tiger namespace for
node #4.GraphANNISExceptionpublic void importFromFileSystem(String path, CorpusStorageManager.ImportFormat format, String corpusName) throws GraphANNISException
path - The location on the file system where the corpus data is
located.format - The format in which this corpus data is stored.corpusName - If not "null", override the name of the new corpus for file
formats that already provide a corpus name.GraphANNISExceptionpublic boolean deleteCorpus(String corpusName) throws GraphANNISException
corpusName - The name of the corpus to delete.GraphANNISExceptionpublic void unloadCorpus(String corpusName) throws GraphANNISException
corpusName - The name of the corpus to unload.GraphANNISExceptionpublic void applyUpdate(String corpusName, GraphUpdate update) throws GraphANNISException
corpusName - The name of the corpus to apply the updates onupdate - The sequence of updates.GraphANNISExceptionCopyright © 2019 Thomas Krause. All rights reserved.