|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||
See:
Description
| Class Summary | |
|---|---|
| ScriptParser | |
Provides classes and tools used to help to create schemas and load data. These utilities are able to read a plain text file containing commands called dex scripts. Commands describes how to create DEX schemas and load data into DEX.
Package edu.upc.dama.dex.script can be used either by command line
or by function calls.
It is deisgned to interpret plain text files. These files have a
set of commands that are able to construct and load data into a
DEX database.
: TOC
Scripting utilities requires .des files in order to define
DEX graph schema.
The following example defines an schema to analyze phone
calls:
// calls-schema.des file
create dbgraph PhoneCalls into 'calls.dex'
create node 'person' (
'name' string unique,
'phone' string indexed
)
create edge 'calls' from 'person' to 'person' (
'date' timestamp,
'duration' long
)
This schema defines one node type, person, and edge type, calls.
Person has two attributes: name and phone. Calls is a materialized
directed restriced edge type from person to person, it has two attributes:
date and duration.
Previous .des file have defined an schema, but it has no
data. DEX scripts (.des) allow us to load data into
DEX database. Data is required to be present in
CSV format
An extra .des file is required to load an existing
CSV:
// calls-load.des dile
use dbgraph PhoneCalls into 'calls.dex'
load nodes 'persons.csv'
columns 'name', 'phone'
into 'person'
fields terminated ';'
from 1
mode rows
load edges 'calls.csv'
columns 'caller', 'called', 'date', 'duration'
into 'calls'
ignore 'caller', 'called'
where tail 'caller' = 'person'.'phone' head 'called' = 'person'.'phone'
fields terminated ';'
from 1
mode rows
This .des file load information from
persons.csv and from
calls.csv. The former contains two columns
one with values for the name attribute, and another one for the phone
attribute. The latter contains the required description to build
person to person calls relationships (two columns: caller and
called) and attribute values (two more columns: date and duration).
CSV files are readed by using edu.upc.dama.dex.io package.
They are simple comma sepparated files, as the ones generated
by most of database managers and applications.
The person.csv for
calls-schema.def is:
name;phone
John Bronson;555312111
"William ""Will"" Thomson";555192939
Maria Garudo;555443322
The calls.csv for
calls-load.def is:
caller;called;date;duration
555192939;555443322;2001-05-23 13:37:27;12
555312111;555192939;2001-05-24 21:32:24;3
555443322;555192939;2001-05-24 01:49:48;54
555192939;555443322;2001-05-25 15:21:12;14
This files can be uploaded into a DEX database by using command
line. The command to run previous .des files is:
java edu.upc.dama.dex.script.ScriptParser calls-schema.des
java edu.upc.dama.dex.script.ScriptParser calls-load.des
The former launches the script that creates the schema, the
latter launches the script that loads data into the dex schema.
The CSV files are required on the same directory in order to
load their values. A relative or absolute path can also be used as a filename.
The "jdex.jar" file must be in the java class path.
This files can be executed from java classes through API.
An application to run previous .des files is:
import edu.upc.dama.dex.script.ScriptParser;
class Example {
public static void main(String[] args)
{
ScriptParser.main(new String[] {"calls-schema.des"});
ScriptParser.main(new String[] {"calls-load.des"});
}
}
This class uses ScriptParser main static method in order to
execute both DEX scripts.
A DEX Script file (.des) is a file that contains an
ordered list of commands.
DEX will execute each one of script file commands in order.
Commands will create schemas, define nodes and edges, and load
data into a previously defined DEX schema.
There are six main commands:
: TOC
Schema command defines and creates a graph schema into a DEX
schema database.
Schema definition has the alias, former name of database,
filename to store the database, and multiple node and edge
definition.
Node and edge definition has the type name, and a list of
their attributes names and types. Optionally an attribute can
be indexed or unique (the default is basic).
The attribute could be indexed later (see Schema update), when the
data had already been loaded.
Edge definition it is also parametrized to suport multiple types of
edges (directed, undirected, restricted, ...) and optionally
to specify connecting types (restricted), and attributes.
Also the edges can enable materialize neighbors or not.
Defined schema is created into a new DEX database and used on
following load commands.
Schema command syntax are as follows:
CREATE DBGRAPH alias INTO filename
CREATE NODE node_type_name "("
[attribute_name (INT|LONG|DOUBLE|STRING|BOOLEAN|TIMESTAMP|TEXT) [INDEXED|UNIQUE|BASIC], ...]
")"
CREATE [UNDIRECTED] EDGE edge_type_name
[FROM node_type_name TO node_type_name] "("
[attribute_name (INT|LONG|DOUBLE|STRING|BOOLEAN|TIMESTAMP|TEXT), ...]
") [MATERIALIZE NEIGHBORS]"
Value:
create dbgraph WIKIPEDIA into 'wikipedia.dex'
create node TITLES (
ID int unique,
'TEXT' string ,
NLC string,
TITLE string indexed
)
create node IMAGES (
ID int unique,
NLC string,
filename string indexed
)
create edge REFS (
NLC string,
'TEXT' string,
TYPE string
)
create edge IMGS ( )
create dbgraph FAMILY into 'family.dex'
create node PERSON (NAME string indexed, ID int unique, YEAR int)
create node DOG (NAME string indexed, YEAR int)
create edge CHILD from PERSON to PERSON (YEAR int)
create edge HUSBAND from PERSON to PERSON (YEAR int) materialize neighbors
create edge WIFE from PERSON to PERSON (YEAR int) materialize neighbors
create edge PET from PERSON to DOG () materialize neighbors
create dbgraph CARMODEL into 'cars.dex'
create node PERSON (NAME string, ID int unique, YEAR int)
create node CAR (MODEL string, ID int, OWNER int indexed)
create edge CAROWNER from PERSON to CAR
Loads an schema from an existing DEX database and uses it on
following load commands.
This command specifies an alias for a DEX database, loads the
DEX schema from the filename. Selected DEX database is used
on following load commands.
Use command syntax is the following:
USE DBGRAPH alias INTO filename
Load nodes command creates and sets attribute values for nodes
imported from a CSV. For each CSV row, a new node is created in
the previously selected DEX database
(see dbgraph use).
This command selects the file to read and sets the name of the columns to read
(they can be optionally * to ignore). The type of the new
nodes is defined by into parameter. All node attributes
are set from each column but the ones defined by
ignore parameter or named as *.
Default field delimiter is , but that behaviour can be
modified by fields parameter. All CSV lines are
read unless the parameter from or max
are present. The former allows to skip headers, like field
definition rows. The latter allows to test loads by reading
a limiter number of rows. Default load mode is by rows, but
mode parameter can change it for load optimization.
Load nodes command syntax is the following:
As mentioned above in the Schema definition section, in some cases, it's necessary to write the file_name, attribute_name, alias_name and node_type_name parameters in inverted commas. The following example shows this case.
LOAD NODES file_name
COLUMNS attribute_name [alias_name], ...
INTO node_type_name
[IGNORE (attribute_name|alias_name), ....]
[FIELDS
[TERMINATED char]
[ENCLOSED char]
[ALLOW_MULTILINE]]
[FROM num]
[MAX num]
[MODE (ROWS|COLUMNS [SPLIT [PARTITIONS num]])]
load nodes 'titles.csv'
columns ID, NLC, 'TEXT', TITLE
into TITLES
mode rows
load NODES 'images.csv'
columns ID, NLC, FILENAME
into IMAGES
from 1
max 10000
mode columns
load nodes 'people.csv'
columns *, DNI, NAME, AGE, *, ADDRESS
into PEOPLE
fields terminated ';' enclosed '"'
mode rows
Load edges command creates links and sets attributes values for
edges imported from a CSV. For each CSV row a new edge is created
in the previously selected DEX database
(see dbgraph use).
This command has the same parameters that load node plus
edge specific parameters.
Edges are always created as a link between two previous existing nodes,
in order to select which ones the where parameter is used.
That means that you must create nodes (see load nodes
command) first.
It is important to note that referenced node attributes in the
where clause (tail and head both) must be indexed or unique attributes.
Tail node is defined by tail property, it looks for
the node where file column value is the same than the node
of an specific name with the same value at that specific attribute name.
Head node is defined by head property, like tail it
creates the edge against the specified node.
Load edges command syntax is the following:
As mentioned above in the Schema definition section, in some cases, it's necessary to write the file_name, attribute_name, alias_name and node_type_name parameters in inverted commas. The following example shows this case.
LOAD EDGES file_name
COLUMNS attribute_name [alias_name], ...
INTO node_type_name
[IGNORE (attribute_name|alias_name), ....]
WHERE
TAIL (attribute_name|alias_name) = node_type_name.attribute_name
HEAD (attribute_name|alias_name) = node_type_name.attribute_name
[FIELDS
[TERMINATED char]
[ENCLOSED char]
[ALLOW_MULTILINE]]
[FROM num]
[MAX num]
[MODE (ROWS|COLUMNS [SPLIT [PARTITIONS num]])]
load edges 'references.csv'
columns NLC, 'TEXT', TYPE, 'FROM' F, 'TO' T
into REFS
ignore F, T
where tail F = TITLES.ID head T = TITLES.ID
mode columns split partitions 3
load edges 'imagesReferences.csv.gz'
columns 'From', 'To'
into IMGS
ignore 'From', 'To'
where tail 'From' = TITLES.ID HEAD 'To' = IMAGES.ID
mode rows
In these examples,
Schema update commands allows for updating the schema of a graph pool. Nowadays it is possible to remove node or edge types or attributes. The node attribute indexing can also be modified.
DROP (NODE|EDGE|ATTRIBUTE) name
INDEX node_type_name.attribute_name [INDEXED|UNIQUE|BASIC]
drop attribute IMAGES.ID
drop attribute IMAGES.NLC
drop attribute IMAGES.FILENAME
drop node 'IMAGES'
drop edge IMGS
index PEOPLE.NAME indexed
index TITLES.TITLE unique
There are some conventions that are applied to all commands.
This conventions are the following:
edu.upc.dama.dex.io package. They can optionally be
compressed with zip tools into a gz files, load
commands will descompress automatically the content of the
files
SET TIMESTAMP FORMAT "yyyy-MM-dd hh:mm:ss" to
change the default format of the Timestamps data to load on dex.
After use this, all timestamps data are loaded with this format.Created DEX Scripts (.des files) are executed on command line or inside a java application through API.
: TOC
DEX Script API provides a class with an executable main that allows
to execute a DEX Script file.
This command line tool requires one argument which is the path and
name of the .des file.
Command line syntax is the following:
java edu.upc.dama.dex.script.ScriptParser file.des
This tool requires whole DEX environment including DEX classpath and DEX JNI native libraries.
DEX provides and API to run DEX Scripts from Java applications.
The current package provides the
ScriptParser class that executes
DEX Script files and streams.
There are two ways to execute DEX Scripts from API: through main
or parser construction. The former reuses
ScriptParser.main(String[])
operation executed from command line. It receives a string of one
element which is the file name.
An example is the following:
import edu.upc.dama.dex.script.ScriptParser;
class Example1 {
public void main(String[] args)
{
ScriptParser.main(new String[] {"file.def"});
}
}
Presented solution is limited to execute existing files on user
file system and use new DEX instance object. The latter is to use
an instance of ScriptParser. Its
constructor receives a DEX object and a Reader object with a
DEX file. Once the object is constructed, a call to parse will
ScriptParser.parse(boolean) the reader into the DEX object.
An example is the following:
import java.io.StringReader;
import edu.upc.dama.dex.script.ScriptParser;
class Example2 {
public static void main(String args[]) throws Exception {
DEX dex = new DEX();
StringReader reader= new StringReader(
"dbgraph People into 'people.dex' ( "+
"node 'person' ( 'name' string ) "+
")"
);
ScriptParser ps = new ScriptParser(dex, reader);
ps.parse(true);
dex.close();
}
}
edu.upc.dama.dex.io
|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||