molgenis file format reference

This is documentation on the data exchange format for the 'molgenis' system.

To ease data exchange this system comes with a simple 'tab separated values' file format. In such text files the data is formatted in tables with the columns separated using tabs, colons, or semi-colons. Advantage is that these files can be easily created and parsed using common spreadsheet tools like Excel. An example of such tab delimited file is shown below:

name	description	date
Experiment1	This is my first experiment	2010-01-19
Experiment2	This is my second experiment	2010-01-20
This document describes what file types and columns are defined for the 'molgenis' system. Data in this format can be uploaded to the database via the user interface using the 'File' menu). Alternatively, a whole directory of such files can be loaded in batch using the CsvImport program. The following files are currently recognized by this program (grouped by topic):

Below, the columns for each of these file types are detailed as well as example data shown (if available).

org.molgenis.omx.auth file types

File: molgenisuser.txt

Contents:
Anyone who can login.

Structure:
column name type required? auto/default description
username string YES  
password_ string   secret big fixme: password type.
activationcode string     Used as alternative authentication mechanism to verify user email and/or if user has lost password.
active bool   false Boolean to indicate if this account can be used to login.
superuser bool   false
firstname string    
midinitials string    
lastname string    
title string     An academic title, e.g. Prof.dr, PhD.
affiliation string    
department string     Added from the old definition of MolgenisUser. Department of this contact.
role string     Indicate role of the contact, e.g. lab worker or PI.
address text     The address of the Contact.
phone string     The telephone number of the Contact including the suitable area codes.
email string     The email address of the Contact.
fax string     The fax number of the Contact.
tollfreephone string     A toll free phone number for the Contact, including suitable area codes.
city string     Added from the old definition of MolgenisUser. City of this contact.
country string     Added from the old definition of MolgenisUser. Country of this contact.
Constraint: values in column username should unique.
Constraint: values in column email should unique.

File: molgenisgroup.txt

Structure:
column name type required? auto/default description
name string YES  

File: molgenisgroupmember.txt

Structure:
column name type required? auto/default description
molgenisuser_username
xref YES   This xref uses {molgenisuser_username} to find related elements in file molgenisUser.txt based on unique column {username}.
molgenisgroup_id
xref YES   This xref uses {molgenisgroup_id} to find related elements in file molgenisGroup.txt based on unique column {id}.

File: userauthority.txt

Structure:
column name type required? auto/default description
role string YES  
molgenisuser_username
xref YES   This xref uses {molgenisuser_username} to find related elements in file molgenisUser.txt based on unique column {username}.

File: groupauthority.txt

Structure:
column name type required? auto/default description
role string YES  
molgenisgroup_id
xref YES   This xref uses {molgenisgroup_id} to find related elements in file molgenisGroup.txt based on unique column {id}.

org.molgenis.omx.core file types

File: molgenisentity.txt

Contents:
Referenceable catalog of entity names, menus, forms and plugins.

Structure:
column name type required? auto/default description
name string YES   Name of the entity.
type_ string YES   Type of the entity.
classname string YES   Full name of the entity.
Constraint: values in column classname should unique.
Contraint: values in the combined columns (name, type_) should be unique.

File: runtimeproperty.txt

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
value text YES  
Constraint: values in column identifier should unique.
Constraint: values in column name should unique.

org.molgenis.omx.observ file types

Observ-OM is a model to uniformly describe any phenotypic, genotypic or molecular observation. The four core concepts are:

File: characteristic.txt

Contents:
Characteristics are yes-no statements about things in the world. These can be used as part of an observation, as parameter of ObservableFeature ('measuredCharacteristic'). For example: 'What is allele of [Marker]', here the [Marker] is a characteristic. Also, Characteristics can be used as target of observation. Typical examples are 'Individual' or 'Panel'. But also 'Marker' can be an Target when asked the question 'QTL p-value for [phenotype]': here both target and feature are characteristic, for example 'leave count' (phenotype characteristic) and 'PVV4' (marker characteristic).

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
Constraint: values in column identifier should unique.

File: observationtarget.txt

Contents:
ObservationTarget defines subjects of observation, such as Individual, Panel, Sample, etc. For instance: 'target 1' IS A 'Individual'.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
Constraint: values in column identifier should unique.

File: observablefeature.txt

Contents:
ObservableFeature defines anything that can be observed.

In other words, ObservableFeature are the questions asked, e.g. 'What is Height?', 'What is Systolic blood pressure?', or 'Has blue eyes?'.

Some questions may be repeated for multiple characteristics. For example 'What is [MarkerAllele] observed?' can be applied to all elements of a MarkerSet, and 'What is [medicin codes] uses' can be applied to a set of Medicine codes. This can be specified using the measuredCharacteristic field.

The identifier of ObservableFeature is globally unique. It is recommended that each ObservableFeature is named according to a well-defined ontology term or database accession.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
unit_Identifier
xref     (Optional) Reference to the well-defined measurement unit used to observe this feature (if feature is that concrete). E.g. mmHg. This xref uses {unit_identifier} to find related elements in file ontologyTerm.txt based on unique column {identifier}.
definitions_Identifier
mref     The concept that is being measured in a specific way.. This mref uses {definitions_identifier} to find related elements in file ontologyTerm.txt based on unique column {identifier}. . More than one reference can be added separated by '|', for example: ref1|ref2|ref3.
datatype enum   string (Optional) Reference to the technical data type. E.g. 'int'.
temporal bool   false Whether this feature is time dependent and can have different values when measured on different times (e.g. weight, temporal=true) or generally only measured once (e.g. birth date, temporal=false).
Constraint: values in column identifier should unique.

File: category.txt

Contents:
Category is partOf ObservableFeature to define categories for an ObservableFeature, such as the categorical answer codes that are often used in Questionaires. For example the ObservableFeature 'sex' has {code_string = 1, label=male} and {code_string = 2, label=female}. Category can be linked to well-defined ontology terms via the ontologyReference. Category extends ObservationElement such that it can be referenced by ObservedValue.value. The Category class maps to METABASE::Category .

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
observablefeature_Identifier
xref YES   The Measurement these permitted values are part of.. This xref uses {observablefeature_identifier} to find related elements in file observableFeature.txt based on unique column {identifier}.
valuecode string     The value used to store this category in ObservedValue. For example '1', '2'.
definition_Identifier
xref     The category that is being measured in a specific way.. This xref uses {definition_identifier} to find related elements in file ontologyTerm.txt based on unique column {identifier}.
ismissing bool   false whether this value should be treated as missing value.
Constraint: values in column identifier should unique.

File: protocol.txt

Contents:
The Protocol class defines parameterizable descriptions of (analysis)methods. Examples of protocols are: Questionaires, SOPs, Assay platforms, Statistical analyses, etc. Each protocol has a unique identifier. Protocol has an association to OntologyTerm to represent the type of protocol.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
protocoltype_Identifier
xref     classification of protocol. This xref uses {protocoltype_identifier} to find related elements in file ontologyTerm.txt based on unique column {identifier}.
subprotocols_Identifier
mref     Subprotocols of this protocol. This mref uses {subprotocols_identifier} to find related elements in file protocol.txt based on unique column {identifier}. . More than one reference can be added separated by '|', for example: ref1|ref2|ref3.
features_Identifier
mref     parameters (in/out) that are used or produced by this protocol.. This mref uses {features_identifier} to find related elements in file observableFeature.txt based on unique column {identifier}. . More than one reference can be added separated by '|', for example: ref1|ref2|ref3.
requiredfeatures_Identifier
mref     ........... This mref uses {requiredfeatures_identifier} to find related elements in file observableFeature.txt based on unique column {identifier}. . More than one reference can be added separated by '|', for example: ref1|ref2|ref3.
root bool   false Indicator whether this protocol defines a workflow (e.g is the first protocol of a workflow).
active bool   true whether this protocol is considered active/inactive.
Constraint: values in column identifier should unique.

File: dataset.txt

Contents:
Container for one or more observations that are measured using the same protocol and by the same performer(s). The data set may be a file (having the same identifier) but in most cases it is a data table consisting of rows (Observation). This entity replaces ProtocolApplication.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
protocolused_Identifier
xref YES   Reference to the protocol that is being used (if available). This xref uses {protocolused_identifier} to find related elements in file protocol.txt based on unique column {identifier}.
starttime datetime   today time when the protocol started.
endtime datetime   today (Optional) time when the protocol ended.
Constraint: values in column identifier should unique.

File: observationset.txt

Contents:
In practice: Observation is one row within a DataSet.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
partofdataset_Identifier
xref YES   DataSet this ValueSet is part of.. This xref uses {partofdataset_identifier} to find related elements in file dataSet.txt based on unique column {identifier}.
time datetime     Time of this observationSet.
Constraint: values in column identifier should unique.
Contraint: values in the combined columns (partofdataset, time) should be unique.

File: observedvalue.txt

Contents:
Generic storage of values as part of one observation event. Values are atomatic observations, e.g., length (feature) of individual 1 (valueset.target) = 179cm (value). Values can also be qualified by some characteristic, e.g., QTL p-value (feature) between phenotype 'leaf count' (characteristic) and marker 'PVV4' (valueset.target) = 0.1^10+3 (value).

Structure:
column name type required? auto/default description
observationset_Identifier
xref YES   Reference to the observation. For example a particular patient visit or the application of a microarray or the calculation of a QTL model. This xref uses {observationset_identifier} to find related elements in file observationSet.txt based on unique column {identifier}.
feature_Identifier
xref YES   References the ObservableFeature that this observation was made on. For example 'probe123'.. This xref uses {feature_identifier} to find related elements in file observableFeature.txt based on unique column {identifier}.
value_id
xref     The value observed. This xref uses {value_id} to find related elements in file value.txt based on unique column {id}.

org.molgenis.omx.observ.target file types

File: species.txt

Contents:
Ontology terms for species. E.g. Arabidopsis thaliana. DISCUSSION: should we avoid subclasses of OntologyTerm and instead make a 'tag' filter on terms so we can make pulldowns context dependent (e.g. to only show particular subqueries of ontologies).

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
ontology_Identifier
xref     (Optional) The source ontology or controlled vocabulary list that ontology terms have been obtained from.. This xref uses {ontology_identifier} to find related elements in file ontology.txt based on unique column {identifier}.
termaccession string     (Optional) The accession number assigned to the ontology term in its source ontology. If empty it is assumed to be a locally defined term.
definition string     (Optional) The definition of the term.
Constraint: values in column identifier should unique.
Contraint: values in the combined columns (ontology, termaccession) should be unique.

File: individual.txt

Contents:
The Individuals class defines the subjects that are used as observation target. The Individual class maps to XGAP:Individual and PaGE:Individual. Groups of individuals can be defined via Panel.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
mother_Identifier
xref     Refers to the mother of the individual.. This xref uses {mother_identifier} to find related elements in file individual.txt based on unique column {identifier}.
father_Identifier
xref     Refers to the father of the individual.. This xref uses {father_identifier} to find related elements in file individual.txt based on unique column {identifier}.
Constraint: values in column identifier should unique.

File: panel.txt

Contents:
The Panel class defines groups of individuals based on cohort design, case/controls, families, etc. For instance: 'LifeLines cohort', 'middle aged man', 'recombinant mouse inbred Line dba x b6' or 'Smith family'. A Panel can act as a single ObservationTarget. For example: average height (Measurement) in the LifeLines cohort (Panel) is 174cm (ObservedValue). The Panel class maps to XGAP:Strain and PaGE:Panel classes. In METABASE this is assumed there is one panel per study.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
paneltype_Identifier
xref     Indicate the type of Panel (example: Sample panel, AssayedPanel, Natural=wild type, Parental=parents of a cross, F1=First generation of cross, RCC=Recombinant congenic, CSS=chromosome substitution). This xref uses {paneltype_identifier} to find related elements in file ontologyTerm.txt based on unique column {identifier}.
numberofindividuals int YES  
species_Identifier
xref     The species this panel is an instance of/part of/extracted from.. This xref uses {species_identifier} to find related elements in file species.txt based on unique column {identifier}.
individuals_Identifier
mref     The list of individuals in this panel. This mref uses {individuals_identifier} to find related elements in file individual.txt based on unique column {identifier}. . More than one reference can be added separated by '|', for example: ref1|ref2|ref3.
Constraint: values in column identifier should unique.

File: panelsource.txt

Contents:
PanelSources is partOf Panel to define how panels are related panels, founder panels, such as overlap, selection criteria, getting assayed panel from a sample panel, etc.

Structure:
column name type required? auto/default description
currentpanel_Identifier
xref YES   Panel for which these sources are defined.. This xref uses {currentpanel_identifier} to find related elements in file panel.txt based on unique column {identifier}.
sourcepanel_Identifier
xref YES   Source that contributed individuals to current panel. This xref uses {sourcepanel_identifier} to find related elements in file panel.txt based on unique column {identifier}.
numberofindividuals int     Number of individuals lifted over from this source.
selectioncriteria text YES   Inclusion/exclusion criteria used to select these individuals from source into current panel.

File: ontology.txt

Contents:
Ontology defines a reference to an ontology or controlled vocabulary from which well-defined and stable (ontology) terms can be obtained. Each Ontology should have a unique identifer, for instance: Gene Ontology, Mammalian Phenotype, Human Phenotype Ontology, Unified Medical Language System, Medical Subject Headings, etc. Also a abbreviation is required, for instance: GO, MP, HPO, UMLS, MeSH, etc. Use of existing ontologies/vocabularies is recommended to harmonize phenotypic feature and value descriptions. But one can also create a 'local' Ontology. The Ontology class maps to FuGE::Ontology, MAGE-TAB::TermSourceREF.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
ontologyaccession string     A accession that uniquely identifies the ontology (typically an acronym). E.g. GO, MeSH, HPO.
ontologyuri hyperlink     (Optional) A URI that references the location of the ontology.
Constraint: values in column identifier should unique.

File: ontologyterm.txt

Contents:
OntologyTerm defines a single entry (term) from an ontology or a controlled vocabulary (defined by Ontology). The identifier is the ontology term is unique. E.g. 'NCI:Antigen Gene'. Other data entities can reference to this OntologyTerm to harmonize naming of concepts. If no suitable ontology term exists then one can define new terms locally (in which case there is no formal accession for the term limiting its use for cross-Investigation queries).

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
ontology_Identifier
xref     (Optional) The source ontology or controlled vocabulary list that ontology terms have been obtained from.. This xref uses {ontology_identifier} to find related elements in file ontology.txt based on unique column {identifier}.
termaccession string     (Optional) The accession number assigned to the ontology term in its source ontology. If empty it is assumed to be a locally defined term.
definition string     (Optional) The definition of the term.
Constraint: values in column identifier should unique.
Contraint: values in the combined columns (ontology, termaccession) should be unique.

File: accession.txt

Contents:
An external identifier for an annotation. For example: name='R13H8.1', ontology='ensembl' or name='WBgene00000912', ontology='wormbase'.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
ontology_Identifier
xref     (Optional) The source ontology or controlled vocabulary list that ontology terms have been obtained from.. This xref uses {ontology_identifier} to find related elements in file ontology.txt based on unique column {identifier}.
termaccession string     (Optional) The accession number assigned to the ontology term in its source ontology. If empty it is assumed to be a locally defined term.
definition string     (Optional) The definition of the term.
Constraint: values in column identifier should unique.
Contraint: values in the combined columns (ontology, termaccession) should be unique.

org.molgenis.omx.auth file types

File: institute.txt

Contents:
A contact is either a person or an organization. Copied from FuGE::Contact.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES  
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
address text     The address of the Contact.
phone string     The telephone number of the Contact including the suitable area codes.
email string     The email address of the Contact.
fax string     The fax number of the Contact.
tollfreephone string     A toll free phone number for the Contact, including suitable area codes.
city string     Added from the old definition of MolgenisUser. City of this contact.
country string     Added from the old definition of MolgenisUser. Country of this contact.
Constraint: values in column identifier should unique.
Constraint: values in column name should unique.

File: person.txt

Contents:
Person represents one or more people involved with an Investigation. This may include authors on a paper, lab personnel or PIs. Person has last name, firstname, mid initial, address, contact and email. A Person role is included to represent how a Person is involved with an investigation. For submission to repository purposes an allowed value is 'submitter' and the term is present in the MGED Ontology, an alternative use could represent job title. An Example from ArrayExpress is E-MTAB-506 ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/experiment/TABM/E-TABM-506/E-TABM-506.idf.txt. .
The FUGE equivalent to Person is FuGE::Person.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
address text     The address of the Contact.
phone string     The telephone number of the Contact including the suitable area codes.
email string     The email address of the Contact.
fax string     The fax number of the Contact.
tollfreephone string     A toll free phone number for the Contact, including suitable area codes.
city string     Added from the old definition of MolgenisUser. City of this contact.
country string     Added from the old definition of MolgenisUser. Country of this contact.
firstname string    
midinitials string    
lastname string    
title string     An academic title, e.g. Prof.dr, PhD.
affiliation_Name
xref     This xref uses {affiliation_name} to find related elements in file institute.txt based on unique column {name}.
department string     Added from the old definition of MolgenisUser. Department of this contact.
roles_Identifier
xref     Indicate role of the contact, e.g. lab worker or PI. Changed from mref to xref in oct 2011.. This xref uses {roles_identifier} to find related elements in file personRole.txt based on unique column {identifier}.
Constraint: values in column identifier should unique.
Constraint: values in column email should unique.

File: personrole.txt

Contents:
Seperate type of ontologyTerm to administrate roles.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
ontology_Identifier
xref     (Optional) The source ontology or controlled vocabulary list that ontology terms have been obtained from.. This xref uses {ontology_identifier} to find related elements in file ontology.txt based on unique column {identifier}.
termaccession string     (Optional) The accession number assigned to the ontology term in its source ontology. If empty it is assumed to be a locally defined term.
definition string     (Optional) The definition of the term.
Constraint: values in column identifier should unique.
Contraint: values in the combined columns (ontology, termaccession) should be unique.

org.molgenis.omx.observ.value file types

File: value.txt

Structure:
column name type required? auto/default description

File: boolvalue.txt

Structure:
column name type required? auto/default description
value bool YES  

File: categoricalvalue.txt

Structure:
column name type required? auto/default description
value_Identifier
xref YES   This xref uses {value_identifier} to find related elements in file category.txt based on unique column {identifier}.

File: datevalue.txt

Structure:
column name type required? auto/default description
value date YES  

File: datetimevalue.txt

Structure:
column name type required? auto/default description
value datetime YES  

File: decimalvalue.txt

Structure:
column name type required? auto/default description
value decimal YES  

File: emailvalue.txt

Structure:
column name type required? auto/default description
value email YES  

File: htmlvalue.txt

Structure:
column name type required? auto/default description
value text YES  

File: hyperlinkvalue.txt

Structure:
column name type required? auto/default description
value hyperlink YES  

File: intvalue.txt

Structure:
column name type required? auto/default description
value int YES  

File: longvalue.txt

Structure:
column name type required? auto/default description
value long YES  

File: mrefvalue.txt

Structure:
column name type required? auto/default description
value_Identifier
mref YES   This mref uses {value_identifier} to find related elements in file characteristic.txt based on unique column {identifier}. . More than one reference can be added separated by '|', for example: ref1|ref2|ref3.

File: stringvalue.txt

Structure:
column name type required? auto/default description
value string YES  

File: textvalue.txt

Structure:
column name type required? auto/default description
value text YES  

File: xrefvalue.txt

Structure:
column name type required? auto/default description
value_Identifier
xref YES   This xref uses {value_identifier} to find related elements in file characteristic.txt based on unique column {identifier}.

org.molgenis.omx.study file types

Model extension to make it possible to store a DataSetFilter that is linked to a user

File: studydatarequest.txt

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
requestform string YES   request form filename.
features_Identifier
mref YES   This mref uses {features_identifier} to find related elements in file observableFeature.txt based on unique column {identifier}. . More than one reference can be added separated by '|', for example: ref1|ref2|ref3.
protocol_Identifier
xref YES   protocol used to create request. This xref uses {protocol_identifier} to find related elements in file protocol.txt based on unique column {identifier}.
molgenisuser_username
xref YES   This xref uses {molgenisuser_username} to find related elements in file molgenisUser.txt based on unique column {username}.
requestdate datetime YES   request date.
requeststatus enum YES  
Constraint: values in column identifier should unique.

org.molgenis.omx.workflow file types

File: protocolflow.txt

Structure:
column name type required? auto/default description
inputfeature_Identifier
xref YES   This xref uses {inputfeature_identifier} to find related elements in file observableFeature.txt based on unique column {identifier}.
outputfeature_Identifier
xref YES   This xref uses {outputfeature_identifier} to find related elements in file observableFeature.txt based on unique column {identifier}.
source_Identifier
xref YES   This xref uses {source_identifier} to find related elements in file protocol.txt based on unique column {identifier}.
destination_Identifier
xref YES   This xref uses {destination_identifier} to find related elements in file protocol.txt based on unique column {identifier}.

File: observationsetflow.txt

Structure:
column name type required? auto/default description
source_Identifier
xref YES   This xref uses {source_identifier} to find related elements in file observationSet.txt based on unique column {identifier}.
destination_Identifier
xref YES   This xref uses {destination_identifier} to find related elements in file observationSet.txt based on unique column {identifier}.

XGAP module file types

XGAP, taken from https://raw.github.com/joerivandervelde/molgenis-sdk/omx3/src/main/resources/omx3/xgap.xml at 26 sept 2013. Added: Variant

File: track.txt

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
Constraint: values in column identifier should unique.

File: variant.txt

Contents:
A SNP is a special kind of Marker, but can also be seen as a phenotype to map against in some cases. A single-nucleotide polymorphism is a DNA sequence variation occurring when a single nucleotide in the genome (or other shared sequence) differs between members of a biological species or paired chromosomes in an individual.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
chromosome_Identifier
xref     Reference to the chromosome this position belongs to.. This xref uses {chromosome_identifier} to find related elements in file chromosome.txt based on unique column {identifier}.
cm decimal     genetic map position in centi morgan (cM).
bpstart long     numeric basepair postion (5') on the chromosome.
bpend long     numeric basepair postion (3') on the chromosome.
seq text     The FASTA text representation of the sequence.
symbol string     todo.
mutationposition int    
cdnaposition int    
aaposition int    
variantlength int    
event string    
ntchange string    
codonchange string    
cdnanotation string    
gdnanotation string    
aanotation string    
exon string    
consequence string    
inheritance string    
reportedsnp bool    
effectonsplicing bool    
pathogenicity string    
gene string    
idmutation int    
detailsformutation hyperlink    
track_Identifier
xref YES   This xref uses {track_identifier} to find related elements in file track.txt based on unique column {identifier}.
Constraint: values in column identifier should unique.

File: chromosome.txt

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
ordernr int YES  
isautosomal bool YES   Is 'yes' when number of chromosomes is equal in male and female individuals, i.e., if not a sex chromosome.
bplength int     Lenght of the chromsome in base pairs.
species_Identifier
xref     Reference to the species this chromosome belongs to.. This xref uses {species_identifier} to find related elements in file ontologyTerm.txt based on unique column {identifier}.
Constraint: values in column identifier should unique.

File: nmrbin.txt

Contents:
Shift of the NMR frequency due to the chemical environment.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
Constraint: values in column identifier should unique.

File: clone.txt

Contents:
BAC clone fragment.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
chromosome_Identifier
xref     Reference to the chromosome this position belongs to.. This xref uses {chromosome_identifier} to find related elements in file chromosome.txt based on unique column {identifier}.
cm decimal     genetic map position in centi morgan (cM).
bpstart long     numeric basepair postion (5') on the chromosome.
bpend long     numeric basepair postion (3') on the chromosome.
seq text     The FASTA text representation of the sequence.
symbol string     todo.
Constraint: values in column identifier should unique.

File: derivedtrait.txt

Contents:
Any meta trait, eg. false discovery rates, P-values, thresholds.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
Constraint: values in column identifier should unique.

File: environmentalfactor.txt

Contents:
Experimental conditions, such as temperature differences, batch effects etc.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
Constraint: values in column identifier should unique.

File: gene.txt

Contents:
Trait annotations specific for genes.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
chromosome_Identifier
xref     Reference to the chromosome this position belongs to.. This xref uses {chromosome_identifier} to find related elements in file chromosome.txt based on unique column {identifier}.
cm decimal     genetic map position in centi morgan (cM).
bpstart long     numeric basepair postion (5') on the chromosome.
bpend long     numeric basepair postion (3') on the chromosome.
seq text     The FASTA text representation of the sequence.
symbol string     Main symbol this gene is known by (not necessarily unique, in constrast to 'name').
orientation enum     Orientation of the gene on the genome (F=forward, R=reverse).
control bool     Indicating whether this is a 'housekeeping' gene that can be used as control.
Constraint: values in column identifier should unique.

File: transcript.txt

Contents:
Trait annotations specific for transcripts.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
gene_Identifier
xref     The gene that produces this protein. This xref uses {gene_identifier} to find related elements in file gene.txt based on unique column {identifier}.
Constraint: values in column identifier should unique.

File: protein.txt

Contents:
Trait annotations specific for proteins.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
gene_Identifier
xref     The gene that produces this protein. This xref uses {gene_identifier} to find related elements in file gene.txt based on unique column {identifier}.
transcript_Identifier
xref     The transcript variant that produces this protein. This xref uses {transcript_identifier} to find related elements in file transcript.txt based on unique column {identifier}.
aminosequence text     The aminoacid sequence.
mass decimal     The mass of this metabolite.
Constraint: values in column identifier should unique.

File: metabolite.txt

Contents:
Trait annotations specific for metabolites.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
formula string     The chemical formula of a metabolite.
mass decimal     The mass of this metabolite.
structure text     The chemical structure of a metabolite (in SMILES representation).
Constraint: values in column identifier should unique.

File: marker.txt

Contents:
Trait annotations specific for markers.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
chromosome_Identifier
xref     Reference to the chromosome this position belongs to.. This xref uses {chromosome_identifier} to find related elements in file chromosome.txt based on unique column {identifier}.
cm decimal     genetic map position in centi morgan (cM).
bpstart long     numeric basepair postion (5') on the chromosome.
bpend long     numeric basepair postion (3') on the chromosome.
seq text     The FASTA text representation of the sequence.
symbol string     todo.
reportsfor_Identifier
mref     The marker (or a subclass like 'SNP') this marker (or a subclass like 'SNP') reports for.. This mref uses {reportsfor_identifier} to find related elements in file marker.txt based on unique column {identifier}. . More than one reference can be added separated by '|', for example: ref1|ref2|ref3.
Constraint: values in column identifier should unique.

File: snp.txt

Contents:
A SNP is a special kind of Marker, but can also be seen as a phenotype to map against in some cases. A single-nucleotide polymorphism is a DNA sequence variation occurring when a single nucleotide in the genome (or other shared sequence) differs between members of a biological species or paired chromosomes in an individual.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
chromosome_Identifier
xref     Reference to the chromosome this position belongs to.. This xref uses {chromosome_identifier} to find related elements in file chromosome.txt based on unique column {identifier}.
cm decimal     genetic map position in centi morgan (cM).
bpstart long     numeric basepair postion (5') on the chromosome.
bpend long     numeric basepair postion (3') on the chromosome.
seq text     The FASTA text representation of the sequence.
symbol string     todo.
reportsfor_Identifier
mref     The marker (or a subclass like 'SNP') this marker (or a subclass like 'SNP') reports for.. This mref uses {reportsfor_identifier} to find related elements in file marker.txt based on unique column {identifier}. . More than one reference can be added separated by '|', for example: ref1|ref2|ref3.
status string     The status of this SNP, eg 'confirmed'.
polymorphism_Identifier
mref     The polymorphism that belongs to this SNP.. This mref uses {polymorphism_identifier} to find related elements in file polymorphism.txt based on unique column {identifier}. . More than one reference can be added separated by '|', for example: ref1|ref2|ref3.
Constraint: values in column identifier should unique.

File: polymorphism.txt

Contents:
The difference of a single base discovered between two sequenced individuals.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
base enum YES   The affected DNA base. Note that you can select the reference base here.
value string     The strain/genotype for which this polymorphism was discovered. E.g. 'N2' or 'CB4856'.
Constraint: values in column identifier should unique.

File: probe.txt

Contents:
A piece of sequence that reports for the expression of a gene, typically spotted onto a microarray.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
chromosome_Identifier
xref     Reference to the chromosome this position belongs to.. This xref uses {chromosome_identifier} to find related elements in file chromosome.txt based on unique column {identifier}.
cm decimal     genetic map position in centi morgan (cM).
bpstart long     numeric basepair postion (5') on the chromosome.
bpend long     numeric basepair postion (3') on the chromosome.
seq text     The FASTA text representation of the sequence.
symbol string     todo.
mismatch bool   false Indicating whether the probe is a match.
probeset_Identifier
xref     Optional: probeset this probe belongs to (e.g., in Affymetrix assays).. This xref uses {probeset_identifier} to find related elements in file probeSet.txt based on unique column {identifier}.
reportsfor_Identifier
xref     The gene this probe reports for.. This xref uses {reportsfor_identifier} to find related elements in file gene.txt based on unique column {identifier}.
Constraint: values in column identifier should unique.

File: spot.txt

Contents:
This is the spot on a microarray.
Note: We don't distinquish between probes (the sequence) and spots (the sequence as spotted on the array).

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
chromosome_Identifier
xref     Reference to the chromosome this position belongs to.. This xref uses {chromosome_identifier} to find related elements in file chromosome.txt based on unique column {identifier}.
cm decimal     genetic map position in centi morgan (cM).
bpstart long     numeric basepair postion (5') on the chromosome.
bpend long     numeric basepair postion (3') on the chromosome.
seq text     The FASTA text representation of the sequence.
symbol string     todo.
mismatch bool   false Indicating whether the probe is a match.
probeset_Identifier
xref     Optional: probeset this probe belongs to (e.g., in Affymetrix assays).. This xref uses {probeset_identifier} to find related elements in file probeSet.txt based on unique column {identifier}.
reportsfor_Identifier
xref     The gene this probe reports for.. This xref uses {reportsfor_identifier} to find related elements in file gene.txt based on unique column {identifier}.
x int YES   Row.
y int YES   Column.
gridx int     Meta Row.
gridy int     Meta Column.
Constraint: values in column identifier should unique.
Contraint: values in the combined columns (x, y, gridx, gridy) should be unique.

File: probeset.txt

Contents:
A set of Probes. E.g. an Affymetrix probeset has multiple probes. It implements locus because sometimes you want to give the complete set of probes a range, for example: indicating that this set of probes spans basepair 0 through 10.000.000 on chromosome 3. The same information could arguably also be queried from the probes themselves, but if you have 40k probes, retrieving the same information from only ProbeSet (if annotated so) would be much faster.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
chromosome_Identifier
xref     Reference to the chromosome this position belongs to.. This xref uses {chromosome_identifier} to find related elements in file chromosome.txt based on unique column {identifier}.
cm decimal     genetic map position in centi morgan (cM).
bpstart long     numeric basepair postion (5') on the chromosome.
bpend long     numeric basepair postion (3') on the chromosome.
seq text     The FASTA text representation of the sequence.
symbol string     todo.
Constraint: values in column identifier should unique.

File: masspeak.txt

Contents:
A peak that has been selected within a mass spectrometry experiment.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
mz decimal     Mass over charge ratio of this peak.
retentiontime decimal     The retention-time of this peak in minutes.
Constraint: values in column identifier should unique.

File: sample.txt

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
individual_Identifier
xref     The individual from which this sample was taken.. This xref uses {individual_identifier} to find related elements in file individual.txt based on unique column {identifier}.
tissue_Identifier
xref     The tissue from which this sample was taken.. This xref uses {tissue_identifier} to find related elements in file ontologyTerm.txt based on unique column {identifier}.
Constraint: values in column identifier should unique.

File: pairedsample.txt

Contents:
A pair of samples labeled for a two-color microarray experiment.

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
subject1_Identifier
xref YES   The first subject. This xref uses {subject1_identifier} to find related elements in file individual.txt based on unique column {identifier}.
label1_Identifier
xref     Which channel or Fluorescent labeling is associated with the first subject. This xref uses {label1_identifier} to find related elements in file ontologyTerm.txt based on unique column {identifier}.
subject2_Identifier
xref YES   The second sample. This xref uses {subject2_identifier} to find related elements in file individual.txt based on unique column {identifier}.
label2_Identifier
xref     Which channel or Fluorescent labeling is associated with the second subject. This xref uses {label2_identifier} to find related elements in file ontologyTerm.txt based on unique column {identifier}.
Constraint: values in column identifier should unique.

Patient file types

Temporary solution to link mutations to patients for the genome browser.

File: patient.txt

Structure:
column name type required? auto/default description
identifier string YES   user supplied or automatically assigned (using a decorator) unique and short identifier, e.g. MA1234.
name string YES   human readible name, not necessary unique.
description text     (Optional) Rudimentary meta data about the observable feature. Use of ontology terms references to establish unambigious descriptions is recommended.
mother_Identifier
xref     Refers to the mother of the individual.. This xref uses {mother_identifier} to find related elements in file individual.txt based on unique column {identifier}.
father_Identifier
xref     Refers to the father of the individual.. This xref uses {father_identifier} to find related elements in file individual.txt based on unique column {identifier}.
allele1_Identifier
xref     This xref uses {allele1_identifier} to find related elements in file variant.txt based on unique column {identifier}.
allele2_Identifier
xref     This xref uses {allele2_identifier} to find related elements in file variant.txt based on unique column {identifier}.
pheno string    
pubmedid string    
reference string    
Constraint: values in column identifier should unique.

Appendix: documentation of the mref tables

molgenis file types