Package org.biopax.paxtools.io.gsea
Class GSEAConverter
java.lang.Object
org.biopax.paxtools.io.gsea.GSEAConverter
Converts a BioPAX model to the GMT format (used by GSEA software).
It creates GSEA entries from sequence entity reference xrefs
in the BioPAX model as follows:
Each entry (row) consists of three columns (tab separated):
name (we use pathway URI),
description (e.g. "name: Apoptosis; datasource: reactome; organism: 9606 idtype: uniprot"),
and the list of identifiers (of the same type). For participants not associated with any pathway,
"other" is used for the pathway name and uri.
The list may have one or more IDs of the same type per entity reference,
e.g., UniProt IDs or HGNC Symbols; entity references that do not have any xref of
given db/id type are ignored. Optionally, if there are less than three protein
references per entry, it will not be printed.
Note, to effectively enforce cross-species violation,
'organism' property and pathways must be set
to a BioSource object that has a valid unification xref:
db="Taxonomy" and id= some valid taxonomy id.
Note, this code assumes that the model has successfully been validated
and perhaps normalized (using the BioPAX Validator, Paxtools Normalizer).
A BioPAX L1 or L2 model is first converted to the L3.
-
Constructor Summary
ConstructorsConstructorDescriptionConstructor.GSEAConverter(String idType, boolean crossSpeciesCheckEnabled) Constructor.GSEAConverter(String idType, boolean crossSpeciesCheckEnabled, boolean skipSubPathways) Constructor.GSEAConverter(String idType, boolean crossSpeciesCheckEnabled, Set<Provenance> skipSubPathwaysOf) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionCollection<org.biopax.paxtools.io.gsea.GMTEntry> Creates GSEA entries from the pathways contained in the model.intIf this value is greater than 0, and the number of proteins/genes in a gene set is less than that value, then this gene set is to skip (no GSEA entry is written).booleanIf true, then only GSEA entries that (genes) correspond to a Pathway are printed to the output.voidsetAllowedOrganisms(Set<String> allowedOrganisms) voidsetMinNumOfGenesPerEntry(int minNumOfGenesPerEntry) voidsetSkipOutsidePathways(boolean skipOutsidePathways) voidwriteToGSEA(Model model, OutputStream out) Converts model to GSEA (GMT) and writes to out.
-
Constructor Details
-
GSEAConverter
public GSEAConverter()Constructor. -
GSEAConverter
Constructor. See class declaration for more information.- Parameters:
idType- - identifier type, name of the resource, either the string value of the most of EntityReference's xref.db properties in the BioPAX data, e.g., "HGNC Symbol", "NCBI Gene", "RefSeq", "UniProt" or "UniProt knowledgebase", or the <namespace> part in normalized EntityReference URIs http://identifiers.org/<namespace>/<ID> (it depends on the actual data; so double-check before using in this constructor).crossSpeciesCheckEnabled- - if true, enforces no cross species participants in output
-
GSEAConverter
Constructor. See class declaration for more information.- Parameters:
idType- - identifier type, name of the resource, either the string value of the most of EntityReference's xref.db properties in the BioPAX data, e.g., "HGNC Symbol", "NCBI Gene", "RefSeq", "UniProt" or "UniProt knowledgebase", or the <namespace> part in normalized EntityReference URIs http://identifiers.org/<namespace>/<ID> (it depends on the actual data; so double-check before using in this constructor).crossSpeciesCheckEnabled- - if true, enforces no cross species participants in outputskipSubPathways- - if true, do not traverse into any sub-pathways to collect entity references (useful when a model, such as converted to BioPAX KEGG data, has lots of sub-pathways, loops.)
-
GSEAConverter
public GSEAConverter(String idType, boolean crossSpeciesCheckEnabled, Set<Provenance> skipSubPathwaysOf) Constructor. See class declaration for more information.- Parameters:
idType- - identifier type, name of the resource, either the string value of the most of EntityReference's xref.db properties in the BioPAX data, e.g., "HGNC Symbol", "NCBI Gene", "RefSeq", "UniProt" or "UniProt knowledgebase", or the <namespace> part in normalized EntityReference URIs http://identifiers.org/<namespace>/<ID>, such as 'hgnc.symbol', 'uniprot' (it depends on the actual data; so double-check before using in this constructor).crossSpeciesCheckEnabled- - if true, enforces no cross species participants in outputskipSubPathwaysOf- - do not look inside sub-pathways of pathways of given data sources to collect entity references (useful when a model, such as converted to BioPAX KEGG data, has lots of sub-pathways, loops.)
-
-
Method Details
-
isSkipOutsidePathways
public boolean isSkipOutsidePathways()If true, then only GSEA entries that (genes) correspond to a Pathway are printed to the output.- Returns:
- true/false
-
setSkipOutsidePathways
public void setSkipOutsidePathways(boolean skipOutsidePathways) -
getAllowedOrganisms
-
setAllowedOrganisms
-
getMinNumOfGenesPerEntry
public int getMinNumOfGenesPerEntry()If this value is greater than 0, and the number of proteins/genes in a gene set is less than that value, then this gene set is to skip (no GSEA entry is written).- Returns:
- the min. value
-
setMinNumOfGenesPerEntry
public void setMinNumOfGenesPerEntry(int minNumOfGenesPerEntry) -
writeToGSEA
Converts model to GSEA (GMT) and writes to out. See class declaration for more information.- Parameters:
model- Modelout- output stream to write the result to- Throws:
IOException- when there's an output stream error
-
convert
Creates GSEA entries from the pathways contained in the model.- Parameters:
model- Model- Returns:
- a set of GSEA entries
-