Package org.biopax.paxtools.io.gsea
Class GSEAConverter
java.lang.Object
org.biopax.paxtools.io.gsea.GSEAConverter
Converts a BioPAX model to the GMT format (used by GSEA software).
It creates GSEA entries from sequence entity reference xrefs
in the BioPAX model as follows:
Each entry (row) consists of three columns (tab separated):
name (we use pathway URI),
description (e.g. "name: Apoptosis; datasource: reactome; organism: 9606 idtype: uniprot"),
and the list of identifiers (of the same type). For participants not associated with any pathway,
"other" is used for the pathway name and uri.
The list may have one or more IDs of the same type per entity reference,
e.g., UniProt IDs or HGNC Symbols; entity references that do not have any xref of
given db/id type are ignored. Optionally, if there are less than three protein
references per entry, it will not be printed.
Notes:
- to effectively enforce the cross-species checks, 'organism' property of pathways and participants must be set
to a BioSource element with valid ncbitaxon (taxonomy) unification xref;
- this assumes that the BioPAX Level3 model was validated and normalized (with BioPAX Validator and Normalizer);
-
Constructor Summary
ConstructorsConstructorDescriptionConstructor.GSEAConverter(String idType, boolean crossSpeciesCheckEnabled) Constructor.GSEAConverter(String idType, boolean crossSpeciesCheckEnabled, boolean skipSubPathways) Constructor.GSEAConverter(String idType, boolean crossSpeciesCheckEnabled, Collection<Provenance> skipSubPathwaysOf) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionCollection<org.biopax.paxtools.io.gsea.GMTEntry> Creates GSEA entries from the pathways contained in the model.intIf this value is greater than 0, and the number of proteins/genes in a gene set is less than that value, then this gene set is to skip (no GSEA entry is written).booleanIf true, then only GSEA entries that (genes) correspond to a Pathway are printed to the output.voidsetAllowedOrganisms(Set<String> allowedOrganisms) voidvoidsetMinNumOfGenesPerEntry(int minNumOfGenesPerEntry) voidsetSkipOutsidePathways(boolean skipOutsidePathways) voidwriteToGSEA(Model model, OutputStream out) Converts model to GSEA (GMT) and writes to out.
-
Constructor Details
-
GSEAConverter
public GSEAConverter()Constructor. -
GSEAConverter
Constructor. See class declaration for more information.- Parameters:
idType- - identifier collection/type name, prefix - either the string value of the most of EntityReference's xref.db property values in the BioPAX data, e.g., "hgnc.symbol", or "HGNC Symbol", "NCBI Gene", "RefSeq", "UniProt", - or the namespace/prefix part of the normalized EntityReference URIs, such as 'chebi' in identifiers.org/chebi:ID or bioregistry.io/chebi:ID (it depends on actual data; so check/verify before using here).crossSpeciesCheckEnabled- - if true, enforces no cross species participants in output
-
GSEAConverter
Constructor. See class declaration for more information.- Parameters:
idType- - identifier collection/type name, prefix - either the string value of the most of EntityReference's xref.db property values in the BioPAX data, e.g., "hgnc.symbol", or "HGNC Symbol", "NCBI Gene", "RefSeq", "UniProt", - or the namespace/prefix part of the normalized EntityReference URIs, such as 'chebi' in identifiers.org/chebi:ID or bioregistry.io/chebi:ID (it depends on actual data; so check/verify before using here).crossSpeciesCheckEnabled- - if true, enforces no cross species participants in outputskipSubPathways- - if true, do not traverse into any sub-pathways to collect entity references (useful when a model, such as converted to BioPAX KEGG data, has lots of sub-pathways, loops.)
-
GSEAConverter
public GSEAConverter(String idType, boolean crossSpeciesCheckEnabled, Collection<Provenance> skipSubPathwaysOf) Constructor. See class declaration for more information.- Parameters:
idType- identifier collection/type name, prefix - either the string value of the most of EntityReference's xref.db property values in the BioPAX data, e.g., "hgnc.symbol", or "HGNC Symbol", "NCBI Gene", "RefSeq", "UniProt", - or the namespace/prefix part of the normalized EntityReference URIs, such as 'chebi' in identifiers.org/chebi:ID or bioregistry.io/chebi:ID (it depends on actual data; so check/verify before using here).crossSpeciesCheckEnabled- - if true, enforces no cross species participants in outputskipSubPathwaysOf- - do not look inside sub-pathways of pathways of given data sources to collect entity references (useful when a model, such as converted to BioPAX KEGG data, has lots of sub-pathways, loops.)
-
-
Method Details
-
getIdType
-
setIdType
-
isSkipOutsidePathways
public boolean isSkipOutsidePathways()If true, then only GSEA entries that (genes) correspond to a Pathway are printed to the output.- Returns:
- true/false
-
setSkipOutsidePathways
public void setSkipOutsidePathways(boolean skipOutsidePathways) -
getAllowedOrganisms
-
setAllowedOrganisms
-
getMinNumOfGenesPerEntry
public int getMinNumOfGenesPerEntry()If this value is greater than 0, and the number of proteins/genes in a gene set is less than that value, then this gene set is to skip (no GSEA entry is written).- Returns:
- the min. value
-
setMinNumOfGenesPerEntry
public void setMinNumOfGenesPerEntry(int minNumOfGenesPerEntry) -
writeToGSEA
Converts model to GSEA (GMT) and writes to out. See class declaration for more information.- Parameters:
model- Modelout- output stream to write the result to- Throws:
IOException- when there's an output stream error
-
convert
Creates GSEA entries from the pathways contained in the model.- Parameters:
model- Model- Returns:
- a set of GSEA entries
-