Class GSEAConverter

java.lang.Object
org.biopax.paxtools.io.gsea.GSEAConverter

public class GSEAConverter extends Object
Converts a BioPAX model to the GMT format (used by GSEA software). It creates GSEA entries from sequence entity reference xrefs in the BioPAX model as follows: Each entry (row) consists of three columns (tab separated): name (we use pathway URI), description (e.g. "name: Apoptosis; datasource: reactome; organism: 9606 idtype: uniprot"), and the list of identifiers (of the same type). For participants not associated with any pathway, "other" is used for the pathway name and uri. The list may have one or more IDs of the same type per entity reference, e.g., UniProt IDs or HGNC Symbols; entity references that do not have any xref of given db/id type are ignored. Optionally, if there are less than three protein references per entry, it will not be printed. Note, to effectively enforce cross-species violation, 'organism' property and pathways must be set to a BioSource object that has a valid unification xref: db="Taxonomy" and id= some valid taxonomy id. Note, this code assumes that the model has successfully been validated and perhaps normalized (using the BioPAX Validator, Paxtools Normalizer). A BioPAX L1 or L2 model is first converted to the L3.
  • Constructor Details

    • GSEAConverter

      public GSEAConverter()
      Constructor.
    • GSEAConverter

      public GSEAConverter(String idType, boolean crossSpeciesCheckEnabled)
      Constructor. See class declaration for more information.
      Parameters:
      idType - - identifier type, name of the resource, either the string value of the most of EntityReference's xref.db properties in the BioPAX data, e.g., "HGNC Symbol", "NCBI Gene", "RefSeq", "UniProt" or "UniProt knowledgebase", or the <namespace> part in normalized EntityReference URIs http://identifiers.org/<namespace>/<ID> (it depends on the actual data; so double-check before using in this constructor).
      crossSpeciesCheckEnabled - - if true, enforces no cross species participants in output
    • GSEAConverter

      public GSEAConverter(String idType, boolean crossSpeciesCheckEnabled, boolean skipSubPathways)
      Constructor. See class declaration for more information.
      Parameters:
      idType - - identifier type, name of the resource, either the string value of the most of EntityReference's xref.db properties in the BioPAX data, e.g., "HGNC Symbol", "NCBI Gene", "RefSeq", "UniProt" or "UniProt knowledgebase", or the <namespace> part in normalized EntityReference URIs http://identifiers.org/<namespace>/<ID> (it depends on the actual data; so double-check before using in this constructor).
      crossSpeciesCheckEnabled - - if true, enforces no cross species participants in output
      skipSubPathways - - if true, do not traverse into any sub-pathways to collect entity references (useful when a model, such as converted to BioPAX KEGG data, has lots of sub-pathways, loops.)
    • GSEAConverter

      public GSEAConverter(String idType, boolean crossSpeciesCheckEnabled, Set<Provenance> skipSubPathwaysOf)
      Constructor. See class declaration for more information.
      Parameters:
      idType - - identifier type, name of the resource, either the string value of the most of EntityReference's xref.db properties in the BioPAX data, e.g., "HGNC Symbol", "NCBI Gene", "RefSeq", "UniProt" or "UniProt knowledgebase", or the <namespace> part in normalized EntityReference URIs http://identifiers.org/<namespace>/<ID>, such as 'hgnc.symbol', 'uniprot' (it depends on the actual data; so double-check before using in this constructor).
      crossSpeciesCheckEnabled - - if true, enforces no cross species participants in output
      skipSubPathwaysOf - - do not look inside sub-pathways of pathways of given data sources to collect entity references (useful when a model, such as converted to BioPAX KEGG data, has lots of sub-pathways, loops.)
  • Method Details

    • isSkipOutsidePathways

      public boolean isSkipOutsidePathways()
      If true, then only GSEA entries that (genes) correspond to a Pathway are printed to the output.
      Returns:
      true/false
    • setSkipOutsidePathways

      public void setSkipOutsidePathways(boolean skipOutsidePathways)
    • getAllowedOrganisms

      public Set<String> getAllowedOrganisms()
    • setAllowedOrganisms

      public void setAllowedOrganisms(Set<String> allowedOrganisms)
    • getMinNumOfGenesPerEntry

      public int getMinNumOfGenesPerEntry()
      If this value is greater than 0, and the number of proteins/genes in a gene set is less than that value, then this gene set is to skip (no GSEA entry is written).
      Returns:
      the min. value
    • setMinNumOfGenesPerEntry

      public void setMinNumOfGenesPerEntry(int minNumOfGenesPerEntry)
    • writeToGSEA

      public void writeToGSEA(Model model, OutputStream out) throws IOException
      Converts model to GSEA (GMT) and writes to out. See class declaration for more information.
      Parameters:
      model - Model
      out - output stream to write the result to
      Throws:
      IOException - when there's an output stream error
    • convert

      public Collection<org.biopax.paxtools.io.gsea.GMTEntry> convert(Model model)
      Creates GSEA entries from the pathways contained in the model.
      Parameters:
      model - Model
      Returns:
      a set of GSEA entries