Package org.bridgedb

Class DataSource


  • public final class DataSource
    extends Object
    Contains information about a certain DataSource. This includes:
    • Its full name ("Ensembl")
    • Its system code ("En")
    • Its main url ("http://www.ensembl.org")
    • Id-specific url's ("http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=" + id)
    The DataSource class uses the extensible enum pattern. You cannot instantiate DataSources directly, instead you have to use one of the constants from the org.bridgedb.bio module such as BioDataSource.ENSEMBL, or the getBySystemcode or "getByFullname" methods. These methods return a predefined DataSource object if it exists. If a predefined DataSource for a requested SystemCode doesn't exists, a new one springs to life automatically. This can be used when the user requests new, unknown data sources. If you call getBySystemCode twice with the same argument, it is guaranteed that you get the same return object. However, there is no way to combine a new DataSource with a new FullName unless you use the "register" method.

    This way any number of pre-defined DataSources can be used, but plugins can define new ones and you can handle unknown data sources in the same way as predefined ones.

    Definitions for common DataSources can be found in org.bridgedb.bio.BioDataSource.

    • Method Detail

      • getKnownUrl

        public String getKnownUrl​(String id)
        Turn id into url pointing to info page on the web, e.g. "http://www.ensembl.org/get?id=ENSG..." Since version 2 this will return null if no pattern has been set.
        Parameters:
        id - identifier to use in url
        Returns:
        Url
      • urlPatternKnown

        public boolean urlPatternKnown()
        Check if a Url pattern is know for this DataSource.
        Returns:
        True if and only if a Url pattern is known/ has been registered
      • getFullName

        public String getFullName()
        returns full name of DataSource e.g. "Ensembl". May return null if only the system code is known. Also used as identifier in GPML
        Returns:
        full name of DataSource
      • getSystemCode

        public String getSystemCode()
        returns GenMAPP SystemCode, e.g. "En". May return null, if only the full name is known. Also used as identifier in
        1. Gdb databases,
        2. Gex databases.
        3. Imported data
        4. the Mapp format.
        We should try not to use the system code anywhere outside these 4 uses.
        Returns:
        systemcode, a short unique code.
      • getMainUrl

        public String getMainUrl()
        Return the main Url for this datasource, that can be used to refer to the datasource in general. (e.g. http://www.ensembl.org/) May return null in case the main url is unknown.
        Returns:
        main url
      • getType

        public String getType()
        Returns:
        type of entity that this DataSource describes, for example "metabolite", "gene", "protein", "interaction" or "probe"
      • getMiriamURN

        public String getMiriamURN​(String id)
        Creates a global identifier. It uses the MIRIAM data type list to create a MIRIAM URI like "urn:miriam:uniprot:P12345", or if this DataSource is not included in the MIRIAM data types list, a bridgedb URI.
        Parameters:
        id - Id to generate URN from.
        Returns:
        the URN.
        Since:
        Version 2.0.0
      • getCompactIdentifier

        public String getCompactIdentifier​(String id)
        Creates a compact identifier. It uses the MIRIAM data type list to create a compact identifier like "uniprot:P12345".
        Parameters:
        id - identifier to generate compact identifier from.
        Returns:
        the compact identifier.
        Since:
        Version 3.0.0
      • getIdentifiersOrgUri

        public String getIdentifiersOrgUri​(String id)
      • getAlternative

        public String getAlternative()
        Retrieves any saved alternative name.
        Returns:
        Saved alternative name or null if none is known
        Since:
        version 2.0.0
      • getDescription

        public String getDescription()
        Retrieves any saved description.
        Returns:
        Saved description or null if none is known
        Since:
        version 2.0.0
      • register

        public static DataSource.Builder register​(String sysCode,
                                                  String fullName)
        Register a new DataSource with (optional) detailed information. This can be used by other modules to define new DataSources. Note: Since version 2 this method is stricter. It will no longer allow an existing dataSource to have either its full name of sysCode changed.
        Parameters:
        sysCode - short unique code between 1-4 letters, originally used by GenMAPP
        fullName - full name used in GPML.
        Returns:
        Builder that can be used for adding detailed information.
      • registerAlias

        public void registerAlias​(String alias)
      • getExistingBySystemCode

        public static DataSource getExistingBySystemCode​(String systemCode)
        Parameters:
        systemCode - short unique code to query for
        Returns:
        pre-existing DataSource object by system code if it exists
        Throws:
        IllegalArgumentException - if no DataSource is known with this systemCode
      • systemCodeExists

        public static boolean systemCodeExists​(String systemCode)
        Check if a DataSoucre with this systemCode has been registered
        Parameters:
        systemCode - to check
        Returns:
        True if and only if a DataSource has been registered with this systemCode.
      • getExistingByFullName

        public static DataSource getExistingByFullName​(String fullName)
        returns pre-existing DataSource object
        Parameters:
        fullName - full name to query for
        Returns:
        DataSource
        Throws:
        IllegalArgumentException - if no DataSource is known with this systemCode
      • fullNameExists

        public static boolean fullNameExists​(String fullName)
        Check if a DataSoucre with this sfullName has been registered
        Parameters:
        fullName - to check
        Returns:
        True if and only if a DataSource has been registered with this systemCode.
      • getDataSources

        public static Set<DataSource> getDataSources()
        get all registered datasoures as a set.
        Returns:
        set of all registered DataSources
      • getFilteredSet

        public static Set<DataSource> getFilteredSet​(Boolean primary,
                                                     Boolean metabolite,
                                                     Object o)
        returns a filtered subset of available datasources.
        Parameters:
        primary - Filter for specified primary-ness. If null, don't filter on primary-ness.
        metabolite - Filter for specified metabolite-ness. If null, don't filter on metabolite-ness.
        o - Filter for specified organism. If null, don't filter on organism.
        Returns:
        filtered set.
      • getFullNames

        public static List<String> getFullNames()
        Get a list of all non-null full names.

        Warning: the ordering of this list is undefined. Two subsequent calls may give different results.

        Returns:
        List of full names
      • toString

        public String toString()
        The string representation of a DataSource is equal to it's full name. (e.g. "Ensembl")
        Overrides:
        toString in class Object
        Returns:
        String representation
      • getExample

        public Xref getExample()
        Returns:
        example Xref, mostly for testing purposes
      • isPrimary

        public boolean isPrimary()
        Returns:
        if this is a primary DataSource or not. Primary DataSources are preferred when annotating models. A DataSource is primary if it is not of type probe, so that means e.g. Affymetrix or Agilent probes are not primary. All gene, protein and metabolite identifiers are primary.
      • isDeprecated

        public boolean isDeprecated()
        A DataSource is deprecated if it is replaced by another data source which should be used instead. Even if this DataSource is deprecated, it does not imply it says what it is deprecated by.
        Returns:
        true if this DataSource is deprecated
      • isDeprecatedBy

        public DataSource isDeprecatedBy()
        Returns the DataSource that should be used instead if this DataSource is deprecated. This method may return null even if this DataSource is deprecated.
        Returns:
        if defined, the DataSource that should be used instead of this one
      • isMetabolite

        public boolean isMetabolite()
        Returns:
        if this DataSource describes metabolites or not.
      • getOrganism

        public Object getOrganism()
        Returns:
        Organism that this DataSource describes, or null if multiple / not applicable.
      • getByCompactIdentifierPrefix

        public static DataSource getByCompactIdentifierPrefix​(String prefix)
      • getByMiriamBase

        public static DataSource getByMiriamBase​(String base)
        Since version 2.0 this method will return null if no DataSource is known
        Parameters:
        base - the base urn, which must start with "urn:miriam:". It it isn't, null is returned.
      • getByIdentiferOrgBase

        public static DataSource getByIdentiferOrgBase​(String base)
        Since version 2.0 this method will return null if no DataSource is known
        Parameters:
        base - the base urn, which must start with "http://identifiers.org/". It it isn't, null is returned.
      • getCompactIdentifierPrefix

        public String getCompactIdentifierPrefix()
        Returns the compact identifier prefix (previously known as MIRIAM base.