This repository has been archived by the owner on Sep 24, 2019. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 4
Dataset Objects
Natalie Catlett edited this page Sep 18, 2015
·
15 revisions
This page contains information about the dataset objects defined in datasets.py. Some object classes inherit from multiple parent classes (e.g., HGNCData is a NamespaceDataSet, OrthologyData, and HistoryDataSet) .
classes:
-
DataSet
- contains any data relevant to the BEL Namespace and Annotation resource generator pipeline
- attributes:
- prefix - prefix for data set
- dictionary - dictionary containing data, built in parsed module
- methods:
- get_values - returns all non-obsolete values used as keys in the data dictionary
- _str_ - returns identifying string for data object
-
NamespaceDataSet
- contains data for BEL Namespaces and Annotations including ids, terms, synonyms, and equivalences
- parent class - DataSet
- attributes (in addition to parent class attributes):
- name - name for namespace (can be same as prefix)
- ids - flag to produce .belns file with ids, default=False
- labels - flag to produce .belns file with preferred labels, default=True
- domain - list containing domain(s) of term in the namespace (e.g., "chemical", "gene and gene product"), default = ['other']
- scheme_type - list containing 'ns' (namespace) and/or 'anno' (annotation) to indicate if data is used to build namespace and/or annotation files, default=['ns']
- methods (in addition to parent class methods):
- get_label(term_id) - returns the value to be used as the preferred label for an associated term_id
- get_name(term_id) - returns the term name to use as a title (or None)
- get_xrefs(term_id) - returns equivalences for the term_id to other namespaces, in the case where the data set object is the source information. Returned as set of terms expressed as PREFIX:ID
- get_species(term_id) - returns species associated with a term_id as NCBI tax ID, or None as applicable
- get_encoding(term_id) - returns the encoding (allowed bel functions) for the term_id
- get_concept_type(term_id) - if from an annotation concept schemes, returns set of AnnotationConcept types associated with then term_id
- get_alt_symbols(term_id) - returns set of synonym symbols associated with the term_id
- get_alt_names(term_id) - returns set of name synonyms associated with the term_id
- get_alt_ids(term_id) - returns set of alternative ids associated with the term_id
- write_ns_values(dir) - writes .belns file(s) to specified dir (uses write_data)
- write_data(data, dir, name) - writes .belns file
-
HistoryDataSet
- contains information about obsolete (withdrawn and/or replaced) ids
- used for change_log and rdf output, not .belns and .beleq file generation
- parent class - DataSet
- methods:
- get_id_update(term_id) - returns updated value for a given term, "withdrawn", or None (if no replacement information)
- get_obsolete_ids - returns dictionary with all obsolete ids and current value
-
OrthologyDataSet
- contains orthology relationship data
- parent class - DataSet
- methods
- get_orthologs(term_id) - returns set of orthologs associated with term_id