Adding new Namespace datasets

File format

A 'standard' format is available to use for adding new namespaces to the resource-generation pipeline. This format is a tab-delimited text file with the following columns:

ID - _a unique identifier for the namespace value required
ALTIDS - any alternative ids
LABEL - the preferred label for the namespace value required
SYNONYM - alternative labels, pipe-delimited
DESCRIPTION - documentation text
TYPE - the encoding for the namespace value (e.g., 'O' for pathology, 'C' for complex) required
SPECIES - the species associated with the namespace value, if any
XREF - equivalent values from other BEL namespaces, pipe-delimited. Must include a recognized prefix to be used for generating equivalences
OBSOLETE - flag obsolete values with '1'
PARENTS - any parent terms, valid for ID isA PARENT
CHILDREN - any child terms, valid for CHILD isA ID General information can be included at the top of the file, but must be preceded with a '#'.

Example data

Examples of namespace data in this format can be found for the following namespaces:

SFAM - Selventa protein families
SCHEM - Selventa legacy chemical names
SCOMP - Selventa named complexes
SDIS - Selventa legacy diseases

Integration into resource-generator pipeline

To add a namespace dataset in this format to your resource-generator pipeline, the following steps are required:

In configuration.py:
initialize data object (_NOTE - the data object is expected to be named using the prefix for your namespace, followed by 'data')

my_data = StandardCustomData(name='my-namespace-name', prefix='my')

configure dataset by adding to baseline_data. baseline_data is an ordered dictionary containing information for all of the data files used by gp_baseline.py. baseline_data maps data file names to a tuple containing [1] file location, [2] the file parser (in parsers.py, and [3] the data object to store the parsed data.

baseline_data['my_file_name'] = ('file_location', parsers.NamespaceParser, my_data)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding new Namespace datasets

File format

Example data

Integration into resource-generator pipeline

Clone this wiki locally