Skip to content
This repository has been archived by the owner on Sep 24, 2019. It is now read-only.

Adding new Namespace datasets

ncatlett edited this page Feb 28, 2014 · 18 revisions

File format

A 'standard' format is available to use for adding new namespaces to the resource-generation pipeline. This format is a tab-delimited text file with the following columns:

  1. ID - _a unique identifier for the namespace value required
  2. ALTIDS - any alternative ids
  3. LABEL - the preferred label for the namespace value required
  4. SYNONYM - alternative labels, pipe-delimited
  5. DESCRIPTION - documentation text
  6. TYPE - the encoding for the namespace value (e.g., 'O' for pathology, 'C' for complex) required
  7. SPECIES - the species associated with the namespace value, if any
  8. XREF - equivalent values from other BEL namespaces, pipe-delimited. Must include a recognized prefix to be used for generating equivalences
  9. OBSOLETE - flag obsolete values with '1'
  10. PARENTS - any parent terms, valid for ID isA PARENT
  11. CHILDREN - any child terms, valid for CHILD isA ID General information can be included at the top of the file, but must be preceded with a '#'.

Example data

Examples of namespace data in this format can be found for the following namespaces:

  1. SFAM - Selventa protein families
  2. SCHEM - Selventa legacy chemical names
  3. SCOMP - Selventa named complexes
  4. SDIS - Selventa legacy diseases

Integration into resource-generator pipeline

To add a namespace dataset in this format to your resource-generator pipeline, the following steps are required:

  • In configuration.py:
    • initialize data object my_data = StandardCustomData(name='my-namespace-name', prefix='my-prefix')
Clone this wiki locally