pretzel-data

Data structure resources for plantinformatics/pretzel.

High level structure

Data is organised into datasets which contain blocks. Think of a dataset as a container for a collection of data which can be naturally organised into subsets which form blocks. Blocks must have scope which differentiates them from other blocks.

For example, we can define a simple physical genome as follows:

{
  "name": "myGenome",
  "meta": {
    "year": "2018"
  },
  "blocks": [
    {
      "scope": "1A",
      "featureType": "linear",
      "range": [
        1,
        500000000
      ]
    },
    {
      "scope": "1B",
      "featureType": "linear",
      "range": [
        1,
        450000000
      ]
    }
  ]
}

myGenome has two chromosomes (which correspond to blocks here), 1A and 1B. featureType indicates the type of feature contained in the block. Presently, all data is linear, which defines a range and features as positions or sub-ranges within the range. The plan is for future data such as genotypes to be of observational type.

The meta field contains a set of arbitrary key-value pairs. It can be empty. It can be used to record any associated metadata, such as publication DOI, year, source, details of the organism, variety, etc.

The above dataset defines a physical genome with two chromosomes of given sizes. This is like defining the "reference" in a genome viewer tool. Next, we want to define an annotation inside this space.

Defining features inside a dataset

We define features, such as genes, inside another dataset by using the field parent:

{
    "name": "myAnnotation",
    "parent": "myGenome",
    "namespace" : "myGenome:myAnnotation",
    "blocks": [
        {
            "scope": "1A",
            "featureType": "linear",
            "features": [
                {
                    "name": "my1AGene1",
                    "range": [
                        3000,
                        5150
                    ]
                },
                ...

By specifying the parent as myGenome (defined previously), we are indicating that the scope we reference in the following dataset is refering to the parent blocks. Here we have defined a gene my1AGene1 spanning positions 3000 to 5150 in chromosome (scope) 1A. Negative orientation of a gene can be defined by having the second value in the range smaller than the first.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
public		public
public_maps		public_maps
LICENSE		LICENSE
README.md		README.md
aliases.json		aliases.json
myAnnotation.json		myAnnotation.json
myDataset.json		myDataset.json
myIndividual.json		myIndividual.json
myMap.json		myMap.json
myMap2.json		myMap2.json
myMap3.json		myMap3.json
myMarkers.json		myMarkers.json
myMarkers2.json		myMarkers2.json
mySample.json		mySample.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pretzel-data

High level structure

Defining features inside a dataset

About

Releases

Packages

Contributors 3

License

plantinformatics/pretzel-data

Folders and files

Latest commit

History

Repository files navigation

pretzel-data

High level structure

Defining features inside a dataset

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages