1
+
2
+ .. image:: http://genomicsandhealth.org/files/logo_ga.png
3
+
1
4
==============================
2
5
GA4GH Reference Implementation
3
6
==============================
4
7
5
- A reference implementation of the APIs defined in the schemas repository.
6
-
7
- *************************
8
- Initial skeleton overview
9
- *************************
8
+ This is a prototype for the GA4GH reference client and
9
+ server applications. It is under heavy development, and many aspects of
10
+ the layout and APIs will change as requirements are better understood.
11
+ If you would like to help, please check out our list of
12
+ `issues <https://github.com/ga4gh/server/issues>`_!
10
13
11
- This is a proposed skeleton layout for the GA4GH reference client and
12
- server applications. As such, nothing is finalised and all aspects of
13
- the design and implementation are open for discussion and debate. The overall
14
- goals of the project are:
14
+ Our aims for this implementation are:
15
15
16
16
Simplicity/clarity
17
17
The main goal of this implementation is to provide an easy to understand
@@ -36,45 +36,15 @@ Ease of use
36
36
make installing the ``ga4gh`` reference code very easy across a range of
37
37
operating systems.
38
38
39
-
40
-
41
- *************
42
- Trying it out
43
- *************
44
-
45
- The project is designed to be published as a `PyPI <https://pypi.python.org/pypi>`_
46
- package, so ultimately installing the reference client and server programs
47
- should be as easy as::
48
-
49
- $ pip install ga4gh
50
-
51
- However, the code is currently only a proposal, so it has not been uploaded to
52
- the Python package index. The best way to try out the code right now is to
53
- use `virtualenv <http://virtualenv.readthedocs.org/en/latest/>`_. After cloning
54
- the git repo, and changing to the project directory, do the following::
55
-
56
- $ virtualenv testenv
57
- $ source testenv/bin/activate
58
- $ python setup.py install
59
-
60
- This should install the ``ga4gh_server`` and ``ga4gh_client`` scripts into the
61
- virtualenv and update your ``PATH`` so that they are available. When you have
62
- finished trying out the programs you can leave the virtualenv using::
63
-
64
- $ deactivate
65
-
66
- The virtualenv can be restarted at any time, and can also be deleted
67
- when you no longer need it.
68
-
69
39
********************************
70
40
Serving variants from a VCF file
71
41
********************************
72
42
73
- Two implementations of the variants API is available that can serve data based
74
- on existing VCF files. This backends are based on tabix and `wormtable
75
- <http://www.biomedcentral.com/1471-2105/14/356>`_, which is a Python library
76
- to handle large scale tabular data. See `Wormtable backend`_ for instructions
77
- on serving VCF data from the GA4GH API.
43
+ Two implementations of the variants API are available that can serve data based
44
+ on existing VCF files. These backends are based on tabix and `wormtable
45
+ <http://www.biomedcentral.com/1471-2105/14/356>`_, which is a Python library to
46
+ handle large scale tabular data. See `Wormtable backend`_ for instructions on
47
+ serving VCF data from the GA4GH API.
78
48
79
49
*****************
80
50
Wormtable backend
@@ -159,38 +129,41 @@ building and indexing such large tables.
159
129
Tabix backend
160
130
*****************
161
131
162
- The tabix backend allows us to serve variants from an arbitrary VCF file.
163
- The VCF file must first be indexed with `tabix <http://samtools.sourceforge.net/tabix.shtml>`_.
164
- Many projects, including the `1000 genomes project
165
- <http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/>`_, release files with tabix
166
- indicies already precomputed. This backend can serve such datasets without any
167
- preprocessing via the command:
132
+ The tabix backend allows us to serve variants from an arbitrary VCF file. The
133
+ VCF file must first be indexed with `tabix
134
+ <http://samtools.sourceforge.net/tabix.shtml>`_. Many projects, including the
135
+ `1000 genomes project
136
+ <http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/>`_, release files
137
+ with tabix indicies already precomputed. This backend can serve such datasets
138
+ without any preprocessing via the command::
168
139
169
- $ python ga4gh/scripts/server.py tabix DATADIR
140
+ $ ga4gh_server tabix DATADIR
170
141
171
- where DATADIR is a directory that contains folders of tabix-indexed VCF file(s). There cannot
172
- be more than one VCF file in any subdirectory that has data for the same reference contig.
142
+ where DATADIR is a directory that contains subdirectories of tabix-indexed VCF
143
+ file(s). There cannot be more than one VCF file in any subdirectory that has
144
+ data for the same reference contig.
173
145
174
146
******
175
147
Layout
176
148
******
177
149
178
- The code for the project is held in the ``ga4gh`` package, which corresponds
179
- to the ``ga4gh`` directory in the project root. Within this package,
180
- the functionality is split between the ``client``, ``server`` and
181
- ``protocol`` modules. There is also a subpackage called ``scripts``
182
- which holds the code defining the command line interfaces for the
150
+ The code for the project is held in the ``ga4gh`` package, which corresponds to
151
+ the ``ga4gh`` directory in the project root. Within this package, the
152
+ functionality is split between the ``client``, ``server``, ``protocol`` and
153
+ ``cli`` modules. The ``cli`` module contains the definitions for the
183
154
``ga4gh_client`` and ``ga4gh_server`` programs.
184
155
185
- For development purposes, it is useful to be able to run the command
186
- line programs directly without installing them. To do this, make hard links
187
- to the files in the scripts directory to the project root and run them
188
- from there; e.g::
156
+ For development purposes, it is useful to be able to run the command line
157
+ programs directly without installing them. To do this, use the
158
+ ``server_dev.py`` and ``client_dev.py`` scripts. (These are just shims to
159
+ facilitate development, and are not intended to be distributed. The
160
+ distributed versions of the programs are packaged using the setuptools
161
+ ``entry_point`` key word; see ``setup.py`` for details). For example, the run
162
+ the server command simply run::
189
163
190
- $ ln ga4gh/scripts/server.py .
191
- $ python server.py
192
- usage: server.py [-h] [--port PORT] [--verbose] {help,simulate} ...
193
- server.py: error: too few arguments
164
+ $ python server_dev.py
165
+ usage: server_dev.py [-h] [--port PORT] [--verbose] {help,wormtable,tabix} ...
166
+ server_dev.py: error: too few arguments
194
167
195
168
++++++++++++
196
169
Coding style
@@ -199,6 +172,13 @@ Coding style
199
172
The code follows the guidelines of `PEP 8
200
173
<http://legacy.python.org/dev/peps/pep-0008>`_ in most cases. The only notable
201
174
difference is the use of camel case over underscore delimited identifiers; this
202
- is done for consistency with the GA4GH API. The code was checked for compliance
175
+ is done for consistency with the GA4GH API. Code should be checked for compliance
203
176
using the `pep8 <https://pypi.python.org/pypi/pep8>`_ tool.
204
177
178
+
179
+ **********
180
+ Deployment
181
+ **********
182
+
183
+ *TODO* Give simple instructions for deploying the server on common platforms
184
+ like Apache and Nginx.
0 commit comments