-
Notifications
You must be signed in to change notification settings - Fork 3
Tutorial 2: Handling Genome Data
This tutorial will guide you through import of genome data and linking it to metabolic and reaction information. We will be working on Bacillus subtilis (strain. 168)
Before you begin the tutorial there are a couple of files to download and optional configuration options. Some of these files are large and may take some time to download and configure. We will be integrating data on Bacillus Subtilis and require several data files for the basic part of the tutorial.
-
AL009126.xml
: Genome and Protein - available from European Nucleotide Archive (ENA) -
gb-2009-10-6-r69-s1.xls
: iBsu1103 reconstruction - available from Henry et al. 2009
Optional (required for linking reactions)
-
uniprot_sprot.xml.gz (800MB)
: UniProt SwissProt XML used to index cross-references. Download the compressed file to you computer then from the Resources menu configure the UniProt Cross-references loader to use to this file. On an uncompressed file the index creation should take less then ten minutes but this will depend on your machine.
The following outlines which values are imported from ENA and the XML attribute used for each. If possible each gene product will be associated with it's encoding gene.
- Gene
- id - automatically generated
- abbreviation - blank
- name -
locus_tag
- start, end - loaded and points to the chromosome sequence (also loaded) to give the sequence of the gene
- Gene Product (tRNA, rRNA and proteins)
The Genomes Page list completed genomes and links to their sequence page. To import a genome from the Genomes Page - Bacteria locate the sequence/html
in the table and navigate to that page.
On the page you can download the XML from this link.
With Metingear open, select the menu item, File > New Reconstruction
. In the organism name field start typing bacillus subtilis
and click Bacillus subtilis (strain 168)
from the drop down menu.
You may change the reconstruction identifier or leave it as it is (see Creating a Reconstruction). Click create
to add a new reconstruction to metingear.
The reconstruction should appear in the side bar.
With the newly created reconstruction, navigate to and select the menu item File > Import ENA Genome
.
The file will begin importing.
When the import is finished there will be 4457
genes and 4371
gene products loaded. There will also be several warnings relating to the afore mentioned limitations which can be closed by clicking the cross to the left of the error message.
If everything was successful, it is a good idea to save the current state of the reconstruction before we add the other data. Select File > Save As
or File > Save
(home directory by default) to save the reconstruction.
With the active reconstruction, select the menu item File > Import Excel
.
Choose the location of where you have downloaded gb-2009-10-6-r69-s1.xls
. With the location selected a wizard dialog will show which prompt you for the sheets which contain the reactions and proteins. If the reactions are spread across multiple sheets, such as, a separate table for exchange reactions, then you can import these later by rerunning the wizard on a different sheet. Ensure the reactions and the metabolites selection is Table S2
and Table S1
respectively and press next
.
We now need to configure the metabolites table to indicate the location of each required column. Configure the dialog setting:
- Data starts:
1
- Data end:
1140
- Identifier/Abbreviation:
A
- Name:
B
- Charge:
G
- Molecular Formula:
C
- KEGG cross-reference:
D
You can read more about the configuration here.
When you have configured the dialog, go to the next
page to configure the reaction import. The reaction fields should be set to:
- Start row:
2
- End row:
1437
- Identifier/Abbreviation:
A
- Name:
B
- Reaction Equation:
D
- we could also choose C which would then use metabolite names to identify metabolites (also select the name as Identifier in the metabolite sheet) but these are more ambiguous and in this case will not import properly. - Classification:
E
- Subsystem/Reaction Type:
G
With the metabolites and reactions configured we click next
(twice), and then okay
to begin the import. If you have ChEBI Names loaded as a local index (see Resources) then the metabolites will automatically be referenced to ChEBI (in addition to the existing KEGG annotations).
When the import is done you should have 1137
metabolites and 1436
- in this particular case there will be some warnings when the import is complete that it could not find information about the metabolite cpd00498
- this is due to an error in the input.
We now have data imported for genes, gene products, reactions and metabolites however although we can navigate between gene products and their encoding genes and metabolites and the reactions they participate in the reactions are not link to any gene products. There are several ways we can achieve this. One easy way is to link by the enzyme nomleclature EC number but we can also link by Locus
if the reactions have locus tags we can link to locus tags in gene products or the gene product id (if it is a locus id). Before we can link the reactions to gene products we first need to have such annotations pressent.
We can add EC numbers to gene products:
- manually -
Edit > Add Annotation
and select cross-reference - sequence homology -
Tools > Sequence Homology
andTools > Transfer Annotations
see Tutorial: Handling Protein Data - expanding cross-references - if we have some annotations on our gene products already we can expand out these references (i.e. transfer annotations)
You may have noticed we have UniProt annotations available on many gene products. We can expand out these annotations using the tool Tools > Annotation > Expand UniProt Annotations
(see also Tools/Expand UniProt Annotations).
Select the gene product view in metingear.
In this view press ctrl-A (⌘-A on OS X) to select all products.
With the selection active, choose Tools > Annotation > Expand UniProt Annotations
. There are no configuration options. If the menu item is not available make sure you have the SwissProt cross-references loaded (see Resources).
Sorting by enzyme classification, we can now see that around 600 products now have an EC number.
With the EC numbers annotated we can now link these to reactions. Select the menu item Tool > Associate > Reactions to Gene Products
(see also Tools/Associate, Reactions to Gene Products.
In the dialog, select E.C.
for both reactions and gene products and press okay.
The products with EC numbers will now be updated and associated with reactions.
Currently not available, see limitations.