-
Notifications
You must be signed in to change notification settings - Fork 3
Import
This page describes import actions available under the file menu.
- File
- Import...
- Peptide
- KGML
- Cross-references (navigates to Tools)
- Import Excel
- Import model-SEED
- Import SBML
- Import ENA Genome
- Import...
- Importing BioPax (via BioPax2SBML)
- If no reconstruction is open create a new reconstruction (see Creating Metabolic Entities)
- Select the menu option
File > Import SBML
- Choose the file you which to import and click
open
- Configure any import options
- Configure any unidentified compartments
Options
- merge entries
Unidentified Compartments
Metingear will attempt to resolve the compartment names to one of several defined compartments. If the compartment can not be identified or is ambiguous a popup will interrupt the loading of the file and request the selection of the unidentified compartment.
Below is an example where a model has a compartment named cell which is too general for the current definitions. A list of possible compartments is displayed with appropriate GO Terms and in this case cytoplasm is selected.
Another common case would be in and out. For these simple cases is suffices to annotated each compartment as cytoplasm and extracellular.
Currently Metingear does not define a boundary compartment.
imported data
The following details how basic data is converted.
-
sbml:name
is loaded as a metabolite/reaction name -
sbml:id
is loaded as a metabolite/reaction id -
sbml:metaid
is loaded as metabolite/reaction abbreviation - annotations with MIRIAM URNs (e.g.
urn:miriam:kegg.compound:C00009
) and identifiers.org URLs (e.g.http://identifiers.org/kegg.compound/C00009/
) load as cross-references of the relavent species - annotations with InChIs from
rdf.openmolecules.net
(e.g.http://rdf.openmolecules.net?InChI=1/CH4/h1H4
) are loaded as InChI annotations (link) - comments are loaded as Comment annotations
Models/Networks in a worksheet are unstructured and thus additional knowledge of the data type and location is required. Metingear provides a dialog wizard to guide the user through the import process.
- If no reconstruction is open create a new reconstruction (see Creating Metabolic Entities)
- Select the menu option
File > Import Excel
- Choose the file you which to import and click
open
. Currently only.xls
and not.xlsx
files are supported. - Select which sheets list the reactions/metabolites - a guess is attempted but some worksheets may contain multiple reaction sheets (e.g. internal and exchange reactions).
- Proceed to the next step
- Select a range of continuos block of data (i.e. no blank links separator rows)
- Select which columns in the reaction sheet refer to the predefined types
- Proceed to the next step
- Select a range of continuos block of data (i.e. no blank links separator rows)
- Select which columns in the metabolite sheet refer to the predefined types
- Proceed to the next step
- Click finish to import the reconstruction
selection dialog
When the range is selected, only data between the given rows is imported. This is indicated in the preview at the bottom of the dialog as grey row (which will not be loaded).
There are several other actions within the dialog. The follow diagram depicts action (red) and if there is a subtle result (green).
imported data
As the data is unstructured there may be more information in one reconstruction compared to another. At the core all that is required is a well formated reaction equation (see Creating Metabolic Entities) and unique metabolite identifiers. The following details a list of all importable data, required fields are marked in bold.
reaction table (basic)
- equation describes the participants of a reaction either referring by metabolite abbreviation/name [required]
- name loaded as the name of the reaction [optional]
- abbreviation/id loaded as a reaction abbreviation [optional]
- locus, subsystem, references loaded as direct annotation [optional]
- classification is parsed and matched to either an Enzyme or Transport Classification number
metabolite table
- abbreviation/id loaded as the metabolite abbreviation [required]
- name loaded as the metabolite name [optional]
- charge loaded as the formal charge annotation on the metabolite [optional]
- molecular formula loaded as the molecular formula annotation on the metabolite
- compartment sometimes the compartment is specified on the metabolite table - providing this option will override any identifier compartments within the reaction equation [optional]
- KEGG, ChEBI and PubChem - loaded as cross-reference the specified resource
reaction table (advanced)
- gibbs free energy/error loaded as a single annotation of the Gibbs Free Energy with an error range [optional]
-
direction indicates the direction of a reaction, only required if the direction is not specified in the reaction equation (e.g.
a + b = c + d
has no direction). If no direction can be identifier the direction is loaded as unknown [option] - flux bounds loads the lower/upper flux bounds of a reaction to appropriate annotations [optional]
Please note Gene and protein tables are not yet imported (planned) but if locus information on a reaction is provided then these links can resolved after import.
- If no reconstruction is open create a new reconstruction (see Creating Metabolic Entities)
- Select the menu option
File > Import > KGML
- Choose the file you which to import and click
open
imported data
The KGML file only provides the compound and reaction identifies. As such the compound identifier is loaded as the id, name, abbreviation and as a cross-reference. The cross-reference can then be used to transfer information form the compound entry. The reaction participants are preserved but as with the compound no human identifiable names are loaded.
Reconstructions from the popular model-SEED can be imported without selecting where the data is. With an active project (see Creating Metabolic Entities) the menu item File > Import model-SEED (xls)
will open a file chooser. Selecting the desired file will import all available data on metabolites and reactions.
Important: only complete genes are fully supported, if a genome is loaded which has genes split across multiple contigs and error may occur when it is loaded.
Genome data can be imported from ENA .xml
files into an active project. These genomes can be downloaded from http://www.ebi.ac.uk/genomes/ and selecting the download option.
Metingear will import the genome sequence across the marked genes as well as the protein sequence. Any recognised cross-references (e.g. Uniprot, InterPro) are converted to cross-references on the relavent genes and proteins.
FASTA formatted sequences are imported as empty proteins into an active project. The FASTA identifier (everything before the first space) is imported as the protein id and the rest is loaded as the name.
>id a longer description of the protein
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFIL
where id is imported as the id and a longer description of the protein is loaded as the protein name. identifiers with resource prefixes are supported. That is gb|AAD44166.1
will be loaded as a GeneBank identifier.
Metingear does not currently support BioPax import but using the online tool BioPax2SBML we can convert the owl
file into something readable. Unfortunately the transition isn't seamless and there are a few hacks which need to be made.
This section will demonstrate how to import from BioPax using a fragment from Rhea - annotated reaction databases : (RHEA fragment download). With the file downloaded we unzip the owl
file and upload it to BioPax2SBML
once the file has been uploaded
Select the convert tool
When the conversion has finished, download the file and uncompress it.
We now have to modify the XML slightly, SBML version 3 does not support annotations (need to be loaded by a module) - unfortunately that module isn't complete yet so we now to manually downgrade the SBML version. You can still import Level 3 XML but there will be no annotations. With the SBML file open in a text editor change the <sbml
tag.
Head of XML
<?xml version='1.0' encoding='UTF-8' standalone='no'?>
<!-- Created by BioPAX2SBML version 1.0 on 2013-04-11 17:39 with JSBML version 1.0-rc1. -->
<sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" qual:required="false" level="3" xmlns:qual="http://www.sbml.org/sbml/level3/version1/qual/version1" version="1">
<model id="SId_83805498" name="/rahome/webservices/galaxy/database/files/003/dataset_3078_files/10000-11219-rhea-biopax_lite.owl" metaid="meta_SId_83805498" timeUnits="time" substanceUnits="substance" volumeUnits="volume">
...
original tag
<sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" qual:required="false" level="3" xmlns:qual="http://www.sbml.org/sbml/level3/version1/qual/version1" version="1">
changed tag
<sbml xmlns="http://www.sbml.org/sbml/level2" qual:required="false" level="2" version="4">
the head of the file should now look like this
<?xml version='1.0' encoding='UTF-8' standalone='no'?>
<!-- Created by BioPAX2SBML version 1.0 on 2013-04-11 17:39 with JSBML version 1.0-rc1. -->
<sbml xmlns="http://www.sbml.org/sbml/level2" qual:required="false" level="2" version="4">
<model id="SId_83805498" name="/rahome/webservices/galaxy/database/files/003/dataset_3078_files/10000-11219-rhea-biopax_lite.owl" metaid="meta_SId_83805498" timeUnits="time" substanceUnits="substance" volumeUnits="volume">
...
You can now import the file using Import SBML.