Skip to content

Resources

johnmay edited this page Apr 9, 2013 · 11 revisions

Resources

Metingear stores data sets locally to improve performance and allow operations that would either not be possible with web-services or painfully slow. The name referencing for example could required 100+ queries per compound. Each local data set has a loader interface which can be configured with one or more resources. Resources come in different flavours and the accepted datatypes are flexible:

  • Resource File: A single file either on the local file system or a remote ftp/http site. The file may be compressed with Zip or GZip
  • Resource Directory: local file system directory

To load resources select the menu item Edit > Preferences > Resources

Edit>Preferences

This will open the preferences dialog

Edit>Preferences

If this is the first time you are configuring Metingear it is a good idea to check the resource root exists or change it. The default root will vary depending on the operating system:

  • OS X: ~/Library/Application Support/metingear/services
  • Windows: %APPDATA%/metingear/services on XP this would be C:\Documents and Settings\<user name>\Application Data
  • Linux: ~/.appdata/metingear/services

It is possible to change the path to a custom location by editing the path or browsing for a desired folder (recommended).

Edit>Preferences>Resources

If you change the location of the root you will be prompted to restart Metingear. If you do not restart then loading of resources (see. below) may be saved to the wrong directory.

Edit>Preferences>Resources

Loaders

Each local resource provides a loader for loading either local/remote resources into an 'index'. There are several operations permitted on each loader which may or may not be available to you. An option is available if the icon is bold.

Edit>Preferences>Resources

  • Delete - removes the current index and it's backup
  • Revert - reverts the current index to it's previous state, useful if you update the index and a service is no longer working as expected. This could happen if the loaded was interrupted or the provided file is not valid
  • Configure - change/add the locations of the required resources. Note: changes will not register until you close the configuration panel
  • Update - will use the resources provided in configuration to update the index
  • Cancel - interrupt the update - will automatically revert the index

A updating loader: Edit>Preferences>Resources

Configuring a resources

Edit>Preferences>Resources

ChEBI

Loading ChEBI data does not require any local files. ChEBI can provide Names, Chemical structure, Chemical Data and Cross-references via four different loaders. The files required by the ChEBI loaders are quite small and normally can be left at their default values.

All ChEBI loaders require the ChEBI Compounds file. This file is used to normalise all ChEBI entities to their primary identifiers (those which are exposed on the public website). The compounds file is located ftp://ftp.ebi.ac.uk/pub/databases/chebi/Flat_file_tab_delimited/compounds.tsv, each subsequence reference of compounds is referring to this file.

  • ChEBI Names - requires the following flat-files compounds and ftp://ftp.ebi.ac.uk/pub/databases/chebi/Flat_file_tab_delimited/names.tsv
  • ChEBI Data - requires the following flat-files compounds and ftp://ftp.ebi.ac.uk/pub/databases/chebi/Flat_file_tab_delimited/chemical_data.tsv
  • ChEBI Structures - requires the following flat-files compounds and ftp://ftp.ebi.ac.uk/pub/databases/chebi/SDF/ChEBI_complete.sdf.gz
  • ChEBI Cross-references - requires the following flat-files compounds and ftp://ftp.ebi.ac.uk/pub/databases/chebi/Flat_file_tab_delimited/reference.tsv.zip

KEGG Compound

KEGG Compound data is distributed in a single flat-file (kegg/ligand/compound) whilst the structures are located in (kegg/ligand/mol) directory. This loader has only been tested on the last publicly available flat-file (KEGG Compound v57).

  • KEGG Compound - requires the location of the compound file. The default location of this file will be <kegg data location>/ligand/compound.

  • KEGG Compound structure - requires the location of a local directory of KEGG compounds. By default this location will be <kegg data location>/ligand/mol/.

HMDB

  • HMDB XML - provides loading of names, chemical data and cross-references. This resource requires the directory of the HMDB XML files http://www.hmdb.ca/downloads with one entry per file. Although it is possible to load the data directly from the remote location it is recommended that you download the file first onto your local filesystem and then set the path of the loader to point there.

  • HMDB Metabocards (legacy) - provides loading of names, chemical data and cross-references. The only required resource is the Metabocards file http://www.hmdb.ca/public/downloads/current/metabocards.zip. Although it is possible to load the data directly from the remote location it is recommended that you download the file first onto your local filesystem and then set the path of the loader to point there

  • HMDB Structures - provides chemical structures for HMDB identifiers. Required files http://www.hmdb.ca/public/downloads/current/mcard_sdf_all.txt.gz

BioCyc

  • MetaCyc Compound - provides a service for searching and retrieving MetaCyc names, formula and charge. This loader required access the MetaCyc PDDG flat-files which are freely accessible but requires license registration (see BioCyc Flatfiles). This loader need the compounds.dat which is located ..../MetaCyc/16.1/data/compound.dat relative to your copy of MetaCyc flat file.

  • MetaCyc Structure - provides a service for searching and retrieving MetaCyc names, formula and charge. This loader required access the MetaCyc PDDG flat-files which are freely accessible but requires license registration (see BioCyc Flatfiles). Similar to the KEGG Compound loader this requires a folder for compound files. By default this folder is compressed to tar.gz which can not be directly opened (i.e. requires unzip then untar). To load the compound structure you must first manually unzip and untar the directory. Once complete the loader can be pointed to a path relative to your MetaCyc distribution .../MetaCyc/16.1/data/MetaCyc-MOLfiles/.

LIPID MAPs Structure Database (LMSD) - lipidmaps.org

  • LIPID MAPs Name - provides a service for searching the compound names within the LMSD. As this resource is freely distributed this index can be loaded remotely. This loader requires either a tsv file or a folder of tsv files, both of which may be zipped/gzipped. The single tsv option allows you to choose which section of the database to load (e.g. LMSD_20120412_FA.tsv for only Fatty Acids). The other option will load from a directory http://www.lipidmaps.org/downloads/LMSD_20120412_tsv.zip. This will load the names from file http://www.lipidmaps.org/downloads/LMSD_20120412_tsv.zip and will use the LMSD_20120412_All.tsv table.

  • LIPID MAPs Structure - provides a service for searching and fetching compounds by chemical structure. This resource can be loaded remotely or locally. The default location is http://www.lipidmaps.org/downloads/LMSDFDownload23Apr12.zip.

UniProt

* Taxonomy - provides organism name completion on creation of a new reconstruction via the [curated species list](http://www.uniprot.org/docs/speclist). Attribution - The UniProt Consortium Reorganizing the protein space at the Universal Protein Resource (UniProt) Nucleic Acids Res. 40: D71-D75 (2012).
* Cross-reference - provides cross-references from UniProt XML file. This is resource is currently only required for sequence annotation transfer. You can choose whether to load the SwissProt XML `ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.xml.gz` (currently ~800 MB Compressed) or TrEMBl XML `ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.xml.gz` (currently ~32 GB Compressed). The load can read these files compressed. To load this resource you should download the files separately and uncompress them. You can then configure the loaded to point at the specified file. Currently the SwissProt XML will take 10 minutes to load.