Mapping Partial Structural Representations from Lipid Shorthand Nomenclature with Goslin #12
nilshoffmann
announced in
Hackathon proposals
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Title
Mapping Partial Structural Representations from Lipid Shorthand Nomenclature with Goslin
Abstract
Goslin is the first grammar-based computational library for the recognition/parsing and normalization of lipid names following the hierarchical lipid shorthand nomenclature. The new version Goslin 2.0 implements the latest nomenclature and adds an additional grammar to recognize systematic IUPAC-IUB fatty acyl names as stored, e.g., in the LIPID MAPS database and is perfectly suited to update lipid names in LIPID MAPS or HMDB databases to the latest nomenclature. Goslin 2.0 is also available as a standalone web application with a REST API, implemented in Java, as well as C++, C#, Java, Python 3, and R libraries. Importantly, it can be easily included in lipidomics tools and scripts providing direct access to translation functions to support the following general tasks:
In this hackathon project, we want to extend Goslin's generation functions to output SMILES / SMARTS representations of a given lipid shorthand name, similar to the pre-computed SMILES available from Swiss Lipids. We want to explore the possibility to use other SMILES / RxSMILES flavours to represent more structural details that are available from the short hand name. Another goal is to use this new functionality to extend the Goslin Webapp to provide cross-links from lipid names to public studies in repositories like MetaboLights and to make the newly added (partial) structural information available to downstream cheminformatics / bioinformatics tools.
Project Plan
We will organise our hackathon project around the following main tasks:
We will use the de.NBI Cloud to perform repository scale analysis of MetaboLights and (if possible) Metabolomics Workbench studies to generate shorthand names and will update the current Goslin Webapp UI to give more concise entry point to find occurrences of individual lipids or bulk queries.
After the hackaton we will continue the integration and evaluation activities, release updated versions of the different Goslin implementations, update the documentation and start writing a manuscript that reports and summarises the updates.
Technical Details
Programming languages: Python, C++, R, Java
Will build on existing software: Goslin (different language implementations), RefMet, SwissLipids
These databases will be used: LIPID MAPS, Swiss Lipids, HMDB, ChEBI, ...
These datasets will be used: MetaboLights, Metabolomics Workbench (as use cases to test and optimise parsing / translation)
Contact information
Nils Hoffmann
Institute of Bio- and Geosciences, ELIXIR Germany, Forschungszentrum Jülich, Outstation Bielefeld
[email protected]
Dominik Kopczynski
Institute of Analytical Chemistry, University of Vienna, Vienna
[email protected]
Beta Was this translation helpful? Give feedback.
All reactions