-
Notifications
You must be signed in to change notification settings - Fork 25
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
6 changed files
with
724 additions
and
756 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,17 @@ | ||
__author__ = 'Yuri E. Corilo' | ||
__version__ = '2.0.0' | ||
__doc__ = ''' | ||
[![DOI](https://zenodo.org/badge/265072913.svg)](https://zenodo.org/badge/latestdoi/265072913) | ||
![CoreMS Logo](docs/CoreMS.COLOR_small.png) | ||
<div align="left"> | ||
<br> | ||
<br> | ||
<a href="https://doi.org/10.5281/zenodo.4641552"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.4641552.svg" alt="CoreMS DOI" style="width:150px;height:25px;"></a> | ||
<br> | ||
</div> | ||
*** | ||
# Table of Contents | ||
- Introduction | ||
- [CoreMS](#CoreMS) | ||
|
@@ -18,9 +27,9 @@ | |
- [Installation](#corems-installation) | ||
- [Thermo Raw File on Mac and Linux](#thermo-raw-file-access) | ||
- Execution: | ||
- [Jupyter Notebook and Docker containers](#molecular-database-and-jupyter-notebook-containers) | ||
- [Jupyter Notebook and Docker containers](#docker-stack) | ||
- [Simple Example](#simple-script-example) | ||
- [Python Examples](examples/examples) | ||
- [Python Examples](examples/scripts) | ||
- [Jupyter Notebook Examples](examples/notebooks) | ||
|
@@ -34,21 +43,21 @@ | |
Data handling and software development for modern mass spectrometry (MS) is an interdisciplinary endeavor requiring skills in computational science and a deep understanding of MS. To enable scientific software development to keep pace with fast improvements in MS technology, we have developed a Python software framework named CoreMS. The goal of the framework is to provide a fundamental, high-level basis for working with all mass spectrometry data types, allowing custom workflows for data signal processing, annotation, and curation. The data structures were designed with an intuitive, mass spectrometric hierarchical structure, thus allowing organized and easy access to the data and calculations. Moreover, CoreMS supports direct access for almost all vendors’ data formats, allowing for the centralization and automation of all data processing workflows from the raw signal to data annotation and curation. | ||
- reproducible pipeline | ||
CoreMS aims to provide | ||
- logical mass spectrometric data structure | ||
- self-containing data and metadata storage | ||
- modern molecular formulae assignment algorithms | ||
- dynamic molecular search space database search and generator | ||
## Current Version | ||
### `1.5.1` | ||
`2.0` | ||
## Main Developers/Contact | ||
- [Yuri. E. Corilo](mailto:[email protected]) | ||
- [William Kew](mailto:[email protected]) | ||
## Data formats | ||
### Data input formats | ||
- Bruker Solarix (CompassXtract) | ||
|
@@ -61,10 +70,7 @@ | |
- CoreMS exported processed mass list files(excel, .csv, .txt, pandas dataframe as .pkl) | ||
- CoreMS self-containing Hierarchical Data Format (.hdf5) | ||
- Pandas Dataframe | ||
- Support for Could Storage using s3path.S3path | ||
see examples of usage here: | ||
- [S3 Support](tests/s3_test.py) | ||
- Support for cloud Storage using s3path.S3path(see examples of usage here: [S3 Support](tests/s3_test.py)) | ||
### Data output formats | ||
|
@@ -78,54 +84,32 @@ | |
- LC-MS | ||
- GC-MS | ||
- IMS-MS (`TODO`) | ||
- LC-IMS-MS (`TODO`) | ||
- Collections (`TODO`) | ||
- Transient | ||
- Mass Spectra | ||
- Mass Spectrum | ||
- Mass Spectral Peak | ||
- Molecular Formula | ||
- Molecular Structure (`TODO`) | ||
### In progress data structures | ||
- IMS-MS | ||
- LC-IMS-MS | ||
- Collections | ||
- Molecular Structure | ||
--- | ||
## Available features | ||
### FT-MS Signal Processing | ||
### FT-MS Signal Processing, Calibration, and Molecular Formula Search and Assignment | ||
- Apodization, Zerofilling, and Magnitude mode FT | ||
- Manual and automatic noise threshold calculation | ||
- Peak picking using apex quadratic fitting | ||
- Experimental resolving power calculation | ||
### GC-MS Signal Processing | ||
- Baseline detection, subtraction, smoothing | ||
- m/z based Chromatogram Peak Deconvolution, | ||
- Manual and automatic noise threshold calculation | ||
- First and second derivatives peak picking methods | ||
- Peak Area Calculation | ||
### GC-MS Calibration | ||
- Retention Index Calibration | ||
### GC-MS Compound Identification | ||
- Automatic local (SQLite) or external (MongoDB or PostgreSQL) database check, generation, and search | ||
- Automatic molecular match algorithm with all spectral similarity methods | ||
### FT-MS Calibration | ||
- Frequency and m/z domain calibration functions: | ||
- LedFord equation [ref] | ||
- LedFord equation | ||
- Linear equation | ||
- Quadratic equation | ||
- Automatic search most abundant **Ox** homologue series | ||
- Step fit ('walking calibration") based on the LedFord equation [ref] | ||
### FT-MS Molecular formulae search and assignment | ||
- Automatic local (SQLite) or external (PostgreSQL) database check, generation, and search | ||
- Automatic molecular formulae assignments algorithm for ESI(-) MS for natural organic matter analysis | ||
- Automatic fine isotopic structure calculation and search for all isotopes | ||
|
@@ -134,7 +118,18 @@ | |
- Kendrick classification | ||
- Heteroatoms classification and visualization | ||
### High Resolution Mass spectrum simulations | ||
### GC-MS Signal Processing, Calibration, and Compound Identification | ||
- Baseline detection, subtraction, smoothing | ||
- m/z based Chromatogram Peak Deconvolution, | ||
- Manual and automatic noise threshold calculation | ||
- First and second derivatives peak picking methods | ||
- Peak Area Calculation | ||
- Retention Index Calibration | ||
- Automatic local (SQLite) or external (MongoDB or PostgreSQL) database check, generation, and search | ||
- Automatic molecular match algorithm with all spectral similarity methods | ||
### High Resolution Mass Spectrum Simulations | ||
- Peak shape (Lorentz, Gaussian, Voigt, and pseudo-Voigt) | ||
- Peak fitting for peak shape definition | ||
|
@@ -143,7 +138,7 @@ | |
- Calculated ICR Resolving Power based on magnetic field (B), and transient time(T) | ||
--- | ||
## CoreMS Installation | ||
## Installation | ||
```bash | ||
pip install corems | ||
|
@@ -157,16 +152,10 @@ | |
docker-compose up -d | ||
``` | ||
- Change the url_database on MSParameters.molecular_search.url_database to: | ||
- Change the url_database on MSParameters.molecular_search.url_database to: "postgresql+psycopg2://coremsappdb:coremsapppnnl@localhost:5432/coremsapp" | ||
- Set the url_database env variable COREMS_DATABASE_URL to: "postgresql+psycopg2://coremsappdb:coremsapppnnl@localhost:5432/coremsapp" | ||
"postgresql+psycopg2://coremsappdb:coremsapppnnl@localhost:5432/coremsapp" | ||
- Set the url_database env variable COREMS_DATABASE_URL to: | ||
"postgresql+psycopg2://coremsappdb:coremsapppnnl@localhost:5432/coremsapp" | ||
--- | ||
## Thermo Raw File Access: | ||
### Thermo Raw File Access: | ||
To be able to open thermo file a installation of pythonnet is needed: | ||
- Windows: | ||
|
@@ -181,11 +170,11 @@ | |
``` | ||
--- | ||
### Another option is to run the docker stack that will start the CoreMS containers: | ||
## Docker stack | ||
--- | ||
Another option to use CoreMS is to run the docker stack that will start the CoreMS containers | ||
## Molecular Database and Jupyter Notebook Containers | ||
### Molecular Database and Jupyter Notebook Docker Containers | ||
A docker container containing: | ||
- A custom python distribution will all dependencies installed | ||
|
@@ -238,7 +227,7 @@ | |
___ | ||
## Simple Script Example | ||
More examples can be found under the directory docs/example, docs/notebooks | ||
More examples can be found under the directory examples/scripts, examples/notebooks | ||
- Basic functionality example | ||
|
@@ -250,75 +239,77 @@ | |
file_path= 'tests/tests_data/ftms/ESI_NEG_SRFA.d' | ||
#Bruker Solarix class reader | ||
# Instatiate the Bruker Solarix reader with the filepath | ||
bruker_reader = ReadBrukerSolarix(file_path) | ||
#access the transient object | ||
# Use the reader to instatiate a transient object | ||
bruker_transient_obj = bruker_reader.get_transient() | ||
#calculates the transient duration time | ||
# Calculate the transient duration time | ||
T = bruker_transient_obj.transient_time | ||
#access the mass spectrum object | ||
# Use the transient object to instatitate a mass spectrum object | ||
mass_spectrum_obj = bruker_transient_obj.get_mass_spectrum(plot_result=False, auto_process=True) | ||
# - search monoisotopic molecular formulas for all mass spectral peaks | ||
# - calculate fine isotopic structure based on monoisotopic molecular formulas found and current dynamic range | ||
# - search molecular formulas of correspondent calculated isotopologues, | ||
# The following SearchMolecularFormulas function does the following | ||
# - searches monoisotopic molecular formulas for all mass spectral peaks | ||
# - calculates fine isotopic structure based on monoisotopic molecular formulas found and current dynamic range | ||
# - searches molecular formulas of correspondent calculated isotopologues | ||
# - settings are stored at SearchConfig.json and can be changed directly on the file or inside the framework class | ||
SearchMolecularFormulas(mass_spectrum_obj, first_hit=False).run_worker_mass_spectrum() | ||
# iterate over mass spectral peaks objs | ||
# Iterate over mass spectral peaks objs within the mass_spectrum_obj | ||
for mspeak in mass_spectrum_obj.sort_by_abundance(): | ||
# returns true if there is at least one molecular formula associated | ||
# with the mass spectral peak | ||
# same as mspeak.is_assigned -- > bool | ||
# If there is at least one molecular formula associated, mspeak returns True | ||
if mspeak: | ||
# get the molecular formula with the highest mass accuracy | ||
# Get the molecular formula with the highest mass accuracy | ||
molecular_formula = mspeak.molecular_formula_lowest_error | ||
# plot mz and peak height, use mass_spectrum_obj.mz_exp to access all mz | ||
# and mass_spectrum_obj.mz_exp_profile to access mz with all available datapoints | ||
# Plot mz and peak height | ||
pyplot.plot(mspeak.mz_exp, mspeak.abundance, 'o', c='g') | ||
# iterate over all molecular formulae associated with the ms peaks obj | ||
# Iterate over all molecular formulas associated with the ms peaks obj | ||
for molecular_formula in mspeak: | ||
#check if the molecular formula is a isotopologue | ||
# Check if the molecular formula is a isotopologue | ||
if molecular_formula.is_isotopologue: | ||
#access the molecular formula text representation | ||
# Access the molecular formula text representation and print | ||
print (molecular_formula.string) | ||
#get 13C atoms count | ||
# Get 13C atoms count | ||
print (molecular_formula['13C']) | ||
else: | ||
#get mz and peak height | ||
# Get mz and peak height | ||
print(mspeak.mz_exp,mspeak.abundance) | ||
#exporting data | ||
# Save data | ||
## to a csv file | ||
mass_spectrum_obj.to_csv("filename") | ||
mass_spectrum_obj.to_hdf("filename") | ||
# save pandas Datarame to pickle | ||
# to pandas Datarame pickle | ||
mass_spectrum_obj.to_pandas("filename") | ||
# get pandas Dataframe | ||
# Extract data as a pandas Dataframe | ||
df = mass_spectrum_obj.to_dataframe() | ||
``` | ||
## UML Diagrams | ||
UML (unified modeling language) diagrams for Direct Infusion FT-MS and GC-MS classes can be found [here](docs/uml). | ||
## Citing CoreMS | ||
If you use CoreMS in your work, please use the following citation: | ||
Version [1.5.1 Release on GitHub](https://github.com/EMSL-Computing/CoreMS/releases/tag/1.5.1), archived on Zenodo: | ||
Version [2.0.0 Release on GitHub](https://github.com/EMSL-Computing/CoreMS/releases/tag/v2.0.0), archived on Zenodo: | ||
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4641553.svg)](https://doi.org/10.5281/zenodo.4641553) | ||
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4641552.svg)](https://doi.org/10.5281/zenodo.4641552) | ||
``` | ||
Yuri E. Corilo, William R. Kew, Lee Ann McCue. (2021, March 27). EMSL-Computing/CoreMS: CoreMS 1.5.1 (Version v1.5.1), as developed on Github. Zenodo. http://doi.org/10.5281/zenodo.4641553 | ||
Yuri E. Corilo, William R. Kew, Lee Ann McCue (2021, March 27). EMSL-Computing/CoreMS: CoreMS 2.0.0 (Version v2.0.0), as developed on Github. Zenodo. http://doi.org/10.5281/zenodo.4641552 | ||
``` | ||
## Disclaimer | ||
This material was prepared as an account of work sponsored by an agency of the | ||
United States Government. Neither the United States Government nor the United | ||
|
Oops, something went wrong.