All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Updated the doi to the MS2Query library. This new library has a fix for a bug misrepresenting the compound classes.
- Handle missing compound names. Previously MS2Query would break, when no compound name was available for an analog.
- Fix bug for creating sqlite files. Missing compound class annotations would be stored incorrectly. The creation now works correctly, but the download link to the models, still points to the sqlite files with mistakes. MS2Query can be used, but this leads to weird formatting in the results file of the class annotations.
- Set max matchms version, since breaking change was introduced (missing add_losses)
- Set max scipy version, since breaking change was introduced (for windows)
- MS2Query is now tested on python 3.9 and 3.10 instead of 3.8 and 3.9
- MS2Query is using MS2Deepscore 2.0. This is a breaking change, making MS2Query not work with old models anymore
- Updated model to use MS2Deepscore 2.0 and used newly available training data for all models.
- Made compatible with MS2Deepscore 0.5.0
- New models have to be downloaded, since this version is not compatible with the older models! Embeddings have to be stored as parquet.
- Embeddings are now stored by parquet instead of pickle
- Made MS2Query compatible with matchms 0.24.0
- environment.yml and CI_build test fur building a conda env from this file
- Allow for using uper_case additional_metadata columns
- Removed pickled files from tests to pave way to pandas 2.0 and new matchms
- Set version of matchmsextras to 0.4.0, to fix dependency issue
- Fix test with wrong sklearn version.
- Set default additional metadata from rtinseconds to retention_time
- Fix bug in downloading models from command line
- Made compatible with matchms => 0.14
- Added separate workflow for integration tests
- Changed automatic publish python branch used
- Option to return results table.
- Option to return dataframe with results.
Small bug fixes
- Allow URLLiberror when loading in compound classes
- Update readme with zenodo files
- Added compound classes to sqlite file
- Compound classes are now automatically added in the library file creation pipeline.
- Smiles are added from the specific spectrum instead of from the inchikey.
- Compound classes cannot be added from csv file anymore. Download the newest version of the sqlite file to have compoudn classes again.
- Zenodo link is set to latest version instead of specific version.
- Unit tests had mayor reformatting
- Better use of global fixtures
- Remove pickled results table files
- Set MS2Deepscore <2.0.0
- Fix h5py dependency issue
- Downloading files is more modular.
- Loading only the models for training your own model is easier.
- The default settings for additional metadata are changed to match mgf files from feature based molecular networking
- Readme has been cleaned up
- Define the newest zenodo DOI in one location.
- Store random forest model in Onnx format
- Added --addional_metadat to command line usage
- Fix dependency issue on matchmsextras
- Make command line runnable
- Add explanation for running from command line to readme
- Check if the ionization mode of query spectra is the same as the library
- Added option to automatically split library on ionization mode.
- When output file already exists a new file is created like output_file(1).csv
- Remove warning tensorflow
- Set tensorflow to <2.9 to prevent printing of progress bars for ms2deepscore
- Set scikit learn to version 0.24.2 to prevent risk of backwards compatibility issues.
- Finalize workflow for k-fold cross validation
- Add explanation for reproducing results to readme
- Set tensorflow version to <= 2.10.1
- Set tensorflow version to <= 2.4.1
- Set tensorflow version to <= 2.10.1
- Remove tensorflow warnings about feature names
- Remove rdkit dependency for running MS2Query
- Training models is now fully automatic (no need for notebooks)
- Functions for creating benchmarking results
- Functions for doing k_fold_cross_validation
- Functions for visualizing benchmarking results
- Method for creating new library files
- Cleaning spectra functions for running are now combined with cleaning spectra functions for training
- Do not store MS2Deepscores in results table, to prevent memory issues
- Changed calculation of tanimoto scores, for better memory efficiency
- Code structure changed, tanimoto scores are now calculated in create_sqlite_database, instead of library_files_creator.
- Option to use previously calculated tanimoto scores as input for creating the sqlite library
- Creating your own library files for ms2query is a lot easier (see readme)
- Downloading negative mode files is added
- Downloading from zenodo is more robust
- Use smaller SQlite file. Tanimoto scores and peak and intensities are not stored anymore reducing the sqlite file size to 300 mb
- Generate spectrum id integers, instead of using spectrum id specified in the metadata of spectra.
- Updated notebooks for performance analysis
- Solved bug in downloading library
- Made handling different pandas versions more flexible
- Switch to random forest model
- Changed input features for random forest model #130
retention_index
andretention_time
are cleaned by matchms in data filtering #127
- switched to newer matchms (>=0.11.0) and spec2vec (>=0.6.0) versions #127
- Changed names for features random forest model
- Remove neural network model functionality
- Remove multiple chemical neighbourhood related scores
0.2.4 - 2021-04-11
- Solved bug in downloading Zenodo files
0.2.3 - 2021-02-11
- Added run_ms2query to make it more user friendly
- Refactored code to process spectrum per spectrum instead of list of spectra
0.2.2 - 2021-09-30
- Refactored code to use results table
- Made compatible with newer pandas and gensim versions
0.2.1 - 2021-06-14
- Changed release workflow, so pip package is also updated.
0.2.0 - 2021-06-14
- Move library parts to Sqlite #56
- Define spectrum processing functions #61
- Extend CI workflow and add Sonarcloud #62
- Average inchikey score and neighbourhood score #78
- Streamlit web app (will now be future development) #83
- Refactored library matching #65
- Split workflow into true matches and analog search #72
- Refactored library files creation #74
0.1.0 - 2021-01-01
- First ms2query prototype sketching the basic workflow and a streamlit web app.
- First test workflow and basic batches.
- Licence.