Skip to content

Latest commit

 

History

History
181 lines (113 loc) · 8.35 KB

CHANGELOG.md

File metadata and controls

181 lines (113 loc) · 8.35 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[2.0.0] - date...

Large scale expansion, revision, and restructuring of MS2Deepscore.

Added

  • Models are now build using PyTorch.
  • Models have build-in GPU support (using pytorch).
  • new EmbeddingEvaluatorModel (Inception Time CNN)
  • new LinearModel for absolute error estimates
  • new MS2DeepScoreEvaluated matchms-style score --> gives "score" and "predicted_absolute_error"
  • Additional smart binning layer that can handle input of much higher peak resolution (not used as a default!)
  • New validation concept --> all-vs-all scores for the validation spectra are computed, but loss is then computed per score bin. This gives better and more significant statistics of the model performance
  • New loss functions "Risk Aware MAE" and "Risk Aware MSE" which function similar to MAE or MSE but try to counteract the tendency of a model to predict towards 0.5.
  • Losses can now be weighted with a weighting_factor.

Changed

  • No longer supports Tensorflow/Keras
  • The concept of Spectrum binning has changed and is now implemented differently (i.e. no more "missing peaks" as before)
  • Monte-Carlo Dropout does not return a score (mean or median) together with percentile-based upper and lower bound (instead of STD or IQR before).

1.0.0 - 2024-03-12

Last version using Tensorflow. Next versions will be using PyTorch.

Added

  • Added split_positive_and_negative_mode.py #148
  • Added SettingMS2Deepscore #151
  • Clearer Warnings when too little input spectra are used in data generator. #155

Changed

  • Change the max oversampling rate to max_pairs_per_bin #148
  • Made spectrum pair selection a lot simpler and fixed mistake #148
  • Use DataGeneratorCherrypicked instead of DataGeneratorAllInchikeys in pipelines #148
  • Removed M1 Chip compatibility which lead to faulty results depending on Tensorflow version #200

0.5.0 - 2023-08-18

Added

  • New DataGeneratorCherrypicked as alternative to former data generators #145. This will work better for large datasets and also tried to counteract biases in the chemical similarity scores.
  • Models can now be trained on selected metadata entries in addition to the spectrum peaks #128.
  • New MetadataFeatureGenerator class to handle additional metadata more robustly #128
  • Workflow scripts for training a new MS2DeepScore model #124. The ease of training MS2Deepscore models is improved, including standard settings and splitting validation and training data.

Changed

  • In SiameseModel, the attributes are not passed as an argument but instead used by the class.
  • Improved plotting functionality. Some additional plotting options were added and plots previously created in notebooks are now functions.
  • Linting (code and imports) #145.

0.4.0 - 2023-04-25

Added

  • Functions to cover the full pipeline of training a new model #129

Fixed

  • Tensorflow issues when saving/loading models #123

Changed

  • Random seed is now optional when fixed_set=True for the data generator #134
  • load_model() functions now auto-detects if a model is multi_inputs or not
  • Python version support was changed to 3.8, 3.9, 3.10 (other versions should still work but are not systematically tested)

0.3.1 - 2023-01-06

Changed

  • Minor changes to make tests work with new matchms (>=0.18.0). Older versions should work as well though. #120

0.3.0 - 2022-11-29

Added

  • Allow adding metadata to the network inputs, e.g. precursor-m/z using the additional_inputs parameter #115

Fixed

  • Update test to work with Tensorflow 2.11 #114

0.2.3 - 2022-03-02

Fixed

  • Fixes issue #97 by raising a ValueError when duplicate InChiKey14 are specified by the user in the reference_scores_df DataFrame.

Changed

  • Minor linting #93

Fixed

  • Handled numby dependency issues #94 and #95

0.2.2 - 2021-08-19

Fixed

  • now compatible with new Tensorflow 2.6, also checked by additional CI runs for Tensorflow 2.4, 2.5 and 2.6 #92

0.2.1 - 2021-07-20

Changed

  • Speed improvement of spectrum binning step #90

0.2.0 - 2021-04-01

Added

  • MS2DeepScoreMonteCarlo Monte-Carlo dropout based ensembling do obtain mean/median score and STD #65
  • choice between median (default) and mean ensemble score which come with IQR and STD as uncertainty measures #86
  • dropout_in_first_layer option for SiameseModel (default is False) #86
  • use_fixed_set option for data generators to create deterministic training/testing data with fixed random seed #73

Changed

  • small update of create_histograms_plot to make the plot prettier/better to read #85

Fixed

  • solved minor unclarity with the pair selection for non-available reference scores #79
  • solved minor unclarity with the addition of noise peaks during data augmentation #78

0.1.3 - 2021-03-09

Changed

  • Allow users to define L1 and L2 regularization of SiameseModel #67
  • Allow users to define number and size of SiameseModel #64

0.1.2 - 2021-03-05

Added

  • create_confusion_matrix_plot in plotting #58

0.1.1 - 2021-02-09

Added

  • noise peak addition during training via data generators #55
  • L1 and L2 regularization for first dense layer #55

Changed

  • move vector calculation to separate calculate_vectors method #52

0.1.0 - 2021-02-08

Added

  • This is the initial version of MS2DeepScore