Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev #52

Merged
merged 76 commits into from
Dec 13, 2023
Merged

Dev #52

merged 76 commits into from
Dec 13, 2023

Commits on Sep 13, 2023

  1. ✨ dump and rename peptide files with PEP score

    create a new intermediate dump of 7,444 HeLa runs
    Henry Webel committed Sep 13, 2023
    Configuration menu
    Copy the full SHA
    e2e7521 View commit details
    Browse the repository at this point in the history

Commits on Sep 14, 2023

  1. ✨ KNN comp. in updated workflow (v2)

    - KNN dumps val and test data with specified "args.model_key"
      in "config.yaml"
    - update color palette for "unknown" models
    - make performance_plots.py more robust
    - training configs are created and saved on the fly
      (-> avoid separate model configs, collect all in one)
    
    R methods are fixed, no customization so far. To do this one
    would probably need to generate separte NBs for each method.
    Henry committed Sep 14, 2023
    Configuration menu
    Copy the full SHA
    771178f View commit details
    Browse the repository at this point in the history

Commits on Sep 15, 2023

  1. ✨ MCAR-MNAR sampling

    based on Lazar et. al. (2016)
    - below a quantile -> MNAR, select from there
    - quantile is defined based on overall frac of missing values
    - mix MCAR and MNAR
    
    - format and clean-up code in script
    Henry committed Sep 15, 2023
    Configuration menu
    Copy the full SHA
    afc4f02 View commit details
    Browse the repository at this point in the history

Commits on Sep 18, 2023

  1. 🐛 regression: use filtered dataset

    - refactoring error -> select correct data
    Henry committed Sep 18, 2023
    Configuration menu
    Copy the full SHA
    ac66abc View commit details
    Browse the repository at this point in the history
  2. ⚡ CICD pipeline: some R methods are slow

    - only test CF, DAE and VAE functionally
    - select configs in example folder...
    Henry committed Sep 18, 2023
    Configuration menu
    Copy the full SHA
    a2b97b8 View commit details
    Browse the repository at this point in the history
  3. 🎨 format all code using autopep8

    - both scripts (notebooks)
    - and library code
    Henry committed Sep 18, 2023
    Configuration menu
    Copy the full SHA
    13dba85 View commit details
    Browse the repository at this point in the history

Commits on Sep 19, 2023

  1. ✨ add further methods

    - msImpute
    - trKNN (from source)
    
    Add to workflow check.
    Henry committed Sep 19, 2023
    Configuration menu
    Copy the full SHA
    0517113 View commit details
    Browse the repository at this point in the history

Commits on Sep 20, 2023

  1. ✨ group plots by seocndary nb number

    - start grouping output for an easier overview (than only alphabetical)
    Henry committed Sep 20, 2023
    Configuration menu
    Copy the full SHA
    25468d2 View commit details
    Browse the repository at this point in the history

Commits on Sep 27, 2023

  1. 👽 update code to pandas 2.0

    - update depreciated functionality in pandas
    
    -> some scripts might have further depreciation warnings
    Henry committed Sep 27, 2023
    Configuration menu
    Copy the full SHA
    a1dfa2b View commit details
    Browse the repository at this point in the history
  2. ➕ add igraph (for windows)

    - igraph installation in conda on the fly fails for windows otherwise:
      https://stackoverflow.com/a/71711600/9684872
    Henry committed Sep 27, 2023
    Configuration menu
    Copy the full SHA
    7118bec View commit details
    Browse the repository at this point in the history
  3. 🐛 remove peptides from reversed protein groups

    - reversed decoy sequence matches should be removed (it's only a few)
    Henry committed Sep 27, 2023
    Configuration menu
    Copy the full SHA
    1da36a6 View commit details
    Browse the repository at this point in the history
  4. 🐛 set index correctly

    Henry Webel committed Sep 27, 2023
    Configuration menu
    Copy the full SHA
    9725406 View commit details
    Browse the repository at this point in the history
  5. 🐛 target renamed plot in rule

    - grouping of plots was not reflected in Snakemake workflow
    Henry committed Sep 27, 2023
    Configuration menu
    Copy the full SHA
    4085578 View commit details
    Browse the repository at this point in the history
  6. 🚧 prepare cluster execution

    - aim: specify long run time for R jobs with a high max
    - run long running job in parallel on one big node
    Henry committed Sep 27, 2023
    Configuration menu
    Copy the full SHA
    2dbd185 View commit details
    Browse the repository at this point in the history
  7. ✨ enable torque cluster execution

    - log file paths for submitted jobs added (should be unique)
    - -V: forward set environment for submitted job
    Henry Webel committed Sep 27, 2023
    Configuration menu
    Copy the full SHA
    c5236f7 View commit details
    Browse the repository at this point in the history

Commits on Sep 29, 2023

  1. 🐛 remove reversed sequences from evidence

    - precursors from reversed protein sequences are removed from the evidence
      table
    - adapt code to use local information (yaml files)
    Henry committed Sep 29, 2023
    Configuration menu
    Copy the full SHA
    6d8ecbf View commit details
    Browse the repository at this point in the history
  2. ⬆️ remove constraints on pandas and pytorch

    - colab uses pandas and pytorch two
    - datetime_is_numeric parameter removed from
       describe, see
       https://pandas.pydata.org/docs/whatsnew/v2.0.0.html
    Henry committed Sep 29, 2023
    Configuration menu
    Copy the full SHA
    8a4ab8b View commit details
    Browse the repository at this point in the history

Commits on Oct 9, 2023

  1. 🐛 make pandas 2.0 compatible

    - append is depreciated.
    Henry committed Oct 9, 2023
    Configuration menu
    Copy the full SHA
    12218c5 View commit details
    Browse the repository at this point in the history
  2. ✨📝 specify task specific log files

    in case a tool, e.g. the torque scheduler, creates log files, these
    can be requested per task (job):
    in the run_snakememake_cluster bash script, this is done
    using -e and -o options.
    Henry Webel committed Oct 9, 2023
    Configuration menu
    Copy the full SHA
    26dc891 View commit details
    Browse the repository at this point in the history
  3. 🔧 set default to cpu (no accelerator, e.g. gpu)

    Henry Webel committed Oct 9, 2023
    Configuration menu
    Copy the full SHA
    c7503e1 View commit details
    Browse the repository at this point in the history
  4. ✨ torque (qsub) script with parameters using -v

    - submit required parameters using the -v option, e.g.
      qsub run_snakemake_cluster.sh \
       -N snakemake_exp0 \
       -v configfile=path_to/config.yaml,prefix=exp0
    Henry Webel committed Oct 9, 2023
    Configuration menu
    Copy the full SHA
    6e780a2 View commit details
    Browse the repository at this point in the history
  5. 🎨🐛 reverse column for evidence, rename all dumps

    - rename also protein groups and precursors (evidence) dumps
    - drop entries from reversed sequences in evidence files
    Henry Webel committed Oct 9, 2023
    Configuration menu
    Copy the full SHA
    0fcc086 View commit details
    Browse the repository at this point in the history

Commits on Oct 10, 2023

  1. 🐛 methods that give all NA are not filtered

    - increase robustness of notebook, ignoring all NA methods
     (here: IMPSEQ)
    - To consider:
      should 01_1_train_NAGuideR.ipynb throw an error if all pred are NAs?
    Henry Webel committed Oct 10, 2023
    Configuration menu
    Copy the full SHA
    2c78071 View commit details
    Browse the repository at this point in the history

Commits on Oct 11, 2023

  1. 🎨✨ unify and showcase internals, add IDs

    - function loading and filtering data
    - add IDs making it possible to make precursors (Evidence IDs),
      Peptide ID and Protein Groups IDs to each other.
      in a file the id column is always "id" (e.g. proteinGroups.txt id column = Protein Groups IDs in the other two)
    Henry Webel committed Oct 11, 2023
    Configuration menu
    Copy the full SHA
    8b3ffc8 View commit details
    Browse the repository at this point in the history
  2. 🎨📝 order by R methods by alphabet, start documenting

    - tbc: see what works
    
    Next: merge with version where parameters for python based models
    can be set in config.yaml
    Henry Webel committed Oct 11, 2023
    Configuration menu
    Copy the full SHA
    f53aa96 View commit details
    Browse the repository at this point in the history
  3. Merge pull request #49 from RasmussenLab/filter_reversed

    Filter reversed -> parts for collecting data will be factored out
    Henry Webel authored Oct 11, 2023
    Configuration menu
    Copy the full SHA
    cfc5fe5 View commit details
    Browse the repository at this point in the history
  4. Merge pull request #50 from RasmussenLab/pbs_cluster_exec

    🚧 prepare cluster execution
    
    - default: CPU execution, not accelerated (e.g. GPU)
    - job script for torque cluster
    - logs with notebook outputs
    Henry Webel authored Oct 11, 2023
    Configuration menu
    Copy the full SHA
    c00f600 View commit details
    Browse the repository at this point in the history
  5. Merge pull request #51 from RasmussenLab/pip_dependencies

    ⬆️ remove constraints on pandas and pytorch
    
    -> faster setup on google collab 
    - less constraints on version
    Henry Webel authored Oct 11, 2023
    Configuration menu
    Copy the full SHA
    41ace60 View commit details
    Browse the repository at this point in the history

Commits on Oct 12, 2023

  1. 🔥 move hela data collection code to new repo

    -> https://github.com/RasmussenLab/hela_qc_mnt_data
    
    commit link:
    RasmussenLab/hela_qc_mnt_data@f88586b
    
    - make minor adaption needed due to deletions
    Henry committed Oct 12, 2023
    Configuration menu
    Copy the full SHA
    5b3c0b0 View commit details
    Browse the repository at this point in the history

Commits on Oct 16, 2023

  1. Merge pull request #53 from RasmussenLab/move_hela_data_code

    🔥 move hela data collection code to new repo: 
    
    https://github.com/RasmussenLab/hela_qc_mnt_data
    Henry Webel authored Oct 16, 2023
    Configuration menu
    Copy the full SHA
    fd1d07d View commit details
    Browse the repository at this point in the history
  2. 🚧 update Snakefile v2

    - cluster execution and renamed files
    - format
    Henry committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    02b2d9f View commit details
    Browse the repository at this point in the history
  3. ✨ Integrate data splitting config into main config

    - allow to set frac_mnar from commandline using:
       --config frac_mnar=.5
    - dump created data config using separate rule (into experiment folder)
    Henry committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    288d78e View commit details
    Browse the repository at this point in the history

Commits on Oct 17, 2023

  1. 🎨 increase fonts, improve plotting

    - fixed heatmap from -1 to 1 for correlations
    - shrink heatmap legend
    
    :bug: greater equal, not strictly greater for cutoff
    Henry committed Oct 17, 2023
    Configuration menu
    Copy the full SHA
    27ba67d View commit details
    Browse the repository at this point in the history

Commits on Oct 18, 2023

  1. 📝 update configs

    - managed to include more models for comparison
    Henry Webel committed Oct 18, 2023
    Configuration menu
    Copy the full SHA
    225af4f View commit details
    Browse the repository at this point in the history

Commits on Oct 19, 2023

  1. ✨ exec status script, use conda, write configs

    - query torque cluster execution status changed, updated and moved
    - v2 of workflow now creates config files automatically which can still be used with version one
    - use pre-created conda environment with rule
    Henry Webel committed Oct 19, 2023
    Configuration menu
    Copy the full SHA
    45afa79 View commit details
    Browse the repository at this point in the history

Commits on Oct 20, 2023

  1. 🎨 Snakefile_v2 ready as new default

    - can create the configs needed for current workflow  in Snakefile
    
    Next: Test and swap + document
    Henry Webel committed Oct 20, 2023
    Configuration menu
    Copy the full SHA
    47824ce View commit details
    Browse the repository at this point in the history
  2. 🚚 group grid search configs

    Henry Webel committed Oct 20, 2023
    Configuration menu
    Copy the full SHA
    464cc1c View commit details
    Browse the repository at this point in the history

Commits on Oct 31, 2023

  1. 🎨 format workflows

    Henry Webel committed Oct 31, 2023
    Configuration menu
    Copy the full SHA
    345d256 View commit details
    Browse the repository at this point in the history
  2. 🔥 move to hela_qc_mnt_data project

    -> Snakefile in project folder
    Henry Webel committed Oct 31, 2023
    Configuration menu
    Copy the full SHA
    43aa9a1 View commit details
    Browse the repository at this point in the history
  3. 📝 add "version" to config file

    - document relationship between updated and old workflow's
      configuration files (-> how to specify models and parameters)
    Henry Webel committed Oct 31, 2023
    Configuration menu
    Copy the full SHA
    a358d78 View commit details
    Browse the repository at this point in the history
  4. 🚚 move shell scripts to folder bin

    Henry Webel committed Oct 31, 2023
    Configuration menu
    Copy the full SHA
    801c823 View commit details
    Browse the repository at this point in the history

Commits on Nov 1, 2023

  1. 📝🐛 Test execution of "misc_*" notebooks

    - `testNotebooks.smk` can do it.
    - set kernel to current environement (generic "Python 3")
    Henry Webel committed Nov 1, 2023
    Configuration menu
    Copy the full SHA
    51b57db View commit details
    Browse the repository at this point in the history
  2. 📝 add version

    for base workflow configuration (Snakefile vs Snakefile_v2)
    Henry Webel committed Nov 1, 2023
    Configuration menu
    Copy the full SHA
    500220d View commit details
    Browse the repository at this point in the history

Commits on Nov 3, 2023

  1. :error: update expected rule output

    - msImpute and trKNN failed
    Henry Webel committed Nov 3, 2023
    Configuration menu
    Copy the full SHA
    f46777b View commit details
    Browse the repository at this point in the history

Commits on Nov 8, 2023

  1. 📝 document distributed cluster execution

    Henry Webel committed Nov 8, 2023
    Configuration menu
    Copy the full SHA
    c849cb5 View commit details
    Browse the repository at this point in the history

Commits on Nov 9, 2023

  1. Merge pull request #54 from RasmussenLab/workflow_config

    Workflows update
    
    create a new version of the basic workflow allowing to run models twice with different configurations (e.g. different number of neighbours in KNN)
    ... while not breaking everything else
    For now only a new workflow is added, and the default switch will be done later.
    update workflow for testing all misc_* scripts
    execution scripts for cluster move to separate folder (and adapted to time-out after 24h)
    collect cluster scripts in project/bin
    format all Snakemake files using snakefmt
    Henry Webel authored Nov 9, 2023
    Configuration menu
    Copy the full SHA
    089cc8e View commit details
    Browse the repository at this point in the history
  2. 🎨 format R code

    Henry committed Nov 9, 2023
    Configuration menu
    Copy the full SHA
    26213e5 View commit details
    Browse the repository at this point in the history
  3. ✨ Added GSIMP and MsImpute v2-mnar

    - developed locally -> up for testing if dependencies are met
      (GSIMP is no package and dependencies were figured out locally)
    - test in example workflow (-> GSimp did not finish locally)
    Henry committed Nov 9, 2023
    Configuration menu
    Copy the full SHA
    ec85618 View commit details
    Browse the repository at this point in the history

Commits on Nov 11, 2023

  1. ✨ log papermill output for each job

    - create individual logs for nb execution
      -> separate files on local execution
      -> documentation of how long training step took
    Henry Webel committed Nov 11, 2023
    Configuration menu
    Copy the full SHA
    7a55767 View commit details
    Browse the repository at this point in the history
  2. 🐛 None is not dumped as null without cp dict

    - config dict has to be copied. Otherwise value
      None is not dumped as null:
      Before:
      - column_names: "None"
      Now:
      - column_names: null
    Henry Webel committed Nov 11, 2023
    Configuration menu
    Copy the full SHA
    8bd3211 View commit details
    Browse the repository at this point in the history

Commits on Nov 14, 2023

  1. Configuration menu
    Copy the full SHA
    d015517 View commit details
    Browse the repository at this point in the history
  2. 🐛 few features have less than 4 training observations

    - one or two features have with 50 samples less than 4 intensities in
      training data split
      -> move the validation data for these to the training split
    Henry committed Nov 14, 2023
    Configuration menu
    Copy the full SHA
    139c792 View commit details
    Browse the repository at this point in the history

Commits on Nov 15, 2023

  1. 🐛 GSIMP slow, SEQKNN does not like too few features

    - new dataset balancing between GSIMP runtime and
      SEQKNN need for a minimum number of features
    - run each method one by one (avoid race conditions when installing, only
      a problem on first time setup)
    Henry committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    296cbf9 View commit details
    Browse the repository at this point in the history

Commits on Nov 16, 2023

  1. Configuration menu
    Copy the full SHA
    0bee424 View commit details
    Browse the repository at this point in the history
  2. 🐛 SeqKNN crashes with too few samples

    - is GSIMP fast enough (227-> ~1h)?
    - probably test GSIMP here once, then remove from "fast testing" workflow
    Henry committed Nov 16, 2023
    Configuration menu
    Copy the full SHA
    1b23b95 View commit details
    Browse the repository at this point in the history
  3. 🎨 update metadata

    remove warnings thrown by papermill
    Henry Webel committed Nov 16, 2023
    Configuration menu
    Copy the full SHA
    2bb3d4d View commit details
    Browse the repository at this point in the history
  4. 🐛 add method and set defaults from grid search

    - update defaults to results from small grid search (smallest of top 3)
    Henry Webel committed Nov 16, 2023
    Configuration menu
    Copy the full SHA
    9456d91 View commit details
    Browse the repository at this point in the history
  5. ✨ configs for MNAR MCAR experiments

    document also qsub command and update submission script
    Henry Webel committed Nov 16, 2023
    Configuration menu
    Copy the full SHA
    1569b97 View commit details
    Browse the repository at this point in the history
  6. 🎨🚧 improve plots for Figure 2

    (add more models)
    - needs to be completed and cleaned-up
    Henry Webel committed Nov 16, 2023
    Configuration menu
    Copy the full SHA
    35322b3 View commit details
    Browse the repository at this point in the history

Commits on Nov 17, 2023

  1. 🐛 fix remaining colors, test

    Henry committed Nov 17, 2023
    Configuration menu
    Copy the full SHA
    0b0d747 View commit details
    Browse the repository at this point in the history
  2. 🐛 don't train with too small batches

    - rather "bigger" batches with more training steps
    - update Fig. 2 plots generation to 25MNAR
    Henry Webel committed Nov 17, 2023
    Configuration menu
    Copy the full SHA
    89046b4 View commit details
    Browse the repository at this point in the history

Commits on Nov 26, 2023

  1. Merge pull request #55 from RasmussenLab/further_R_methods

    Methods:
    
    - added GSimp.
    - reduced the dimensionality of the example data in the GitHub Action so 
      GSimp finishes (~1h) -> does not scale
    - MNAR algorithm of MSIMPUTE added
    
    Data:
    
    - ensure that training data has at least 4 samples (MSIMPUTE includes that check)
    - Formatted and updated workflow configs and declarations (v1&v2). Added script for command creation
    Henry Webel authored Nov 26, 2023
    Configuration menu
    Copy the full SHA
    29a549a View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    c9e00e4 View commit details
    Browse the repository at this point in the history
  3. 🎨 allow custom subselection, add NA if not available

    - Figure 2: add custom selection of models to aggregate best 5 models
      of several datasets (custom plotting for paper)
    - rotate performance label
    - add NA if model did not run (here: error or not finished within 24h)
    Henry committed Nov 26, 2023
    Configuration menu
    Copy the full SHA
    748a1d7 View commit details
    Browse the repository at this point in the history
  4. 🐛 sync and specify selected

    - for large pep and evi, the top five are already
      the correct set
    Henry Webel committed Nov 26, 2023
    Configuration menu
    Copy the full SHA
    1f2d682 View commit details
    Browse the repository at this point in the history

Commits on Nov 27, 2023

  1. 🐛✨ Pick colors for selected

    - for subselected models the colors were not reselected
    Henry committed Nov 27, 2023
    Configuration menu
    Copy the full SHA
    e206483 View commit details
    Browse the repository at this point in the history
  2. 🎨 switch colors and show model tag for color

    - based on seaborn example of _ColorPalette
    Henry committed Nov 27, 2023
    Configuration menu
    Copy the full SHA
    e62f80b View commit details
    Browse the repository at this point in the history
  3. 🎨 center swarmplot labels

    Henry committed Nov 27, 2023
    Configuration menu
    Copy the full SHA
    db2469a View commit details
    Browse the repository at this point in the history

Commits on Nov 28, 2023

  1. 🎨 rotate other direction, use space better

    improve readability
    Henry committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    49ee8e5 View commit details
    Browse the repository at this point in the history
  2. 🎨 allow custom display name of feat

    Henry committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    e49c1eb View commit details
    Browse the repository at this point in the history

Commits on Dec 5, 2023

  1. 🔧🎨 25MNAR share as default, update path & fontsize

    - tables for Supp. Data
    - update plots (fontsize, support)
    Henry committed Dec 5, 2023
    Configuration menu
    Copy the full SHA
    43de8bd View commit details
    Browse the repository at this point in the history
  2. 🎨🔧 Update comp. with subsetted data

    - use a share of 25% MNAR in removed data
    - use a share of 25% MNAR in comparison
    - update figures for publication (names, label, fontsize, etc)
    Henry committed Dec 5, 2023
    Configuration menu
    Copy the full SHA
    bbe068e View commit details
    Browse the repository at this point in the history
  3. 🔧 update exp.: repeated runs of full ald data

    - dump config
    Henry committed Dec 5, 2023
    Configuration menu
    Copy the full SHA
    052ed78 View commit details
    Browse the repository at this point in the history

Commits on Dec 7, 2023

  1. 🎨🐛 update overfitting analysis (25MNAR)

    - 🐛 remove metadata fpath from train_X.yaml
    - also run KNN comp. with workflow v2 with a share of 25MNAR
    Henry committed Dec 7, 2023
    Configuration menu
    Copy the full SHA
    49d628b View commit details
    Browse the repository at this point in the history

Commits on Dec 12, 2023

  1. 📝 add three newly added methods to overview

    Henry committed Dec 12, 2023
    Configuration menu
    Copy the full SHA
    27d8ad2 View commit details
    Browse the repository at this point in the history
  2. 🔖 bump version to v.0.2.0

    Henry committed Dec 12, 2023
    Configuration menu
    Copy the full SHA
    af85dd7 View commit details
    Browse the repository at this point in the history