Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev #52

Merged
merged 76 commits into from
Dec 13, 2023
Merged

Dev #52

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
e2e7521
:sparkles: dump and rename peptide files with PEP score
Sep 13, 2023
771178f
:sparkles: KNN comp. in updated workflow (v2)
Sep 14, 2023
afc4f02
:sparkles: MCAR-MNAR sampling
Sep 15, 2023
ac66abc
:bug: regression: use filtered dataset
Sep 18, 2023
a2b97b8
:zap: CICD pipeline: some R methods are slow
Sep 18, 2023
13dba85
:art: format all code using autopep8
Sep 18, 2023
0517113
:sparkles: add further methods
Sep 19, 2023
25468d2
:sparkles: group plots by seocndary nb number
Sep 20, 2023
a1dfa2b
:alien: update code to pandas 2.0
Sep 27, 2023
7118bec
:heavy_plus_sign: add igraph (for windows)
Sep 27, 2023
1da36a6
:bug: remove peptides from reversed protein groups
Sep 27, 2023
9725406
:bug: set index correctly
Sep 27, 2023
4085578
:bug: target renamed plot in rule
Sep 27, 2023
2dbd185
:construction: prepare cluster execution
Sep 27, 2023
c5236f7
:sparkles: enable torque cluster execution
Sep 27, 2023
6d8ecbf
:bug: remove reversed sequences from evidence
Sep 29, 2023
8a4ab8b
:arrow_up: remove constraints on pandas and pytorch
Sep 29, 2023
12218c5
:bug: make pandas 2.0 compatible
Oct 9, 2023
26dc891
:sparkles::memo: specify task specific log files
Oct 9, 2023
c7503e1
:wrench: set default to cpu (no accelerator, e.g. gpu)
Oct 9, 2023
6e780a2
:sparkles: torque (qsub) script with parameters using -v
Oct 9, 2023
0fcc086
:art::bug: reverse column for evidence, rename all dumps
Oct 9, 2023
2c78071
:bug: methods that give all NA are not filtered
Oct 10, 2023
8b3ffc8
:art::sparkles: unify and showcase internals, add IDs
Oct 11, 2023
f53aa96
:art::memo: order by R methods by alphabet, start documenting
Oct 11, 2023
cfc5fe5
Merge pull request #49 from RasmussenLab/filter_reversed
Oct 11, 2023
c00f600
Merge pull request #50 from RasmussenLab/pbs_cluster_exec
Oct 11, 2023
41ace60
Merge pull request #51 from RasmussenLab/pip_dependencies
Oct 11, 2023
5b3c0b0
:fire: move hela data collection code to new repo
Oct 12, 2023
fd1d07d
Merge pull request #53 from RasmussenLab/move_hela_data_code
Oct 16, 2023
02b2d9f
:construction: update Snakefile v2
Oct 16, 2023
288d78e
:sparkles: Integrate data splitting config into main config
Oct 16, 2023
27ba67d
:art: increase fonts, improve plotting
Oct 17, 2023
225af4f
:memo: update configs
Oct 18, 2023
45afa79
:sparkles: exec status script, use conda, write configs
Oct 19, 2023
47824ce
:art: Snakefile_v2 ready as new default
Oct 20, 2023
464cc1c
:truck: group grid search configs
Oct 20, 2023
345d256
:art: format workflows
Oct 31, 2023
43aa9a1
:fire: move to hela_qc_mnt_data project
Oct 31, 2023
a358d78
:memo: add "version" to config file
Oct 31, 2023
801c823
:truck: move shell scripts to folder bin
Oct 31, 2023
51b57db
:memo::bug: Test execution of "misc_*" notebooks
Nov 1, 2023
500220d
:memo: add version
Nov 1, 2023
f46777b
:error: update expected rule output
Nov 3, 2023
c849cb5
:memo: document distributed cluster execution
Nov 8, 2023
089cc8e
Merge pull request #54 from RasmussenLab/workflow_config
Nov 9, 2023
26213e5
:art: format R code
Nov 9, 2023
ec85618
:sparkles: Added GSIMP and MsImpute v2-mnar
Nov 9, 2023
7a55767
:sparkles: log papermill output for each job
Nov 11, 2023
8bd3211
:bug: None is not dumped as null without cp dict
Nov 11, 2023
d015517
:bug: Quote strings to allow white spaces in folder names
Nov 14, 2023
139c792
:bug: few features have less than 4 training observations
Nov 14, 2023
296cbf9
:bug: GSIMP slow, SEQKNN does not like too few features
Nov 15, 2023
0bee424
:bug: execute one-by-one, show errors in main process
Nov 16, 2023
1b23b95
:bug: SeqKNN crashes with too few samples
Nov 16, 2023
2bb3d4d
:art: update metadata
Nov 16, 2023
9456d91
:bug: add method and set defaults from grid search
Nov 16, 2023
1569b97
:sparkles: configs for MNAR MCAR experiments
Nov 16, 2023
35322b3
:art::construction: improve plots for Figure 2
Nov 16, 2023
0b0d747
:bug: fix remaining colors, test
Nov 17, 2023
89046b4
:bug: don't train with too small batches
Nov 17, 2023
29a549a
Merge pull request #55 from RasmussenLab/further_R_methods
Nov 26, 2023
c9e00e4
:art::wrench: improve swarmplots, add methods in ALD comp.
Nov 26, 2023
748a1d7
:art: allow custom subselection, add NA if not available
Nov 26, 2023
1f2d682
:bug: sync and specify selected
Nov 26, 2023
e206483
:bug::sparkles: Pick colors for selected
Nov 27, 2023
e62f80b
:art: switch colors and show model tag for color
Nov 27, 2023
db2469a
:art: center swarmplot labels
Nov 27, 2023
49ee8e5
:art: rotate other direction, use space better
Nov 28, 2023
e49c1eb
:art: allow custom display name of feat
Nov 28, 2023
43de8bd
:wrench::art: 25MNAR share as default, update path & fontsize
Dec 5, 2023
bbe068e
:art::wrench: Update comp. with subsetted data
Dec 5, 2023
052ed78
:wrench: update exp.: repeated runs of full ald data
Dec 5, 2023
49d628b
:art::bug: update overfitting analysis (25MNAR)
Dec 7, 2023
27d8ad2
:memo: add three newly added methods to overview
Dec 12, 2023
af85dd7
:bookmark: bump version to v.0.2.0
Dec 12, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ jobs:
run: |
cd project
snakemake -p -c1 --configfile config/single_dev_dataset/example/config.yaml -n
snakemake -p -c2 -k --configfile config/single_dev_dataset/example/config.yaml
snakemake -p -c1 -k --configfile config/single_dev_dataset/example/config.yaml
- name: Archive results
uses: actions/upload-artifact@v3
with:
Expand Down
24 changes: 5 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ Packages either are based on this repository, or were referenced by NAGuideR (Ta
From the brief description in the table the exact procedure is not always clear.

| Method | Package | source | status | name |
| ------------- | ----------------- | ------ | --- |------------------ |
| ------------- | ----------------- | ------ | ------ |------------------ |
| CF | pimms | pip | | Collaborative Filtering |
| DAE | pimms | pip | | Denoising Autoencoder |
| VAE | pimms | pip | | Variational Autoencoder |
Expand All @@ -206,7 +206,7 @@ From the brief description in the table the exact procedure is not always clear.
| COLMEDIAN | e1071 | CRAN | | replace NA with column median |
| ROWMEDIAN | e1071 | CRAN | | replace NA with row median |
| KNN_IMPUTE | impute | BIOCONDUCTOR | | k nearest neighbor imputation |
| SEQKNN | SeqKnn | tar file | | Sequential k- nearest neighbor imputation <br> start with feature with least missing values and re-use imputed values for not yet imputed features
| SEQKNN | SeqKnn | tar file | | Sequential k- nearest neighbor imputation <br> starts with feature with least missing values and re-use imputed values for not yet imputed features
| BPCA | pcaMethods | BIOCONDUCTOR | | Bayesian PCA missing value imputation
| SVDMETHOD | pcaMethods | BIOCONDUCTOR | | replace NA initially with zero, use k most significant eigenvalues using Singular Value Decomposition for imputation until convergence
| LLS | pcaMethods | BIOCONDUCTOR | | Local least squares imputation of a feature based on k most correlated features
Expand All @@ -222,26 +222,12 @@ From the brief description in the table the exact procedure is not always clear.
| TRKNN | - | script | | truncation k-nearest neighbor imputation
| RF | missForest | CRAN | | Random Forest imputation (one feature at a time)
| PI | - | - | | Downshifted normal distribution (per sample)
| GSIMP | - | script | | QRILC initialization and iterative Gibbs sampling with generalized linear models (glmnet)
| MSIMPUTE | msImpute | BIOCONDUCTOR | | Missing at random algorithm using low rank approximation
| MSIMPUTE_MNAR | msImpute | BIOCONDUCTOR | | Missing not at random algorithm using low rank approximation
| ~~grr~~ | DreamAI | - | Fails to install | Rigde regression
| ~~GMS~~ | GMSimpute | tar file | Fails on Windows | Lasso model



## Workflows

The workflows folder in the repository contains snakemake workflows used for rawfile data processing,
both for running MaxQuant over a large set of HeLa raw files
and ThermoRawFileParser on a list of raw files to extract their meta data. For details see:

> Webel, Henry, Yasset Perez-Riverol, Annelaura Bach Nielson, and Simon Rasmussen. 2023. “Mass Spectrometry-Based Proteomics Data from Thousands of HeLa Control Samples.” Research Square. https://doi.org/10.21203/rs.3.rs-3083547/v1.

### MaxQuant

Process single raw files using MaxQuant. See [README](workflows/maxquant/README.md) for details.

### Metadata

Read metadata from single raw files using MaxQuant. See [README](workflows/metadata/README.md) for details.

## Build status
[![Documentation Status](https://readthedocs.org/projects/pimms/badge/?version=latest)](https://pimms.readthedocs.io/en/latest/?badge=latest)
10 changes: 5 additions & 5 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,8 @@ dependencies:
- jupyter-dash
- papermill # execute ipynb's
# R packages (listed in NAGuideR)
- r-base #=3.6
- r-devtools # is it needed for source installs on windows server?
- r-base
- r-devtools # is it needed for source installs on windows server?
- r-irkernel
- r-reshape2
- r-stringi # + rmarkdown hack for reshape2
Expand All @@ -66,6 +66,7 @@ dependencies:
- r-rrcov
- r-gmm
- r-tmvtnorm
- r-igraph
# - bioconductor-biocinstaller
# - r-imputelcmd # bioconda
# - bioconductor-impute
Expand All @@ -83,6 +84,5 @@ dependencies:
# - jupyterlab_code_formatter
# - jupyterlab-git
- pip:
- -e .
- mrmr-selection

- -e .
- mrmr-selection
Loading
Loading