Skip to content

Commit

Permalink
🐛 make all imports explicit w.r.t. to pkg (#68)
Browse files Browse the repository at this point in the history
- make all import explicit
- let workflow run as a last resort with one job at a time (although it wasn't necessary for the merge run)

* 🐛 make all imports explicit w.r.t. to pkg
* 📝 update M1 installation instructions

- still does not seem to work

* 🐛 deactivating njab makes it run!

- committed from marc's M1 laptop

* ⏪ use njab after fix in downstream pkg

underlying case is polars import in mrmr-selection, see: RasmussenLab/njab#13

* 🐛 try to allow parallel installations for R

- ubuntu runs are regularly failing as lock can cause issues
- alternatively one job at a time could be run in the retry

* 🐛 try to manuelly add missing dependency: gmm

* ⚡📝 make ci more robust to installation issues, update README for Mac M1 chicps

---------

Co-authored-by: mpielies <[email protected]>
  • Loading branch information
Henry Webel and mpielies authored Jun 10, 2024
1 parent bf4629f commit ea902a6
Show file tree
Hide file tree
Showing 14 changed files with 74 additions and 31 deletions.
12 changes: 10 additions & 2 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -69,17 +69,25 @@ jobs:
mkdir runs
papermill 04_1_train_pimms_models.ipynb runs/04_1_train_pimms_models.ipynb
papermill 04_1_train_pimms_models.ipynb runs/04_1_train_pimms_models_no_val.ipynb -p sample_splits False
- name: Run demo workflow (integration test)
- name: Dry-Run demo workflow (integration test)
continue-on-error: true
run: |
cd project
snakemake -p -c1 --configfile config/single_dev_dataset/example/config.yaml -n
- name: Run demo workflow (integration test)
continue-on-error: true
run: |
cd project
snakemake -p -c4 -k --configfile config/single_dev_dataset/example/config.yaml
- name: Run demo workflow again (in case of installation issues)
continue-on-error: true
run: |
cd project
snakemake -p -c1 -n --configfile config/single_dev_dataset/example/config.yaml
snakemake -p -c4 -k --configfile config/single_dev_dataset/example/config.yaml
- name: Run demo workflow again (in case of installation issues) - one thread
run: |
cd project
snakemake -p -c1 --configfile config/single_dev_dataset/example/config.yaml
- name: Archive results
# https://github.com/actions/upload-artifact
uses: actions/upload-artifact@v4
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/workflow_website.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,15 @@ jobs:
cd project
snakemake -s workflow/Snakefile_v2.smk --configfile config/alzheimer_study/config.yaml -p -c4 -k
- name: Run demo workflow again (in case of installation issues)
continue-on-error: true
run: |
cd project
snakemake -s workflow/Snakefile_v2.smk --configfile config/alzheimer_study/config.yaml -p -c4 -k
- name: Run demo workflow again (in case of installation issues) with one thread
continue-on-error: true
run: |
cd project
snakemake -s workflow/Snakefile_v2.smk --configfile config/alzheimer_study/config.yaml -p -c1 -k
- name: Run differential analysis workflow
run: |
cd project
Expand Down
17 changes: 12 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,17 +127,24 @@ mamba env create -n pimms -f environment.yml # faster, less then 5mins

If on Mac M1, M2 or having otherwise issue using your accelerator (e.g. GPUs): Install the pytorch dependencies first, then the rest of the environment:

### Install pytorch first (M-chips)
### Install pytorch first

> :warning: We currently see issues with some installations on M1 chips. A dependency
> for one workflow is polars, which causes the issue. This should be [fixed now](https://github.com/RasmussenLab/njab/pull/13)
> for general use by delayed import
> of `mrmr-selection` in `njab`. If you encounter issues, please open an issue.
Check how to install pytorch for your system [here](https://pytorch.org/get-started).

- select the version compatible with your cuda version if you have an nvidia gpu or a Mac M-chip.

```bash
conda create -n vaep python=3.9 pip
conda activate vaep
# Follow instructions on https://pytorch.org/get-started
# conda env update -f environment.yml -n vaep # should not install the rest.
conda create -n pimms python=3.9 pip
conda activate pimms
# Follow instructions on https://pytorch.org/get-started:
# CUDA is not available on MacOS, please use default package
# pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
conda install pytorch::pytorch torchvision torchaudio fastai -c pytorch -c fastai -y
pip install pimms-learn
pip install jupyterlab papermill # use run notebook interactively or as a script

Expand Down
6 changes: 6 additions & 0 deletions project/01_1_train_NAGuideR_methods.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@
# - BiocManager could be moved to methods who are installed from BioConductor

# + tags=["hide-input"] vscode={"languageId": "r"}
# options("install.lock"=FALSE)

packages_base_R <-
c("BiocManager", "reshape2", "data.table", "readr", "tibble")

Expand Down Expand Up @@ -130,6 +132,7 @@ nafunctions <- function(x, method = "zero") {
else if (method == "qrilc") {
install_bioconductor("impute")
install_bioconductor("pcaMethods")
install_rpackage('gmm')
install_rpackage('imputeLCMD')
xxm <- t(df1)
data_zero1 <-
Expand All @@ -139,13 +142,15 @@ nafunctions <- function(x, method = "zero") {
else if (method == "mindet") {
install_bioconductor("impute")
install_bioconductor("pcaMethods")
install_rpackage('gmm')
install_rpackage('imputeLCMD')
xxm <- as.matrix(df1)
df <- imputeLCMD::impute.MinDet(xxm, q = 0.01)
}
else if (method == "minprob") {
install_bioconductor("impute")
install_bioconductor("pcaMethods")
install_rpackage('gmm')
install_rpackage('imputeLCMD')
xxm <- as.matrix(df1)
df <-
Expand Down Expand Up @@ -278,6 +283,7 @@ nafunctions <- function(x, method = "zero") {

install_bioconductor("impute")
install_bioconductor("pcaMethods")
install_rpackage('gmm')
install_rpackage('imputeLCMD')
install_rpackage("magrittr")
install_rpackage("glmnet")
Expand Down
6 changes: 6 additions & 0 deletions project/01_1_train_NAGuideR_methods.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@
},
"outputs": [],
"source": [
"# options(\"install.lock\"=FALSE)\n",
"\n",
"packages_base_R <-\n",
" c(\"BiocManager\", \"reshape2\", \"data.table\", \"readr\", \"tibble\")\n",
"\n",
Expand Down Expand Up @@ -160,6 +162,7 @@
" else if (method == \"qrilc\") {\n",
" install_bioconductor(\"impute\")\n",
" install_bioconductor(\"pcaMethods\")\n",
" install_rpackage('gmm')\n",
" install_rpackage('imputeLCMD')\n",
" xxm <- t(df1)\n",
" data_zero1 <-\n",
Expand All @@ -169,13 +172,15 @@
" else if (method == \"mindet\") {\n",
" install_bioconductor(\"impute\")\n",
" install_bioconductor(\"pcaMethods\")\n",
" install_rpackage('gmm')\n",
" install_rpackage('imputeLCMD')\n",
" xxm <- as.matrix(df1)\n",
" df <- imputeLCMD::impute.MinDet(xxm, q = 0.01)\n",
" }\n",
" else if (method == \"minprob\") {\n",
" install_bioconductor(\"impute\")\n",
" install_bioconductor(\"pcaMethods\")\n",
" install_rpackage('gmm')\n",
" install_rpackage('imputeLCMD')\n",
" xxm <- as.matrix(df1)\n",
" df <-\n",
Expand Down Expand Up @@ -308,6 +313,7 @@
" \n",
" install_bioconductor(\"impute\")\n",
" install_bioconductor(\"pcaMethods\")\n",
" install_rpackage('gmm')\n",
" install_rpackage('imputeLCMD')\n",
" install_rpackage(\"magrittr\")\n",
" install_rpackage(\"glmnet\")\n",
Expand Down
3 changes: 0 additions & 3 deletions vaep/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,7 @@
from importlib import metadata

import njab
import pandas as pd
import pandas.io.formats.format as pf

# from . import logging, nb, pandas, plotting
import vaep.logging
import vaep.nb
import vaep.pandas
Expand Down
2 changes: 1 addition & 1 deletion vaep/analyzers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"""
from types import SimpleNamespace

from . import compare_predictions, diff_analysis
from vaep.analyzers import compare_predictions, diff_analysis

__all__ = ['diff_analysis', 'compare_predictions', 'Analysis']

Expand Down
2 changes: 0 additions & 2 deletions vaep/data_handling.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@
import numpy as np
import pandas as pd

# coverage


def coverage(X: pd.DataFrame, coverage_col: float, coverage_row: float):
"""Select proteins by column depending on their coverage.
Expand Down
2 changes: 0 additions & 2 deletions vaep/filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@

logger = logging.getLogger(__name__)

# ! use in data selection and tutorial


def select_features(df: pd.DataFrame,
feat_prevalence: float = .2,
Expand Down
3 changes: 1 addition & 2 deletions vaep/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,7 @@
from fastcore.foundation import L

import vaep

from . import ae, analysis, collab, vae
from vaep.models import ae, analysis, collab, vae

logger = logging.getLogger(__name__)

Expand Down
2 changes: 1 addition & 1 deletion vaep/models/ae.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
import vaep.models
import vaep.transform

from . import analysis
from vaep.models import analysis

logger = logging.getLogger(__name__)

Expand Down
3 changes: 1 addition & 2 deletions vaep/models/collab.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,7 @@

import vaep.io.dataloaders
import vaep.io.datasplits

from . import analysis
from vaep.models import analysis

logger = logging.getLogger(__name__)

Expand Down
34 changes: 28 additions & 6 deletions vaep/pandas/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,30 @@
import omegaconf
import pandas as pd

from .calc_errors import calc_errors_per_feat, get_absolute_error
from vaep.pandas.calc_errors import calc_errors_per_feat, get_absolute_error

__all__ = [
'calc_errors_per_feat',
'get_absolute_error',
'unique_cols',
'get_unique_non_unique_columns',
'prop_unique_index',
'replace_with',
'index_to_dict',
'get_columns_accessor',
'get_columns_accessor_from_iterable',
'select_max_by',
'get_columns_namedtuple',
'highlight_min',
'_add_indices',
'interpolate',
'flatten_dict_of_dicts',
'key_map',
'parse_query_expression',
'length',
'get_last_index_matching_proportion',
'get_lower_whiskers',
'get_counts_per_bin']


def unique_cols(s: pd.Series) -> bool:
Expand Down Expand Up @@ -285,16 +308,15 @@ def get_lower_whiskers(df: pd.DataFrame, factor: float = 1.5) -> pd.Series:
return ret


def get_counts_per_bin(df: pd.DataFrame, bins: range, columns: Optional[List[str]] = None) -> pd.DataFrame:
def get_counts_per_bin(df: pd.DataFrame,
bins: range,
columns: Optional[List[str]] = None) -> pd.DataFrame:
"""Return counts per bin for selected columns in DataFrame."""
counts_per_bin = dict()
if columns is None:
columns = df.columns.to_list()
for col in columns:
_series = (pd.cut(df[col], bins=bins)
.to_frame()
.groupby(col)
.size())
_series = (pd.cut(df[col], bins=bins).to_frame().groupby(col).size())
_series.index.name = 'bin'
counts_per_bin[col] = _series
counts_per_bin = pd.DataFrame(counts_per_bin)
Expand Down
7 changes: 2 additions & 5 deletions vaep/plotting/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,8 @@
import seaborn

import vaep.pandas

from . import data, defaults, errors, plotly
from .errors import plot_rolling_error

# from . defaults import order_categories, labels_dict, IDX_ORDER
from vaep.plotting import data, defaults, errors, plotly
from vaep.plotting.errors import plot_rolling_error

seaborn.set_style("whitegrid")
# seaborn.set_theme()
Expand Down

0 comments on commit ea902a6

Please sign in to comment.