Skip to content

Commit

Permalink
Merge pull request #4 from mlexchange/organize
Browse files Browse the repository at this point in the history
Organize
  • Loading branch information
taxe10 authored Sep 24, 2024
2 parents ee63e70 + 9a4f474 commit a3e7d2a
Show file tree
Hide file tree
Showing 21 changed files with 697 additions and 120 deletions.
7 changes: 7 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[flake8]
# 127 is width of the Github code viewer,
# black default is 88 so this will only warn about comments >127
max-line-length = 127
# Ignore errors due to incompatibility with black
#https://black.readthedocs.io/en/stable/guides/using_black_with_other_tools.html
extend-ignore = E203,E701
33 changes: 33 additions & 0 deletions .github/workflows/python-app.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: dimension_reduction_pca

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]

jobs:
test:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
- name: Set up Python 3.9
uses: actions/setup-python@v5
with:
python-version: 3.9
cache: 'pip'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install .
pip install .[dev]
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
run: |
pytest
187 changes: 184 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,186 @@
*~
data/output/
data/upload/
.file_manager_vars
data/upload/
build/


.DS_Store

# Byte-compiled / optimized / DLL files
__pycache__/
data/output/
data/upload/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
EC-lab
lab book
LTspice files
TRS
proc
ESpectrum Stream 00012.bin
Frame Stream 00012.bin
Metadata 00012.json
test.zarr
misc
test_fast.zarr
zarr_utils.py


data/

.vscode/
34 changes: 34 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
default_language_version:
python: python3
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-ast
- id: check-case-conflict
- id: check-merge-conflict
- id: check-symlinks
- id: check-yaml
- id: debug-statements
- repo: https://github.com/gitguardian/ggshield
rev: v1.25.0
hooks:
- id: ggshield
language_version: python3
stages: [commit]
# Using this mirror lets us use mypyc-compiled black, which is about 2x faster
- repo: https://github.com/psf/black-pre-commit-mirror
rev: 24.2.0
hooks:
- id: black
- repo: https://github.com/pycqa/flake8
rev: 7.0.0
hooks:
- id: flake8
- repo: https://github.com/pycqa/isort
rev: 5.13.2
hooks:
- id: isort
args: ["--profile", "black"]
14 changes: 14 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
FROM python:3.9
LABEL maintainer="THE MLEXCHANGE TEAM"

RUN apt-get update
RUN pip3 install --upgrade pip &&\
pip3 install .

WORKDIR /app/work
ENV HOME /app/work
ENV PYTHONUNBUFFERED=1

COPY umap_run.py umap_run.py
COPY src src
CMD ["echo", "running umap"]
7 changes: 5 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,16 @@ test:
echo ${ID_USER}

build_docker:
docker build -t ${IMG_WEB_SVC} -f ./docker/Dockerfile .
docker build -t ${IMG_WEB_SVC} -f ./Dockerfile .

build_podman:
podman build -t ghcr.io/runboj/mlex_dimension_reduction_umap:main -f ./Dockerfile .

run_docker:
docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --memory-swap -1 -it -v ${PWD}/data:/app/work/data/ ${IMG_WEB_SVC} bash

UMAP_example:
docker run -u ${ID_USER $USER}:${ID_GROUP $USER} --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --memory-swap -1 -it -v ${PWD}:/app/work/ ${IMG_WEB_SVC} python umap_run.py data/example_shapes/Demoshapes.npz data/output '{"n_components": 2, "min_dist": 0.1, "n_neighbors": 7}'
docker run -u ${ID_USER $USER}:${ID_GROUP $USER} --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --memory-swap -1 -it -v ${PWD}:/app/work/ ${IMG_WEB_SVC} python src/mlex_dimension_reduction_umap/umap_run.py example_umap.yaml


push_docker:
Expand Down
31 changes: 29 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,38 @@ First, build the dimension reduction image in terminal:

Once built, you can run the following examples:
`make UMAP_example`
which is equivalend to first `make run_docker` then `python umap_run.py data/example_shapes/Demoshapes.npz data/output '{"n_components": 2, "min_dist": 0.1, "n_neighbors": 7}'`.
which is equivalend to first `make run_docker` then `python umap_run.py example_umap.yaml`.

These examples utilize the information stored in the folder /data. The computed latent vectors will be saved in data/output.

#### TODO: run the container interactively
## Developer Setup
If you are developing this library, there are a few things to note.

1. Install development dependencies:

```
pip install .
pip install ".[dev]"
```

2. Install pre-commit
This step will setup the pre-commit package. After this, commits will get run against flake8, black, isort.

```
pre-commit install
```

3. (Optional) If you want to check what pre-commit would do before commiting, you can run:

```
pre-commit run --all-files
```

4. To run test cases:

```
python -m pytest
```

## Copyright
MLExchange Copyright (c) 2023, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy). All rights reserved.
Expand Down
18 changes: 0 additions & 18 deletions docker/Dockerfile

This file was deleted.

10 changes: 0 additions & 10 deletions docker/requirements.txt

This file was deleted.

20 changes: 20 additions & 0 deletions example_umap.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Example for parameters to excecute

# I/O
io_parameters:
uid_retrieve: # uid for feature vectors from autoencoder
data_type: 'file' # either "file" or "tiled"
root_uri: https://tiled-seg.als.lbl.gov/api/v1/metadata/reconstruction/rec20190524_085542_clay_testZMQ_8bit
data_uris: ['20190524_085542_clay_testZMQ_']
data_tiled_api_key:
result_tiled_uri: http://localhost:8888
result_tiled_api_key:
uid_save:
output_dir: 'data/output'
load_model_path:
save_model_path:

model_parameters:
n_components: 2
min_dist: 0.1
n_neighbors: 5
Loading

0 comments on commit a3e7d2a

Please sign in to comment.