Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PTV-1882 upload fixes and improvements #189

Open
wants to merge 33 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
d9e2453
ignore local files that make creep into image [PTV-1882]
eapearson Jun 14, 2023
ca66236
ignore more IDEs and convenient developer dir [PTV-1882]
eapearson Jun 14, 2023
fce8e7c
update dependencies, reorder alphabetically, adds pytest-asyncio (par…
eapearson Jun 14, 2023
5b27706
Set of changes to support multiple fixes to upload, as well as necess…
eapearson Jun 14, 2023
81b9b09
move to python 3.11, simplify and improve [PTV-1882]
eapearson Jun 14, 2023
5894114
refactor entrypoint to use gunicorn and remove local dev support [PTV…
eapearson Jun 14, 2023
6e17a5c
testing fixes and improvements [PTV-1882]
eapearson Jun 14, 2023
7c679b6
Run tests in GHA with Python 3.11 [PTV-1882]
eapearson Jun 14, 2023
18aad93
Development tools [PTV-1882]
eapearson Jun 14, 2023
baa17da
Standalone small testing doc [PTV-1882]
eapearson Jun 14, 2023
9222cc3
making codacy, and me, happier [PTV-1882]
eapearson Jun 14, 2023
3651c5d
make random string generator deterministic [PTV-1882]
eapearson Jun 15, 2023
dc30414
more changes to make codacy and me happy, more to come [PTV-1882]
eapearson Jun 15, 2023
66def97
more refactoring to fix usage errors and warnings [PTV-1882]
eapearson Jun 15, 2023
e6efb80
more code quality fixes [PTV-1882]
eapearson Jun 15, 2023
1158081
just a few more linting fixes [PTV-1882]
eapearson Jun 15, 2023
e3164c9
a few more code quality fixes [PTV-1882]
eapearson Jun 15, 2023
2fb8db0
last of the "critical" code smell issues? [PTV-1882]
eapearson Jun 15, 2023
680d81f
couple more for sonarcloud [PTV-1882]
eapearson Jun 15, 2023
e53e1c3
finally fix the file-existence-check race condition [PTV-1882]
eapearson Jun 15, 2023
4479bc2
few more code quality fixes [PTV-1882]
eapearson Jun 15, 2023
1afa4d1
more code quality fixes, thanks to sonarlint [PTV-1882]
eapearson Jun 15, 2023
ba42da0
fix wrong parameter [PTV-1882]
eapearson Jun 15, 2023
61a2f03
don't run local tool container as root [PTV-1882]
eapearson Jun 15, 2023
428a928
does NOSONAR work? [PTV-1882]
eapearson Jun 15, 2023
b360863
exclude usage of md5() fom sonarcloud [PTV-1882]
eapearson Jun 15, 2023
d2ff902
run as kbapp [PTV-1882]
eapearson Jun 15, 2023
571de6f
codacy can get some satisfaction? [PTV-1882]
eapearson Jun 15, 2023
00a0d43
maybe this will make bandit happy [PTV-1882]
eapearson Jun 16, 2023
9ad78e4
paper over corrupt metadata file (race condition), write json not wri…
eapearson Jun 16, 2023
e22b98a
handle temp file better, as well as canceled uploads [PTV-1882]
eapearson Jun 16, 2023
3140c0a
add section for running dev server [PTV-1882]
eapearson Jun 16, 2023
5c36473
dev 'run' command improvement [PTV-1882]
eapearson Jun 16, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Even though the Dockerfile only copies over necessary files, certain
# artifacts may creep in that we should exclude from the image.
.DS_Store
9 changes: 2 additions & 7 deletions .github/workflows/run_tests.yml
Original file line number Diff line number Diff line change
@@ -1,22 +1,17 @@
# This workflow will install Python dependencies, run tests and lint with a single version of Python
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: Run Staging Service Tests

on:
[pull_request]

jobs:
build:

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- name: Set up Python 3.9
- name: Set up Python 3.11
uses: actions/setup-python@v2
with:
python-version: 3.9
python-version: 3.11
- name: Install dependencies
run: |
python -m pip install --upgrade pip
Expand Down
10 changes: 9 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ celerybeat-schedule
.venv
venv/
ENV/
Pipfile

# Spyder project settings
.spyderproject
Expand All @@ -100,8 +101,15 @@ ENV/
# mypy
.mypy_cache/

# IDEs
.idea
.fleet

# misc

data/
.DS_Store
.virtualenvs/
test.env
run_tests_single.sh
run_tests_single.sh
local_work/
34 changes: 15 additions & 19 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,37 +1,33 @@
FROM python:3.9-slim-buster
FROM python:3.11.4-slim-buster
# -----------------------------------------
RUN mkdir -p /kb/deployment/lib
RUN apt-get update && apt-get install -y --no-install-recommends apt-utils
RUN apt-get install -y zip && \
apt-get install -y unzip && \
apt-get install -y bzip2 && \
apt-get install -y libmagic-dev


RUN apt-get install -y htop wget
RUN apt-get update && \
apt-get install -y --no-install-recommends zip=3.0-11+b1 unzip=6.0-23+deb10u3 bzip2=1.0.6-9.2~deb10u2 libmagic-dev=1:5.35-4+deb10u2 htop=2.2.0-1+b1 wget=1.20.1-1.1 && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

# Standard simplified python setup.
COPY ./requirements.txt /requirements.txt
RUN pip install -r /requirements.txt
RUN python -m pip install "pip==23.1.2" && pip install -r /requirements.txt

COPY ./ /kb/module
COPY ./globus.cfg /etc/globus.cfg
RUN touch /var/log/globus.log && chmod 777 /var/log/globus.log
RUN cp -r /kb/module/staging_service /kb/deployment/lib
RUN cp -r /kb/module/deployment /kb

COPY ./staging_service /kb/deployment/lib/staging_service
COPY ./deployment /kb/deployment

# TODO: should be port 5000 to match other kbase services
EXPOSE 3000

WORKDIR /kb/deployment/lib


# The BUILD_DATE value seem to bust the docker cache when the timestamp changes, move to
# the end
LABEL org.label-schema.build-date=$BUILD_DATE \
org.label-schema.vcs-url="https://github.com/kbase/staging_service.git" \
org.label-schema.vcs-ref=$VCS_REF \
org.label-schema.schema-version="1.1.8" \
us.kbase.vcs-branch=$BRANCH \
maintainer="Steve Chan [email protected]"
org.label-schema.vcs-url="https://github.com/kbase/staging_service.git" \
org.label-schema.vcs-ref=$VCS_REF \
org.label-schema.schema-version="1.1.8" \
us.kbase.vcs-branch=$BRANCH \
maintainer="Steve Chan [email protected]"

ENTRYPOINT ["/kb/deployment/bin/entrypoint.sh"]
45 changes: 34 additions & 11 deletions deployment/bin/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,40 @@
# please also update the launch.json for vscode accordingly
# the things that must be kept in sync are KB_DEPLOYMENT_CONFIG and PYTHONPATH

#top section for local running
DIR="$( cd "$( dirname "$0" )" && pwd )"
if [ -d "$DIR/../../staging_service" ]; then
PYTHONPATH="$DIR/../../staging_service"
export KB_DEPLOYMENT_CONFIG="$DIR/../conf/local.cfg"
export FILE_LIFETIME="90"
fi

#bottom section for running inside docker
if [ -d "kb/deployment/lib/staging_service" ]; then
PYTHONPATH="kb/deployment/lib/staging_service"
# environment variable for KB_DEPLOYMENT_CONFIG set in docker-compose.yml
echo "staging_service not installed in /kb/deployment/lib as expected"
exit 1
fi
python3 -m staging_service

PYTHONPATH="kb/deployment/lib/staging_service"

# disabled in favor of running behind gunicorn
# python3 -m staging_service

# The port at which to run this service.
# TODO: this should probably be 5000 in order to operate like other KBase services.
# Who likes surprises like this?
SERVICE_PORT="${SERVICE_PORT:-3000}"

# See https://docs.gunicorn.org/en/latest/run.html
# Recommended to have 2-4 workers per core. Default assumes 1 core.
GUNICORN_WORKERS="${GUNICORN_WORKERS:-4}"

# Workers silent for more than this many seconds are killed and restarted.
# This setting affects the /upload endpoint, as it processes files after
# upload, calculating the md5 and potentially, in the future, additional tasks
# such as validating the file.
# Other endpoints may encounter a delay as well, as any endpoint which access
# metadata may trigger a metadata rebuild, including the md5.
# Default to 10 minutes, a generous amount of time to wait for this to complete.
# Alternative designs could solve this in a different way.
GUNICORN_TIMEOUT="${GUNICORN_TIMEOUT:-600}"

# TODO: should be port 5000 to match other kbase services
gunicorn staging_service.main:web_app \
--bind 0.0.0.0:"${SERVICE_PORT}" \
--worker-class aiohttp.GunicornWebWorker \
--pythonpath "${PYTHONPATH}" \
--workers "${GUNICORN_WORKERS}" \
--timeout "${GUNICORN_TIMEOUT}"
27 changes: 27 additions & 0 deletions development/docker-compose-kbase-ui.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# This docker compose file works with kbase-ui - specifically the kbase-dev
# network that allows proxying service requests from kbase-ui and the narrative.
version: "3.8"
networks:
kbase-dev:
name: kbase-dev
services:
staging_service:
hostname: staging_service
build:
context: .
ports:
# TODO: should be port 5000 to match other kbase services
- "3010:3000"
volumes:
- "./data:/kb/deployment/lib/src/data"
- "./staging_service:/kb/deployment/lib/staging_service"
networks:
- kbase-dev
dns:
- 8.8.8.8
environment:
- KB_DEPLOYMENT_CONFIG=/kb/deployment/conf/deployment.cfg
- FILE_LIFETIME=90
# - SAVE_STRATEGY=SAVE_STRATEGY_TEMP_THEN_COPY
# - SAVE_STRATEGY=SAVE_STRATEGY_SAVE_TO_DESTINATION
# - CHUNK_SIZE=
13 changes: 13 additions & 0 deletions development/scripts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Scripts

## `generate-data-files.py`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these files also appear in the other PR -- should one of the PRs be merged first and then the other one or is there duplicate content?


A small tool, an example function really, to generate files for manual testing of upload and download. I use it by copy/pasting into a Python REPL session in the directory in which I want to generate files, and just run the function with the required parameters.

## `list-uploads.sh`

Handy for listing the contents of the `./data` directory while developing the `/upload` endpoint of the server locally. It simply lists the directory every 0.5s.

## `run`

TODO
15 changes: 15 additions & 0 deletions development/scripts/generate-data-files.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import io


def generate_null_file(size: int, name: str):
"""
Generate a file full of 0 bytes, for testing different
file size scenarios.

e.g. generate_null_file(5100000000, "5.1g.out")
"""
with open(name, "wb") as out:
bw = io.BufferedWriter(out, 10000000)
for _ in range(size):
bw.write(b"\0")
bw.flush()
13 changes: 13 additions & 0 deletions development/scripts/list-uploads.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/bin/bash

#
# A handy script to display the contents of the data directory in a
# terminal window, refreshing every 0.5 seconds and excluding
# any macOS .DS_Store Desktop Services files.
#
while :
do
clear
find data -type f \( ! -name ".DS_Store" \) -ls
sleep 0.5
done
8 changes: 8 additions & 0 deletions development/scripts/run
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/bin/sh
PROJECT_DIRECTORY=${DIR:-$PWD}
echo "Project Directory: ${PROJECT_DIRECTORY}"
docker compose \
-f "${PROJECT_DIRECTORY}/development/tools/runner/docker-compose.yml" \
--project-directory "${PROJECT_DIRECTORY}" \
--project-name tools \
run --rm runner "${@:1}"
7 changes: 7 additions & 0 deletions development/scripts/run-dev-server.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash

echo "Running server in development..."
docker compose \
-f development/docker-compose-kbase-ui.yml \
--project-directory "${PWD}" \
run --rm staging_service
29 changes: 29 additions & 0 deletions development/tools/runner/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
FROM python:3.11.4-slim-buster
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment at the top of the file to explain what this dockerfile is for?

# -----------------------------------------
RUN apt-get update && \
apt-get install -y --no-install-recommends zip=3.0-11+b1 unzip=6.0-23+deb10u3 bzip2=1.0.6-9.2~deb10u2 libmagic-dev=1:5.35-4+deb10u2 htop=2.2.0-1+b1 wget=1.20.1-1.1 && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

SHELL ["/bin/bash", "-c"]

RUN mkdir -p /app

WORKDIR /app

# Standard simplified python setup.
COPY ./requirements.txt /app/requirements.txt
RUN python -m pip install "pip==23.1.2" && pip install -r /app/requirements.txt

ENV PATH="/app:${PATH}"

# Run as a regular user, not root.
RUN addgroup --system kbapp && \
adduser --system --ingroup kbapp kbapp && \
chown -R kbapp:kbapp /app

USER kbapp

ENTRYPOINT [ "/app/development/tools/runner/entrypoint.sh" ]

CMD [ "bash" ]
10 changes: 10 additions & 0 deletions development/tools/runner/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
version: '3.8'
services:
runner:
build:
context: .
dockerfile: development/tools/runner/Dockerfile
volumes:
- .:/app
command:
- bash
17 changes: 17 additions & 0 deletions development/tools/runner/entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/bin/bash

set -e

# Nice to set this up for all future python code being run.
export PYTHONPATH="${PWD}/src"

#
# This execs whatever is provided as a COMMAND to the container. By default, as established
# in the image via the Dockerfile, it is scripts/start-server.sh
# This mechanism, though, allows us to run anything inside the container. We use this
# for running tests, mypy, black, etc. through docker compose.
# Note that surrounding $@ with double quote causes the expansion (which starts with
# parameter 1 not 0) to result in each parameter being retained without splitting.
# Sort of like "$@" -> "$1" "$2" "$3" ..
#
exec "$@"
72 changes: 72 additions & 0 deletions docs/testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Testing

## In Container

You can run tests in a container which uses the same base image and uses the same dependencies. (This container can also run other python tasks.)

```shell
./development/scripts/run run_tests.sh
```

To run tests in individual test file you may supply the path to it. By default, the tests run against `tests/`.

```shell
./development/scripts/run run_tests.sh tests/test_app.py
```

## On Host

### install or activate python 3.11

E.g. macOS with macports:

```shell
sudo port install python311
sudo port select --set python python311
```

### install venv

```shell
python -m venv venv
```

### update pip

```shell
python -m pip install --upgrade pip
```

## Running Dev Server

For integration tests, the server can be stood up inside a docker container:

```shell
./development/scripts/run-dev-server.sh
```

### install deps

```shell
pip install -r requirements.txt
```

### run tests

```shell
./run_tests.sh
```

### Other

#### cherrypick tests

e.g. to just test app.py:

```shell
./run_tests.sh tests/test_app.py
```

#### coverage

check coverage in `htmlcov/index.html`
Loading