Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub wf on self-hosted runner #163

Merged
merged 50 commits into from
Sep 20, 2024
Merged
Show file tree
Hide file tree
Changes from 47 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
24da6d8
reorganizing tests #2
fabiocat93 Sep 18, 2024
01d5961
reorganizing tests #3
fabiocat93 Sep 18, 2024
6475e35
reorganizing tests #6
fabiocat93 Sep 18, 2024
22bba37
reorganizing tests #8
fabiocat93 Sep 18, 2024
6c38697
reorganizing tests #9
fabiocat93 Sep 18, 2024
c4b2ea1
reorganizing tests #10
fabiocat93 Sep 18, 2024
8deeacd
reorganizing tests #11
fabiocat93 Sep 18, 2024
4007ae8
reorganizing tests #12
fabiocat93 Sep 18, 2024
72cc0b1
reorganizing tests #14
fabiocat93 Sep 18, 2024
63db09b
reorganizing tests #16
fabiocat93 Sep 18, 2024
5809725
reorganizing tests #17
fabiocat93 Sep 18, 2024
7dadfbb
reorganizing tests #19
fabiocat93 Sep 18, 2024
48a9b53
reorganizing tests #20
fabiocat93 Sep 18, 2024
362ce38
reorganizing tests #21
fabiocat93 Sep 18, 2024
7c68b83
reorganizing tests #22
fabiocat93 Sep 18, 2024
d565d11
trying with a new machine #2
fabiocat93 Sep 18, 2024
134842f
trying with a new machine #3
fabiocat93 Sep 18, 2024
f95115a
moving to in CI
fabiocat93 Sep 19, 2024
08cf32b
moving to in CI #2
fabiocat93 Sep 19, 2024
289253d
moving to in CI #3
fabiocat93 Sep 19, 2024
64b42f1
moving to in CI #5
fabiocat93 Sep 19, 2024
0b9b89e
moving to in CI #6
fabiocat93 Sep 19, 2024
883553b
moving to in CI #7
fabiocat93 Sep 19, 2024
af19160
moving to in CI #8
fabiocat93 Sep 19, 2024
155603a
adjusting speech enhancement test
fabiocat93 Sep 19, 2024
c63cb74
fixing github token issue #4
fabiocat93 Sep 19, 2024
384ce07
fixing transcribe_timestamped workflow
fabiocat93 Sep 19, 2024
ba1ffac
removing transcribe_timestamped wf #3
fabiocat93 Sep 19, 2024
e3d5acc
fixing issues with cuda #2
fabiocat93 Sep 19, 2024
3548b61
fixing speech enhancing test
fabiocat93 Sep 19, 2024
5485819
fixing speech enhancing test #3
fabiocat93 Sep 19, 2024
83fe489
fixing speech enhancing test #5
fabiocat93 Sep 19, 2024
927ecdd
codecov
fabiocat93 Sep 19, 2024
d98d0ce
codecov #2
fabiocat93 Sep 20, 2024
71c2f66
codecov #3
fabiocat93 Sep 20, 2024
79d1ce9
codecov #4
fabiocat93 Sep 20, 2024
48d61cd
restructure yaml file #3
fabiocat93 Sep 20, 2024
8d58674
restructure yaml file #4
fabiocat93 Sep 20, 2024
7f4acae
splitting the flows #2
fabiocat93 Sep 20, 2024
08ef6db
fixing poetry cache directory
fabiocat93 Sep 20, 2024
f407e64
moving back the repository folder
fabiocat93 Sep 20, 2024
d551ff2
copy instead of mv
fabiocat93 Sep 20, 2024
972b463
Added fixture for cache clearing
wilke0818 Sep 20, 2024
af99bea
Merge branch 'main' into github-wf
wilke0818 Sep 20, 2024
4d98d99
Added docstring
wilke0818 Sep 20, 2024
2f31f42
fixing style issue
fabiocat93 Sep 20, 2024
3a11d97
add check for gpu tests
fabiocat93 Sep 20, 2024
bbd9646
adding tests on gpu for ubuntu-python 3.11
fabiocat93 Sep 20, 2024
f430b41
playing with the flow #2
fabiocat93 Sep 20, 2024
a8aa72b
cleaning
fabiocat93 Sep 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
149 changes: 149 additions & 0 deletions .github/workflows/e2c-runner-tests-310.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we go ahead and add a 3.11 and 3.12 before merging?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added 3.11. would stay like this for now. We can add 3.12 in the future (I want first to check how much money we use with 3.10 and 3.11).

Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
name: e2c-runner-tests-310

on:
pull_request:
types: [opened, synchronize, reopened, labeled]

jobs:
start-runner:
if: github.event.pull_request.draft == false && contains(github.event.pull_request.labels.*.name, 'to-test-gpu')
name: start-runner
runs-on: ubuntu-latest
outputs:
label: ${{ steps.start-ec2-runner.outputs.label }}
ec2-instance-id: ${{ steps.start-ec2-runner.outputs.ec2-instance-id }}
job-ran: ${{ steps.set-ran.outputs.ran }}
steps:
- id: set-ran
run: echo "::set-output name=ran::true"
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_KEY_SECRET }}
aws-region: ${{ vars.AWS_REGION }}
- name: Start EC2 runner
id: start-ec2-runner
uses: machulav/ec2-github-runner@v2
with:
mode: start
github-token: ${{ secrets.GH_TOKEN }}
ec2-image-id: ${{ vars.AWS_IMAGE_ID }}
ec2-instance-type: ${{ vars.AWS_INSTANCE_TYPE }}
subnet-id: ${{ vars.AWS_SUBNET }}
security-group-id: ${{ vars.AWS_SECURITY_GROUP }}


ubuntu-tests-310:
name: ubuntu-tests-310
needs: start-runner
runs-on: ${{ needs.start-runner.outputs.label }}
defaults:
run:
shell: bash
working-directory: ${{ vars.WORKING_DIR }}
strategy:
matrix:
python-version: ['3.10']
env:
WORKING_DIR: ${{ vars.WORKING_DIR }}
POETRY_CACHE_DIR: ${{ vars.WORKING_DIR }}
outputs:
job-ran: ${{ steps.set-ran.outputs.ran }}
steps:
- id: set-ran
run: echo "::set-output name=ran::true"
- uses: actions/checkout@v4
with:
fetch-depth: 1 # no need for the history
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install ffmpeg (Ubuntu)
if: startsWith(matrix.os, 'ubuntu')
run: sudo apt-get update && sudo apt-get install -y ffmpeg
shell: bash
- name: Install Poetry
uses: snok/install-poetry@v1
with:
version: 1.7.1
virtualenvs-create: true
virtualenvs-in-project: true
- name: Check available space
run: |
df -h
shell: bash
- name: Echo python info
run: |
python --version
which python
shell: bash
- name: Copy senselab directory to current directory
run: |
cp -r /actions-runner/_work/senselab/senselab .
- name: Install dependencies with Poetry
run: |
cd senselab
poetry env use ${{ matrix.python-version }}
poetry run pip install iso-639
poetry install --with dev
shell: bash
- name: Check poetry info
run: |
cd senselab
poetry env info
poetry --version
shell: bash
- name: Check NVIDIA SMI details
run: |
cd senselab
poetry run nvidia-smi
poetry run nvidia-smi -L
poetry run nvidia-smi -q -d Memory
shell: bash
- name: Prepare cache folder for pytest
run: mkdir -p $WORKING_DIR/pytest/temp
shell: bash
- name: Run unit tests
id: run-tests
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: >
cd senselab && poetry run pytest \
--rootdir=$WORKING_DIR/pytest \
--basetemp=$WORKING_DIR/pytest/temp \
--junitxml=pytest.xml \
--cov-report=term-missing:skip-covered \
--cov-report=xml:coverage.xml \
--cov=src src/tests \
--log-level=DEBUG \
--verbose
shell: bash

stop-runner:
name: stop-runner
needs:
- start-runner # waits for the EC2 instance to be created
- ubuntu-tests-310 # waits for the actual job to finish
runs-on: ubuntu-latest
if: ${{ needs.start-runner.outputs.job-ran == 'true' && needs.ubuntu-tests-310.outputs.job-ran == 'true' || failure() }} # required to stop the runner even if an error occurred in previous jobs
steps:
- name: Check available space
run: |
df -h
shell: bash
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_KEY_SECRET }}
aws-region: ${{ vars.AWS_REGION }}
- name: Stop EC2 runner
uses: machulav/ec2-github-runner@v2
with:
mode: stop
github-token: ${{ secrets.GH_TOKEN }}
label: ${{ needs.start-runner.outputs.label }}
ec2-instance-id: ${{ needs.start-runner.outputs.ec2-instance-id }}
Original file line number Diff line number Diff line change
@@ -1,55 +1,43 @@
name: Python Tests
name: github-runner-tests

on:
pull_request:
types: [opened, synchronize, reopened, labeled]

jobs:
unit:
macos-tests:
if: github.event.pull_request.draft == false && contains(github.event.pull_request.labels.*.name, 'to-test')
name: macOS-tests
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
fail-fast: true
matrix:
include:
- {os: ubuntu-latest, architecture: x64, python-version: '3.10'}
- {os: ubuntu-latest, architecture: x64, python-version: '3.11'}
- {os: ubuntu-latest, architecture: x64, python-version: '3.12'}
- {os: macos-latest, architecture: x64, python-version: '3.10'}
- {os: macos-latest, architecture: arm64, python-version: '3.10'}
- {os: macos-latest, architecture: x64, python-version: '3.11'}
- {os: macos-latest, architecture: arm64, python-version: '3.11'}
- {os: macos-latest, architecture: x64, python-version: '3.12'}
- {os: macos-latest, architecture: arm64, python-version: '3.12'}
# - {os: windows-latest, architecture: x64, python-version: '3.10'}
# - {os: windows-latest, architecture: x64, python-version: '3.11'}
env:
GITHUB_ACTIONS: true
# - {os: macos-latest, architecture: arm64, python-version: '3.11'}
# the reason why we commented out 3.11 is that it hits github rate limit for some modules (e.g., knn-vc, Camb-ai/mars5-tts)
steps:
- uses: actions/checkout@v4
with: # no need for the history
fetch-depth: 1
with:
fetch-depth: 1 # no need for the history
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install ffmpeg (Ubuntu)
if: startsWith(matrix.os, 'ubuntu')
run: sudo apt-get update && sudo apt-get install -y ffmpeg
shell: bash
- name: Install ffmpeg (macOS)
if: startsWith(matrix.os, 'macos')
run: brew install ffmpeg
- name: Install ffmpeg (Windows)
if: startsWith(matrix.os, 'windows')
run: choco install ffmpeg

- name: Install pipx and ensure it's up to date
run: |
python -m pip install --upgrade pipx
pipx ensurepath
shell: bash
- name: Install poetry
run: pipx install poetry==1.7.1
shell: bash
- name: Install Poetry
uses: snok/install-poetry@v1
with:
version: 1.7.1
virtualenvs-create: true
virtualenvs-in-project: true
- name: Install dependencies with Poetry
run: |
poetry run pip install iso-639
Expand All @@ -58,9 +46,10 @@ jobs:
- name: Run unit tests
id: run-tests
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: >
poetry run pytest \
poetry run pytest -n auto \
--junitxml=pytest.xml \
--cov-report=term-missing:skip-covered \
--cov-report=xml:coverage.xml \
Expand All @@ -74,11 +63,12 @@ jobs:
token: ${{ secrets.CODECOV_TOKEN }}

pre-commit:
if: github.event.pull_request.draft == false && contains(github.event.pull_request.labels.*.name, 'to-test')
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest] # For demonstration, other OSes are commented out: macos-latest, windows-latest
python-version: ['3.10'] # For speeding up the process we removed "3.11" for now
os: [ubuntu-latest]
python-version: ['3.10']
steps:
- uses: actions/checkout@v4
with: # no need for the history
Expand All @@ -87,14 +77,12 @@ jobs:
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install pipx and ensure it's up to date
run: |
python -m pip install --upgrade pipx
pipx ensurepath
shell: bash
- name: Install poetry
run: pipx install poetry==1.7.1
shell: bash
- name: Install Poetry
uses: snok/install-poetry@v1
with:
version: 1.7.1
virtualenvs-create: true
virtualenvs-in-project: true
- name: Install dependencies with Poetry
run: |
poetry run pip install iso-639
Expand All @@ -104,8 +92,6 @@ jobs:
run: pipx install pre-commit
shell: bash
- name: Run pre-commit
env:
SKIP: pytest
run: |
poetry run pre-commit run --all-files
shell: bash
10 changes: 0 additions & 10 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,13 +73,3 @@ repos:
entry: YAML files must have .yaml extension.
language: fail
files: \.yml$

- repo: local
hooks:
- id: pytest
name: pytest
entry: poetry run pytest --testmon
language: system
types: [python]
pass_filenames: false
always_run: true
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ If you feel that the functionality you have added to senselab requires some extr

### An example of well documented function following Google-style

````
```python
import statistics
from typing import Dict, List

Expand Down Expand Up @@ -99,4 +99,4 @@ def calculate_statistics(data: List[float]) -> Dict[str, float]:
'variance': variance,
'std_dev': std_dev
}
````
```
6 changes: 2 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ classifiers = [
"Development Status :: 3 - Alpha",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"License :: OSI Approved :: Apache Software License",
"Operating System :: OS Independent"
]
Expand Down Expand Up @@ -62,16 +61,15 @@ vocos = "~=0.1"
optional = true

[tool.poetry.group.dev.dependencies]
pytest = "~=8.2"
pytest-xdist = {version = "~=3.6.1", extras = ["psutil"]}
pytest-mock = "~=3.14"
pytest-cov = "~=5.0"
mypy = "~=1.9"
pre-commit = "~=3.7"
pytest-cov = "~=5.0"
ruff = "~=0.3"
codespell = "~=2.3"
jupyter = "~=1.0"
ipywidgets = "~=8.1"
pytest-testmon = "~=2.1.1"

[tool.poetry.group.docs]
optional = true
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
"""Workflow for timestamped transcription."""

"""
# TODO: Please double-check this because tests are failing
from senselab.audio.workflows.transcribe_timestamped.transcribe_timestamped import transcribe_timestamped

__all__ = ["transcribe_timestamped"]
"""
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
"""Transcribes audio files with timestamps."""

'''
# TODO: Please double-check this because tests are failing
from typing import List

import pydra
Expand Down Expand Up @@ -77,7 +79,7 @@ def transcribe_task(audios: List[Audio], model: HFModel, language: Language) ->
model=wf.lzin.model,
language=wf.lzin.language,
)
).split("batched_audios", batched_audios=wf.inputs.batched_audios)
).split("batched_audios", batched_audios=wf.transcribe.lzin.batched_audios)

align_transcriptions_task = pydra.mark.task(align_transcriptions)
wf.add(
Expand All @@ -99,3 +101,4 @@ def transcribe_task(audios: List[Audio], model: HFModel, language: Language) ->
sub(wf)

return wf.result()[0].output.aligned_transcriptions
'''
9 changes: 7 additions & 2 deletions src/senselab/text/tasks/embeddings_extraction/huggingface.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,9 @@ def extract_text_embeddings(
device, _ = _select_device_and_dtype(
user_preference=device, compatible_devices=[DeviceType.CUDA, DeviceType.CPU]
)

print(f"Using device: {device}")

# Load tokenizer and model
tokenizer = cls._get_tokenizer(model=model)
ssl_model = cls._load_model(model=model, device=device)
Expand All @@ -87,13 +90,15 @@ def extract_text_embeddings(
# Process each piece of text individually
for text in pieces_of_text:
# Tokenize sentence
encoded_input = tokenizer(text, return_tensors="pt").to(device)
encoded_input = tokenizer(text, return_tensors="pt").to(device.value)

# Compute token embeddings
with torch.no_grad():
model_output = ssl_model(**encoded_input, output_hidden_states=True)
hidden_states = model_output.hidden_states
concatenated_hidden_states = torch.cat([state.unsqueeze(0) for state in hidden_states], dim=0)
concatenated_hidden_states = torch.cat(
[state.to(device.value).unsqueeze(0) for state in hidden_states], dim=0
)
embeddings.append(concatenated_hidden_states.squeeze())

return embeddings
Loading
Loading