Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: write a regression test suite that can be sbatch submitted #616

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

joanise
Copy link
Member

@joanise joanise commented Dec 20, 2024

Starting to address #611

PR Goal?

Start to write a regression test suite that we can sbatch submit to the cluster with GPUs and whatever memory we need.

For now, I'm using a tiny dataset, but it's easy to adjust to something larger.

Feedback sought?

While this is early, you can already try to give it a spin: edit the two user config variables at the top if needed, and run

sbatch EveryVoice/everyvoice/tests/regression-test.sh

Priority?

beta I guess

Tests added?

nothing but

Confidence?

medium, this still needs more work

Version change?

no

Related PRs?

no

TODO

  • use coverage analysis
  • add at least minimal checks to make sure the output files are as expected
  • run the demo and use playwright to click in it
  • add other variants -- there are lots of paths we can take through the EV pipeline!

Copy link

Review changes with  SemanticDiff

@joanise joanise marked this pull request as draft December 20, 2024 20:56
Copy link
Contributor

github-actions bot commented Dec 20, 2024

CLI load time: 0:00.27
Pull Request HEAD: 63ce1a2e4c487140e42f3b958b4289471df83fd4
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package
import time:      1012 |     102363 |     typer.main
import time:       294 |     121565 |   typer
import time:      7891 |     201562 | everyvoice.cli

Copy link

codecov bot commented Dec 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.34%. Comparing base (9a210df) to head (63ce1a2).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #616   +/-   ##
=======================================
  Coverage   76.34%   76.34%           
=======================================
  Files          47       47           
  Lines        3483     3483           
  Branches      479      479           
=======================================
  Hits         2659     2659           
  Misses        721      721           
  Partials      103      103           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@marctessier
Copy link
Collaborator

marctessier commented Jan 9, 2025

This test is hardcoded for GPSC7. ( GPSC7 was busy so I modified for V100 , GPSC5) .

I ran the test

I did get this message in my error_log , where I did not need... ( I can ignore this below..)

cat EV-regress.e5091562
/var/lib/slurm-llnl/slurmd/job5091562/slurm_script: line 22: /home/tes001/start_ev.sh: No such file or directory

BUT I am getting this message:

Your resume-from file is likely out of sync. Aborting.

Can we add this to the script where the error and output log will be written where the script was ran using the jobname "EV-regress" .

#SBATCH --output=./%x.o%j
#SBATCH --error=./%x.e%j
(EveryVoice_pr616) [U20-GPSC5]:$ cat EV-regress.o5091580
╭──────────────────────────────────────────────────────────────────────────────╮
│ Welcome to the EveryVoice Wizard. We will guide you through the process of   │
│ setting up the configuration for a new EveryVoice project.                   │
│                                                                              │
│ Navigation: as any point, you can hit Ctrl+C to: go back a step, view        │
│ progress, save progress, or exit the wizard.                                 │
│                                                                              │
│ From saved progress, you can resume at any time by running the same command  │
│ with the --resume-from option.                                               │
╰──────────────────────────────────────────────────────────────────────────────╯
Applying saved response 'regress' for Name Step
Great! Launching Configuration Wizard 🧙 for project named 'regress'.
Applying saved response 'EveryVoice Regressor' for Contact Name Step
Great! Nice to meet you, 'EveryVoice Regressor'.
Applying saved response '[email protected]' for Contact Email Step
Great! Your contact email '[email protected]' will be saved to your models.
Applying saved response '.' for Output Path Step
The Configuration Wizard 🧙 will put your files here: 'regress'
Applying saved response 'metadata.csv' for Filelist Step
Applying saved response 'Yes, I do have permission to use this data.' for 
Dataset Permission Step
Applying saved response 'psv' for Filelist Format Step
Error: next tour question is Filelist Text Representation Step but resume list 
has Filelist Has Header Line Step instead.
Your resume-from file is likely out of sync. Aborting.

@joanise
Copy link
Member Author

joanise commented Jan 9, 2025

Nice, thank you for this output, I'll take a look when I resume work on this.

@joanise joanise force-pushed the dev.ej/regression branch from 8d7f6ab to 5f89017 Compare January 9, 2025 22:53
@joanise
Copy link
Member Author

joanise commented Jan 9, 2025

@marctessier I have fixed the issues you noticed, and more:

  • I was assuming the metadata file did not have a header, but you added one in your copy of LJ. I added code to skip the first line so my assumptions are met.
  • I now ignore the activate script if it's not found
  • I've reduce the tqdm logging frequency to every >=5s with export TQDM_MININTERVAL=5. Makes batch job logs nicer to read. (Still looking for the equivalent with torch, to make it show the loss on screen less often.)
  • I've merge stderr and stdout into a file in the current directory as you suggested.
  • I now log each command that gets run before it starts, so it's easier to follow the logs afterwards.

@joanise joanise force-pushed the dev.ej/regression branch 3 times, most recently from a06aa56 to a42061e Compare January 16, 2025 22:04
Defining `export TQDM_MININTERVAL=5` (or the desired number of seconds) will
reduce the frequency at which TQDM updates progress bars, which is nicer for
logs that get saved to file.
@joanise joanise marked this pull request as ready for review January 23, 2025 17:21
@joanise
Copy link
Member Author

joanise commented Jan 24, 2025

Feedback from Samuel:

  • fetching bundle.js from unpkg fails if the cluster proxy is not defined; needs graceful fall back like readalongs CLI
  • make sure timing out trying to fetch from unpkg does mean a lot of time waiting for it
  • LJ-150 fails to train with everyvoice.utils:filter_dataset_based_on_target_text_representation_level:96 - Sorry you do not have enough characters data in your current validation filelist to run the model with a batch size of 16.
  • When training fails, exit right away, don't continue to do stuff that's guaranteed to fail
  • in the readme.md, document that you can use sbatch --partition=... --account=... ../../myscript.sh to override the partition and account that are hard-coded in the scripts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants