feat: write a regression test suite that can be sbatch submitted #616

joanise · 2024-12-20T20:56:41Z

Starting to address #611

PR Goal?

Start to write a regression test suite that we can sbatch submit to the cluster with GPUs and whatever memory we need.

For now, I'm using a tiny dataset, but it's easy to adjust to something larger.

Feedback sought?

While this is early, you can already try to give it a spin: edit the two user config variables at the top if needed, and run

sbatch EveryVoice/everyvoice/tests/regression-test.sh

Priority?

beta I guess

Tests added?

nothing but

Confidence?

medium, this still needs more work

Version change?

no

Related PRs?

no

TODO

use coverage analysis
add at least minimal checks to make sure the output files are as expected
run the demo and use playwright to click in it
add other variants -- there are lots of paths we can take through the EV pipeline!

semanticdiff-com · 2024-12-20T20:56:44Z

Review changes with

github-actions · 2024-12-20T21:01:00Z

CLI load time: 0:00.27
Pull Request HEAD: 63ce1a2e4c487140e42f3b958b4289471df83fd4
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package
import time:      1012 |     102363 |     typer.main
import time:       294 |     121565 |   typer
import time:      7891 |     201562 | everyvoice.cli

codecov · 2024-12-20T21:13:27Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.34%. Comparing base (9a210df) to head (63ce1a2).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #616   +/-   ##
=======================================
  Coverage   76.34%   76.34%           
=======================================
  Files          47       47           
  Lines        3483     3483           
  Branches      479      479           
=======================================
  Hits         2659     2659           
  Misses        721      721           
  Partials      103      103

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

marctessier · 2025-01-09T19:05:51Z

This test is hardcoded for GPSC7. ( GPSC7 was busy so I modified for V100 , GPSC5) .

I ran the test

I did get this message in my error_log , where I did not need... ( I can ignore this below..)

cat EV-regress.e5091562
/var/lib/slurm-llnl/slurmd/job5091562/slurm_script: line 22: /home/tes001/start_ev.sh: No such file or directory

BUT I am getting this message:

Your resume-from file is likely out of sync. Aborting.

Can we add this to the script where the error and output log will be written where the script was ran using the jobname "EV-regress" .

#SBATCH --output=./%x.o%j
#SBATCH --error=./%x.e%j

(EveryVoice_pr616) [U20-GPSC5]:$ cat EV-regress.o5091580
╭──────────────────────────────────────────────────────────────────────────────╮
│ Welcome to the EveryVoice Wizard. We will guide you through the process of   │
│ setting up the configuration for a new EveryVoice project.                   │
│                                                                              │
│ Navigation: as any point, you can hit Ctrl+C to: go back a step, view        │
│ progress, save progress, or exit the wizard.                                 │
│                                                                              │
│ From saved progress, you can resume at any time by running the same command  │
│ with the --resume-from option.                                               │
╰──────────────────────────────────────────────────────────────────────────────╯
Applying saved response 'regress' for Name Step
Great! Launching Configuration Wizard 🧙 for project named 'regress'.
Applying saved response 'EveryVoice Regressor' for Contact Name Step
Great! Nice to meet you, 'EveryVoice Regressor'.
Applying saved response '[email protected]' for Contact Email Step
Great! Your contact email '[email protected]' will be saved to your models.
Applying saved response '.' for Output Path Step
The Configuration Wizard 🧙 will put your files here: 'regress'
Applying saved response 'metadata.csv' for Filelist Step
Applying saved response 'Yes, I do have permission to use this data.' for 
Dataset Permission Step
Applying saved response 'psv' for Filelist Format Step
Error: next tour question is Filelist Text Representation Step but resume list 
has Filelist Has Header Line Step instead.
Your resume-from file is likely out of sync. Aborting.

joanise · 2025-01-09T19:11:56Z

Nice, thank you for this output, I'll take a look when I resume work on this.

joanise · 2025-01-09T22:57:34Z

@marctessier I have fixed the issues you noticed, and more:

I was assuming the metadata file did not have a header, but you added one in your copy of LJ. I added code to skip the first line so my assumptions are met.
I now ignore the activate script if it's not found
I've reduce the tqdm logging frequency to every >=5s with export TQDM_MININTERVAL=5. Makes batch job logs nicer to read. (Still looking for the equivalent with torch, to make it show the loss on screen less often.)
I've merge stderr and stdout into a file in the current directory as you suggested.
I now log each command that gets run before it starts, so it's easier to follow the logs afterwards.

Defining `export TQDM_MININTERVAL=5` (or the desired number of seconds) will reduce the frequency at which TQDM updates progress bars, which is nicer for logs that get saved to file.

Starting to address #611

joanise · 2025-01-24T22:59:06Z

Feedback from Samuel:

fetching bundle.js from unpkg fails if the cluster proxy is not defined; needs graceful fall back like readalongs CLI
make sure timing out trying to fetch from unpkg does mean a lot of time waiting for it
LJ-150 fails to train with everyvoice.utils:filter_dataset_based_on_target_text_representation_level:96 - Sorry you do not have enough characters data in your current validation filelist to run the model with a batch size of 16.
When training fails, exit right away, don't continue to do stuff that's guaranteed to fail
in the readme.md, document that you can use sbatch --partition=... --account=... ../../myscript.sh to override the partition and account that are hard-coded in the scripts

joanise requested review from SamuelLarkin, roedoejet and marctessier December 20, 2024 20:56

joanise marked this pull request as draft December 20, 2024 20:56

joanise force-pushed the dev.ej/regression branch from 8d7f6ab to 5f89017 Compare January 9, 2025 22:53

joanise force-pushed the dev.ej/regression branch 3 times, most recently from a06aa56 to a42061e Compare January 16, 2025 22:04

joanise added 2 commits January 23, 2025 11:26

build(deps): require tqdm>=4.66 to support TQDM_MININTERVAL=5 in jobs

cbebf70

Defining `export TQDM_MININTERVAL=5` (or the desired number of seconds) will reduce the frequency at which TQDM updates progress bars, which is nicer for logs that get saved to file.

feat: write a regression test suite that can be sbatch submitted

63ce1a2

Starting to address #611

joanise force-pushed the dev.ej/regression branch from f72019e to 63ce1a2 Compare January 23, 2025 17:17

joanise marked this pull request as ready for review January 23, 2025 17:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: write a regression test suite that can be sbatch submitted #616

feat: write a regression test suite that can be sbatch submitted #616

joanise commented Dec 20, 2024 •

edited

Loading

semanticdiff-com bot commented Dec 20, 2024

github-actions bot commented Dec 20, 2024 •

edited

Loading

codecov bot commented Dec 20, 2024 •

edited

Loading

marctessier commented Jan 9, 2025 •

edited by joanise

Loading

joanise commented Jan 9, 2025

joanise commented Jan 9, 2025

joanise commented Jan 24, 2025

feat: write a regression test suite that can be sbatch submitted #616

Are you sure you want to change the base?

feat: write a regression test suite that can be sbatch submitted #616

Conversation

joanise commented Dec 20, 2024 • edited Loading

PR Goal?

Feedback sought?

Priority?

Tests added?

Confidence?

Version change?

Related PRs?

TODO

semanticdiff-com bot commented Dec 20, 2024

github-actions bot commented Dec 20, 2024 • edited Loading

codecov bot commented Dec 20, 2024 • edited Loading

Codecov Report

marctessier commented Jan 9, 2025 • edited by joanise Loading

joanise commented Jan 9, 2025

joanise commented Jan 9, 2025

joanise commented Jan 24, 2025

joanise commented Dec 20, 2024 •

edited

Loading

github-actions bot commented Dec 20, 2024 •

edited

Loading

codecov bot commented Dec 20, 2024 •

edited

Loading

marctessier commented Jan 9, 2025 •

edited by joanise

Loading