Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull Request For Research Purposes #166

Open
wants to merge 37 commits into
base: runner_dev_jeremy
Choose a base branch
from

Conversation

TimotheeeNiven
Copy link

This is a pull request to work with Professor Holleman on research. The branch right now has changes to most of the reference submissions and how the runner works.

TimotheeeNiven and others added 2 commits November 12, 2024 13:26
Added troubleshooting notes for 'nonetype' error
@TimotheeeNiven TimotheeeNiven requested a review from a team as a code owner December 4, 2024 15:51
Copy link

github-actions bot commented Dec 4, 2024

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

TimothyNiven and others added 26 commits December 5, 2024 16:50
* moved training_torch to experimental and added a README

* starting to move code into here

* some updates to streaming wakeword

* updated streaming wakeword model to be the actual candidate DS-TCN model w/ no residual layers

* set default features to 40-D LFBEs

* changed num_classes to 3 in train.py

* demo notebook (in progress) added

* demo notebook runs through small training run

* added  count_labels and is_batched()

* demo now adds some silence waveforms (which then have noise added) to training dataset

* updated get_dataset (mostly copied from demo.ipynb) and removed use_sam part from train.py

* fixed default model architecture flag

* fixed some issues with building model

* added from_logits argument to model.compile

* cleanup changes to get_dataset and demo notebook

* catching up on edits

* keras_model does not need tf datasets module

* cleaning up demo notebook

* cleaning up demo notebook

* made path to speech commands dataset easier to config per location/user without upsetting git

* beginning of code to test long waveform in python

* some updates

* added option to read in model config file

* set validation set to incorporate background noise.  also fixed issue where background_volume option was being ignored

* moved code to add silent (or white noise) frames to dataset into its function.  Applies to val set now, but still not test set

* fixed argument error

* added post-training quantization

* several changes in order to use QAT and evaluate on long waveforms:
Replaced flatten with reshape in order to preserve time duration.
Adding empty noise frames and duplicates of the wakeword to
validationset (instead of just training)
number of duplicates and noise level are now command line flags and are
separate for training and validation sets.
Input shape is now (batch, time, 1, features) (None,49,1,40) instead of
just (None,49,40) to avoid the extra expand_dims layer, which caused
problems for QAT.

* notebook updated to work with last commits on get_data, keras_model

* changed default LR schedule to reduce_on_plateau so it scales better with more epochs

* some more edits to get QAT working

* changed labels to one-hot to work with precision/recall metrics. Also changed feature extractor code in dataset preparation to optionally work in a standalone model.

* added notebook to develop tflite model for feature extraction

* removed some prints from get_dataset.  added an evaluation to train

* adjusted reduce lr on plateau settings

* fixed plotting error

* working on different options to run the feature extractor on MCU

* small changes to notebook

* removed old commented-out code that loaded pre-built dataset

* tflite_feature_extractor.ipynb very much a work in progress

* added setup instructions and a  to the streaming wakeword benchmark (mlcommons#155)

* cache datasets after spectrogram computation to avoid recomputing them at every epoch

* fixed data_dir default to point to speech_commands_v0.02

* fixed data_dir default to point to speech_commands_v0.02

* added BooleanOptionalAction to correctly parse boolean Flags

* fixed parsing of bool args (use_qat, run_test_set) to work with python 3.8

* changed so parse_command raises exception on unrecognized flags

* changed so parse_command raises exception on unrecognized flags

* added foreground scaling args foreground_volume_min, _max to train on quiter wakewords.  Also changed defaults for data/background paths (to HOME/speech_commands_v0.02) to align with default filename

* set is_training true for ds_val so it gets noise added

* edits to str ww model

* edits to data set building

* saved training history along with plot

* removed average pooling, increased initial feature stride

* Fixed bug where np.random is only evaluated at graph creation, so all foregrounds are scaled by the same amount.  Also added condition so empty frames are not added to calibration set.

* fixed several places where np.random was used in a tf graph, resulting in the same value being used for the whole dataset

* widened filters in 2nd,3rd layers to 128

* changed back from 32 LFBEs to 40

* minor cleanup -- whitespace, removing old commented out lines, etc.

* fixed error - val set was using target words from training set

* minor cleanup -- whitespace, removing old commented out lines, etc.

* changed ordering in data prep, now shuffle before batching

* adding current version of trained and quantized streaming ww model

* minor edits/cleanup

* changed Flags.num_train_samples to num_samples_training. same for test, validation. refactoring get_dataset code

* added 1st pass at get_data_config(), refactoring dataset build

* refactored dataset building. train.py runs now, have not tested performance

* setup_example is work in progress, just capturing progress

* train.py runs but gives random-level validation accuracy.  demo notebook fails

* flag parsing used 'train' instead of 'training' and therefore was not shuffling the training set

* updated demo to match changes in data

* minor updates

* dumps options as json into plot_dir

* fixed demo to work with new get_data code.  moved take after shuffle so subsets are correctly mixed. but makes even runs on a small subset of training data slow, because shuffle runs on everything

* moved softmax to inside the model; adjusted loss function accordingly

* moved softmax calculation into the model

* working on true pos/false pos computation

* fixed error, post-wakeword extension was being added twice

* fixing notebook counting of true/false positives

* removed commented-out code; added zero2nan()

* added multiple background noise paths, can split long bg files into smaller chunks, added clip_len_samples arg to prepare_bg_data

* change QAT initial LR to Flags.lr, LR is too small after float pre-training

* fixed cmd line arg processing to accomodate multile bg noise paths

* removed commented out code from demo notebook

* convert only-target dataset to numpy array and back so cardinality() works

* refactored num_silent, num_repeats in to fraction_silent and fraction_target to make varied-length experiments work better

* fixed cmd line arg processing to accomodate multile bg noise paths

* fixing code for smaller datasets

* catching up on demo edits

* added code to run quantized model on long waveform

* working on long wav file creation; added poisson process to place wake words

* updated long wave creation, need to move it to a separate file soon. moved get_true_and_false_detections() to util

* increased number of background files from 50 to 100

* added code to illustrate false detects/rejects

* updating background noise creation to avoid train/val duplicates

* added exclude_background_files.txt

* put code to build the long test wav into its own (two) files

* added eval_long_wav.py to test fpr, fnr on a long wav

* made build_long_wav work with the musan_path from streaming_config.json

* made build_long_wav work with the musan_path from streaming_config.json

* fixed a typo

* fixed issue with path construction in long wav spec

* added l2 reg to conv layers

* added L2 reg to conv layers

* removed some old commented-out code

* eval_long_wav can now test either h5 models or tflite models

* added script to create indices into the val set for calibration

* code to create calibration set is working

* fixed quantize.py to work with extracted npz calibration set

* adjusted volume of foreground and background for testing

* added code to save spectrogram in build_long_wav.py

* demo notebook should now work with current code

* separated augmentation (built by get_augment_wavs_func()) and feature extractor (from get_lfbe_func())

* made l2 reg parameter a commmand line flag

* fixed eval_long_wav to work with feature extractor changes

* added validation set measurments to eval_long_wav.py

* moved eval_long_wav to evaluate.py

* added threshold=0.95 to precision/recall metrics to match evaluate.py

* added a list of 'bad' marvin wav files. modified build_long_wav_spec and get_dataset to exclude them

* edited comment on saved_model_path to reflect evaluate.py

* added bad_marvin_files.txt

* fixed error in number of unknown samples for reduced runs

* renamed build_long_wav_spec.py -> build_long_wav_def.py to avoid ambiguity of spec (specification vs spectrogram)

* renamed features back to audio to allow easy skipping of feature extraction

* minor edits

* removed debug print statement

* fixed code for tflite models

* adjusted some default training params

* catching notebook up to other code

* clearing out some debug prints

* added trained model

* moved label_count out of model_settings into a flag

* minor edits

* added random timeshift

* added a couple more bad marvins to exclude

* added flag to enforce a minimum SNR

* centralized data paths in streaming_config.json (no command line arguments needed or observed)

* removed some obsolete cmd line args and modified get_dataset to respect time_shift_ms

* fixed evaulate.py to work with changes on speech_commands_path

* changed evaluate and quantize to use model_init_path, so by default the reference model will still be used

* adjusted trainign params

* adjusted trainign params

* updated long wav info

* updated README

* updated reference model

* removed some info messages

* add line to create plot_dir if it does not exist

* reduced noise level in long wav

* refactored command line argument parsing

* refactored command line argument parsing

* fixed some errors in README

* fixed quantize to use saved_model_path instead of model_init_path

* added calibration_samples.npz

* fixing argument processing for evaluate.py to work with either tflite or h5 model

* fixed typo in evaluate.py

* fixed typo in evaluate

* updated tflite model

* fixed issue with plot_dir

* ignoring trained models other than reference model

* updated readme

* updated demo notebook

* added note about the demo notebook to readme

* merged work from runner_dev_jeremy

---------

Co-authored-by: Alexander Montgomerie-Corcoran <[email protected]>
Co-authored-by: Peter Chang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants