Skip to content

Commit

Permalink
ENH: Switch to version 1.0 of config file format, fix #685 #345 #748 (#…
Browse files Browse the repository at this point in the history
…750)

* WIP: Add src/vak/config/dataset.py

* Add module-level docstring + type annotations in src/vak/config/parse.py

* WIP: Fix how cli.prep adds dataset path to toml config file

* Change table names in src/vak/config/valid.toml

* Rename section -> table in config/parse.py

* In cli/prep change 'section' -> 'table' and lowercase table names

* In config/config.py, change 'section' -> 'table' and lowercase table names

* Change '[PREP]' -> '[vak.prep]' in config/prep.py

* WIP: Change table names in config files in tests/data_for_tests/configs

* Make tomlkit a dependency in pyproject.toml, drop toml

* Change config/parse.py to use tomlkit

* Update example configs in doc/toml/

* Add link to example config files in docs, in error messages in config/validators.py

* Remove 'spect_params' from REQUIRED_OPTIONS in config/parse.py, this is not a top-level table and will be an attribute of prep instead

* Rename 'config_toml' -> 'config_dict' in config/parse.py

* Fix function _validate_tables_arg_convert_list in config/parse.py

* Fix error message formatting in src/vak/config/validators.py

* Add ModelConfig class to config/model.py, add type annotations, fix config_from_toml_dict to look in specific section

* Fixup fixing config_from_toml_dict to look in specific section

* Rewrite config/eval.py with 'modern' attrs

* Fixup rewrite config/eval with 'modern attrs

* Rewrite config/learncurve.py with 'modern' attrs

* Rewrite config/predict.py with 'modern' attrs

* Rewrite config/prep.py with 'modern' attrs

* Rewrite config/train.py with 'modern' attrs

* Rename Dataset -> DatasetConfig in config/dataset.py

* Add are_table_options_valid to config/validators.py, will be used by classmethods from_config_dict

* WIP: Add from_config_dict classmethod to EvalConfig

* WIP: Add tests/test_config/test_dataset.py

* Make fixes to ModelConfig class, fix circular imports in config/model.py module

* Write tests in tests/test_config/test_dataset.py

* Use tomlkit not toml in cli/prep.py

* Use tomlkit in tests/fixtures/annot.py

* Use tomlkit in tests/scripts/vaktestdata/configs.py

* Use tomlkit in tests/scripts/vaktestdata/source_files.py

* Use tomlkit in tests/test_config/test_validators.py

* Remove spect_params attribute from Config in config/config.py, fix class' docstring

* Reorder attributes, fix typo in docstring of DatasetConfig

* Rewrite config/parse.py assuming config classes have from_config_dict classmethod

* Rename `table` -> `table_name` in a couple validators in config/validators.py

* Remove use of config.model.config_from_toml_path in cli/eval.py

* Remove use of config.model.config_from_toml_path in cli/learncurve.py

* Remove use of config.model.config_from_toml_path in cli/predict.py

* Remove use of config.model.config_from_toml_path in cli/train.py

* Remove functions from config/model.py: config_from_toml_path and config_from_toml_dict

* Add `to_dict` method to ModelConfig

* Use to_dict() method of ModelConfig class in cli functions

* Fix how we get labelset from config in tests/fixtures/annot.py

* WIP: Clean up / rewrite tests/fixtures/config.py

* Fix model tables in tests/data_for_tests/configs

* Finish unit tests in tests/test_config/test_model.py

* Fix model tables in doc/toml

* Rename data_for_tests/configs/invalid_option_config.toml -> invalid_key_config.toml

* Rename are_options_valid/are_table_options_valid -> are_keys_valid/are_table_keys_valid in config/validators.py

* Rename two fixtures in fixtures/config.py: invalid_section_config_path -> invalid_table_config_path, invalid_option_config_path -> invalid_key_config_path

* Fix validator names in config/parse.py, rename TABLE_CLASSES constant -> TABLE_CLASSES_MAP

* Rename config/valid.toml -> valid-version-1.0.toml, fix how model table is declared

* Fix VALID_TOML_PATH in config/validators.py after renaming config/valid.toml -> config/valid-version-1.0.toml

* Import config classes in vak/config/__init__.py

* Add _tomlkit_to_popo to tests/fixtures/config.py so we operate on dicts not tomlkit.TOMLDocument

* Add _tomlkit_to_popo to config/parse.py so we operate on dicts not tomlkit.TOMLDocument

* Finish rewriting tests for tests/test_config/test_prep.py

* Rewrite EvalConfig with from_config_dict method

* Rewrite LearncurveConfig with from_config_dict method

* Rewrite PredictConfig with from_config_dict method

* Rewrite PrepConfig with from_config_dict method

* Rewrite TrainConfig with from_config_dict method

* Remove functions from config/parse.py

* Rename config/parse.py -> config/load.py

* Make functions in config/parse.py into classmethods on Config class

* Use config.Config.from_toml_path everywhere instead of config.parse.from_toml_path

* Make fixes in Config classmethods

* Change load._load_toml_from_path again so that it returns config_dict['vak'], to avoid writing ['vak'] everywhere in calling functions

* Add docstring to are_tables_valid in config/validators.py

* Lowercase config table names in tests/scripts/vaktestdata/configs.py

* In tests/scripts/vaktestdata/source_files.py, change cfg.spect_params -> cfg.prep.spect_params, fix how we change values in toml, add tables_to_parse arg to call to Config.from_toml_path

* in test_cli/test_prep.py, call vak.config.load not vak.config.parse

* Fix how we instantiate DatasetConfig and ModelConfig in EvalConfig.from_config_dict method

* Fix how we instantiate DatasetConfig and ModelConfig in PredictConfig.from_config_dict method

* Fix how we instantiate DatasetConfig and ModelConfig in TrainConfig.from_config_dict method

* Fix how we instantiate DatasetConfig and ModelConfig in LearncurveConfig.from_config_dict method

* Remove brekapoint in src/vak/config/model.py

* Fix wrong variable name so we save configs correctly in tests/scripts/vaktestdata/source_files.py, and add tables_to_parse arg to Config.from_toml_path, so we don't get 'missing dataset' errors

* Fix how we re-write configs, in tests/scripts/vaktestdata/configs.py

* Add model and dataset tables to get those keys in top-level tables, in src/vak/config/valid-version-1.0.toml

* Change cfg.table.dataset_path -> cfg.table.dataset.path in vak/cli modules (e.g., vak.train.dataset.path)

* Get tests passing for tests/test_config/test_eval.py

* Clean up tests/test_config/test_eval.py

* Get tests passing in tests/test_config/test_predict.py

* Fix how we access config_toml in tests/scripts/vaktestdata/configs.py -- missing 'vak' key

* Add pytest.mark.parametrize to tests/test_config/test_learncurve.py

* Rewrite tests in tests/test_config/test_train.py

* Rewrite tests in tests/test_config/test_config.py

* Add unit test to tests/test_config/test_model.py

* Add unit test for exceptions in tests/test_config/test_eval.py

* Fix 'cfg.spect_params' -> 'cfg.prep.spect_params' in src/vak/cli/predict.py

* Add unit test for exceptions in tests/test_config/test_learncurve.py

* Add unit test for exceptions in tests/test_config/test_train.py

* Add more test cases to TestEvalConfig.test_from_config_dict_raises

* Add more test cases to TestLearncurveConfig.test_from_config_dict_raises

* Add unit test for exceptions in tests/test_config/test_predict.py

* Add two unit tests that PrepConfig raises expected exceptions

* Fix/add unit tests in tests/test_config/test_config.py

* Change order of parameters for Config.from_config_dict, make toml_path last param

* Fix/add unit tests in tests/fixtures/config.py

* Fix/add unit tests in tests/fixtures/config.py

* Rename test_config/test_parse.py -> test_load.py, fix/rewrite tests

* Fix tests in tests/test_config/test_spect_params.py

* Make fixups in tests/test_config

* Apply fixes from linter

* Make more linting fixes

* Speed up install in nox session 'lint', only install linting tools

* Change names 'section'/'option' -> 'table'/'key' in tests

* Fix tests in tests/test_cli/test_eval.py

* Finish fixing cli tests, fix renaming

* Fix how we get 'path' from 'dataset' table in configs, in tests/fixtures/csv.py

* Fix how we get 'path' from 'dataset' table in configs, in tests/fixtures/dataset.py

* Change .dataset_path -> .dataset.path in tests/

* Fix how we get model config and rename config attribute .dataset_path -> .dataset.path throughout tests

* In tests/, fixup change .dataset_path -> .dataset.path, use model.name where we used to use just 'model' attribute of config

* Fix fixture specific_config_toml_path in fixtures/config.py to handle case where we need to access sub-table and change a key in it--right now this is just [
'dataset']['path']

* Fix how we change ['dataset']['path'] value in tests/test_eval/test_frame_classification.py

* Fix how we change ['dataset']['path'] value in config in several tests

* Use ModelConfig attribute name where needed in tests/test_learncurve/test_frame_classification.py

* In tests, replace calls to vak.config.model.config_from_toml_path with calls to ModelConfig method to_dict()

* Change cfg.spect_params -> cfg.prep.spect_params in tests

* Fix cfg.predict -> cfg.predict.dataset.path in tests/test_predict/test_frame_classification.py

* Fix constant LABELSET_NOTMAT in fixtures/annot.py so it is a list of str, not a Tomlkit.String class

* Fix cfg.learncurve -> cfg.learncurve.dataset.path in tests/test_prep/test_frame/test_learncurve.py

* Fix cfg.learncurve -> cfg.learncurve.dataset.path in tests/test_prep/test_frame/test_learncurve.py

* Cast pathlib to str before adding to tomldoc, in tests/test_train/

* Change transform/dataset params keys in data_for_tests/configs to a dataset table with a params key

* Add `params` attribute to DatasetConfig

* Change transform/dataset params keys in doc/toml/ to a dataset table with a params key

* Rewrite vak/config/model.py method 'to_dict' as 'asdict', using attrs asdict function. We now return 'name' and will just get it from the dict instead of having a separate 'model_name' parameter for functions that take 'model_config'

* Add asdict method to DatasetConfig class, like ModelConfig.asdict

* Fix calls to model.to_dict() -> model.asdict()

* Add unit tests for DatasetConfig.asdict

* Add unit tests for ModelConfig.asdict

* Add an assertion in tests/test_config/test_dataset.py

* Remove transform params and dataset_params from EvalConfig, will just use dataset attribute, a DatasetConfig, with its params attribute

* Remove dataset/transform_params key-value pairs in valid-version-1.0.toml, and add params key to dataset tables with in-line table params

* Remove train/val/dataset/transform_params from TrainConfig, will use DatasetConfig attribute params instead

* Remove train/val/dataset/transform_params from PredictConfig, will use DatasetConfig attribute params instead

* Revise transforms.defaults.frame_classification.TrainItemTransform and change get_default_frame_classification_transform to return an instance of the TrainItemTransform when 'mode' is 'train'

* Make vak.table.dataset.params into an in-line table in toml files in tests/data_for_tests/configs

* Fix attribute name in frame_classification.TrainItemTransform.__init__: source_transform -> frames_transform

* Rewrite datasets.frame_classification.WindowDataset to require item_transform, and assume that it is an instance of transforms.frame_classification.TrainItemTransform

* Rewrite datasets.frame_classification.FramesDataset to make item_transform required

* Rewrite src/vak/train/frame_classification.py: remove params model_name, train/val_transform_params, train/val_dataset_params, and dataset_path, replace with dataset_config and just have model_config contain name

* Rewrite src/vak/train/_train.py: remove params model_name, train/val_transform_params, train/val_dataset_params, and dataset_path, replace with dataset_config and just have model_config contain name

* Rewrite vak/cli/train.py to call train._train.train with just model_config and dataset_config, remove model_name, dataset_path, train/val_transform_params and train/val_dataset_params

* Fix how we unpack batch in training_step method of FrameClassificationModel

* Change transform_kwargs parameter of transforms.defaults.parametric_umap.get_default_parametric_umap_transform to default to None, and if None to be an empty dict

* Change transform_kwargs parameter of transforms.defaults.frame_classification.get_default_frame_classification_transform to default to None, and if None to be an empty dict

* Change DatasetConfig.params attribute to default to empty dict, so we can unpack with ** operator even when no params are specified

* Fix DatasetConfig.from_config_dict method to not use dict.get method, so we don't set attributes to None inadvertently

* Modify transforms.defaults.get so that transform_kwargs is None by default. Also revise docstring and type annotations

* Rewrite src/vak/train/parametric_umap.py to use model_config and dataset_config parameters, removing parameters val/train_transform_params + val/train_dataset_params and dataset_path

* Rewrite vak/eval/frame_classification.py to use model_config and dataset_config parameters, removing parameters transform_params + dataset_params and dataset_path

* Rewrite vak/eval/parametric_umap.py to use model_config and dataset_config parameters, removing parameters transform_params + dataset_params and dataset_path

* Rewrite vak/eval/eval_.py to use model_config and dataset_config parameters, removing parameters transform_params + dataset_params and dataset_path

* Rewrite cli.eval to pass model_config and dataset_config into eval_module.eval, remove dataset_path/transform_params/datset_params arguments

* Unpack dataset_config[params] with ** inside trak/frame_classification.py, instead of directly getting window_size from the params dict

* Rewrite vak/learncurve/frame_classification.py to use model_config and dataset_config parameters, removing parameters transform_params + dataset_params and dataset_path

* Rewrite vak/learncurve/learncurve.py to use model_config and dataset_config parameters, removing parameters transform_params + dataset_params and dataset_path

* Rewrite cli.learncurve to pass model_config and dataset_config into learning_curve.learncurve, remove dataset_path/transform_params/datset_params arguments

* Rewrite vak/predict/frame_classification.py to use model_config and dataset_config parameters, removing parameters transform_params + dataset_params and dataset_path

* Rewrite vak/predict/parametric_umap.py to use model_config and dataset_config parameters, removing parameters transform_params + dataset_params and dataset_path

* Rewrite vak/predict/predict.py to use model_config and dataset_config parameters, removing parameters transform_params + dataset_params and dataset_path

* Fix dataset_path -> dataset_config[path] and add missing variable model_name in src/vak/learncurve/learncurve.py

* Fix dataset_path -> dataset_config[path] and add missing variable model_name in src/vak/learncurve/frame_classification.py

* Rewrite vak/cli/predict.py to use model_config and dataset_config parameters, removing parameters transform_params + dataset_params and dataset_path

* Remove non-existent dataset_params variable in vak/predict/frame_classification.py

* Fix unit tests for DatasetConfig to test 'params' attribute gets handled correctly

* Remove train/val_dataset_params and train/val_transform_params from test cases we parametrize with in tests/test_config/

* Use DatasetConfig.params attribute where we need to in tests/test_datasets

* Fix method name ModelConfig.to_dict -> asdict in tests/

* In tests for eval/learncurve/predict/train, use model_config and dataset_config parameters, removing parameters transform_params + dataset_params and dataset_path

* Fix use of default transform and dataset.params attribute in test_models/test_base.py

* Fix config snippets in docs

* Apply linting to src/

* Raise 'from e' with errors in eval/predict/train/frame_classification modules
  • Loading branch information
NickleDave authored May 5, 2024
1 parent e2c1a71 commit df876e3
Show file tree
Hide file tree
Showing 150 changed files with 3,455 additions and 2,383 deletions.
78 changes: 39 additions & 39 deletions doc/get_started/autoannotate.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@ Below is an example of some annotated Bengalese finch song, which is what we'll

:::{hint}
`vak` has built-in support for widely-used annotation formats.
Even if your data is not annotated with one of these formats,
you can use `vak` by converting your annotations to a simple `.csv` format
Even if your data is not annotated with one of these formats,
you can use `vak` by converting your annotations to a simple `.csv` format
that is easy to create with Python libraries like `pandas`.
For more information, please see:
For more information, please see:
{ref}`howto-user-annot`
:::

Expand All @@ -42,39 +42,39 @@ Before going through this tutorial, you'll need to:
or [notepad++](https://notepad-plus-plus.org/)
3. Download example data from this dataset: <https://figshare.com/articles/Bengalese_Finch_song_repository/4805749>

- one day of birdsong, for training data (click to download)
- one day of birdsong, for training data (click to download)
{download}`https://figshare.com/ndownloader/files/41668980`
- another day, to use to predict annotations (click to download)
{download}`https://figshare.com/ndownloader/files/41668983`
- Be sure to extract the files from these archives!
Please use the program "tar" to extract the archives,
- Be sure to extract the files from these archives!
Please use the program "tar" to extract the archives,
on either macOS/Linux or Windows.
Using other programs like WinZIP on Windows
Using other programs like WinZIP on Windows
can corrupt the files when extracting them,
causing confusing errors.
Tar should be available on newer Windows systems
(as described
(as described
[here](https://learn.microsoft.com/en-us/virtualization/community/team-blog/2017/20171219-tar-and-curl-come-to-windows)).
- Alternatively you can copy the following command and then
paste it into a terminal to run a Python script
that will download and extract the files for you.
- Alternatively you can copy the following command and then
paste it into a terminal to run a Python script
that will download and extract the files for you.

:::{eval-rst}

.. tabs::

.. code-tab:: shell macOS / Linux

curl -sSL https://raw.githubusercontent.com/vocalpy/vak/main/src/scripts/download_autoannotate_data.py | python3 -

.. code-tab:: shell Windows

(Invoke-WebRequest -Uri https://raw.githubusercontent.com/vocalpy/vak/main/src/scripts/download_autoannotate_data.py -UseBasicParsing).Content | py -
:::

4. Download the corresponding configuration files (click to download):
{download}`gy6or6_train.toml <../toml/gy6or6_train.toml>`,
{download}`gy6or6_eval.toml <../toml/gy6or6_eval.toml>`,
{download}`gy6or6_eval.toml <../toml/gy6or6_eval.toml>`,
and {download}`gy6or6_predict.toml <../toml/gy6or6_predict.toml>`

## Overview
Expand Down Expand Up @@ -181,7 +181,7 @@ Change the part of the path in capital letters to the actual location
on your computer:

```toml
[PREP]
[vak.prep]
dataset_type = "frame classification"
input_type = "spect"
# we change the next line
Expand Down Expand Up @@ -230,11 +230,11 @@ When you run `prep`, `vak` converts the data from `data_dir` into a special data
automatically adds the path to that file to the `[TRAIN]` section of the `config.toml` file, as the option
`csv_path`.

You have now prepared a dataset for training a model!
You'll probably have more questions about
how to do this later,
when you start to work with your own data.
When that time comes, please see the how-to page:
You have now prepared a dataset for training a model!
You'll probably have more questions about
how to do this later,
when you start to work with your own data.
When that time comes, please see the how-to page:
{ref}`howto-prep-annotate`.
For now, let's move on to training a neural network with this dataset.

Expand Down Expand Up @@ -294,7 +294,7 @@ from that checkpoint later when we predict annotations for new data.

(prepare-prediction-dataset)=

An important step when using neural network models is to evaluate the model's performance
An important step when using neural network models is to evaluate the model's performance
on a held-out dataset that has never been used during training, often called the "test" set.

Here we show you how to evaluate the model we just trained.
Expand Down Expand Up @@ -356,33 +356,33 @@ This file will also be found in the root `results_{timestamp}` directory.
spect_scaler = "/home/users/You/Data/vak_tutorial_data/vak_output/results_{timestamp}/SpectScaler"
```

The last path you need is actually in the TOML file that we used
The last path you need is actually in the TOML file that we used
to train the neural network: `dataset_path`.
You should copy that `dataset_path` option exactly as it is
and then paste it at the bottom of the `[EVAL]` table
You should copy that `dataset_path` option exactly as it is
and then paste it at the bottom of the `[EVAL]` table
in the configuration file for evaluation.
We do this instead of preparing another dataset,
because we already created a test split when we ran
We do this instead of preparing another dataset,
because we already created a test split when we ran
`vak prep` with the training configuration.
This is a good practice, because it helps ensure
This is a good practice, because it helps ensure
that we do not mix the training data with the test data;
`vak` makes sure that the data from the `data_dir` option
`vak` makes sure that the data from the `data_dir` option
is placed in two separate splits, the train and test splits.

Once you have prepared the configuration file as described,
Once you have prepared the configuration file as described,
you can run the following in the terminal:

```shell
vak eval gy6o6_eval.toml
```

You will see output to the console as the network is evaluated.
Notice that for this model we evaluate it *with* and *without*
post-processing transforms that clean up the predictions
You will see output to the console as the network is evaluated.
Notice that for this model we evaluate it *with* and *without*
post-processing transforms that clean up the predictions
of the model.
The parameters of the post-processing transform are specified
The parameters of the post-processing transform are specified
with the `post_tfm_kwargs` option in the configuration file.
You may find this helpful to understand factors affecting
You may find this helpful to understand factors affecting
the performance of your own model.

## 4. Preparing a prediction dataset
Expand All @@ -400,7 +400,7 @@ Just like before, you're going to modify the `data_dir` option of the
This time you'll change it to the path to the directory with the other day of data we downloaded.

```toml
[PREP]
[vak.prep]
data_dir = "/home/users/You/Data/vak_tutorial_data/032312"
```

Expand Down Expand Up @@ -428,7 +428,7 @@ and then add the path to that file as the option `csv_path` in the `[PREDICT]` s
Finally you will use the trained network to predict annotations.
This is the part that requires you to find paths to files saved by `vak`.

There's three you need. These are the exact same paths we used above
There's three you need. These are the exact same paths we used above
in the configuration file for evaluation, so you can copy them from that file.
We explain them again here for completeness.
All three paths will be in the `results` directory
Expand Down
22 changes: 7 additions & 15 deletions doc/reference/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ for each class.
## Valid section names

Following is the set of valid section names:
`{PREP, SPECT_PARAMS, DATALOADER, TRAIN, PREDICT, LEARNCURVE}`.
`{eval, learncurve, predict, prep, train}`.
In the code, these names correspond to attributes
of the main `Config` class, as shown below.

Expand All @@ -43,50 +43,42 @@ that are considered valid.
Valid options for each section are presented below.

(ref-config-prep)=
### `[PREP]` section
### `[vak.prep]` section

```{eval-rst}
.. autoclass:: vak.config.prep.PrepConfig
```

(ref-config-spect-params)=
### `[SPECT_PARAMS]` section
### `[vak.prep.spect_params]` section

```{eval-rst}
.. autoclass:: vak.config.spect_params.SpectParamsConfig
```

(ref-config-dataloader)=
### `[DATALOADER]` section

```{eval-rst}
.. autoclass:: vak.config.dataloader.DataLoaderConfig
```

(ref-config-train)=
### `[TRAIN]` section
### `[vak.train]` section

```{eval-rst}
.. autoclass:: vak.config.train.TrainConfig
```

(ref-config-eval)=
### `[EVAL]` section
### `[vak.eval]` section

```{eval-rst}
.. autoclass:: vak.config.eval.EvalConfig
```

(ref-config-predict)=
### `[PREDICT]` section
### `[vak.predict]` section

```{eval-rst}
.. autoclass:: vak.config.predict.PredictConfig
```

(ref-config-learncurve)=
### `[LEARNCURVE]` section
### `[vak.learncurve]` section

```{eval-rst}
.. autoclass:: vak.config.learncurve.LearncurveConfig
Expand Down
22 changes: 10 additions & 12 deletions doc/toml/gy6or6_eval.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[PREP]
[vak.prep]
# dataset_type: corresponds to the model family such as "frame classification" or "parametric umap"
dataset_type = "frame classification"
# input_type: input to model, either audio ("audio") or spectrogram ("spect")
Expand All @@ -19,16 +19,15 @@ train_dur = 50
val_dur = 15

# SPECT_PARAMS: parameters for computing spectrograms
[SPECT_PARAMS]
[vak.prep.spect_params]
# fft_size: size of window used for Fast Fourier Transform, in number of samples
fft_size = 512
# step_size: size of step to take when computing spectra with FFT for spectrogram
# also known as hop size
step_size = 64

# EVAL: options for evaluating a trained model. This is done using the "test" split.
[EVAL]
model = "TweetyNet"
[vak.eval]
# checkpoint_path: path to saved model checkpoint
checkpoint_path = "/PATH/TO/FOLDER/results/train/RESULTS_TIMESTAMP/TweetyNet/checkpoints/max-val-acc-checkpoint.pt"
# labelmap_path: path to file that maps from outputs of model (integers) to text labels in annotations;
Expand All @@ -51,7 +50,7 @@ output_dir = "/PATH/TO/FOLDER/results/eval"
# ADD THE dataset_path OPTION FROM THE TRAIN FILE HERE (we already created a test split when we ran `vak prep` with that config)

# EVAL.post_tfm_kwargs: options for post-processing
[EVAL.post_tfm_kwargs]
[vak.eval.post_tfm_kwargs]
# both these transforms require that there is an "unlabeled" label,
# and they will only be applied to segments that are bordered on both sides
# by the "unlabeled" label.
Expand All @@ -65,12 +64,11 @@ majority_vote = true
# Only applied if this option is specified.
min_segment_dur = 0.02

# transform_params: parameters used when transforming data
# for a frame classification model, we use FrameDataset with the eval_item_transform,
# that reshapes batches into consecutive adjacent windows with a specific `window_size`
[EVAL.transform_params]
# dataset.params = parameters used for datasets
# for a frame classification model, we use dataset classes with a specific `window_size`
[vak.eval.dataset.params]
window_size = 176

# Note we do not specify any options for the network, and just use the defaults
# We need to put this "dummy" table here though for the config to parse correctly
[TweetyNet]
# Note we do not specify any options for the model, and just use the defaults
# We need to put this table here though so we know which model we are using
[vak.eval.model.TweetyNet]
19 changes: 8 additions & 11 deletions doc/toml/gy6or6_predict.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# PREP: options for preparing dataset
[PREP]
[vak.prep]
# dataset_type: corresponds to the model family such as "frame classification" or "parametric umap"
dataset_type = "frame classification"
# input_type: input to model, either audio ("audio") or spectrogram ("spect")
Expand All @@ -15,17 +15,15 @@ audio_format = "wav"
# all data found in `data_dir` will be assigned to a "predict split" instead

# SPECT_PARAMS: parameters for computing spectrograms
[SPECT_PARAMS]
[vak.prep.spect_params]
# fft_size: size of window used for Fast Fourier Transform, in number of samples
fft_size = 512
# step_size: size of step to take when computing spectra with FFT for spectrogram
# also known as hop size
step_size = 64

# PREDICT: options for generating predictions with a trained model
[PREDICT]
# model: the string name of the model. must be a name within `vak.models` or added e.g. with `vak.model.decorators.model`
model = "TweetyNet"
[vak.predict]
# checkpoint_path: path to saved model checkpoint
checkpoint_path = "/PATH/TO/FOLDER/results/train/RESULTS_TIMESTAMP/TweetyNet/checkpoints/max-val-acc-checkpoint.pt"
# labelmap_path: path to file that maps from outputs of model (integers) to text labels in annotations;
Expand Down Expand Up @@ -61,12 +59,11 @@ majority_vote = true
min_segment_dur = 0.01
# dataset_path : path to dataset created by prep. This will be added when you run `vak prep`, you don't have to add it

# transform_params: parameters used when transforming data
# for a frame classification model, we use FrameDataset with the eval_item_transform,
# that reshapes batches into consecutive adjacent windows with a specific `window_size`
[PREDICT.transform_params]
# dataset.params = parameters used for datasets
# for a frame classification model, we use dataset classes with a specific `window_size`
[vak.predict.dataset.params]
window_size = 176

# Note we do not specify any options for the network, and just use the defaults
# We need to put this "dummy" table here though for the config to parse correctly
[TweetyNet]
# We need to put this table here though, to indicate which model we are using.
[vak.predict.model.TweetyNet]
29 changes: 12 additions & 17 deletions doc/toml/gy6or6_train.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# PREP: options for preparing dataset
[PREP]
[vak.prep]
# dataset_type: corresponds to the model family such as "frame classification" or "parametric umap"
dataset_type = "frame classification"
# input_type: input to model, either audio ("audio") or spectrogram ("spect")
Expand All @@ -22,17 +22,15 @@ val_dur = 15
test_dur = 30

# SPECT_PARAMS: parameters for computing spectrograms
[SPECT_PARAMS]
[vak.prep.spect_params]
# fft_size: size of window used for Fast Fourier Transform, in number of samples
fft_size = 512
# step_size: size of step to take when computing spectra with FFT for spectrogram
# also known as hop size
step_size = 64

# TRAIN: options for training model
[TRAIN]
# model: the string name of the model. must be a name within `vak.models` or added e.g. with `vak.model.decorators.model`
model = "TweetyNet"
[vak.train]
# root_results_dir: directory where results should be saved, as a sub-directory within `root_results_dir`
root_results_dir = "/PATH/TO/FOLDER/results/train"
# batch_size: number of samples from dataset per batch fed into network
Expand All @@ -58,23 +56,20 @@ num_workers = 4
device = "cuda"
# dataset_path : path to dataset created by prep. This will be added when you run `vak prep`, you don't have to add it

# train_dataset_params: parameters used when loading training dataset
# for a frame classification model, we use a WindowDataset with a specific `window_size`
[TRAIN.train_dataset_params]
# dataset.params = parameters used for datasets
# for a frame classification model, we use dataset classes with a specific `window_size`
[vak.train.dataset.params]
window_size = 176

# val_transform_params: parameters used when transforming validation data
# for a frame classification model, we use FrameDataset with the eval_item_transform,
# that reshapes batches into consecutive adjacent windows with a specific `window_size`
[TRAIN.val_transform_params]
window_size = 176

# TweetyNet.optimizer: we specify options for the model's optimizer in this table
[TweetyNet.optimizer]
# To indicate the model to train, we use a "dotted key" with `model` followed by the string name of the model.
# This name must be a name within `vak.models` or added e.g. with `vak.model.decorators.model`
# We use another dotted key to indicate options for configuring the model, e.g. `TweetyNet.optimizer`
[vak.train.model.TweetyNet.optimizer]
# vak.train.model.TweetyNet.optimizer: we specify options for the model's optimizer in this table
# lr: the learning rate
lr = 0.001

# TweetyNet.network: we specify options for the model's network in this table
[TweetyNet.network]
[vak.train.model.TweetyNet.network]
# hidden_size: the number of elements in the hidden state in the recurrent layer of the network
hidden_size = 256
Loading

0 comments on commit df876e3

Please sign in to comment.