Implement Pytorch Metrics #158

bonham79 · 2024-02-06T16:49:23Z

TorchMetrics support is pretty reliable nowadays and makes distributed training less annoying (no more World sizes, yay!). It also syncs well with Wandb logging and allows monitoring of training batch performance. Any complaints about me migrating validation logging to this?

kylebgorman · 2024-02-06T17:02:01Z

I don't know anything about it but it sounds like a positive.

bonham79 · 2024-02-06T17:10:30Z

It's just a suite of metrics that torch sets up to manage multi-gpus under the hood. You just pass it your tensors during the training loop and it will store submetrics. Then when you need the actual metric you call and it does the calculation and memory collection under the hood. It saves you from desync issues if you have more than GPU for training. Also PTL supports it so it helps reduce boilerplate for other metrics.

Adamits · 2024-02-06T17:38:12Z

I think I tried to do this last year, and had some issues getting the features I actually wanted from it to track, so I gave up, but I think it was in beta or something then. I'd be happy if you got it working :)

bonham79 · 2024-02-06T17:40:57Z

I just got it running for BLEU scores on another project, so CER/WER should be stable by now.

kylebgorman · 2024-12-01T23:43:18Z

I piloted this a bit. We can do our form of accuracy using torchmetrics.classification.MulticlassExactMatch, but they don't have an equivalent of our SER, just edit distance limited to strings. (So we'd have to convert back to strings, add all the necessary bookkeeping of letting the models see the indexes, and we wouldn't really be doing "SER" anymore either.) I filed a feature request but probably the best solution is to just use the same API they do to implement SER rather than submitting it to their library.

bonham79 · 2024-12-07T22:47:30Z

Yeah, they have a bias towards strings as final output, it's a bit annoying since you need to do metric calculations on CPU at terminus. It's somewhat painless to implement new metrics so long as the distributed training is managed properly. I support implementing it on our side and then once it's robust enough we can push to library if they ever allow more robust metric balancing.

Closes CUNY-CL#158. * Loss is computed as before, but streamlined somewhat. * `torchmetrics`' implementation of exact match accuracy is lightly adapted. This does everything in tensor-land and should keep things on devices. My tests confirm that accuracy is EXACTLY what it was before. * A `torchmetrics`-compatible implementation of symbol error rate (here defined as the edit distance divided by sum of target lengths) is inserted here. This is heavily documented and it is compatible with our existing implementation. The hot inner loop is still on CPU, but as mentioned in the documentation, this is probably the best option and I don't observe any obvious performance penalty when enabling this. * We do away with the `evaluation` module altogether. Rather we treat the metrics objects as nullables living in the base class, a design adapted from UDTube. The CLI interface is unimpacted, and my side-by-side shows the metrics are exactly the same as before this change.

bonham79 self-assigned this Feb 6, 2024

kylebgorman added the enhancement New feature or request label Feb 6, 2024

kylebgorman linked a pull request Dec 8, 2024 that will close this issue

Uses torchmetrics for metric computation #284

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Pytorch Metrics #158

Implement Pytorch Metrics #158

bonham79 commented Feb 6, 2024

kylebgorman commented Feb 6, 2024

bonham79 commented Feb 6, 2024

Adamits commented Feb 6, 2024

bonham79 commented Feb 6, 2024

kylebgorman commented Dec 1, 2024 •

edited

Loading

bonham79 commented Dec 7, 2024 •

edited

Loading

Implement Pytorch Metrics #158

Implement Pytorch Metrics #158

Comments

bonham79 commented Feb 6, 2024

kylebgorman commented Feb 6, 2024

bonham79 commented Feb 6, 2024

Adamits commented Feb 6, 2024

bonham79 commented Feb 6, 2024

kylebgorman commented Dec 1, 2024 • edited Loading

bonham79 commented Dec 7, 2024 • edited Loading

kylebgorman commented Dec 1, 2024 •

edited

Loading

bonham79 commented Dec 7, 2024 •

edited

Loading