Components in a pipeline sometimes reach optimal validation score at very different times #10529

einarbmag · 2022-03-21T11:14:24Z

einarbmag
Mar 21, 2022

Let's say I have both a classifier and NER in my pipeline, and I haven't found that multi-task learning is improving performance so I keep the components independent with their own embedding layer. In fact, it seems the optimal validation score for the two components happens at significantly different times in the training process. Selecting the best overall model by picking the checkpoint where the total score is the best, yields a model with components that may be under/overtrained.

What to do in this situation? I know I can separate into two model configs, one for each component, run training separately and then merge in code later. I'd really like to avoid that if possible. Looking at the training script though, I don't see a way for the components to be persisted independently. Another idea is to set different learning rates for the components (if possible), but that would be very fiddly. Any thoughts? Maybe I just have to write a script that does the training for each component separately and merges automatically at the end?

adrianeboyd · 2022-03-21T11:41:21Z

adrianeboyd
Mar 21, 2022

The best solution is to train the components separately. You can write a short third config that assembles the components with source and use spacy assemble to generate the final pipeline. An example: https://github.com/explosion/projects/tree/v3/benchmarks/ud_benchmark/configs

We do this for the provided trained pipelines (tok2vec/syntax, senter, ner parts all all trained separately) and because there are a few details that spacy assemble can't handle generally, we use a script to assemble the final model. It looks like this: #3056 (reply in thread)

2 replies

einarbmag Mar 21, 2022
Author

Thanks @adrianeboyd , that's very helpful. In the ud_benchmarks project example, why does the assemble config have corpora and training sections? There isn't any additional training happening in the assemble step, is there?

adrianeboyd Mar 21, 2022

No, those settings aren't used by spacy assemble. I think they're just the transformer defaults in case someone decides to use this same config for training in the future. If you don't provide it and let spacy init fill-config fill them in, you'll get the tok2vec-based defaults, which can cause problems if someone tries to train because the optimizer and a few other details aren't suitable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Components in a pipeline sometimes reach optimal validation score at very different times #10529

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Components in a pipeline sometimes reach optimal validation score at very different times #10529

einarbmag Mar 21, 2022

Replies: 1 comment · 2 replies

adrianeboyd Mar 21, 2022

einarbmag Mar 21, 2022 Author

adrianeboyd Mar 21, 2022

einarbmag
Mar 21, 2022

Replies: 1 comment 2 replies

adrianeboyd
Mar 21, 2022

einarbmag Mar 21, 2022
Author