Training sharing transformer layer #7513

fcggamou · 2021-03-20T17:38:17Z

fcggamou
Mar 20, 2021

What is the recommended approach to train multiple pipeline components that would share the same transformer layer?
Should I freeze the transformer layer after training one component and then train the rest of them?

Thanks for any feedback

svlandeg · 2021-03-24T14:47:17Z

svlandeg
Mar 24, 2021
Maintainer

If you have data annotated with multiple tasks, there is no reason why you can't train the transformer simultaneously on all these tasks. You can even set the grad_factor to different values to weigh some components more than others (cf https://spacy.io/api/architectures#TransformerListener).

If you're training in multiple stages, then yes one solution is to first train a transformer+component, then freeze and train a second component on top of the frozen transformer. It's difficult to say upfront what the effect on accuracy would be though.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training sharing transformer layer #7513

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Training sharing transformer layer #7513

fcggamou Mar 20, 2021

Replies: 1 comment

svlandeg Mar 24, 2021 Maintainer

fcggamou
Mar 20, 2021

svlandeg
Mar 24, 2021
Maintainer