How we optimize hyper parameter

Overview

Features: epoch , learning rate, batch size
Method: Grid Search
Evaluate : Compare Avg Train loss & Avg Val loss
Result : Prevent Overfitting, Underfitting

Method

경우의 수 세팅 근거: GPU 6GB 제한된 실험환경 + T5 base 파인튜닝 시의 general option 고려
Problem: batch size >16 일때 memory error 발생 > Solution: Batch size 를 줄이고 gradient_accumulation_step 설정: batch 큰것과 동일한 효과 발생
사진
implement : itertools library 활용

Evaluate

Compare Avg Train loss & Avg Val loss
logic:

epoch 별로 발생하는 avg train loss , avg val loss 를 계산

Training logic code explanation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How we optimize hyper parameter

Overview

Method

Evaluate

Training logic code explanation

Uh oh!

Uh oh!

Clone this wiki locally