diff --git a/report/sections/methodology.tex b/report/sections/methodology.tex index ccd475a..ee1aadd 100644 --- a/report/sections/methodology.tex +++ b/report/sections/methodology.tex @@ -27,6 +27,7 @@ \subsection*{Phase 2: Transferring Knowledge via Finetuning} % Training Training is performed on the \texttt{curlie-gpt3.5-10k} and \texttt{curlie-gpt4-10k} dataset for a maximum of 100 epochs. We use a 30\% held-out validation split from the \texttt{crowdsourced} dataset to monitor the validation F1 score and stop training if no improvement is observed for 10 epochs. This is to prevent overfitting the LLM labels. We perform hyperparameter grid search to Bayesian TPE sampler from Optuna~\cite{optuna} for $\eta=100$ trials and $\tau=10$ startup trials to effectively search the hyperparameter space. The hyperparameter values are detailed in Table~\ref{tab:hyperparameters}. The model which performs best on macro F1 in the validation split is chosen for the evaluation. +The training loss, defined as the average binary cross-entropy over 14 classes, includes a reweighting factor to address class imbalance, based on the negative-to-positive sample ratio. \input{tables/hyperparameters.tex} diff --git a/report/sections/summary.tex b/report/sections/summary.tex index c82d7b2..661ded6 100644 --- a/report/sections/summary.tex +++ b/report/sections/summary.tex @@ -1,3 +1,4 @@ \section{Summary}\label{sec:summary} -We have demonstrated that LLMs can provide cost-effective, and high-quality annotations in the settign of multilingual, multilabel website topic classification. Our approach, which involved finetuning a pre-trained Homepage2vec model on LLM-generated labels, resulted in a improvement of 4.3 percentage points in the macro F1 score. Additionally, we are releasing the \texttt{curlie-gpt3.5-10k} and \texttt{curlie-gpt4-10k} datasets \cite{curlie-gpt-10k} with the intention of supporting further research in the open-source community. \ No newline at end of file +We have demonstrated that LLMs can provide cost-effective, and high-quality annotations in the settign of multilingual, multilabel website topic classification. Our approach, which involved finetuning a pre-trained Homepage2vec model on LLM-generated labels, resulted in a improvement of 4.3 percentage points in the macro F1 score. +Additionally, the \texttt{curlie-gpt3.5-10k} and \texttt{curlie-gpt4-10k} datasets \cite{curlie-gpt-10k} are being released to aid open-source research. \ No newline at end of file