Skip to content

my1100/CS-162

Repository files navigation

CS-162: AI-Generated Text Detection with SVM and LoRA-Finetuned RoBERTa

This project detects AI-generated text using both a traditional Support Vector Machine (SVM) baseline and a transformer-based RoBERTa model fine-tuned with Low-Rank Adaptation (LoRA). Models are trained and evaluated on public datasets with both human and AI-generated content.


Features

  • SVM baseline (linear kernel, statistical features)
  • LoRA-finetuned RoBERTa transformer model
  • Training and evaluation on HC3, Reddit, and other public corpora
  • Comprehensive metrics: accuracy, precision, recall, F1, ROC-AUC
  • Error analysis and cross-domain robustness

Datasets


Modeling Approach

  • SVM: linear kernel, feature-based baseline
  • RoBERTa + LoRA: encoder-only transformer with LoRA adapters

Test the Model

To run the model, clone the git repository into a GCP VM and open up the terminal in the folder.

Then, run the following:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

This will set up the environment.

Then, to evaluate a specific .json or .jsonl file, run the following:

python grade_model.py --input [name of file to test] 

The output should display the precision, recall, f1-score, and accuracy of the model on the evaluation dataset.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •