CS-162: AI-Generated Text Detection with SVM and LoRA-Finetuned RoBERTa

This project detects AI-generated text using both a traditional Support Vector Machine (SVM) baseline and a transformer-based RoBERTa model fine-tuned with Low-Rank Adaptation (LoRA). Models are trained and evaluated on public datasets with both human and AI-generated content.

Features

SVM baseline (linear kernel, statistical features)
LoRA-finetuned RoBERTa transformer model
Training and evaluation on HC3, Reddit, and other public corpora
Comprehensive metrics: accuracy, precision, recall, F1, ROC-AUC
Error analysis and cross-domain robustness

Datasets

HC3 (Human ChatGPT Comparison Corpus): English Q&A pairs from ELI5, finance, medicine, open-domain, and CS/AI.
ahmadreza13 Human-vs-AI Dataset: 3.6M samples.
Evaluation sets: arxiv_chatgpt, arxiv_cohere, reddit_chatgpt, reddit_cohere, german_wikipedia, toefl, hewlett

Modeling Approach

SVM: linear kernel, feature-based baseline
RoBERTa + LoRA: encoder-only transformer with LoRA adapters

Test the Model

To run the model, clone the git repository into a GCP VM and open up the terminal in the folder.

Then, run the following:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

This will set up the environment.

Then, to evaluate a specific .json or .jsonl file, run the following:

python grade_model.py --input [name of file to test]

The output should display the precision, recall, f1-score, and accuracy of the model on the evaluation dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
LoRA_model		LoRA_model
Paper		Paper
.gitignore		.gitignore
README.md		README.md
dev_set_eval.py		dev_set_eval.py
ethics_set_eval.py		ethics_set_eval.py
grade_model.py		grade_model.py
load_training_data.py		load_training_data.py
requirements.txt		requirements.txt
train_SVM.ipynb		train_SVM.ipynb
train_roberta_lora.py		train_roberta_lora.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CS-162: AI-Generated Text Detection with SVM and LoRA-Finetuned RoBERTa

Features

Datasets

Modeling Approach

Test the Model

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

my1100/CS-162

Folders and files

Latest commit

History

Repository files navigation

CS-162: AI-Generated Text Detection with SVM and LoRA-Finetuned RoBERTa

Features

Datasets

Modeling Approach

Test the Model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages