Authors:
Rohan Khera MD MS, Aline F Pedroso PhD, Vipina K Keloth PhD, Hua Xu PhD, Gisele S Silva MD PhD, Lee H Schwamm MD
Links:
- [Manuscript – TBD]
- CarDS Lab Website
This repository contains organized Python scripts and supporting materials used in the analysis of a manuscript that sought to characterize distinguishing linguistic features in differentiating AI-generated from human-authored scientific text and evaluate the performance of AI-detection tools for this task.
Contains scripts for processing and annotating the main dataset, including:
- Parsing feature values from rater annotations
- Cleaning and transforming data for modeling
- Creating categorical variables for AI and human impact
Contains scripts for:
- Calculating best essay ratios
- Generating summary tables
- Statistical comparisons of text features by reviewer ratings
Includes code for visualizing:
- Distributions of AI/Human impact annotations
- Essay classification metrics such as GPTZero prediction and subjectivity
Includes code for:
- Merging individual essay PDF files into a combined supplemental file
- Adding labels to essays and formatting the output for manuscript submission
- Ensure all required Excel or CSV input files are present in the correct paths as expected by the scripts.
- Run each script in sequence to process data, compute tables/figures, and compile supplemental PDFs.
- For figures, additional visualization libraries such as
matplotliborseabornmay be required.
Contact: Rohan Khera - [email protected]
Version: June 2025
"""