Skip to content

CarDS-Yale/stroke_essay_llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

stroke_essay_llm

Authors:
Rohan Khera MD MS, Aline F Pedroso PhD, Vipina K Keloth PhD, Hua Xu PhD, Gisele S Silva MD PhD, Lee H Schwamm MD

Links:

Repository Overview

This repository contains organized Python scripts and supporting materials used in the analysis of a manuscript that sought to characterize distinguishing linguistic features in differentiating AI-generated from human-authored scientific text and evaluate the performance of AI-detection tools for this task.

Repository Structure

1. 1_cohort_creation.py

Contains scripts for processing and annotating the main dataset, including:

  • Parsing feature values from rater annotations
  • Cleaning and transforming data for modeling
  • Creating categorical variables for AI and human impact

2. 2_tables.py

Contains scripts for:

  • Calculating best essay ratios
  • Generating summary tables
  • Statistical comparisons of text features by reviewer ratings

3. 3_figures.py

Includes code for visualizing:

  • Distributions of AI/Human impact annotations
  • Essay classification metrics such as GPTZero prediction and subjectivity

4. 4_supplement_material.py

Includes code for:

  • Merging individual essay PDF files into a combined supplemental file
  • Adding labels to essays and formatting the output for manuscript submission

How to Use

  1. Ensure all required Excel or CSV input files are present in the correct paths as expected by the scripts.
  2. Run each script in sequence to process data, compute tables/figures, and compile supplemental PDFs.
  3. For figures, additional visualization libraries such as matplotlib or seaborn may be required.

Contact: Rohan Khera - [email protected]

Version: June 2025
"""

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages