This is the starter repository for Codebasics's Resume Project Challenge 2.0.
This project focuses on building an AI-powered pipeline to detect Adverse Drug Events (ADEs) from symptom text, group them into symptom-based and age-specific clusters, and classify each event by severity.
Please fork this repository to get started.
Contestants will use the VAERS dataset provided by the U.S. Vaccine Adverse Event Reporting System.
- 
Visit the official VAERS Data page:
π https://vaers.hhs.gov/data/datasets.html - 
Scroll to the table listing data by year.
 - 
Download the ZIP file for your target year(s) from the "Zip File" column.
- Example: For 2025, click the link in the Zip File column (e.g., 
4.95 MB). - The ZIP will contain three CSV files:
VAERSDATA.csvβ Main case and patient dataVAERSSYMPTOMS.csvβ Coded adverse event terms using the MedDRA (Medical Dictionary for Regulatory Activities) terminology.- Each report can have up to five coded symptoms (
SYMPTOM1βSYMPTOM5), representing standardized MedDRA Preferred Terms. 
- Each report can have up to five coded symptoms (
 VAERSVAX.csvβ Vaccine/product details
 
 - Example: For 2025, click the link in the Zip File column (e.g., 
 - 
Extract the ZIP files for all target years, and move all three CSV files from each ZIP into the
data/rawfolder of this repository. 
Before starting annotation or model training, review the Annotation Guidelines in the docs/ folder.
They explain in detail:
- ADE annotation rules β how to identify Adverse Drug Events in text, including what to include and what to skip.
 - DRUG annotation rules β how to label vaccine or drug mentions exactly as reported, handle brand names, code names, and generic terms.
 - Special cases β rules for compound symptoms, repeated mentions, death/hospitalization references, and COVID-19 mentions.
 - Span formatting β keeping longest medically accurate terms, excluding durations, and labeling each occurrence separately.
 - Quick checklist β a step-by-step reminder to ensure annotations are consistent and compliant.
 
π Tip: Following these rules strictly ensures the labels are high quality and consistent, which is critical for training the NER model effectively.
Visit the challenge page to learn more: DS RPC-2.0