Generating “STD‑Only” Synthetic Data — Best Practice? #1596
Unanswered
Aryansoxyboiiii
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Synthea team,
I’m using Synthea in a university project to build a probabilistic AI that predicts sexually‑transmitted infections (STIs). To keep the dataset focused (and small), I’d like to generate only the records relevant to STDs and strip everything else (claims, payers, unrelated chronic‑disease modules, etc.).
What I’m hoping to do
Generate patients who either
Export only diagnostic/clinical data
patients.csv
,conditions.csv
,observations.csv
,encounters.csv
,procedures.csv
,medications.csv
Avoid manual surgery on the modules folder if there’s a built‑in whitelist feature.
Any guidance (or “don’t do that, do this instead!”) would be hugely appreciated.
Thanks for maintaining such a fantastic open‑source project!
Best regards,
Aryan
1st‑year student — IIoT & AI Enthusiast
Beta Was this translation helpful? Give feedback.
All reactions