This project uses Synthea™ to generate realistic synthetic patient data for medical notes.
from mednotegen.generator import NoteGenerator
gen = NoteGenerator.from_config("config.yaml")
gen.generate_notes(10, "output_dir")
# Or specify Synthea CSV directory directly:
gen = NoteGenerator(synthea_csv_dir="/path/to/synthea/output/csv")
gen.generate_notes(10, "output_dir")You can specify the Synthea CSV directory directly in your config file. Add the following line to your config.yaml:
Example config.yaml:
count: 10
output_dir: output_dir
synthea_csv_dir: /path/to/synthea/output/csvThen generate notes using:
from mednotegen.generator import NoteGenerator
gen = NoteGenerator.from_config("config.yaml")
gen.generate_notes(10, "output_dir")This project requires Synthea™, an open-source synthetic patient generator, as an external dependency. You must clone and build Synthea yourself before using mednotegen.
To set up Synthea:
- Clone Synthea
git clone https://github.com/synthetichealth/synthea.git
- Build the Synthea JAR
Ensure
cd synthea ./gradlew build check test cp build/libs/synthea-with-dependencies.jar . cd ..
synthea-with-dependencies.jaris in thesynthea/directory at the root of your project.
You can customize patient generation and report output using a config.yaml file. Example options:
count: 10 # Number of reports to generate
output_dir: output_dir # Output directory for PDFs
use_llm: false # Use LLM for report generation
synthea_csv_dir: /path/to/synthea/output/csv # Path to Synthea-generated CSV files
seed: 1234 # Random seed for reproducibility
reference_date: "20250628" # Reference date for data generation (YYYYMMDD)
clinician_seed: 5678 # Optional: separate seed for clinician assignment
gender: female # male, female, or any
min_age: 30 # Minimum patient age
max_age: 60 # Maximum patient age
state: New York # Synthea state parameter
modules:
- cardiovascular-disease
- diabetes
- hypertension
- asthma
local_config: custom_synthea.properties # Custom Synthea config file
local_modules: ./synthea_modules # Directory for custom modules- count: Number of reports to generate
- output_dir: Directory to save generated PDFs
- use_llm: If true, uses OpenAI LLM for report text
- seed: Random seed for reproducibility
- reference_date: Reference date for age calculations (YYYYMMDD)
- clinician_seed: Optional, separate seed for clinician assignment
- gender: Gender filter for patients (
male,female, orany) - min_age, max_age: Age range for patients
- state: US state for Synthea simulation
- modules: Synthea disease modules to enable
- local_config: Path to a custom Synthea config file
- local_modules: Directory for custom Synthea modules
For an up-to-date and complete list of available modules, see the official Synthea modules directory.
If you see errors about missing patients.csv, medications.csv, or conditions.csv, make sure you have generated Synthea data and that the path you provide (via synthea_csv_dir, CLI, or config) points to the correct directory containing those files.
If you installed mednotegen via pip, the default location is inside the package directory. For custom or system-wide Synthea runs, always specify the output CSV directory explicitly.
- No CSV files generated:
- Make sure you edited the correct
synthea.propertiesand used the-cflag when running Synthea. - Ensure
exporter.csv.export = trueis set and not overridden elsewhere in the file.
- Make sure you edited the correct
- FileNotFoundError for CSVs:
- Confirm the CSV files exist in the path specified by
synthea_csv_diror in the expected package location.
- Confirm the CSV files exist in the path specified by
- ValueError: No patients found matching the specified filters:
- Check your age/gender filters in
config.yaml. Try relaxing them if you have too few patients.
- Check your age/gender filters in
Edit src/main/resources/synthea.properties in your Synthea directory:
exporter.csv.export = true
(Ensure any exporter.csv.export = false lines are removed or commented out.)
From your Synthea directory, clean any old output and generate new data:
rm -rf output/
java -jar synthea-with-dependencies.jar -c src/main/resources/synthea.properties -p 1000
- The
-p 1000flag generates 1000 patients. - After running, check for CSV files in
output/csv/.
See README_SYNTHEA_NOTICE.md and LICENSE-APACHE-2.0 for license and attribution requirements.