This repository contains the implementation of GenSynthPop for the South-West district of The Hague, The Netherlands.
The synthetic population is generated for 14 neighborhoods and contains 10 individual-level attributes and 5 household-level attributes. The attributes are shown in the first column in the two tables in the Evaluation section below.
The synthetic population was generated as a case study for GenSynthPop, which was reported on in the paper GenSynthPop: Generating a Spatially Explicit Synthetic Population of Agents and Households from Aggregated Data1. For a full report on the evaluation, please refer to that paper.
The fourteen neighborhoods are located in the South-West District of The Hague ('s Gravenhage), The Netherlands:
Neighborhood Code (CBS) | Neighborhood Name |
---|---|
BU05181785 | Kerketuinen en Zichtenburg |
BU05183284 | Leyenburg |
BU05183387 | Venen, Oorden en Raden |
BU05183396 | Zijden, Steden en Zichten |
BU05183398 | Dreven en Gaarden |
BU05183399 | De Uithof |
BU05183480 | Morgenstond-Zuid |
BU05183488 | Morgenstond-West |
BU05183489 | Morgenstond-Oost |
BU05183536 | Zuiderpark |
BU05183620 | Moerwijk-Oost |
BU05183637 | Moerwijk-West |
BU05183638 | Moerwijk-Noord |
BU05183639 | Moerwijk-Zuid |
The following map shows where in The Netherlands this synthetic population is located, and what the region looks like:
All data sets have been obtained through the Open Data portal of the Statistics Netherlands (Centraal Bureau voor de Statistiek or CBS in Dutch).
All data sources have been made available by CBS under the CC BY 4.0 license, as stated here.
marginal
- marginal_distributions_84583NED.csv. Source: Kerncijfers wijken en buurten 2019 (CBS StatLine):
- Regionale_kerncijfers_Nederland_19052024_185018.csv. Source: Regionale kerncijfers Nederland ( CBS StatLine)
individual
gender
- gender_age-03759NED-formatted.csv. Source: Bevolking op 1 januari en gemiddeld; geslacht, leeftijd en regio (CBS StatLine)
integer_age
- Leeftijdsopbouw Nederland 2019.csv. Source: Bevolkingspiramide (CBS)
migration_background
education
- Primair_onderwijs__schoolregio_18012024_150722.csv. Source: (Speciaal) basisonderwijs en speciale scholen; leerlingen, schoolregio (CBS StatLine)
- Studenten__woonregio_2000_2022_18012024_150722.csv. Source: Leerlingen en studenten; onderwijssoort, woonregio 2000/'01-2022/' 23 (CBS StatLine)
- Bevolking__onderwijsniveau_en_herkomst_26012024_115001.csv. Source: Bevolking; hoogst behaald onderwijsniveau en herkomst (CBS StatLine)
drivers_license
household_position
household
household_composition
- table_7ab235bf-b5a7-4077-bf56-3f5c8efec7d0.csv. Source: Groom usually older than bride (CBS)
- Marriages__key_figures_25052024_182843.csv. Source: Marriages and partnership registrations; key figures (CBS StatLine)
- Geboorte__kerncijfers_per_regio_25052024_182014.csv. Source: Geboorte; kerncijfers vruchtbaarheid, leeftijd moeder, regio (CBS StatLine)
postal_code
household_income
vehicle_ownership
Z2 | X2 | absolute error | ||||||
---|---|---|---|---|---|---|---|---|
DoF | Score | p-value | Score | p-value | total | standardized | ||
age group | neighborhood | 70 | 0.160028 | 1.000000 | 0.016845 | 1.000000 | 20.000000 | 0.000236 |
gender | neighborhood | 28 | 0.168613 | 1.000000 | 0.185536 | 1.000000 | 26.000000 | 0.000306 |
age group | 10 | 0.612648 | 0.999983 | 0.579719 | 0.999987 | 150.559563 | 0.001773 | |
integer age | 106 | 2.144096 | 1.000000 | 3.601348 | 1.000000 | 124.606355 | 0.001468 | |
gender | 212 | 26.086832 | 1.000000 | 31.153039 | 1.000000 | 1183.331065 | 0.013938 | |
migration background | neighborhood | 42 | 0.191206 | 1.000000 | 0.500781 | 1.000000 | 62.000000 | 0.000730 |
age group | 60 | 26.784031 | 0.999936 | 27.521184 | 0.999898 | 1175.066312 | 0.013844 | |
gender | 6 | 1.421525 | 0.964536 | 1.140259 | 0.979733 | 296.952058 | 0.003498 | |
age group × gender | 120 | 27.329810 | 1.000000 | 30.074286 | 1.000000 | 1195.071783 | 0.014080 | |
absolved education | 9 | 0.522445 | 0.999963 | 0.490611 | 0.999972 | 151.258870 | 0.001782 | |
neighborhood | 42 | 0.066162 | 1.000000 | 0.237732 | 1.000000 | 43.868289 | 0.000517 | |
age group | 171 | 69.205262 | 1.000000 | 71.689507 | 1.000000 | 1392.723678 | 0.016408 | |
gender | 18 | 3.668375 | 0.999874 | 3.583914 | 0.999894 | 460.845914 | 0.005429 | |
age group × gender | 342 | 73.542488 | 1.000000 | 78.444162 | 1.000000 | 1438.030022 | 0.016942 | |
current education | 18 | 5.271613 | 0.998370 | 5.559259 | 0.997681 | 241.257163 | 0.002842 | |
age group | 738 | 400.828726 | 1.000000 | 88.304314 | 1.000000 | 344.817221 | 0.004062 | |
gender | 36 | 5.406962 | 1.000000 | 6.937750 | 1.000000 | 265.580831 | 0.003129 | |
migration background | 54 | 13.529318 | 1.000000 | 9.604675 | 1.000000 | 255.267884 | 0.003007 | |
absolved education | 162 | 6.639501 | 1.000000 | 6.341646 | 1.000000 | 241.481890 | 0.002845 | |
age group × gender | 1476 | 707.310600 | 1.000000 | 122.133588 | 1.000000 | 415.442018 | 0.004894 | |
age group × migration background | 2214 | 853.547746 | 1.000000 | 108.171522 | 1.000000 | 392.444181 | 0.004624 | |
age group × absolved education | 6642 | 464.316168 | 1.000000 | 92.685205 | 1.000000 | 360.126489 | 0.004243 | |
gender × migration background | 108 | 20.653331 | 1.000000 | 13.456209 | 1.000000 | 281.248106 | 0.003313 | |
gender × absolved education | 324 | 7.692929 | 1.000000 | 8.568989 | 1.000000 | 271.231784 | 0.003195 | |
migration background × absolved education | 486 | 19.424479 | 1.000000 | 12.676304 | 1.000000 | 258.486825 | 0.003045 | |
age group × gender × migration background | 4428 | 1574.162354 | 1.000000 | 157.970713 | 1.000000 | 487.830608 | 0.005747 | |
age group × gender × absolved education | 13284 | 933.428873 | 1.000000 | 130.189879 | 1.000000 | 443.881705 | 0.005230 | |
age group × migration background × absolved education | 19926 | 1093.462434 | 1.000000 | 113.877827 | 1.000000 | 416.662545 | 0.004909 | |
gender × migration background × absolved education | 972 | 29.108628 | 1.000000 | 21.881322 | 1.000000 | 305.042419 | 0.003594 | |
age group × gender × migration background × absolved education | 39852 | 2194.267693 | 1.000000 | 167.084847 | 1.000000 | 527.015116 | 0.006209 | |
car license | 2 | 0.000165 | 0.999917 | 0.000159 | 0.999921 | 3.595921 | 0.000042 | |
age | 26 | 0.005861 | 1.000000 | 0.006811 | 1.000000 | 14.370524 | 0.000169 | |
motor cycle license | 2 | 0.002777 | 0.998612 | 0.001954 | 0.999024 | 6.370590 | 0.000075 | |
age | 26 | 0.875771 | 1.000000 | 1.484794 | 1.000000 | 15.663175 | 0.000185 | |
moped license | 2 | 0.000220 | 0.999890 | 0.000196 | 0.999902 | 3.975836 | 0.000047 | |
age | 26 | 0.003083 | 1.000000 | 0.010500 | 1.000000 | 13.535124 | 0.000159 | |
car license | 4 | 0.002277 | 0.999999 | 0.003920 | 0.999998 | 3.975836 | 0.000047 | |
age × car license | 52 | 0.027725 | 1.000000 | 0.075585 | 1.000000 | 13.535124 | 0.000159 | |
household position | 21 | 1.803299 | 1.000000 | 1.847752 | 1.000000 | 250.398797 | 0.002950 | |
neighborhood | 42 | 1.802755 | 1.000000 | 2.501504 | 1.000000 | 146.000000 | 0.001720 | |
age group | 420 | 249.202608 | 1.000000 | 126.578850 | 1.000000 | 754.259661 | 0.008886 | |
gender | 42 | 2.699896 | 1.000000 | 2.918825 | 1.000000 | 336.681633 | 0.003967 | |
age group × gender | 840 | 334.941882 | 1.000000 | 182.929638 | 1.000000 | 896.190081 | 0.010558 |
Z2 | X2 | absolute error | ||||||
---|---|---|---|---|---|---|---|---|
DoF | Score | p-value | Score | p-value | total | standardized | ||
household type | neighborhood | 42 | 906.521255 | 0.000000 | 894.854325 | 0.000000 | 4098.000000 | 0.096605 |
postal code | neighborhood | 2069 | 392.432681 | 1.000000 | 570.688983 | 1.000000 | 3199.000000 | 0.077825 |
income_group | age group | 40 | 4.811664 | 1.000000 | 5.578801 | 1.000000 | 162.006477 | 0.003825 |
income household type | 40 | 0.858700 | 1.000000 | 1.054674 | 1.000000 | 138.199964 | 0.003263 | |
main bread winner migration background | 30 | 1.270564 | 1.000000 | 1.394234 | 1.000000 | 164.570737 | 0.003886 | |
age group × income household type | 160 | 21.969606 | 1.000000 | 32.654659 | 1.000000 | 269.499612 | 0.006364 | |
age group × main bread winner migration background | 120 | 34.366320 | 1.000000 | 40.854727 | 1.000000 | 256.078370 | 0.006047 | |
income household type × main bread winner migration background | 120 | 4.172994 | 1.000000 | 5.809870 | 1.000000 | 239.738984 | 0.005661 | |
age group × main bread winner migration background × income household type | 480 | 57.803160 | 1.000000 | 76.238676 | 1.000000 | 420.321388 | 0.009925 | |
cars | income household type | 16 | 11.333855 | 0.788424 | 11.687847 | 0.765174 | 130.374345 | 0.003078 |
hh size | 20 | 2.453303 | 0.999999 | 2.831404 | 0.999998 | 138.374003 | 0.003267 | |
vehicle ownership income group | 20 | 3.199855 | 0.999993 | 4.353822 | 0.999907 | 115.344156 | 0.002724 | |
car license | 20 | 17.717134 | 0.606036 | 20.764026 | 0.411137 | 132.576966 | 0.003131 | |
income household type × hh size | 80 | 10.263020 | 1.000000 | 11.653642 | 1.000000 | 148.560213 | 0.003508 | |
income household type × vehicle ownership income group | 80 | 9.298031 | 1.000000 | 12.164716 | 1.000000 | 164.088294 | 0.003875 | |
income household type × car license | 80 | 27.702845 | 1.000000 | 31.583354 | 1.000000 | 134.961108 | 0.003187 | |
hh size × vehicle ownership income group | 100 | 6.117136 | 1.000000 | 8.300281 | 1.000000 | 170.569850 | 0.004028 | |
hh size × car license | 100 | 18.241554 | 1.000000 | 25.092836 | 1.000000 | 154.187093 | 0.003641 | |
vehicle ownership income group × car license | 100 | 26.546443 | 1.000000 | 34.161878 | 1.000000 | 156.713236 | 0.003700 | |
income household type × hh size × vehicle ownership income group | 400 | 23.725567 | 1.000000 | 29.353700 | 1.000000 | 203.311026 | 0.004801 | |
income household type × hh size × car license | 400 | 29.404266 | 1.000000 | 40.145015 | 1.000000 | 166.019409 | 0.003920 | |
income household type × vehicle ownership income group × car license | 400 | 34.714700 | 1.000000 | 45.360212 | 1.000000 | 203.841026 | 0.004813 | |
hh size × vehicle ownership income group × car license | 500 | 30.191990 | 1.000000 | 44.540901 | 1.000000 | 222.846215 | 0.005262 | |
income household type × hh size × vehicle ownership income group × car license | 2000 | 54.483863 | 1.000000 | 73.215653 | 1.000000 | 265.690539 | 0.006274 | |
motorcycles | income household type | 16 | 3.331365 | 0.999661 | 3.876334 | 0.999100 | 37.931565 | 0.000896 |
hh size | 20 | 10.280284 | 0.962800 | 10.596614 | 0.956021 | 37.931565 | 0.000896 | |
vehicle ownership income group | 20 | 2.359904 | 1.000000 | 2.954304 | 0.999996 | 35.458634 | 0.000837 | |
motorcycle license | 12 | 1.270946 | 0.999947 | 1.348853 | 0.999926 | 35.458634 | 0.000837 | |
income household type × hh size | 80 | 13.105943 | 1.000000 | 14.205072 | 1.000000 | 37.931565 | 0.000896 | |
income household type × vehicle ownership income group | 80 | 7.924885 | 1.000000 | 11.107216 | 1.000000 | 44.933394 | 0.001061 | |
income household type × motorcycle license | 48 | 3.537022 | 1.000000 | 4.063794 | 1.000000 | 37.931565 | 0.000896 | |
hh size × vehicle ownership income group | 100 | 9.686371 | 1.000000 | 12.806302 | 1.000000 | 49.673925 | 0.001173 | |
hh size × motorcycle license | 60 | 10.822410 | 1.000000 | 11.299931 | 1.000000 | 37.931565 | 0.000896 | |
vehicle ownership income group × motorcycle license | 60 | 2.479304 | 1.000000 | 3.112846 | 1.000000 | 35.458634 | 0.000837 | |
income household type × hh size × vehicle ownership income group | 400 | 16.255885 | 1.000000 | 19.531411 | 1.000000 | 50.254357 | 0.001187 | |
income household type × hh size × motorcycle license | 240 | 13.667388 | 1.000000 | 14.965049 | 1.000000 | 37.931565 | 0.000896 | |
income household type × vehicle ownership income group × motorcycle license | 240 | 8.115136 | 1.000000 | 11.201584 | 1.000000 | 44.933394 | 0.001061 | |
hh size × vehicle ownership income group × motorcycle license | 300 | 10.301489 | 1.000000 | 13.574983 | 1.000000 | 49.673925 | 0.001173 | |
income household type × hh size × vehicle ownership income group × motorcycle license | 1200 | 16.946660 | 1.000000 | 20.319974 | 1.000000 | 50.254357 | 0.001187 |
This repository uses Git LFS. Before cloning this repository, make sure to run
git lfs install
After cloning, create a virtual environment, and install the requirements:
python -m venv env
source env/bin/activate
pip install -r requirements.txt
First, the individual level attributes should be created:
python3 generate_individuals.py
This adds the configured attributes one at a time. After each attribute, the intermediate stage of the synthetic population is stored as both a CSV file and a Pandas DataFrame pickle. If these files already exist, the step is not executed again when the above code is run again. To make sure changes take place, remove files from output from the version and onwards where changes need to be applied.
The second step is the household generation:
python3 generate_households.py
This also runs in multiple steps, and again stores the intermediate results. In this step, both the synthetic population and the synthetic household may change at every stage, so both are stored for each stage, even if no changes occur.
Footnotes
-
Marco Pellegrino, Jan de Mooij, Tabea Sonnenschein et al. GenSynthPop: Generating a Spatially Explicit Synthetic Population of Agents and Households from Aggregated Data, 09 October 2023, PREPRINT (Version 1) available at Research Square [https://doi.org/10.21203/rs.3.rs-3405645/v1] ↩