Skip to content

A Synthetic Population for the South-West District of The Hague, The Netherlands, constructed using GenSynthPop-Python

License

Notifications You must be signed in to change notification settings

A-Practical-Agent-Programming-Language/Synthetic-Population-The-Hague-South-West

Repository files navigation

This repository contains the implementation of GenSynthPop for the South-West district of The Hague, The Netherlands.

The synthetic population is generated for 14 neighborhoods and contains 10 individual-level attributes and 5 household-level attributes. The attributes are shown in the first column in the two tables in the Evaluation section below.

The synthetic population was generated as a case study for GenSynthPop, which was reported on in the paper GenSynthPop: Generating a Spatially Explicit Synthetic Population of Agents and Households from Aggregated Data1. For a full report on the evaluation, please refer to that paper.

Neighborhoods

The fourteen neighborhoods are located in the South-West District of The Hague ('s Gravenhage), The Netherlands:

Neighborhood Code (CBS) Neighborhood Name
BU05181785 Kerketuinen en Zichtenburg
BU05183284 Leyenburg
BU05183387 Venen, Oorden en Raden
BU05183396 Zijden, Steden en Zichten
BU05183398 Dreven en Gaarden
BU05183399 De Uithof
BU05183480 Morgenstond-Zuid
BU05183488 Morgenstond-West
BU05183489 Morgenstond-Oost
BU05183536 Zuiderpark
BU05183620 Moerwijk-Oost
BU05183637 Moerwijk-West
BU05183638 Moerwijk-Noord
BU05183639 Moerwijk-Zuid

The following map shows where in The Netherlands this synthetic population is located, and what the region looks like:

Map of the 14 neighborhoods in the South-West district of The Hague, and a map of the 12 provinces in The Netherlands with the South-West district of The Hague highlighted in red

Data sets:

All data sets have been obtained through the Open Data portal of the Statistics Netherlands (Centraal Bureau voor de Statistiek or CBS in Dutch).

All data sources have been made available by CBS under the CC BY 4.0 license, as stated here.

  1. marginal
    1. marginal_distributions_84583NED.csv. Source: Kerncijfers wijken en buurten 2019 (CBS StatLine):
    2. Regionale_kerncijfers_Nederland_19052024_185018.csv. Source: Regionale kerncijfers Nederland ( CBS StatLine)
  2. individual
    1. gender
      1. gender_age-03759NED-formatted.csv. Source: Bevolking op 1 januari en gemiddeld; geslacht, leeftijd en regio (CBS StatLine)
    2. integer_age
      1. Leeftijdsopbouw Nederland 2019.csv. Source: Bevolkingspiramide (CBS)
    3. migration_background
      1. Bev__migratieachtergr__regio__2010_2022_29122023_115517.csv. Source: Bevolking; migratieachtergrond, generatie, lft, regio, 1 jan; 2010-2022 (CBS StatLine)
    4. education
      1. Primair_onderwijs__schoolregio_18012024_150722.csv. Source: (Speciaal) basisonderwijs en speciale scholen; leerlingen, schoolregio (CBS StatLine)
      2. Studenten__woonregio_2000_2022_18012024_150722.csv. Source: Leerlingen en studenten; onderwijssoort, woonregio 2000/'01-2022/' 23 (CBS StatLine)
      3. Bevolking__onderwijsniveau_en_herkomst_26012024_115001.csv. Source: Bevolking; hoogst behaald onderwijsniveau en herkomst (CBS StatLine)
    5. drivers_license
      1. Personen_met_rijbewijs__categorie__regio_19052024_184228.csv. Source: Personen met een rijbewijs; rijbewijscategorie, leeftijd, regio, 1 januari (CBS StatLine)
    6. household_position
      1. Huishoudens__personen__regio_26122023_151215.csv. Source: Huishoudens; personen naar geslacht, leeftijd en regio, 1 januari (CBS StatLine)
      2. Huishoudens__samenstelling__regio_25052024_174751.csv. Source: Huishoudens; samenstelling, grootte, regio, 1 januari (CBS StatLine)
  3. household
    1. household_composition
      1. table_7ab235bf-b5a7-4077-bf56-3f5c8efec7d0.csv. Source: Groom usually older than bride (CBS)
      2. Marriages__key_figures_25052024_182843.csv. Source: Marriages and partnership registrations; key figures (CBS StatLine)
      3. Geboorte__kerncijfers_per_regio_25052024_182014.csv. Source: Geboorte; kerncijfers vruchtbaarheid, leeftijd moeder, regio (CBS StatLine)
    2. postal_code
      1. pc6hnr20190801_gwb.csv. Source: Buurt, wijk en gemeente 2019 voor postcode huisnummer (CBS):
    3. household_income
      1. Inkomen_huishoudens__kenmerken__regio_25052024_182249.csv. Source: Inkomen van huishoudens; huishoudenskenmerken, regio (indeling 2021) (CBS StatLine)
    4. vehicle_ownership
      1. Huishoudens_met_auto_of_motor__2010_2015_14062024_171657.csv. Source: Huishoudens in bezit van auto of motor; huishoudkenmerken, 2010-2015 (CBS StatLine):

Evaluation

Personal Level Attributes

Box plots of percentage difference in synthetic population and marginal data for the values of three attributes (left), repeated with the percentage difference in joint data for one of the attributes (right).

Z2 X2 absolute error
DoF Score p-value Score p-value total standardized
age group neighborhood 70 0.160028 1.000000 0.016845 1.000000 20.000000 0.000236
gender neighborhood 28 0.168613 1.000000 0.185536 1.000000 26.000000 0.000306
age group 10 0.612648 0.999983 0.579719 0.999987 150.559563 0.001773
integer age 106 2.144096 1.000000 3.601348 1.000000 124.606355 0.001468
gender 212 26.086832 1.000000 31.153039 1.000000 1183.331065 0.013938
migration background neighborhood 42 0.191206 1.000000 0.500781 1.000000 62.000000 0.000730
age group 60 26.784031 0.999936 27.521184 0.999898 1175.066312 0.013844
gender 6 1.421525 0.964536 1.140259 0.979733 296.952058 0.003498
age group × gender 120 27.329810 1.000000 30.074286 1.000000 1195.071783 0.014080
absolved education 9 0.522445 0.999963 0.490611 0.999972 151.258870 0.001782
neighborhood 42 0.066162 1.000000 0.237732 1.000000 43.868289 0.000517
age group 171 69.205262 1.000000 71.689507 1.000000 1392.723678 0.016408
gender 18 3.668375 0.999874 3.583914 0.999894 460.845914 0.005429
age group × gender 342 73.542488 1.000000 78.444162 1.000000 1438.030022 0.016942
current education 18 5.271613 0.998370 5.559259 0.997681 241.257163 0.002842
age group 738 400.828726 1.000000 88.304314 1.000000 344.817221 0.004062
gender 36 5.406962 1.000000 6.937750 1.000000 265.580831 0.003129
migration background 54 13.529318 1.000000 9.604675 1.000000 255.267884 0.003007
absolved education 162 6.639501 1.000000 6.341646 1.000000 241.481890 0.002845
age group × gender 1476 707.310600 1.000000 122.133588 1.000000 415.442018 0.004894
age group × migration background 2214 853.547746 1.000000 108.171522 1.000000 392.444181 0.004624
age group × absolved education 6642 464.316168 1.000000 92.685205 1.000000 360.126489 0.004243
gender × migration background 108 20.653331 1.000000 13.456209 1.000000 281.248106 0.003313
gender × absolved education 324 7.692929 1.000000 8.568989 1.000000 271.231784 0.003195
migration background × absolved education 486 19.424479 1.000000 12.676304 1.000000 258.486825 0.003045
age group × gender × migration background 4428 1574.162354 1.000000 157.970713 1.000000 487.830608 0.005747
age group × gender × absolved education 13284 933.428873 1.000000 130.189879 1.000000 443.881705 0.005230
age group × migration background × absolved education 19926 1093.462434 1.000000 113.877827 1.000000 416.662545 0.004909
gender × migration background × absolved education 972 29.108628 1.000000 21.881322 1.000000 305.042419 0.003594
age group × gender × migration background × absolved education 39852 2194.267693 1.000000 167.084847 1.000000 527.015116 0.006209
car license 2 0.000165 0.999917 0.000159 0.999921 3.595921 0.000042
age 26 0.005861 1.000000 0.006811 1.000000 14.370524 0.000169
motor cycle license 2 0.002777 0.998612 0.001954 0.999024 6.370590 0.000075
age 26 0.875771 1.000000 1.484794 1.000000 15.663175 0.000185
moped license 2 0.000220 0.999890 0.000196 0.999902 3.975836 0.000047
age 26 0.003083 1.000000 0.010500 1.000000 13.535124 0.000159
car license 4 0.002277 0.999999 0.003920 0.999998 3.975836 0.000047
age × car license 52 0.027725 1.000000 0.075585 1.000000 13.535124 0.000159
household position 21 1.803299 1.000000 1.847752 1.000000 250.398797 0.002950
neighborhood 42 1.802755 1.000000 2.501504 1.000000 146.000000 0.001720
age group 420 249.202608 1.000000 126.578850 1.000000 754.259661 0.008886
gender 42 2.699896 1.000000 2.918825 1.000000 336.681633 0.003967
age group × gender 840 334.941882 1.000000 182.929638 1.000000 896.190081 0.010558

Household Attributes

Z2 X2 absolute error
DoF Score p-value Score p-value total standardized
household type neighborhood 42 906.521255 0.000000 894.854325 0.000000 4098.000000 0.096605
postal code neighborhood 2069 392.432681 1.000000 570.688983 1.000000 3199.000000 0.077825
income_group age group 40 4.811664 1.000000 5.578801 1.000000 162.006477 0.003825
income household type 40 0.858700 1.000000 1.054674 1.000000 138.199964 0.003263
main bread winner migration background 30 1.270564 1.000000 1.394234 1.000000 164.570737 0.003886
age group × income household type 160 21.969606 1.000000 32.654659 1.000000 269.499612 0.006364
age group × main bread winner migration background 120 34.366320 1.000000 40.854727 1.000000 256.078370 0.006047
income household type × main bread winner migration background 120 4.172994 1.000000 5.809870 1.000000 239.738984 0.005661
age group × main bread winner migration background × income household type 480 57.803160 1.000000 76.238676 1.000000 420.321388 0.009925
cars income household type 16 11.333855 0.788424 11.687847 0.765174 130.374345 0.003078
hh size 20 2.453303 0.999999 2.831404 0.999998 138.374003 0.003267
vehicle ownership income group 20 3.199855 0.999993 4.353822 0.999907 115.344156 0.002724
car license 20 17.717134 0.606036 20.764026 0.411137 132.576966 0.003131
income household type × hh size 80 10.263020 1.000000 11.653642 1.000000 148.560213 0.003508
income household type × vehicle ownership income group 80 9.298031 1.000000 12.164716 1.000000 164.088294 0.003875
income household type × car license 80 27.702845 1.000000 31.583354 1.000000 134.961108 0.003187
hh size × vehicle ownership income group 100 6.117136 1.000000 8.300281 1.000000 170.569850 0.004028
hh size × car license 100 18.241554 1.000000 25.092836 1.000000 154.187093 0.003641
vehicle ownership income group × car license 100 26.546443 1.000000 34.161878 1.000000 156.713236 0.003700
income household type × hh size × vehicle ownership income group 400 23.725567 1.000000 29.353700 1.000000 203.311026 0.004801
income household type × hh size × car license 400 29.404266 1.000000 40.145015 1.000000 166.019409 0.003920
income household type × vehicle ownership income group × car license 400 34.714700 1.000000 45.360212 1.000000 203.841026 0.004813
hh size × vehicle ownership income group × car license 500 30.191990 1.000000 44.540901 1.000000 222.846215 0.005262
income household type × hh size × vehicle ownership income group × car license 2000 54.483863 1.000000 73.215653 1.000000 265.690539 0.006274
motorcycles income household type 16 3.331365 0.999661 3.876334 0.999100 37.931565 0.000896
hh size 20 10.280284 0.962800 10.596614 0.956021 37.931565 0.000896
vehicle ownership income group 20 2.359904 1.000000 2.954304 0.999996 35.458634 0.000837
motorcycle license 12 1.270946 0.999947 1.348853 0.999926 35.458634 0.000837
income household type × hh size 80 13.105943 1.000000 14.205072 1.000000 37.931565 0.000896
income household type × vehicle ownership income group 80 7.924885 1.000000 11.107216 1.000000 44.933394 0.001061
income household type × motorcycle license 48 3.537022 1.000000 4.063794 1.000000 37.931565 0.000896
hh size × vehicle ownership income group 100 9.686371 1.000000 12.806302 1.000000 49.673925 0.001173
hh size × motorcycle license 60 10.822410 1.000000 11.299931 1.000000 37.931565 0.000896
vehicle ownership income group × motorcycle license 60 2.479304 1.000000 3.112846 1.000000 35.458634 0.000837
income household type × hh size × vehicle ownership income group 400 16.255885 1.000000 19.531411 1.000000 50.254357 0.001187
income household type × hh size × motorcycle license 240 13.667388 1.000000 14.965049 1.000000 37.931565 0.000896
income household type × vehicle ownership income group × motorcycle license 240 8.115136 1.000000 11.201584 1.000000 44.933394 0.001061
hh size × vehicle ownership income group × motorcycle license 300 10.301489 1.000000 13.574983 1.000000 49.673925 0.001173
income household type × hh size × vehicle ownership income group × motorcycle license 1200 16.946660 1.000000 20.319974 1.000000 50.254357 0.001187

Using this repository

This repository uses Git LFS. Before cloning this repository, make sure to run

git lfs install

After cloning, create a virtual environment, and install the requirements:

python -m venv env
source env/bin/activate
pip install -r requirements.txt

First, the individual level attributes should be created:

python3 generate_individuals.py

This adds the configured attributes one at a time. After each attribute, the intermediate stage of the synthetic population is stored as both a CSV file and a Pandas DataFrame pickle. If these files already exist, the step is not executed again when the above code is run again. To make sure changes take place, remove files from output from the version and onwards where changes need to be applied.

The second step is the household generation:

python3 generate_households.py

This also runs in multiple steps, and again stores the intermediate results. In this step, both the synthetic population and the synthetic household may change at every stage, so both are stored for each stage, even if no changes occur.

Footnotes

  1. Marco Pellegrino, Jan de Mooij, Tabea Sonnenschein et al. GenSynthPop: Generating a Spatially Explicit Synthetic Population of Agents and Households from Aggregated Data, 09 October 2023, PREPRINT (Version 1) available at Research Square [https://doi.org/10.21203/rs.3.rs-3405645/v1]

About

A Synthetic Population for the South-West District of The Hague, The Netherlands, constructed using GenSynthPop-Python

Resources

License

Stars

Watchers

Forks

Packages

No packages published