Skip to content
Open
Show file tree
Hide file tree
Changes from 59 commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
6bbc25c
use modern pyproject package definition
nick-fournier-rsg Mar 28, 2025
7be2722
pin python to 3.8 for now
nick-fournier-rsg Mar 28, 2025
1d49c6f
Cleanup test runners
nick-fournier-rsg Mar 28, 2025
9bf8dc4
Allow hard constraints on balancer.
nick-fournier-rsg Mar 28, 2025
8ac131b
minor performance enhancements and bug fixes.
nick-fournier-rsg Mar 28, 2025
3ec970f
repop fix
nick-fournier-rsg Mar 28, 2025
9e6dee5
add oceanside repop example
nick-fournier-rsg Apr 2, 2025
80c38a0
Fix oceanside inputs
nick-fournier-rsg Apr 2, 2025
aadc2fd
bugfix summarize empty
nick-fournier-rsg Apr 2, 2025
70efe68
Major cleanup of tests and examples to share data and configs.
nick-fournier-rsg Apr 14, 2025
077556e
Pytests gitaction
nick-fournier-rsg Apr 14, 2025
475b8dd
Minor update to test gha
nick-fournier-rsg Apr 14, 2025
48af270
cleanup gha testing
nick-fournier-rsg Apr 14, 2025
6262619
disable linting for now.
nick-fournier-rsg Apr 14, 2025
e2c94ec
bugfix weighting test
nick-fournier-rsg Apr 14, 2025
e9bae6b
simplify test_steps
nick-fournier-rsg Apr 14, 2025
2e665d2
Normalize df to hash
nick-fournier-rsg Apr 14, 2025
5a21557
debug data hash.
nick-fournier-rsg Apr 15, 2025
5e61a06
debug
nick-fournier-rsg Apr 15, 2025
dc892d7
test
nick-fournier-rsg Apr 15, 2025
86dd98f
more debug
nick-fournier-rsg Apr 15, 2025
11048a6
test sorting
nick-fournier-rsg Apr 15, 2025
0614c66
debug
nick-fournier-rsg Apr 15, 2025
49264e3
further sort
nick-fournier-rsg Apr 15, 2025
f68a444
debug
nick-fournier-rsg Apr 15, 2025
b86cf01
debugging
nick-fournier-rsg Apr 15, 2025
0972171
more debug
nick-fournier-rsg Apr 15, 2025
d79c1c9
Revert "more debug"
nick-fournier-rsg Apr 15, 2025
37f5b97
Revert "debugging"
nick-fournier-rsg Apr 15, 2025
349229b
debug
nick-fournier-rsg Apr 15, 2025
cba2b6a
more debug...
nick-fournier-rsg Apr 15, 2025
57ae826
Linux - Windowx ortools bugfix.
nick-fournier-rsg Apr 15, 2025
b29441b
Cleanup tests and stabilize.
nick-fournier-rsg Apr 15, 2025
77eef56
Working refactor of activitysim pipeline into populationsim
nick-fournier-rsg Apr 16, 2025
814c5d0
linting
nick-fournier-rsg Apr 16, 2025
2dd9034
Possible fix for repop error.
nick-fournier-rsg Apr 17, 2025
b9811d4
Cleanup unused code
nick-fournier-rsg Apr 17, 2025
c9863ed
cleanup dependencies and test python versions
nick-fournier-rsg Apr 17, 2025
b421d87
Cleanup imports
nick-fournier-rsg Apr 17, 2025
c653536
Pinned versions to work with python 3.12
nick-fournier-rsg Apr 17, 2025
d421a47
Dropped support for Python 3.13 because ortools must be <=3.12
nick-fournier-rsg Apr 17, 2025
862d904
Cleaned up future warnings, expanded tests, and resurrected the lp_cv…
nick-fournier-rsg Apr 17, 2025
9c221a0
iter version
nick-fournier-rsg Apr 17, 2025
11f4720
Add pre-commit
nick-fournier-rsg Apr 17, 2025
0c92ac8
Fixed test bug.
nick-fournier-rsg Apr 17, 2025
92e976c
Import bugfix
nick-fournier-rsg May 25, 2025
bc37955
Numba balancer
nick-fournier-rsg May 26, 2025
c854bad
Implemented Numba for significant perf improvement. Need to cleanup S…
nick-fournier-rsg May 27, 2025
9742ee4
Test fix. But needs organizing in sub_balance and do_balance.
nick-fournier-rsg May 27, 2025
d30c8be
Update test_balancer.py
nick-fournier-rsg May 27, 2025
d588705
cleanup uv lock
nick-fournier-rsg May 27, 2025
099ff38
Merge branch 'develop' of github.com:RSGInc/populationsim into develop
nick-fournier-rsg May 27, 2025
6336002
Organize into modules
nick-fournier-rsg May 27, 2025
053731d
split numba functions
nick-fournier-rsg May 27, 2025
4964540
fixed import paths
nick-fournier-rsg May 27, 2025
f155ddd
more organizing
nick-fournier-rsg May 27, 2025
f1a90dc
Added configurable optimizer timeout parameter in settings. Also furt…
nick-fournier-rsg Jun 4, 2025
16c2f21
Cleanup unused code.
nick-fournier-rsg Jun 4, 2025
2d2290e
Added CLI option
nick-fournier-rsg Jun 12, 2025
d51a53d
Bugfix CLI option
nick-fournier-rsg Jun 25, 2025
4474f60
Bugfixes
nick-fournier-rsg Jul 18, 2025
21b9949
Revert "Bugfixes"
nick-fournier-rsg Jul 18, 2025
3d6fe19
Bugfix max delta
nick-fournier-rsg Jul 18, 2025
fbd218a
Hardcode constants instead of as args
nick-fournier-rsg Aug 13, 2025
160d8fd
Update pyproject.toml
nick-fournier-rsg Aug 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python

name: Python package

on:
push:
branches: [ "master", "develop"]
pull_request:
branches: [ "master" ]

jobs:
build:

runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"]

steps:
- uses: actions/checkout@v4

- name: Install uv and set the python version
uses: astral-sh/setup-uv@v5
with:
python-version: ${{ matrix.python-version }}
version: "0.6.14"

- name: Install the project
run: uv sync --all-extras --dev

- name: Lint with ruff
uses: astral-sh/ruff-action@v3

- name: Run tests
run: uv run pytest
4 changes: 1 addition & 3 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
sandbox/
regress/
example_test_no_integerizing/
example_mtc/
.idea
.ipynb_checkpoints

.coverage.*

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
19 changes: 19 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0 # Use latest stable version
hooks:
# - id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace

- repo: https://github.com/psf/black
rev: 24.3.0 # Use latest Black version
hooks:
- id: black
language_version: python3 # Ensures compatibility with Python 3+

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.3.3 # Replace with latest Ruff release
hooks:
- id: ruff
args: [--fix] # Optional: auto-fix simple issues
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.12
16 changes: 16 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Python Debugger: Current File",
"type": "debugpy",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal",
"justMyCode": true,
}
]
}
9 changes: 9 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"python.testing.pytestArgs": [
"populationsim",
"tests"
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"ruff.enable": true,
}
7 changes: 0 additions & 7 deletions MANIFEST.in

This file was deleted.

10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,16 @@ easily adapted for statewide, regional, and urban transportation planning
needs. PopulationSim is implemented in the
[ActivitySim](https://github.com/activitysim/activitysim) framework.

## Command-Line Interface

PopulationSim can be run directly from the command line:

```bash
populationsim -c /path/to/configs -d /path/to/data -o /path/to/output
```

See the [examples directory](examples/) for more information on using the command-line interface.

## Documentation

https://activitysim.github.io/populationsim/
48 changes: 24 additions & 24 deletions docs/application_configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ PopulationSim is configured using the settings.yaml file. PopulationSim can be c

:regular mode:

The regular configuration runs PopulationSim from beginning to end and produces a new synthetic population. This can run either single-process or multi-processed to save on runtime.
The regular configuration runs PopulationSim from beginning to end and produces a new synthetic population. This can run either single-process or multi-processed to save on runtime.

:repop mode:

Expand Down Expand Up @@ -263,17 +263,17 @@ This sub-directory is populated at the end of the PopulationSim run. The table b
Configuring Settings File
~~~~~~~~~~~~~~~~~~~~~~~~~

PopulationSim is configured using the *configs/settings.yaml* file. The user has the flexibility to specify algorithm functionality, list geographies, invoke tracing, provide inputs specifications, select outputs, list the steps to run, and specify multiprocess settings.
PopulationSim is configured using the *configs/settings.yaml* file. The user has the flexibility to specify algorithm functionality, list geographies, invoke tracing, provide inputs specifications, select outputs, list the steps to run, and specify multiprocess settings.

.. note::
When running PopulationSim, multiple settings files can be specified so long as the ``inherit_settings: True`` setting is included in
When running PopulationSim, multiple settings files can be specified so long as the ``inherit_settings: True`` setting is included in
subsequent files. This feature is used for the multi-processing configuration described below. To utilize this feature, once can run PopulationSim
with the following command: ``python run_populationsim.py -c configs_mp -c configs``. This command specifies two config folders, each with
with the following command: ``python run_populationsim.py -c configs_mp -c configs``. This command specifies two config folders, each with
a settings file, and the ``configs_mp`` settings inherit from the earlier ``configs`` settings.

The settings shown below are from the PopulationSim application for the CALM region as an example of how a run can be configured. The meta geography for CALM region is named as *Region*, the seed geography is *PUMA* and the two sub-seed geographies are *TRACT* and *TAZ*. The settings below are for this four geography application, but the user can configure PopulationSim for any number of geographies and use different geography names.

Some of the setting are configured differently for the *repop* mode. The settings specific to the *repop* mode are described in the :ref:`settings_repop` section. The settings specific to the *multiprocessing* setup are described in the :ref:`settings_mp` section.
Some of the setting are configured differently for the *repop* mode. The settings specific to the *repop* mode are described in the :ref:`settings_repop` section. The settings specific to the *multiprocessing* setup are described in the :ref:`settings_mp` section.

**Algorithm/Software Configuration**:

Expand Down Expand Up @@ -395,11 +395,11 @@ Note that Seed-Households, Seed-Persons and Geographic CrossWalk are all require
- tablename: households
filename : seed_households.csv
index_col: hh_id
column_map:
rename_columns:
hhnum: hh_id
- tablename: persons
filename : seed_persons.csv
column_map:
rename_columns:
hhnum: hh_id
SPORDER: per_num
# drop mixed type fields that appear to have been incorrectly generated
Expand All @@ -414,7 +414,7 @@ Note that Seed-Households, Seed-Persons and Geographic CrossWalk are all require
- naicsp07
- tablename: geo_cross_walk
filename : geo_cross_walk.csv
column_map:
rename_columns:
TRACTCE: TRACT
- tablename: TAZ_control_data
filename : control_totals_taz.csv
Expand Down Expand Up @@ -454,7 +454,7 @@ Note that Seed-Households, Seed-Persons and Geographic CrossWalk are all require
+--------------+---------------------------------------------------------------------------------------+
| index_col | Name of the unique ID field in the seed household data |
+--------------+---------------------------------------------------------------------------------------+
| column_map | Column map of fields to be renamed. The format for the column map is as follows: |br| |
| rename_columns | Column map of fields to be renamed. The format for the column map is as follows: |br| |
| | ``Name in CSV: New Name`` |
+--------------+---------------------------------------------------------------------------------------+
| drop_columns | List of columns to be dropped from the input data |
Expand Down Expand Up @@ -627,17 +627,17 @@ For detailed information on software implementation refer to :ref:`core_componen
Configuring Settings File for Multiprocessing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This sections describes the settings that are additionally configured for running PopulationSim with
multiprocessing to reduce runtime. PopulationSim uses ActivitySim's multiprocessing capabilities, which
This sections describes the settings that are additionally configured for running PopulationSim with
multiprocessing to reduce runtime. PopulationSim uses ActivitySim's multiprocessing capabilities, which
are described in more detail `here <https://activitysim.github.io/activitysim/howitworks.html#multiprocessing>`_.

The example below can be found in the ``example_calm\configs_mp\settings.yaml`` file. The group of model steps
identified as ``mp_seed_balancing`` and starting with ``input_pre_processor``
are run single process until the next group of model steps identified as ``mp_sub_balancing_TAZ`` and starting with
The example below can be found in the ``example_calm\configs_mp\settings.yaml`` file. The group of model steps
identified as ``mp_seed_balancing`` and starting with ``input_pre_processor``
are run single process until the next group of model steps identified as ``mp_sub_balancing_TAZ`` and starting with
``sub_balancing.geography=TAZ`` is reached, at which time PopulationSim runs these steps in parallel using two processors
by slicing the problem into separate geographic batches based on the ``slice_geography: TRACT`` setting. It then
returns to single process with the final group of model steps identified as ``mp_summarize`` and
beginning with ``expand_households``.
by slicing the problem into separate geographic batches based on the ``slice_geography: TRACT`` setting. It then
returns to single process with the final group of model steps identified as ``mp_summarize`` and
beginning with ``expand_households``.

::

Expand Down Expand Up @@ -666,8 +666,8 @@ beginning with ``expand_households``.
- trace_TAZ_weights
- name: mp_summarize
begin: expand_households


+-------------------------------+--------------------------------------------------------------------------------------------------------------+
| Attribute | Description |
+===============================+==============================================================================================================+
Expand Down Expand Up @@ -859,7 +859,7 @@ Some conventions for writing expressions:
* Expressions must be vectorized expressions and can use most numpy and pandas expressions.
* When editing the CSV files in Excel, use single quote ' or space at the start of a cell to get Excel to accept the expression

.. _importance:
.. _importance:

What are importance weights
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -882,18 +882,18 @@ Where, :math:`z_{i}` are relaxation factors and :math:`a_{in}` are incidence val

Where, :math:`u_{i}` are the penalties termed as importance factors or importance weights in PopulationSim.

:math:`x_{n}` and :math:`z_{i}` are the parameters solved by the optimization while importance weights (:math:`u_{i}`) are the hyperparameters that are exposed to the user and impact the optimization externally. The objective of the relative entropy optimization is to find a set of weights that are uniform and satisfy marginal controls. The importance weights allow the user to trade-off between these objectives. High importance weights (e.g., 1E10) on all controls result in a hard constrained optimization which gives a high preference to matching marginal controls. Low importance weights (e.g., <50) results in an almost unconstrained problem. The user may also specify different importance weights for each marginal control. In this case, the controls with higher importance weights are given preference over the ones with low importance weights. Therefore, both absolute and relative value of the importance weights impacts the optimization problem and the solution.
:math:`x_{n}` and :math:`z_{i}` are the parameters solved by the optimization while importance weights (:math:`u_{i}`) are the hyperparameters that are exposed to the user and impact the optimization externally. The objective of the relative entropy optimization is to find a set of weights that are uniform and satisfy marginal controls. The importance weights allow the user to trade-off between these objectives. High importance weights (e.g., 1E10) on all controls result in a hard constrained optimization which gives a high preference to matching marginal controls. Low importance weights (e.g., <50) results in an almost unconstrained problem. The user may also specify different importance weights for each marginal control. In this case, the controls with higher importance weights are given preference over the ones with low importance weights. Therefore, both absolute and relative value of the importance weights impacts the optimization problem and the solution.

.. _setting-importance:
.. _setting-importance:

Setting importance weights
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Given the flexibility that importance weights offer to the user, they need to be tuned to get the desired optimality in the outputs for the given seed sample and marginal controls. The quality of the outputs is defined by a uniformity measure of the weights and goodness of fit across marginal controls. Here are general guidelines on setting importance weights:

* Start with a reasonable importance factor value across all controls (e.g., 1000 has typically worked well for multiple regions). This excludes the control on the total number of households which should be set to very high importance to ensure that the right number of households is generated for each zone.
* After achieving reasonable goodness of fit across controls, the importance weights can be increased/decreased to favor one control over the other, or all importance weights can be reduced to improve the uniformity of the weights. Which controls to favor depends on the type of application and the quality of the marginal data.
* The importance weights are generally updated in factors of 10. The user may need to run PopulationSim multiple times using various combinations of importance weights to reach the desired quality of outputs.
* After achieving reasonable goodness of fit across controls, the importance weights can be increased/decreased to favor one control over the other, or all importance weights can be reduced to improve the uniformity of the weights. Which controls to favor depends on the type of application and the quality of the marginal data.
* The importance weights are generally updated in factors of 10. The user may need to run PopulationSim multiple times using various combinations of importance weights to reach the desired quality of outputs.



Expand Down
Loading