Skip to content

Conversation

nick-fournier-rsg
Copy link

@nick-fournier-rsg nick-fournier-rsg commented Jun 4, 2025

This is a substantial revision to the PopulationSim code. Major changes include:

  • Refactored PopulationSim to not depend on legacy ActivitySim
  • Removal of unused dependencies via legacy ActivitySim import
  • Updated critical dependencies of Numpy and Pandas. Now shares versions with current ActivitySim.
  • Updated Python and tested with multiple python versions 3.9, 3.10, 3.11, 3.12
  • Stabilized existing tests and incorporated into GHA CI testing
  • Extended tests to validate results, not just run without error
  • Re-wrote list balancer functions into numba JIT compiled functions (5x-10x speed improvement)
  • Reimplemented optional CVXPY integerization backend
  • Added tests for numba functions and integerization options
  • Resolved repop bug for CALM example
  • Added integerization solver timeout override parameter optional config.
  • Modernized Python packaging to use pyproject.toml

Issues Addressed:

@nick-fournier-rsg nick-fournier-rsg changed the title Develop Phase 10A Populationsim Updates Jun 12, 2025
@nick-fournier-rsg nick-fournier-rsg marked this pull request as ready for review June 12, 2025 21:27
@jpn-- jpn-- requested a review from Copilot June 17, 2025 18:22
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes legacy example scripts and seed data, modernizes documentation and packaging, and adds IDE and CI configurations.

  • Removed outdated example run scripts and seed data CSVs
  • Updated documentation formatting, renamed config keys, and added CLI usage to README
  • Added VSCode settings, pre-commit hooks, Python version pinning, and GitHub Actions CI

Reviewed Changes

Copilot reviewed 192 out of 192 changed files in this pull request and generated no comments.

Show a summary per file
File Description
example_test/run_populationsim.py Removed old example runner script
example_test/data2/seed_persons.csv Deleted legacy seed persons data
example_test/data2/seed_households.csv Deleted legacy seed households data
example_test/configs2/controls.csv Deleted legacy controls configuration
example_survey_weighting/run_populationsim.py Removed survey-weighting example runner
example_calm_repop/run_populationsim.py Removed CALM repop example runner
example_calm/run_populationsim.py Removed CALM example runner
docs/validation.rst Trimmed trailing blank lines
docs/software.rst Removed trailing whitespace
docs/conf.py Reformatted quotes and list syntax
docs/application_configuration.rst Renamed column_map to rename_columns
README.md Added CLI usage section
MANIFEST.in Removed graft directives for example and data files
.vscode/settings.json Added VSCode testing and linter settings
.vscode/launch.json Added VSCode debug configuration
.python-version Pinned Python version to 3.12
.pre-commit-config.yaml Introduced pre-commit hooks for formatting and linting
.github/workflows/python-package.yml Added CI workflow for multi-version testing
Comments suppressed due to low confidence (4)

README.md:21

  • The README references an examples/ directory, but the legacy example scripts and data have been removed. Either update the path to the new examples location or remove this reference to avoid confusion.
See the [examples directory](examples/) for more information on using the command-line interface.

.github/workflows/python-package.yml:25

  • [nitpick] Using actions/setup-python@v4 is more standard and widely supported than astral-sh/setup-uv. Consider switching to the official action to simplify setup and improve community familiarity.
      uses: astral-sh/setup-uv@v5

.vscode/settings.json:1

  • [nitpick] Committing IDE-specific configuration can clutter the repository and affect other contributors. Consider adding .vscode/ to .gitignore or isolating personal settings.
{

@j-murray1
Copy link
Collaborator

Pull Request Testing Feedback

My colleague and I reviewed the pull request code, examples, testing, and documentation. Additionally, I tested the functionality with an independently developed set of inputs. This scenario, referred to as NCTCOG, has a household sample size of 141,851. So, an order of magnitude larger than the provided examples, but not large enough to encounter convergence failure under the default parameters and the accompanying inconsistent results issues that have been noted by others. We have a document containing our full set of comments, but I'm just posting our primary findings below to limit the length of this comment.

Findings

  1. The final_summary files have incorrect differences between control and result fields calculated in the NCTCOG scenario. For example, num_hh_control = 1675 and num_hh_result = 1675 but the num_hh_diff = -686 in the first row of the final_summary_TRACT.csv. I observed this in the final_summary_TRACT.csv for example_calm output as well. It appears that the issue comes from an expectation that ‘results’ and ‘controls’ used to create the summary_df and controls_df, respectively, in summarize.py have the same inherent geographic order. When this is not the case, the dif_df ends up subtracting controls and results for different zone IDs. The result and control values become correctly aligned during the concatenation afterwards, but the dif_df assumes the same geographic ordering as the control values, which leads to the discrepancy between control/result and differences in concatenated table.

    • There may be a better way to go about this that can be included in the initialization of summary_df in summarize.py but I resolved the issue by locating the rows in summary_df based on the zone_ids, which is an Index, established earlier in the function. This should guarantee the summary_df and controls_df have the same ordering when used to create dif_df. The exact line of code I used is:
      summary_df = summary_df.loc[zone_ids]
      This appears to resolve the issue across all the examples as well as the NCTCOG run. Note that summary file checking is not currently covered by the testing suite.
  2. There is a discrepancy between documentation and the example settings.yaml regarding output tables. The example_calm settings.yaml states if no action is specified then no output tables will be written. The documentation states that if no action is specified then all output tables will be written.

    • Testing shows that the former is true. If you do not specify an action, then no summary tables will be written. Further, the documentation states that all summary files are written regardless of the action setting, which is opposite to what is happening in the code. All non-summary output files (besides expanded hh ids) are being written regardless of what action is specified. So, if no action is specified, all outputs except the summary files are written.
    • I think users would benefit from the default for output tables being to write everything and the parameters in the settings file can be used to curate the files created by an experienced user.
  3. Validation.ipynb – The validation notebook includes 2 uses of DataFrame.append() which was deprecated in pandas 1.4.0 and removed in pandas 2.0. The import packages code block does not specify a specific version of pandas to install, so when someone runs it at the time of this review (August 2025) it installs a version of pandas 2.0+, which means the code blocks that use DataFrame.append() will throw an error. To fix this, the lines that use DataFrame.append() could be updated to use pd.concat() or the installation of pandas in the notebook could be amended to an earlier version where append() still works. I recommend the former because the latter may introduce cascading issues with inter-package compatibilities.

  4. I encountered several tests that failed when I ran them using pytest. Note that more tests failed when I ran all tests in sequence versus running them individually; I note the reason for this in a bullet below. The hh_id column in the expanded_household_ids dataframe has variable dtype (int32 or int64). It appears the intended dtype is int64 based on the data in the ‘expected’ folder within ‘tests’. However, when the GROUP_BY_INCIDENCE_SIGNATURE setting is ‘True’ the chooser function used to select hh_id for each group_id results in a hh_id column with dtype=int32. This shouldn't impact the results, but it is causing some problems with the testing. Previously, the pandas function used to apply the hh_id chooser used a parameter called ‘convert_dytpe’ which must have preserved the hh_id column as int64 but this parameter is deprecated in pandas version 2.1.0+ and the PR configuration calls for pandas version 2.2+. I don’t think you necessarily want to enforce the hh_id column to be int64, but for the purposes of testing I cast the result of the hh_id chooser for group_id to int64 and this resolved all the issues with the unit tests. The best solution may be to use Pandas’ built-in assertion that checks if the values of two dataframes are identical as you can tell this function to ignore dtypes.

    • The test changes where the problematic assert statements were added are part of commit c854bad.
    • Issue with test_full_run1() in test_steps.py (Python 3.10). The assertion that expanded household IDs are equal to the expected expanded household IDs fails because the hh_id column in the dataframe read from the parquet file has dtype int64 while the hh_id column in the dataframe from the test run has dtype int32. The pandas df1.equals(df2) function automatically compares dtypes as part of its check for equality and returns False here, which causes the test to fail. Further the assertion breaks out of the function and so the statement past this point closing the pipeline is not executed which causes following tests to fail because they try and access an already open pipeline. Note that the tests following test_full_run1() are meant to fail if run1 fails. However, the open pipeline issue cascades into tests in the test_weighting.py test file as well. Tests in test_weighting.py pass when run individually. Pandas has a built-in assert for checking that values of two dataframes are equal where you can ignore checking dtypes (pd.testing.assert_frame_equal(df1, df2, check_dtype=False)).
    • The same issue identified above also affects the full run tests in test_flex.py.
    • Note the expanded_household_ids dataframe created in the multi-processing version of the steps test has a hh_id column with dtype int64 instead of int32. So, this test passes the df1.equals(df2) assertion. The GROUP_BY_INCIDENCE_SIGNATURE setting is set to ‘False’ which is what ultimately led me to finding the root cause of the issue.
  5. The oceanside repop example appears to have incomplete scenario folder or incorrect settings.yaml in current pull request. As such, it won't run 'out-of-the-box' without some edits from the user. Curious if this is reproducible or if it was an issue with my local setup.

    • Missing output folder which causes populationsim to fail.
      • May be worth adding capability for populationsim to create the necessary output folder within the scenario structure in the case that it doesn’t exist.
    • The pipeline_templates folder and construct_pipe.py have been added to provide ‘existing run’ data but the settings.yaml is still configured to include all the non-repop run input data in the input table list.
      • Once settings.yaml input table list updated similar to the example_calm_repop settings.yaml populationsim runs.
  6. Documentation - We recommend the documentation is updated to reflect the changes made in the PR. We have a number of detailed comments on recommended updates to the documentation. I'm including the two here but happy to provide the others upon request.

    • 'Getting Started' Section: Repository in the pull request comes with a pyproject.toml which multiple python packages interact with to make setting up a project environment very easy (I used ‘pdm’). However, the documentation ‘Getting Started’ section does not mention the *.toml file or how a user should interact with it in order to set up their python environment. Further, this section suggests creating a conda environment with python version 3.8 which is incompatible with the pull request version of populationsim.
      • Recommendation that the ‘Installation’ section be revised to be consistent with the current python versioning requirements. The instructions on setting up a python virtual environment can be updated. Then, the current step 4 can be replaced with direction on how to use pyproject.toml to install all necessary dependencies (e.g. install ‘pdm’ or equivalent tool with pip then use that tool to read the pyproject.toml and install dependencies. Additionally, the ‘ActivitySim’ section can be removed, as everything noted here is handled by using pyproject.toml.
      • The instructions for running the examples in the ‘Getting Started’ section are not updated to show the new CLI capabilities introduced in this pull request.
        • There is a README in the examples folder that describes the CLI capability that should probably be included in the documentation.
    • Currently, you cannot successfully build the documentation locally because the software.rst has not been updated based on the revised structure of the populationsim folder. For example, its reference to the assign component is populationsim.assign (e.g. it is found in the folder populationsim) but this is no longer the case. Assign now resides in a sub-folder called ‘core’. So, the reference in software.rst should be populationsim.core.assign. The same holds for any software component that has been relocated in the updated populationsim folder structure.

@danielsclint
Copy link

danielsclint commented Aug 18, 2025

I started reviewing the code today. I focused first on documentation and examples. I'll save the core code until tomorrow. I'm a little afraid I'm going to get sidetracked, so I figured it best to drop comments as I go.

Initial Thought on the CLI: I like the move to more standardized CLI functions and the use of the TOML file. I was able to create a new environment and run the samples within a matter of minutes.

Sphinx Documentation

Similar to the experience of the @j-murray1, I had several issues with the documentation being seemingly out of date and likely not updated against the most current versions of Sphinx. I had several missing references and a malformed table error when I attempted to build the docs. More importantly, given the scale of the changes to the codebase, it looks like the only changes in the docs folder are cosmetic lint-er fixes (e.g., single-quote to double-quote, linefeeds, spacing). Given the scale of the changes, it would seem appropriate to update the documentation too. A first pass by an AI agent like Claude or Copilot could likely go a long ways.

Examples

  • example_calm: This ran without any problems.
  • example_calm_repop: Followed the notes in the documentatation, and this ran without any problems.
  • example_oceanside_repop: Ran into similar issues as @j-murray1. This did not work out of the box, and it is not immediately clear what the user should do to make this example work. I looked around in the documentation, but I didn't see / find any special notes on this specific example.
  • example_calm: Ran without any issues.
  • example_test: Ran without any issues out-of-the-box. I did try to run with popualtionsim -m 4, and I got a cryptic error. Maybe I'm not suppose to run with multiprocessing enabled, but the error message was not clear that the -m 4 was the problem. See shell window below.
(popsim) ➜  example_test git:(fork/RSGInc/develop) ✗ populationsim -m 4
Configured logging using basicConfig
INFO:populationsim:Configured logging using basicConfig
INFO:populationsim.run:adding 'configs_mp' to config_dir list...
/Users/cdaniels/apps/activitysim/populationsim/populationsim/__main__.py:33: FutureWarning: Support for 'run_list' settings group will be removed.
The run_list.steps setting is renamed 'models'.
The run_list.resume_after setting is renamed 'resume_after'.
Specify both 'models' and 'resume_after' directly in settings config file.
  sys.exit(run(args))
Error: Don't expect 'steps' in run_list and 'models' as stand-alone setting!

Overall Note on Examples

  • With the new CLI functionality, it wasn't immediately clear why the tests still have the legacy run_populationsim.py files in each example. I suggest removing these, or, at the very least, only keep one and add comments that it is still only there for legacy comparisons.
  • I did eventually find the calm validation.ipynb. This is great. I recommend moving it inside of the example_calm folder, so other users can find it more quickly.
  • While I realize it is likely outside the scope of the PR (and task order), I recommend adding some documentation or tests to help guide the user about the outputs in the examples. It is great that they mechanically work, but it is much more difficult to understand on their face whether the algorithms themselves worked. The current examples are great "Hello World" examples, but don't really convey much about the quality / use of outputs. (I realize I'm throwing rocks from a glass house, so, if there is interest from the consortium, let me know, and we can help with this.)

I'll add more notes on the actually software changes in the coming days.

@danielsclint
Copy link

danielsclint commented Aug 20, 2025

Recommendation: Approve with documentation enhancements and example fixes

The Numba performance optimizations deliver impressive results with 2.6x speedup in core balancing algorithms in my tests while maintaining statistical accuracy. Code functionality is good and ready for production. Testing validates both technical implementation and algorithmic consistency across various scenarios.

Key Value Delivered:

  • 2.6x faster execution on core computational bottlenecks
  • Maintained algorithmic precision and convergence characteristics
  • Production-ready performance improvements with minimal trade-offs

The comprehensive testing below reveals valuable insights about system behavior that would benefit from documentation to help future users optimize their workflows.

Testing Environment & Approach

Test Configuration:

  • Geographic Scope: Wyoming statewide (457 zones, complete coverage)
  • Data Scale: 11,812 seed households, 27,799 persons (2022 ACS 5-Year PUMS)
  • Control Variables: 75 demographic controls across household and person dimensions
  • Test Focus: Single-zone processing (Motionworks typical use case)

Note: Motionworks runs PopulationSim per individual block group with 75 control targets rather than multi-level configurations.

Test Results Summary

Test 1: Core Functionality Validation ✅

Objective: Verify system stability and end-to-end pipeline execution

Results: Complete success with excellent performance metrics

  • Runtime: 60.3 seconds total
  • Memory: 1.51 GB peak RSS
  • Convergence: 9.84e-10 precision in 2,577 iterations
  • Processing: 11.8K households + 27.8K persons with 75 controls

Timing Breakdown:

  • Initial Balancing: 48.6s (80% of runtime - primary optimization target)
  • Data Setup: 5.8s
  • Input Processing: 1.8s
  • Integerization: 2.2s
  • Final Steps: 1.9s

Validation: Confirms stable algorithm performance and memory management for production workloads.

Test 2: Numba Performance Impact 🚀

Objective: Quantify performance improvements from Numba JIT compilation

Configuration Changes:

  • Enabled USE_NUMBA: True
  • Precision: float32 (vs Test 1's float64)

Performance Results:

Metric Python Implementation Numba Implementation Improvement
Total Runtime 60.3 seconds 30.6 seconds 2.0x faster
Initial Balancing 48.6 seconds 18.8 seconds 2.6x faster
Memory Usage 1.51 GB RSS 2.0 GB RSS +33% (JIT overhead)
Convergence 2,577 iterations 2,523 iterations Equivalent
Precision 9.8e-10 3.6e-07 Acceptable reduction

Key Finding: Numba delivers substantial performance gains with minimal algorithmic impact. The computational bottleneck (initial balancing) sees 2.6x improvement, halving total processing time.

Additional Insight: Counter-intuitively, testing with float64 and int64 precision showed even better performance (2.2x vs 2.0x) with identical convergence, suggesting optimal Numba configuration should consider 64-bit precision for both speed and accuracy benefits.

Test 3: Monte Carlo Behavior Analysis 🎲

Objective: Assess statistical consistency and randomness across multiple runs

Test Setup: 36 independent runs with different rng_base_seed values (1-36)

Statistical Level Results:

  • Perfect Consistency: All 36 runs produced identical aggregate statistics across all 75 control variables
  • Zero Variation: 0.00% coefficient of variation for all demographic totals
  • Validation Rates: Consistent 86.7% passed, 8.0% warnings, 5.3% failed across runs

Record Level Results:

  • Household Variation: ~70% household overlap between runs despite identical statistics
  • Micro-level Differences: Individual households vary while maintaining perfect aggregate targets
  • Selection Diversity: Each run selects different seed households from the same pool

Assessment: PopulationSim demonstrates ideal Monte Carlo characteristics:

  • ✅ Statistical fidelity preserved across all runs
  • ✅ Controlled micro-level variation for uncertainty analysis
  • ✅ Deterministic reproducibility with fixed seeds
  • ✅ Configurable randomness via rng_base_seed parameter

This validates the system's suitability for both reproducible production use and uncertainty analysis applications.

Test 4: Geographic Boundary Effects 📍

Objective: Determine if additional GEOIDs affect individual zone outputs

Test Setup:

  • Target GEOID: US2020XXBG560019627001 (identical control targets in both runs)
  • Run 1: Single GEOID configuration
  • Run 2: Multi-GEOID configuration (target + 7 additional Wyoming GEOIDs)

Key Finding: Adding additional GEOIDs significantly affects the target GEOID's synthetic population despite identical control targets.

Impact Quantification:

  • Household Overlap: Only 12.6% (103 of 819) shared households between runs
  • Population Variance: 971 vs 966 persons despite identical 969-person target
  • Demographic Shifts: Notable changes in income brackets, age groups, household composition

Selected Control Variations:

Control Variable Target Single GEOID Multi-GEOID Difference
Total Households 461 461 461 ✅ Maintained
Total Persons 969 971 966 ❌ ±5 variance
Income $75-100k 56 57 68 ❌ +12 households
Age 30-39 108 108 132 ❌ +24 persons
Male/Female 503/466 503/468 478/488 ❌ 25-person shift

Root Cause: PopulationSim's initial balancing optimizes across all GEOIDs simultaneously. When additional zones are included, seed households compete globally, resulting in different optimal solutions that satisfy all zone targets collectively.

Architectural Insight: The system lacks zone-level independence - synthetic populations for any geography depend on the broader modeling context. This is algorithmically sound but represents important behavior for users designing geographic scopes.

Technical Observations for Future Enhancement

Configuration Access Patterns

Observation: Mixed usage patterns between config.setting() and settings[] dictionary access within the same functions, particularly in setup_data_structures.py.

Examples:

  • Lines 30, 113, 134: config.setting("geographies")
  • Line 365: settings["geographies"] (same file, different pattern)
  • Line 364: config.setting("seed_geography") despite having settings parameter available

Impact: Potential configuration mismatches if settings parameter differs from global config values.

Pipeline Dependencies

Discovery: The pipeline configuration appears highly flexible but contains undocumented hard dependencies between steps.

Critical Dependencies:

  • Multi-level geography (geographies: [ST,GEOID]) requires sub_balancing step
  • expand_households expects weight tables from previous steps to exist
  • Step sequence cannot be arbitrarily reordered without understanding data flow

Error Example: Commenting out sub_balancing.geography=GEOID causes expand_households to fail with "TypeError: 'NoneType' object is not subscriptable" rather than a meaningful error message.

Documentation Enhancement Opportunities

Based on testing insights, several areas would benefit from documentation additions:

1. Performance Configuration Guide

Document the new Numba-related settings and their performance implications:

# Performance Optimization Settings
USE_NUMBA: True              # Enable JIT compilation (2.6x speedup)
NUMBA_PRECISION: 'int64'     # Numerical precision (int64 fastest in testing)

2. Poorly Documented Configuration Variables

Several settings lack clear documentation:

  • INTEGERIZE_WITH_BACKSTOPPED_CONTROLS: Controls upper geography constraint usage during integerization
  • SUB_BALANCE_WITH_FLOAT_SEED_WEIGHTS: Determines weight type inheritance from upper geography
  • GROUP_BY_INCIDENCE_SIGNATURE: Enables household grouping optimization for performance
  • USE_SIMUL_INTEGERIZER: Simultaneous vs sequential integerization modes
  • max_expansion_factor: Limits household weight expansion to prevent over-representation

3. Monte Carlo Behavior Guide

Explain the rng_base_seed parameter and its effects:

  • Statistical consistency across runs
  • Micro-level household variation
  • Applications for uncertainty analysis

4. Geographic Boundary Effects

Document how geographic scope affects individual zone results:

  • Zone outputs depend on broader modeling context
  • Implications for reproducibility across different geographic scopes
  • Considerations for modeling strategy design

5. Pipeline Step Dependencies

Create explicit dependency documentation:

  • Required step sequences for different geographic configurations
  • Clear error messages for configuration validation
  • Template configurations for common use cases

Conclusions & Recommendations

Primary Recommendation: Approve PR #192 - the Numba optimizations provide substantial performance benefits with maintained algorithmic integrity.

Secondary Recommendation: Enhance documentation to capture the valuable behavioral insights revealed during testing. This would help future users:

  • Optimize their performance configurations
  • Understand system behavior for their specific use cases
  • Design appropriate geographic modeling strategies
  • Troubleshoot configuration issues effectively

Offer: I'm happy to contribute documentation updates that incorporate these testing findings - either as an addendum to this PR or as a separate documentation enhancement PR.

The performance improvements are excellent and the code is production-ready. The testing insights simply provide an opportunity to make this valuable functionality even more accessible to the user community.

@nick-fournier-rsg
Copy link
Author

nick-fournier-rsg commented Aug 25, 2025

Thanks @j-murray1 @danielsclint for the review!

I've done my best to distill the above and the items from the word doc to populate the PopulationSim development priority board @jpn-- @dhensle

I estimated the level of effort with this rough framework in mind:

  • low ~ 1-4 hours
  • medium ~ day or more
  • high ~ week or more

I left the priority fields blank to be determined.

As you all can see I did not touch the documentation, so I think that seems like the minimum place to start. I also definitely welcome any help, documentation or otherwise (@danielsclint ). We can sort out a clean way to collaborate, separate PRs or otherwise.

@jpn--
Copy link
Member

jpn-- commented Aug 28, 2025

Here is the "full text" of TTI's review.
PopulationSim PR Review.docx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants