Skip to content

Commit 47ece66

Browse files
Ben StablerjamiecookBlake RosenthalbstablerLeah Flake
authored
publish v0.5.1 (#141)
* update package version number as well * Allow non-binary incidence (#123) * Allow non-binary incidence * style * update tests to pass * add some progress indication * tidy up validation script, use histogram for a histogram * fix render and some typos * increment version * deprecate py2.7 * Multiprocess (#130) * [Bugfix] Allow seed and meta geography to be the same (#139) * Fixes bug where if the seed geography is the same as the meta_geography, pandas has a small panic attack and the run will fail. * add cytoolz to the "requirements" * fix another activitysim change * Absolute bounds (#136) * adding upper/lower bounds to weighting use case * #137, #134, #133, #131 Co-authored-by: Jamie Cook <[email protected]> Co-authored-by: Blake Rosenthal <[email protected]> Co-authored-by: Ben Stabler <[email protected]> Co-authored-by: Leah Flake <[email protected]>
1 parent b664d22 commit 47ece66

File tree

12 files changed

+70
-48
lines changed

12 files changed

+70
-48
lines changed

.travis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ install:
1616
- conda info -a
1717
- conda create -q -n test-environment python=$TRAVIS_PYTHON_VERSION
1818
- conda activate test-environment
19-
- conda install pytest pytest-cov coveralls pycodestyle
19+
- conda install pytest pytest-cov coveralls pycodestyle cytoolz
2020
- pip install .
2121
- pip freeze
2222

docs/application_configuration.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -320,7 +320,7 @@ These settings control the functionality of the PopulationSim algorithm. The set
320320
| | | The maximum expansion factor may have to be adjusted upwards if the target |br| |
321321
| | | is much greater than the seed number of households. |br| |
322322
+--------------------------------------+------------+---------------------------------------------------------------------------------+
323-
| MAX_BALANCE_ITERATIONS_SIMULTANEOUS | Integer | Number of simultaneous list balancer iterations |
323+
| MAX_BALANCE_ITERATIONS_SIMULTANEOUS | Integer | Number of list balancer iterations. The default may be more than is needed. |
324324
+--------------------------------------+------------+---------------------------------------------------------------------------------+
325325

326326

@@ -693,7 +693,7 @@ This sections describes the settings that are configured differently for the *re
693693

694694
**Input Data Tables for repop mode**
695695

696-
The repop mode runs over an existing synthetic population and uses the data pipeline (HDF5 file) from the regular run as an input. User should copy the HDF5 file from the regular outputs to the *output* folder of the repop set up. The data input which needs to be specified in this setting is the control data for the subset of geographies to be modified. Input tables for the repop mode can be specified in the same manner as regular mode. However, only one geography can be controlled. In the example below, TAZ controls are specified. The controls specified in TAZ_control_data do not have to be consistent with the controls specified in the data used to control the initial population. Only those geographic units to be repopulated should be specified in the control data (for example, TAZs 314 through 317).
696+
The repop mode runs over an existing synthetic population and uses the data pipeline (HDF5 file) from the regular run as an input. User should copy the HDF5 file from the regular outputs to the *output* folder of the repop set up. The data input which needs to be specified in this setting is the control data for the subset of geographies to be modified. Input tables for the repop mode can be specified in the same manner as regular mode. However, only one geography can be controlled and the geography must be the lowest in "geographies" setting. In the example below, TAZ controls are specified. The controls specified in TAZ_control_data do not have to be consistent with the controls specified in the data used to control the initial population. Only those geographic units to be repopulated should be specified in the control data (for example, TAZs 314 through 317).
697697

698698
::
699699

@@ -713,6 +713,7 @@ The repop mode runs over an existing synthetic population and uses the data pipe
713713
| Attribute | Description |
714714
+===========================+=============================================================+
715715
| repop_control_file_name | Name of the CSV control specification file for repop mode |
716+
| | Must include total_hh_control field |
716717
+---------------------------+-------------------------------------------------------------+
717718

718719

docs/getting_started.rst

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,13 @@ This page describes how to install and run PopulationSim with the provided examp
1212
Installation
1313
------------
1414

15-
1. Install `Anaconda 64bit Python 3 <https://www.anaconda.com/distribution/>`__. Anaconda Python is required for PopulationSim.
15+
1. It is recommended that you install and use a *conda* package manager
16+
for your system. One easy way to do so is by using `Anaconda 64bit Python 3 <https://www.anaconda.com/distribution/>`__,
17+
although you should consult the `terms of service <https://www.anaconda.com/terms-of-service>`__
18+
for this product and ensure you qualify (as of summer 2021, businesses and
19+
governments with over 200 employees do not qualify for free usage). If you prefer
20+
a completely free open source *conda* tool, you can download and install the
21+
appropriate version of `Miniforge <https://github.com/conda-forge/miniforge#miniforge3>`__.
1622

1723
2. If you access the internet from behind a firewall, then you will need to configure your proxy server. To do so, create a .condarc file in your Anaconda installation folder (i.e. ``C:\ProgramData\Anaconda3``), such as:
1824

@@ -62,7 +68,7 @@ ActivitySim
6268
ActivitySim depends + some handy Python installation management tools.
6369

6470
For more information on Anaconda and ActivitySim, see ActivitySim's `getting started
65-
<https://activitysim.github.io/activitysim/gettingstarted.html#anaconda>`__ guide.
71+
<https://activitysim.github.io/activitysim/gettingstarted.html>`__ guide.
6672

6773

6874
Run Examples

docs/software.rst

Lines changed: 0 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -224,18 +224,3 @@ Contribution Guidelines
224224

225225
PopulationSim development follows the same `development guidelines <https://activitysim.github.io/activitysim/development.html>`__ as ActivitySim.
226226

227-
228-
Release Notes
229-
-------------
230-
231-
* v0.3 - first release
232-
* v0.3.1 - allow zones with zero households
233-
* v0.3.2 - fix bug in mult-integerizer with total_hh_parent_control_index
234-
* v0.3.3 - add disgnostic printouts on assert fail in mult_integerizer
235-
* v0.3.4 - add survey weighting use case
236-
* v0.3.5 - add Python 3.5+ support
237-
* v0.4 - transfer to ActivitySim.org
238-
* v0.4.1 - package updates
239-
* v0.4.2 - validation script in Python
240-
* v0.4.3 - allow non-binary incidence
241-
* v0.5 - support for multiprocessing

example_survey_weighting/configs/settings.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,8 @@ USE_SIMUL_INTEGERIZER: True
1818
USE_CVXPY: False
1919
max_expansion_factor: 4 # Default is 30
2020
min_expansion_factor: 0.5
21-
21+
absolute_upper_bounds: 20000
22+
absolute_lower_bounds: 1
2223

2324
# Geographic Settings
2425
# ------------------------------------------------------------------

populationsim/balancer.py

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -242,6 +242,7 @@ def np_balancer(
242242
def do_balancing(control_spec,
243243
total_hh_control_col,
244244
max_expansion_factor, min_expansion_factor,
245+
absolute_upper_bound, absolute_lower_bound,
245246
incidence_df, control_totals, initial_weights):
246247

247248
# incidence table should only have control columns
@@ -262,14 +263,21 @@ def do_balancing(control_spec,
262263

263264
if min_expansion_factor:
264265

265-
# number_of_households in this seed geograpy as specified in seed_controlss
266+
# number_of_households in this seed geograpy as specified in seed_controls
266267
number_of_households = control_totals[total_hh_control_index]
267268

268269
total_weights = initial_weights.sum()
269270
lb_ratio = min_expansion_factor * float(number_of_households) / float(total_weights)
270271

271272
lb_weights = initial_weights * lb_ratio
272-
lb_weights = lb_weights.clip(lower=0)
273+
274+
if absolute_lower_bound:
275+
lb_weights = lb_weights.clip(lower=absolute_lower_bound)
276+
else:
277+
lb_weights = lb_weights.clip(lower=0)
278+
279+
elif absolute_lower_bound:
280+
lb_weights = initial_weights.clip(lower=absolute_lower_bound)
273281

274282
else:
275283
lb_weights = None
@@ -283,7 +291,14 @@ def do_balancing(control_spec,
283291
ub_ratio = max_expansion_factor * float(number_of_households) / float(total_weights)
284292

285293
ub_weights = initial_weights * ub_ratio
286-
ub_weights = ub_weights.round().clip(lower=1).astype(int)
294+
295+
if absolute_upper_bound:
296+
ub_weights = ub_weights.round().clip(upper=absolute_upper_bound, lower=1).astype(int)
297+
else:
298+
ub_weights = ub_weights.round().clip(lower=1).astype(int)
299+
300+
elif absolute_upper_bound:
301+
ub_weights = ub_weights.round().clip(upper=absolute_upper_bound, lower=1).astype(int)
287302

288303
else:
289304
ub_weights = None

populationsim/steps/final_seed_balancing.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,8 @@ def final_seed_balancing(settings, crosswalk, control_spec, incidence_table):
6868

6969
max_expansion_factor = settings.get('max_expansion_factor', None)
7070
min_expansion_factor = settings.get('min_expansion_factor', None)
71+
absolute_upper_bound = settings.get('absolute_upper_bound', None)
72+
absolute_lower_bound = settings.get('absolute_lower_bound', None)
7173

7274
relaxation_factors = pd.DataFrame(index=seed_controls_df.columns.tolist())
7375

@@ -86,6 +88,8 @@ def final_seed_balancing(settings, crosswalk, control_spec, incidence_table):
8688
total_hh_control_col=total_hh_control_col,
8789
max_expansion_factor=max_expansion_factor,
8890
min_expansion_factor=min_expansion_factor,
91+
absolute_lower_bound=absolute_lower_bound,
92+
absolute_upper_bound=absolute_upper_bound,
8993
incidence_df=seed_incidence_df,
9094
control_totals=seed_controls_df.loc[seed_id],
9195
initial_weights=seed_incidence_df['sample_weight'])

populationsim/steps/initial_seed_balancing.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,8 @@ def initial_seed_balancing(settings, crosswalk, control_spec, incidence_table):
6565

6666
max_expansion_factor = settings.get('max_expansion_factor', None)
6767
min_expansion_factor = settings.get('min_expansion_factor', None)
68+
absolute_upper_bound = settings.get('absolute_upper_bound', None)
69+
absolute_lower_bound = settings.get('absolute_lower_bound', None)
6870

6971
# run balancer for each seed geography
7072
weight_list = []
@@ -82,6 +84,8 @@ def initial_seed_balancing(settings, crosswalk, control_spec, incidence_table):
8284
total_hh_control_col=total_hh_control_col,
8385
max_expansion_factor=max_expansion_factor,
8486
min_expansion_factor=min_expansion_factor,
87+
absolute_upper_bound=absolute_upper_bound,
88+
absolute_lower_bound=absolute_lower_bound,
8589
incidence_df=seed_incidence_df,
8690
control_totals=seed_controls_df.loc[seed_id],
8791
initial_weights=seed_incidence_df['sample_weight'])

populationsim/steps/repop_balancing.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,8 @@ def repop_balancing(settings, crosswalk, control_spec, incidence_table):
6060

6161
max_expansion_factor = settings.get('max_expansion_factor', None)
6262
min_expansion_factor = settings.get('min_expansion_factor', None)
63+
absolute_upper_bound = settings.get('absolute_upper_bound', None)
64+
absolute_lower_bound = settings.get('absolute_lower_bound', None)
6365

6466
# run balancer for each low geography
6567
low_weight_list = []
@@ -101,6 +103,8 @@ def repop_balancing(settings, crosswalk, control_spec, incidence_table):
101103
total_hh_control_col=total_hh_control_col,
102104
max_expansion_factor=max_expansion_factor,
103105
min_expansion_factor=min_expansion_factor,
106+
absolute_upper_bound=absolute_upper_bound,
107+
absolute_lower_bound=absolute_lower_bound,
104108
incidence_df=seed_incidence_df,
105109
control_totals=low_controls_df.loc[low_id],
106110
initial_weights=initial_weights)

populationsim/steps/setup_data_structures.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -111,11 +111,11 @@ def add_geography_columns(incidence_table, households_df, crosswalk_df):
111111
# add seed_geography col to incidence table
112112
incidence_table[seed_geography] = households_df[seed_geography]
113113

114-
# add meta column to incidence table
115-
seed_to_meta = \
116-
crosswalk_df[[seed_geography, meta_geography]] \
117-
.groupby(seed_geography, as_index=True).min()[meta_geography]
118-
incidence_table[meta_geography] = incidence_table[seed_geography].map(seed_to_meta)
114+
# add meta column to incidence table (unless it's already there)
115+
if seed_geography != meta_geography:
116+
tmp = crosswalk_df[list({seed_geography, meta_geography})]
117+
seed_to_meta = tmp.groupby(seed_geography, as_index=True).min()[meta_geography]
118+
incidence_table[meta_geography] = incidence_table[seed_geography].map(seed_to_meta)
119119

120120
return incidence_table
121121

0 commit comments

Comments
 (0)