Skip to content

Add baseline validation tool to resstockpostproc#1529

Open
rajeee wants to merge 226 commits intodevelopfrom
baseline_validation
Open

Add baseline validation tool to resstockpostproc#1529
rajeee wants to merge 226 commits intodevelopfrom
baseline_validation

Conversation

@rajeee
Copy link
Copy Markdown
Contributor

@rajeee rajeee commented Nov 5, 2025

Pull Request Description

This tool will compare ResStock baseline run with various truth data sources such as EIA 861, EIA 176, RECS, LRD and others and generate comparison graphics and dashboard.

image

See more details here in README.

Live dashboard is here.

Validation test on CI

The baseline-validation CI job runs python resstockpostproc/baseline_validation/main.py --test --no-parallel end-to-end on every PR. This exercises the full pipeline — BSQ/Athena queries, RECS/EIA/LRD truth-data loads, RECS uncertainty computation, and plot rendering for the test subset — and uploads the generated dashboard tree as a CI artifact (baseline-validation-plots.tar.gz) for reviewer inspection. It only contains a small fraction of the plots compared to when doing the full run without --test flag.

The test run hits real AWS sources (Athena in the rescore workgroup, s3://resstock-core/ for truth data, s3://oedi-data-lake/ for ResStock baseline parquets), but the data exchanged is small (~100 MB from the paid resstock-core bucket; the rest is free OEDI egress), so the AWS cost per CI run is negligible (roughly $0.01 or less).

Checklist

Required:

Optional (not all items may apply):

This comment was marked as low quality.

rajeee added 8 commits March 18, 2026 10:48
Delete unused plotters (eia_plotter, base_plotter, timeseries_plotter,
monthly_plotter), empty stubs (eiaid_mapping), and obsolete tests
(test_eia_viz). Fix test_data column assertions to match current
DataCol-normalized output names. Uncomment utility_vertical rename
in lrd_plotter for day_of_year resolution.
GitHub Action and others added 3 commits April 21, 2026 23:00
- README: match upgrade_comparison's declarative tone; cross-reference it
- Add thorough field-by-field Configuration section; remove duplicate
  Configuration Details block that repeated the Input YAML
- Replace 'Plot Types' bullet lists with a link to plot_showcase.md
- main.py: --help description updated to match
- New plot_showcase.md gallery covers 9 plot families with 36 images
  (dashboard + EIA + RECS + cross-filter dims + box plots + histograms
  + difference views + LRD)
- All images constrained via <img width> so single-entity plots don't
  dominate the page (400 px), with 700 px for portrait histograms and
  900 px for multi-panel tilemaps
- Exclude .svg and .png from typos pre-commit hook; Plotly-generated
  SVGs have random short substrings (e.g. 'daa') in internal IDs that
  aren't real typos
GitHub auto-renders a README.md when the user navigates to a directory,
so clicking into example_plots/ in the GitHub web UI now shows the
gallery directly without needing a sibling file. Strip the
'example_plots/' prefix from all in-gallery image paths since they're
now relative to the gallery itself, and point the top-level README's
link at the new location.
@rajeee rajeee requested a review from asparke2 April 22, 2026 00:03
- Remove unused 'workflow: WorkflowConfig' parameter from all 5 plotters
  (bar, box, heatmap, histogram, choropleth). The param was added but
  never read in any plotter body; two of its three callers
  (all_plots_generator, highlights_generator) were never updated to
  pass it, which silently broke those paths.
- Update the dashboard caller to drop the 'workflow=run_workflow' kwarg
  accordingly so all 3 call sites use a uniform 2-arg signature.
- Revert the default port change (8052 -> 8051) in dynamic_dashboard.py
  — a personal-preference drive-by unrelated to the shared refactor.

All other upgrade_comparison changes on this branch (theme/colors/
end_use_dicts moves to shared_utils, local plotter-helper dedup,
diagonal -> diagonal_relaxed in input_manager) are legitimate and
retained.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new baseline validation tool under resstockpostproc to compare ResStock baseline runs against external datasets (EIA, RECS, LRD, etc.), generate plots, and assemble a browsable dashboard. The PR also introduces shared utilities (caching, S3 helpers, generic plotters) and refactors parts of upgrade_comparison to reuse them.

Changes:

  • Add baseline_validation package: schema, data loading/processing, plotters, dashboard generation, and extensive pytest coverage.
  • Add shared utilities (shared_utils) for caching, S3 downloads, sorting/mapping/colors, and reusable plotter helpers.
  • Refactor upgrade_comparison plotters/schema to use shared plot theme and shared end-use dictionaries.

Reviewed changes

Copilot reviewed 43 out of 157 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
postprocessing/tests/test_caching.py Adds unit tests for new disk-cache root and @cached decorator behavior.
postprocessing/ruff.toml Updates Ruff ignores and adds E501 per-file ignores for HTML/JS-heavy dashboard modules.
postprocessing/resstockpostproc/upgrade_comparison/schema/workflow_schema.py Switches end-use dict import to shared_utils location.
postprocessing/resstockpostproc/upgrade_comparison/plotters/histogram_plotter.py Uses shared theme and new palette builder.
postprocessing/resstockpostproc/upgrade_comparison/plotters/heatmap_plotter.py Uses shared theme module.
postprocessing/resstockpostproc/upgrade_comparison/plotters/choropleth_plotter.py Uses shared theme module.
postprocessing/resstockpostproc/upgrade_comparison/plotters/box_plotter.py Replaces local box plot implementation with shared generic box plotter.
postprocessing/resstockpostproc/upgrade_comparison/plotters/bar_plotter.py Replaces local bar plot implementation with shared generic bar plotter.
postprocessing/resstockpostproc/upgrade_comparison/io_managers/input_manager.py Changes Polars concat mode to diagonal_relaxed.
postprocessing/resstockpostproc/upgrade_comparison/data_processing/data_processor.py Switches end-use dict import to shared_utils location.
postprocessing/resstockpostproc/upgrade_comparison/dashboard/callbacks/plotting.py Calls plotting function with keyword arg for plot_spec.
postprocessing/resstockpostproc/upgrade_comparison/.gitignore Ignores sdr_plots/ output folder for this submodule.
postprocessing/resstockpostproc/shared_utils/sorting.py Adds shared “human sort” helper with custom sort orders for dashboard labels.
postprocessing/resstockpostproc/shared_utils/s3_manager.py Adds shared S3 download + local caching utilities.
postprocessing/resstockpostproc/shared_utils/mapping.py Adds shared month/state/utility mappings used by baseline validation.
postprocessing/resstockpostproc/shared_utils/generic_plotters/theme.py Refactors theme to use shared color constants and adds category palette logic.
postprocessing/resstockpostproc/shared_utils/generic_plotters/range_utils.py Adds shared axis range computation helper.
postprocessing/resstockpostproc/shared_utils/generic_plotters/hover_formatting.py Adds shared hover formatting utilities (compact/precise/CI/count).
postprocessing/resstockpostproc/shared_utils/generic_plotters/box_plotter.py Adds shared Plotly box/violin plot renderer used by baseline validation and upgrade comparison.
postprocessing/resstockpostproc/shared_utils/colors.py Adds shared qualitative color series and fuel color mapping.
postprocessing/resstockpostproc/shared_utils/caching.py Adds disk-backed caching decorator and repo-local cache root.
postprocessing/resstockpostproc/shared_utils/init.py Introduces shared_utils package marker/docstring.
postprocessing/resstockpostproc/baseline_validation/workflow.yaml Adds baseline validation workflow config example (data sources, labels, outputs).
postprocessing/resstockpostproc/baseline_validation/tests/test_workflow_schema_histogram.py Tests workflow schema path inference and download behavior for histogram fast path.
postprocessing/resstockpostproc/baseline_validation/tests/test_utils.py Tests utility helpers (add_us_total, add_missing_states, apply_aggregation).
postprocessing/resstockpostproc/baseline_validation/tests/test_template_expansion.py Tests template expansion and work-item generation behavior.
postprocessing/resstockpostproc/baseline_validation/tests/test_rse_column_resolution.py Tests bound-column resolution logic for CI/RSE columns.
postprocessing/resstockpostproc/baseline_validation/tests/test_resstock_annual_fast_path.py Tests raw-parquet “fast path” vs Athena fallback for ResStock annual data.
postprocessing/resstockpostproc/baseline_validation/tests/test_recs_rse.py Tests RECS replicate-weight CI implementation (log-normal bounds) and batching.
postprocessing/resstockpostproc/baseline_validation/tests/test_range_utils.py Tests shared axis range helper for raw/bounded data.
postprocessing/resstockpostproc/baseline_validation/tests/test_plot_spec.py Tests PlotSpec validators and label/display formatting rules.
postprocessing/resstockpostproc/baseline_validation/tests/test_main.py Tests CLI entry behavior and rejection of removed flags.
postprocessing/resstockpostproc/baseline_validation/tests/test_html_utils.py Tests Plotly HTML post-processing including CDN/local fallback behavior.
postprocessing/resstockpostproc/baseline_validation/tests/test_get_recs_data_bounds.py Tests RECS loader helpers for producing uncertainty bounds.
postprocessing/resstockpostproc/baseline_validation/tests/test_gather_data.py Tests data routing for histogram pipeline and bounds column filtering.
postprocessing/resstockpostproc/baseline_validation/tests/test_footnotes.py Tests generated plot/table footnotes and coverage notes.
postprocessing/resstockpostproc/baseline_validation/tests/test_error_bars.py Tests error bar computation for bar plotter bounds.
postprocessing/resstockpostproc/baseline_validation/tests/test_data.py Tests basic loading for EIA/LRD reference datasets.
postprocessing/resstockpostproc/baseline_validation/tests/test_create_html_metric_order.py Tests metric ordering used in HTML viewer.
postprocessing/resstockpostproc/baseline_validation/tests/test_create_html_filter_pair_tabs.py Tests HTML viewer Filter 1/2 tab and swap behavior.
postprocessing/resstockpostproc/baseline_validation/tests/test_create_html.py Tests sharded HTML index generation and viz cell parsing.
postprocessing/resstockpostproc/baseline_validation/tests/_helpers.py Adds shared PlotSpec builders for tests.
postprocessing/resstockpostproc/baseline_validation/schema/workflow_schema.py Adds Pydantic workflow schema and ResStock parquet download helpers.
postprocessing/resstockpostproc/baseline_validation/schema/recs_chars_mapping.py Adds RECS↔ResStock characteristic mapping for grouping/filtering.
postprocessing/resstockpostproc/baseline_validation/plotters/stacked_plotter.py Adds stacked plot orchestrator (bar/box/histogram) with range logic and splitting.
postprocessing/resstockpostproc/baseline_validation/plotters/main_plotter.py Adds main plot dispatch across tilemap/stacked/single-entity renderers.
postprocessing/resstockpostproc/baseline_validation/plotters/lrd_plotter.py Adds LRD-specific plotting behaviors across multiple resolutions/views.
postprocessing/resstockpostproc/baseline_validation/plotters/box_plot_data.py Adds quartile extraction + box plot dataframe preparation.
postprocessing/resstockpostproc/baseline_validation/plot_helpers/utils.py Adds baseline validation helper utilities (month/season handling, BSQ creation).
postprocessing/resstockpostproc/baseline_validation/plot_helpers/theme.py Adds baseline-validation-specific Plotly theming and template registration.
postprocessing/resstockpostproc/baseline_validation/plot_helpers/resstock_raw.py Adds raw-parquet column resolution for schema variants and quantity mappings.
postprocessing/resstockpostproc/baseline_validation/plot_helpers/plot_semantics.py Adds shared semantic helpers (units, timeseries col, source labels, quartile semantics).
postprocessing/resstockpostproc/baseline_validation/plot_helpers/footnotes.py Adds note generation for plots/tables (RECS CI, RSE, penetration notes, histogram note).
postprocessing/resstockpostproc/baseline_validation/main.py Adds baseline validation CLI entry point.
postprocessing/resstockpostproc/baseline_validation/io_managers/utils.py Adds aggregation helpers and state/US-total normalization for loaded data.
postprocessing/resstockpostproc/baseline_validation/io_managers/stats.py Adds weighted quantile helper used in annual aggregations.
postprocessing/resstockpostproc/baseline_validation/io_managers/output_manager.py Adds figure/data saving logic incl. Plotly bundle vendoring and HTML postprocessing.
postprocessing/resstockpostproc/baseline_validation/io_managers/get_lrd_data.py Adds LRD loader, caching, aggregation by resolution, and mapping to eiaid.
postprocessing/resstockpostproc/baseline_validation/io_managers/data_table_columns.py Adds table column config builder and humanized labels for dashboard tables.
postprocessing/resstockpostproc/baseline_validation/io_managers/comparison_data_paths.py Adds S3 paths to EIA/LRD/RECS truth datasets.
postprocessing/resstockpostproc/baseline_validation/generation/index_rows.py Adds index-row assembly helpers (incl. LRD sidebar semantics normalization).
postprocessing/resstockpostproc/baseline_validation/generation/init.py Adds generation package marker file.
postprocessing/resstockpostproc/baseline_validation/example_plots/recs_electricity_ustotal_difference.svg Adds example SVG output artifact.
postprocessing/resstockpostproc/baseline_validation/example_plots/eia_electricity_perunit_difference_ustotal.svg Adds example SVG output artifact.
postprocessing/resstockpostproc/baseline_validation/data_scraping/scrap_eia861M.py Adds EIA 861M scraping script for monthly sales.
postprocessing/resstockpostproc/baseline_validation/data_scraping/scrap_eia861.py Adds EIA 861 annual scraping script for sales/territories.
postprocessing/resstockpostproc/baseline_validation/data_scraping/scrap_eia176.py Adds EIA 176 scraping/conversion script for gas consumption/heat content.
postprocessing/resstockpostproc/baseline_validation/data_processing/recs_rse.py Adds RECS replicate-weight CI calculation helpers (single + batch).
postprocessing/resstockpostproc/baseline_validation/data_processing/recs_mapping.py Adds mapping helpers for RECS columns/enduses/chars into internal schema.
postprocessing/resstockpostproc/baseline_validation/data_processing/metrics.py Adds metric computations (e.g., per-source MAPE) over assembled plot data.
postprocessing/resstockpostproc/baseline_validation/data_processing/histogram_data.py Adds exact histogram pipeline using raw ResStock parquet + RECS microdata.
postprocessing/resstockpostproc/baseline_validation/data_processing/dataset_adapters.py Adds dataset adapters to pair reference + ResStock data and define join keys.
postprocessing/resstockpostproc/baseline_validation/dashboard/dashboard_paths.py Adds path helpers for dashboard layout and dataset output directories.
postprocessing/pyproject.toml Adds baseline-validation dependencies (excel, playwright, pyathena, etc.).
postprocessing/.typos.toml Adds domain acronyms to typos allowlist and adjusts ignore patterns.
postprocessing/.pre-commit-config.yaml Extends pre-commit scope to baseline_validation and updates exclusions.
postprocessing/.gitignore Ignores baseline_validation scripts directory under postprocessing.
.gitignore Adds .history/, .cache/, and test_output/ ignores.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +141 to +152
@classmethod
def from_yaml(cls, yaml_path: str | Path) -> WorkflowConfig:
with open(yaml_path) as f:
config_dict = yaml.safe_load(f)
return cls.model_validate(config_dict)

def get_names_of_data_sources(self) -> list[str]:
return [ds.name for ds in self.data_sources]


workflow_config_path = Path(__file__).parent.parent / "workflow.yaml"
workflow = WorkflowConfig.from_yaml(workflow_config_path)
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module performs file I/O at import time by loading workflow.yaml and instantiating a global workflow. Import-time side effects make it hard to override config (e.g., via CLI --config), complicate tests, and can raise exceptions on import if the YAML is missing/invalid. Prefer moving the load into the CLI entry point (or a load_workflow(path) function) and keeping this module limited to schema definitions.

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +28
@lru_cache(maxsize=None)
def get_df_from_s3(full_s3_path, cache_dir: Path | None = None) -> pl.DataFrame:
"""Download (if needed) and read an S3 file as a Polars DataFrame.

Results are cached in-memory so repeated calls with the same arguments
skip the S3 HEAD check, local MD5 computation, and disk read entirely.
"""
s3bucket, s3path = full_s3_path.replace("s3://", "").split("/", 1)
local_path = cache_dir / s3path
if not _is_file_same(s3bucket, s3path, local_path):
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_df_from_s3(..., cache_dir: Path | None = None) will raise a TypeError when cache_dir is left as None because local_path = cache_dir / s3path is unconditional. Either make cache_dir required, or provide a sensible default (and/or raise an actionable ValueError when it is not provided). Also consider validating that full_s3_path starts with s3:// (as download_s3_file does) to avoid confusing split errors.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +5
from typing import Dict


NUM2MONTH = {
1: "JAN",
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from typing import Dict is unused and will trigger unused-import linting. Remove the import (or use it for the exported constants’ type annotations if needed).

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +19
def test_cached_decorator_creates_cache_under_postprocessing():
cache_name = "unit_test_cache_location"

@caching.cached(cache_name)
def cached_value(x: int) -> int:
return x + 1

assert cached_value(1) == 2
assert (caching.CACHE_ROOT / cache_name).is_dir()
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test writes cache artifacts into the repo-level .cache directory and doesn’t clean them up. That makes the test suite stateful across runs and can interfere with parallel testing. Prefer using tmp_path and monkeypatching caching.CACHE_ROOT (or providing a way to override it) so the test is isolated and leaves no persistent files behind.

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +67
def _is_file_same(bucket, s3_key, local_path):
if not local_path.exists():
return False
client = _get_s3_client()
local_md5 = _calculate_md5(local_path)
s3_metadata = client.head_object(Bucket=bucket, Key=s3_key)
s3_etag = s3_metadata["ETag"].strip('"')
return local_md5 == s3_etag
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For multipart-uploaded S3 objects, the ETag is not a plain MD5 of the object content, so comparing a locally computed MD5 to head_object()['ETag'] can incorrectly report mismatches and trigger unnecessary re-downloads. Consider using ContentLength + LastModified, storing a known checksum in object metadata, or handling multipart ETags (the -<parts> suffix) explicitly.

Copilot uses AI. Check for mistakes.
df_heat_content_monthly = pd.read_excel(raw_path, sheet_name="Data 1", skiprows=2, skipfooter=1, na_values=["."])

annul_heatcontent_file = "NG_CONS_HEAT_A_EPG0_VGTH_BTUCF_A.xls"
raw_path = eia861_raw_path / monthly_heatcontent_file
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: the annual heat-content download writes into the monthly filename (raw_path = ... / monthly_heatcontent_file). This overwrites the monthly file and also means the annual file is never cached under its own name. Use annul_heatcontent_file (or rename the variable to annual_heatcontent_file) when constructing raw_path.

Suggested change
raw_path = eia861_raw_path / monthly_heatcontent_file
raw_path = eia861_raw_path / annul_heatcontent_file

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +39
from resstockpostproc.baseline_validation.generation.plot_generator import generate_plots
from resstockpostproc.baseline_validation.schema.workflow_schema import workflow


def main() -> int:
"""Main entry point for ResStock comparison plot generation."""
parser = argparse.ArgumentParser(
description=(
"Generate comparison graphics and data between a ResStock baseline and other data sources (EIA, RECS, LRD)."
),
)

parser.add_argument(
"--config",
type=str,
default=str(Path(__file__).parent / "workflow.yaml"),
help="Path to workflow configuration YAML file (default: workflow.yaml in script directory)",
)

args = parser.parse_args()

config_path = Path(args.config)
if not config_path.exists():
print(f"Error: Configuration file not found at {config_path}")
return 1

workflow.ensure_resstock_data_files()
generate_plots()
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--config is validated for existence but never actually used to load a workflow; the CLI always uses the module-level workflow imported from schema.workflow_schema, and generate_plots() is invoked without the parsed config. This makes --config misleading and prevents running multiple configs without editing the default YAML. Load the config from config_path (e.g., WorkflowConfig.from_yaml) and either pass it into generate_plots(...)/downstream code or set the active workflow based on the CLI arg before calling ensure_resstock_data_files().

Copilot uses AI. Check for mistakes.
Comment on lines +25 to +27
output:
output_dir: /Users/radhikar/Documents/buildstock2025/
run_name: baseline_val_test_2024_2025_final_test
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checked-in default config hard-codes a user-specific absolute output_dir ("/Users/..."), which will not exist for most users/CI and makes the example config non-portable. Consider using a relative path, a placeholder (e.g. ./test_output), or documenting that users must override this field (and keeping the committed value generic).

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +2
QUALITATIVE_SERIES = [ # from light series in https://sronpersonalpages.nl/~pault/
"#77AADD", # light blude
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: “light blude” should be “light blue”.

Copilot uses AI. Check for mistakes.
rajeee added 19 commits April 21, 2026 20:34
- main.py: drop --config flag; workflow.yaml next to main.py is the
  single source of config. README Running-the-Generator section and
  tests/test_main.py updated (the latter deleted; its removed-flags
  regression check isn't worth carrying for a first-release tool).
- data_scraping/scrap_eia176.py: fix the annual-heat-content download
  writing into the monthly filename (Copilot #16). Also rename the
  'annul_' variable to 'annual_'.
- workflow.yaml: replace hardcoded /Users/radhikar/... output_dir with
  a portable ~/baseline_validation_outputs placeholder (Copilot #18).
- shared_utils/colors.py: 'light blude' -> 'light blue' (Copilot #19).
- shared_utils/mapping.py: drop unused 'from typing import Dict'
  (Copilot #13).
- shared_utils/db_column_names.py: fix OEDI-schema column-name typo
  'out.outdoor_air_dryblub_temp.c' -> 'drybulb' to match the enum name
  OUTDOOR_DRYBULB_TEMP and the sibling OEDI_NEW entry.
Verified against live Athena (workgroup=rescore): the column in
buildstock_sdr.resstock_2024_amy2018_release_2_by_state_vu is literally
named 'out.outdoor_air_dryblub_temp.c' (upstream typo). The previous
commit corrected the spelling to match the enum name, which would have
broken LRD plots against OEDI_VU tables — good catch from Rajendra.

Added a comment flagging that the typo tracks an upstream schema
choice so future reviewers (Copilot, humans) don't 'fix' it again.
- save_figure(formats=[FileType.html]) used a mutable list default
  evaluated once at function-definition time. Switch to the standard
  None-default + body-side fallback pattern.
- Remove three unused helpers from plot_helpers/utils.py:
    filter_by_season, add_month_column, format_large_number,
  plus their supporting NUM2MONTH and SEASON2MONTHS dicts.
  All have zero external references. The local NUM2MONTH used full
  month names while the canonical shared_utils.mapping.NUM2MONTH used
  3-letter abbreviations — the loaded gun is gone.
cache_dir was annotated as Optional with a None default but the body
unconditionally used 'cache_dir / s3path', so calling with the default
crashed with TypeError. Drop the Optional — every caller already passes
a real Path, so make the contract honest.

Also validate that full_s3_path starts with 's3://' before stripping;
the previous .replace('s3://', '') silently no-op'd on non-s3 inputs
and produced a confusing split error downstream. Mirrors what
download_s3_file() already does.
- bar_plotter.py: drop dead xtitle/ytitle assignments (4 sites — lint
  F841/RUF059); dict() call -> {} literal (C408); add noqa to the
  second_category_title param that's accepted for API uniformity but
  never read (ARG001, same rationale as tilemap_plotter).
- hover_formatting.py: int(math.floor(x)) -> math.floor(x) (RUF046 —
  math.floor already returns int).
- histogram_utils.py: wrap a 127-char line (E501); dict comprehension ->
  dict.fromkeys (C420).
- s3_manager.py: @lru_cache(maxsize=None) -> @functools.cache (UP033);
  hashlib.md5(usedforsecurity=False) since we're using it as an integrity
  checksum against S3 ETag, not for security (S324); noqa on the
  module-level _s3_client singleton global (PLW0603).
- timing.py: mutable class attrs annotated with ClassVar (RUF012 — _stats
  and _pending_events); noqa on the long-lived trace-file open (SIM115 —
  closed by stop_trace).
- tilemap_plotter.py: drop 3 unused signature params (orientation,
  label_formatter, categories — no callers pass them); noqa the remaining
  2 (first_category_title, second_category_title — callers DO pass
  meaningful values but the body silently hardcodes empty strings;
  a known design debt, preserving current behavior).
- caching.py: noqa the 'shelve file may not exist in read-only mode'
  try/except/pass (S110) — the comment already explains the intent.
RECS now ships jackknife replicate weights; the Fay (1-epsilon)^2
denominator was the wrong variance estimator for them. Jackknife uses
(R-1)/R * sum of squared log-residuals. CI ribbons widen ~3.84x across
854 RECS plots; point estimates unchanged.
@rajeee rajeee changed the title Add Baseline validation tool to resstockpostproc Add baseline validation tool to resstockpostproc Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants