fix(runner): remove __getstate__/__setstate__, load driver hooks at init by ocervell · Pull Request #1267 · freelabz/secator

ocervell · 2026-07-03T12:49:03Z

Problem

With SECATOR_DEBUG=runner, a chunked task on a worker emits hundreds of
thousands of hook registered lines. The flood appears during the
replace(self, workflow) call (celery.py), right after Celery logs
"Adding chain task secator.celery.run_command from request chain". Reproduced
with a custom driver dropped in ~/.secator/templates/ + a chunked task.

Root cause

PR #1126 (fixes #1124) added two things:

discover_external_drivers() at worker startup (celery.py, IN_WORKER) —
the actual fix: it puts every secator.hooks.<driver> module (built-in and
custom ones from ~/.secator/templates/) into sys.modules, so pickled hook
functions resolve on the worker.
A custom __getstate__/__setstate__ on Runner: __getstate__ strips hooks;
__setstate__ rebuilds them from context['drivers'] by calling
register_hooks().

Because (1) already makes the driver modules importable on the worker, hook
functions pickle/unpickle natively by qualified name — making (2) redundant.
And register_hooks() logs one line per hook, while __setstate__ runs on
every unpickle. Celery's replace() + chord synchronization re-deserialize
the runner-bearing signatures O(chunks) times → the flood.

Fix

register_hooks() is idempotent — skip a hook already present (and stop
mutating a shared module-level HOOKS list in place). Keeps both driver
sources — explicit hooks= (library callers) and context['drivers'] (CLI) —
without registering or logging anything twice.
New _apply_context_drivers(), called once in __init__ — loads
context['drivers'] hooks at construction (the logic __setstate__ used to
run on unpickle). Every runner — initial dispatch, chunk task rebuilt by
run_command, chord callback — gets its driver hooks exactly once.
Registering on the client is inert: dispatching runners set
enable_hooks=False, and run_hooks gates firing on it.
Remove __getstate__/__setstate__. Native pickling preserves the
already-registered resolved_hooks across the Celery boundary; nothing
re-registers on unpickle → the flood is gone. This structurally removes the
fix: restore on_end hooks for chunk-parent tasks on unpickle #1180 class of bug too (hooks are never stripped/rebuilt).

Backwards compatibility

Library callers who pass driver hooks explicitly keep working:

from secator.hooks.mongodb import HOOKS
from secator.workflows import url_crawl
url_crawl(..., hooks=HOOKS)

This actually fixes a latent regression for exactly those callers: today
__getstate__ strips their hooks=, and __setstate__ restores only from
context['drivers'] — so a caller passing hooks= with no matching driver in
context silently lost their hooks on the worker. Native pickling preserves
them.

Tests

New tests/unit/test_runner_hooks.py: driver hooks load at init from
context['drivers']; a custom-driver hook survives a pickle round-trip with
zero re-registration on unpickle; registration is idempotent; library-mode
explicit hooks= survive pickling.
Updated TestRunnerPickle in tests/unit/test_celery.py to the native-pickling
contract (module present as on a discovered worker; unpickle does not
re-register).
Full tests/unit suite: no new failures (the remaining reds are pre-existing
and environmental — AI guardrails / CLI install / offline / CVE / config — and
fail identically on main).

Fixes the SECATOR_DEBUG=runner log flood on chunked tasks; builds on #1126.

Summary by CodeRabbit

Bug Fixes
- Hook-based behavior is now initialized earlier and stays consistent through serialization, reducing cases where actions were missing or duplicated.
- Reapplying the same driver configuration no longer registers the same hook more than once.
- Hook-provided behavior now survives save/restore flows more reliably.

PR #1126 fixed dynamic driver hooks on workers by (1) discovering external drivers at worker startup (celery.py IN_WORKER) and (2) adding a custom __getstate__/__setstate__ pair that strips hooks on pickle and re-registers them from context['drivers'] on every unpickle. Because (1) already makes every secator.hooks.<driver> module importable on the worker, hook functions pickle/unpickle natively by qualified name, making (2) redundant. Worse, __setstate__ calls register_hooks() (one debug log per hook) on every unpickle, and Celery's replace()/chord synchronization re-deserialize the runner-bearing signatures O(chunks) times — so a chunked task with SECATOR_DEBUG=runner emitted hundreds of thousands of "hook registered" lines. Changes: - register_hooks() is now idempotent (skip a hook already present, and stop mutating a shared module-level HOOKS list in place). Keeps both driver-hook sources — explicit hooks= (library callers) and context['drivers'] — without double-registering. Only actually-registered hooks are logged. - New _apply_context_drivers(), called once in __init__, loads context['drivers'] hooks at construction (the logic __setstate__ used to run on unpickle). Every runner — initial dispatch, chunk task rebuilt by run_command, chord callback — gets its driver hooks exactly once. Registering on the client is inert because dispatching runners set enable_hooks=False (run_hooks gates firing on it). - Remove __getstate__/__setstate__. Native pickling preserves the already- registered resolved_hooks across the Celery boundary; nothing re-registers on unpickle, so the flood is gone. Also fixes a latent regression for library callers passing hooks=HOOKS with no context['drivers'] (their hooks were stripped on pickle and never restored on the worker). Tests: new tests/unit/test_runner_hooks.py (init-time load, native pickle survival with zero re-registration, idempotent registration, library-mode hooks=); updated TestRunnerPickle to the native-pickling contract. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01P5vSjfkBuGAAHdKxHS3ySm

coderabbitai · 2026-07-03T12:53:27Z

Walkthrough

Runner initialization now applies context-driven hooks via a new _apply_context_drivers() method invoked during __init__, replacing the previous __getstate__/__setstate__ pickle-based restoration. register_hooks() was made idempotent to avoid duplicate hook registration. Tests were updated and added accordingly.

Changes

Driver Hook Lifecycle Fix

Layer / File(s)	Summary
Apply driver hooks at init, remove pickle restoration `secator/runners/_base.py`	Adds `_apply_context_drivers()` invoked from `__init__` to discover, order, import, and merge driver `HOOKS` before calling `register_hooks()`; removes `__getstate__`/`__setstate__`; makes `register_hooks()` skip hooks already present in `resolved_hooks`.
Celery pickling test updates `tests/unit/test_celery.py`	Replaces the missing-module unpickle test with one verifying hooks resolve before pickling, `register_hooks` is not called again on unpickle, and hooks survive round-trip pickling; updates related comments.
New runner hook test coverage `tests/unit/test_runner_hooks.py`	Adds a custom driver template, library-mode hook, temp-template setup/teardown, and tests validating hook presence at construction, non-duplicated registration across pickling/re-application, and survival of explicit hooks through pickling.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Sequence Diagram(s)

sequenceDiagram
  participant Init as Runner.__init__
  participant Apply as _apply_context_drivers
  participant Discover as discover_external_drivers
  participant Register as register_hooks

  Init->>Register: register_hooks(hooks)
  Init->>Apply: _apply_context_drivers()
  Apply->>Discover: discover and order drivers
  Discover-->>Apply: driver modules
  Apply->>Apply: import HOOKS, deep-merge
  Apply->>Register: register_hooks(merged_hooks)
  Register->>Register: skip hooks already in resolved_hooks

Possibly related PRs

freelabz/secator#1126: Both PRs change secator/runners/_base.py to ensure dynamically discovered driver hook functions are correctly handled across Celery worker serialization/deserialization.
freelabz/secator#1172: Both PRs modify how runner hooks are loaded/applied from context['drivers'] during runner construction/pickling.

Poem

A rabbit hops through pickled state,
No more __setstate__ to relitigate,
Hooks load early, clean and true,
No duplicates hopping through! 🐇
With drivers merged at init's dawn,
Onward the worker runners spawn~

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title is concise and accurately summarizes the runner hook and pickle handling change.
Linked Issues check	✅ Passed	The changes load dynamic driver hooks at init and preserve them through Celery pickling, matching the worker-mode fix.
Out of Scope Changes check	✅ Passed	The runner refactor and new tests are directly tied to the stated worker-mode hook serialization fix.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/runner-hook-pickle-refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

🧹 Nitpick comments (2)

secator/runners/_base.py (1)
468-515: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Logic is sound; consider deduplicating the driver-hook loading path.

_apply_context_drivers() and _apply_profile_drivers() (Lines 1527-1567) share the same core sequence: discover_external_drivers() → import_dynamic('secator.hooks.{driver}', 'HOOKS') → deep_merge_dicts(*hooks_list) → register_hooks(...). Extracting a small helper (e.g. _load_driver_hooks(drivers) -> merged_hooks) would remove the duplication and keep the discovery/merge behavior in one place.

Note: the _apply_profile_drivers docstring/comment still reference the now-removed Runner.__setstate__ (Lines 1500, 1531) — worth updating while touching this area.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@secator/runners/_base.py` around lines 468 - 515, Extract the repeated
driver-hook loading sequence from _apply_context_drivers() and
_apply_profile_drivers() into a shared helper that handles
discover_external_drivers(), import_dynamic('secator.hooks.{driver}', 'HOOKS'),
deep_merge_dicts(*hooks_list), and the register_hooks call. Keep the
context/profile-specific selection logic in each caller, and update the
surrounding docstring/comments in Runner to remove references to the removed
__setstate__ path.
tests/unit/test_runner_hooks.py (1)
38-56: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Prefer addClassCleanup for robust restoration of global state.

setUpClass mutates process-global state (CONFIG.dirs.templates and the discover_external_drivers cache). If any step after Line 44 raises (e.g. discover_external_drivers() on Line 48), tearDownClass is not invoked and CONFIG.dirs.templates is left pointing at a temp dir that later gets cleaned up — polluting subsequent test classes. Registering cleanups right after each mutation guarantees restoration even on partial-setup failure.
♻️ Suggested restructure
 	`@classmethod`
 	def setUpClass(cls):
 		cls._tmpdir = tempfile.TemporaryDirectory()
+		cls.addClassCleanup(cls._tmpdir.cleanup)
 		tmp = Path(cls._tmpdir.name)
 		(tmp / 'mydriver.py').write_text(CUSTOM_DRIVER)
 		cls._orig_templates = CONFIG.dirs.templates
 		CONFIG.dirs.templates = tmp
+		cls.addClassCleanup(setattr, CONFIG.dirs, 'templates', cls._orig_templates)
+		cls.addClassCleanup(sys.modules.pop, 'secator.hooks.mydriver', None)
+		cls.addClassCleanup(discover_external_drivers.cache_clear)
 		# discover_external_drivers is `@cache`'d; clear it so it re-scans our tmp dir
 		# (an earlier test/import may have populated the cache with the real dir)
 		discover_external_drivers.cache_clear()
 		discover_external_drivers()
(and drop the now-redundant tearDownClass.)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_runner_hooks.py` around lines 38 - 56,
`test_runner_hooks.setUpClass` mutates process-wide state
(`CONFIG.dirs.templates` and `discover_external_drivers.cache_clear()`), so
register restoration with `addClassCleanup` immediately after each mutation
instead of relying on `tearDownClass`. Ensure the cleanup restores
`CONFIG.dirs.templates`, clears the cache again, and removes
`secator.hooks.mydriver` so state is reset even if `discover_external_drivers()`
fails during setup. After moving cleanup into `setUpClass`, remove the now
redundant `tearDownClass`.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@secator/runners/_base.py`:
- Around line 468-515: Extract the repeated driver-hook loading sequence from
_apply_context_drivers() and _apply_profile_drivers() into a shared helper that
handles discover_external_drivers(), import_dynamic('secator.hooks.{driver}',
'HOOKS'), deep_merge_dicts(*hooks_list), and the register_hooks call. Keep the
context/profile-specific selection logic in each caller, and update the
surrounding docstring/comments in Runner to remove references to the removed
__setstate__ path.

In `@tests/unit/test_runner_hooks.py`:
- Around line 38-56: `test_runner_hooks.setUpClass` mutates process-wide state
(`CONFIG.dirs.templates` and `discover_external_drivers.cache_clear()`), so
register restoration with `addClassCleanup` immediately after each mutation
instead of relying on `tearDownClass`. Ensure the cleanup restores
`CONFIG.dirs.templates`, clears the cache again, and removes
`secator.hooks.mydriver` so state is reset even if `discover_external_drivers()`
fails during setup. After moving cleanup into `setUpClass`, remove the now
redundant `tearDownClass`.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d397a750-87da-4f88-ac93-0f1c736ad4ac

📥 Commits

Reviewing files that changed from the base of the PR and between b556414 and 1cc4781.

📒 Files selected for processing (3)

secator/runners/_base.py
tests/unit/test_celery.py
tests/unit/test_runner_hooks.py

## Problem The `Tests` workflow is **red on main** — the `coverage` job fails at *Run coverage report* (`coverage combine --keep` succeeds, then `coverage report -m` prints no table and exits 1). Introduced by #1267. Root cause (reproduced locally): ``` No source for code: '/tmp/tmp8dwgpdlo/mydriver.py'; see .../messages.html#error-no-source EXIT=1 ``` #1267's `tests/unit/test_runner_hooks.py` dropped the fake driver into a system `TemporaryDirectory` and imported it as `secator.hooks.mydriver`. Coverage recorded that `/tmp/...` path. The unit job passed (it only *writes* coverage data), but the separate `coverage` job combines the unit+integration data and runs `coverage report`; by then the tempdir is gone, so coverage raises `NoSource` and exits 1. The path isn't caught by `--omit=*/site-packages/*,*/tests/*,*/templates/*`. ## Fix Mirror the existing `tests/unit/test_loader.py` convention: write the driver into the **real** `CONFIG.dirs.templates` dir (its path contains `/templates/`, so the report's `--omit=*/templates/*` excludes it) and clean it up in `tearDownClass`. No change to the runtime fix from #1267 — this only relocates where the test's throwaway driver file lives. ## Verification - `coverage run … test_runner_hooks.py` → 4 passed; then `coverage report --omit=…` → **exit 0, full table, 0 "No source" errors** (was exit 1 before). - Driver file is removed after the run (no stray file left in `~/.secator/templates/`). - `flake8` clean. 🤖 Generated with [Claude Code](https://claude.com/claude-code)  ## Summary by CodeRabbit * **Tests** * Updated hook-related test coverage to use a real external driver fixture during initialization. * Improved validation for custom hook registration, persistence after serialization, and repeated context application.  Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

🤖 I have created a release *beep* *boop* --- ## [0.39.0](v0.38.0...v0.39.0) (2026-07-03) ### Features * **vulnerability:** add status field + carry-over across re-scans ([#1209](#1209)) ([0d7bf7d](0d7bf7d)) ### Bug Fixes * **cli:** missing "help" field in option raises ValueError ([#1230](#1230)) ([74a6b39](74a6b39)) * **config:** SECATOR_ADDONS_* invalid key override breaks Secator totally ([#1205](#1205)) ([2164993](2164993)) * dynamic task import order causing SyntaxError with external tasks ([#1213](#1213)) ([0e2fcd6](0e2fcd6)) * **exporters:** add runner warnings to report ([#1228](#1228)) ([1734f6b](1734f6b)) * **output:** display full node id instead of task name ([#1227](#1227)) ([e26f394](e26f394)) * **output:** Vulnerability colors display by severity, show tags for Tag ([#1229](#1229)) ([6ad35e3](6ad35e3)) * **runner:** remove __getstate__/__setstate__, load driver hooks at init ([#1267](#1267)) ([2c6d2e8](2c6d2e8)) * **tests:** coverage report NoSource failure on main ([#1270](#1270)) ([8b3a2a3](8b3a2a3)) ### Performance Improvements * **output-types:** cache keys() + O(1) deduplicate membership check ([#1220](#1220)) ([7a956e9](7a956e9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

coderabbitai Bot reviewed Jul 3, 2026

View reviewed changes

ocervell merged commit 2c6d2e8 into main Jul 3, 2026
10 of 11 checks passed

This was referenced Jul 3, 2026

chore(main): release 0.39.0 #1208

Merged

fix(tests): coverage report NoSource failure on main #1270

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(runner): remove getstate/setstate, load driver hooks at init#1267

fix(runner): remove getstate/setstate, load driver hooks at init#1267
ocervell merged 1 commit into
mainfrom
fix/runner-hook-pickle-refactor

ocervell commented Jul 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ocervell commented Jul 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root cause

Fix

Backwards compatibility

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ocervell commented Jul 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading