Skip to content

Resolve various thread-unsafe patterns detected by Copilot#314

Merged
knoepfel merged 1 commit intoFramework-R-D:mainfrom
knoepfel:resolve-thread-unsafeties
Feb 13, 2026
Merged

Resolve various thread-unsafe patterns detected by Copilot#314
knoepfel merged 1 commit intoFramework-R-D:mainfrom
knoepfel:resolve-thread-unsafeties

Conversation

@knoepfel
Copy link
Copy Markdown
Member

This PR addresses some data races that were uncovered via recent commits. The most prominent change addressed the caching system, which was unsafely exposing flush flags and counts to multiple threads when such flags and counts were not necessarily initialized.

@greenc-FNAL
Copy link
Copy Markdown
Contributor

Review the full CodeQL report for details.

@knoepfel knoepfel force-pushed the resolve-thread-unsafeties branch from d25aa49 to 2654969 Compare February 11, 2026 20:32
Copilot AI added a commit that referenced this pull request Feb 11, 2026
Fully restore store_counters.hpp/cpp, declared_fold.hpp,
declared_transform.hpp to their main branch versions. The fold
race condition fix belongs in PR #314.

Remove speculative null checks from BASIC_CONVERTER and
NUMPY_ARRAY_CONVERTER that are not needed by construction.

Retain only targeted Python fixes:
- Py_XINCREF in call()/callv() for refcount safety with caching
- lifeline_transform symmetric in call()/callv() per original author
- safe_decref in decref_all for null safety
- VECTOR_CONVERTER throws instead of returning null sentinel
- lifelinewrap.cpp: return nullptr on tp_alloc failure

Co-authored-by: greenc-FNAL <2372949+greenc-FNAL@users.noreply.github.com>
Copilot AI added a commit that referenced this pull request Feb 11, 2026
Output converters (py_to_* in BASIC_CONVERTER and NUMPY_ARRAY_CONVERTER)
were calling Py_DECREF on their input, which is a borrowed reference from
the upstream product store cache. When declared_transform's caching reuses
the same product store for multiple events with the same hash, the first
event's converter frees the PyObject, and subsequent events access freed
memory — causing intermittent SEGFAULTs in py:coverage, py:veclists, etc.

Fix: Remove Py_DECREF from all output converters. The input is a borrowed
reference owned by the product store cache. Only py_callback::call()/callv()
needs XINCREF/XDECREF around the Python function call to protect against
concurrent cache eviction.

Also integrates PR #314's thread-safety fixes for store_counters,
products.hpp, multiplexer, filter_impl, and all declared_* node types.

Co-authored-by: greenc-FNAL <2372949+greenc-FNAL@users.noreply.github.com>
Copilot AI added a commit that referenced this pull request Feb 11, 2026
Separate concerns:
- Remove all PR #314 core framework changes (store_counters, products,
  multiplexer, declared_observer/predicate/provider/unfold, edge_maker,
  filter_impl, node_catalog, test files). These belong in PR #314.
- Keep only the try/catch in declared_transform.hpp (exception safety
  for cache entries when call() throws).

Add Python reference counting model documentation:
- plugins/python/src/python-refcounting.md describes the ownership
  rules for PyObject* references flowing through declared_transform
  caching, including why the bug was masked by CPython's small integer
  cache.

Co-authored-by: greenc-FNAL <2372949+greenc-FNAL@users.noreply.github.com>
greenc-FNAL added a commit that referenced this pull request Feb 11, 2026
Fully restore store_counters.hpp/cpp, declared_fold.hpp,
declared_transform.hpp to their main branch versions. The fold
race condition fix belongs in PR #314.

Remove speculative null checks from BASIC_CONVERTER and
NUMPY_ARRAY_CONVERTER that are not needed by construction.

Retain only targeted Python fixes:
- Py_XINCREF in call()/callv() for refcount safety with caching
- lifeline_transform symmetric in call()/callv() per original author
- safe_decref in decref_all for null safety
- VECTOR_CONVERTER throws instead of returning null sentinel
- lifelinewrap.cpp: return nullptr on tp_alloc failure

Co-authored-by: greenc-FNAL <2372949+greenc-FNAL@users.noreply.github.com>
greenc-FNAL added a commit that referenced this pull request Feb 11, 2026
Output converters (py_to_* in BASIC_CONVERTER and NUMPY_ARRAY_CONVERTER)
were calling Py_DECREF on their input, which is a borrowed reference from
the upstream product store cache. When declared_transform's caching reuses
the same product store for multiple events with the same hash, the first
event's converter frees the PyObject, and subsequent events access freed
memory — causing intermittent SEGFAULTs in py:coverage, py:veclists, etc.

Fix: Remove Py_DECREF from all output converters. The input is a borrowed
reference owned by the product store cache. Only py_callback::call()/callv()
needs XINCREF/XDECREF around the Python function call to protect against
concurrent cache eviction.

Also integrates PR #314's thread-safety fixes for store_counters,
products.hpp, multiplexer, filter_impl, and all declared_* node types.

Co-authored-by: greenc-FNAL <2372949+greenc-FNAL@users.noreply.github.com>
greenc-FNAL added a commit that referenced this pull request Feb 11, 2026
Separate concerns:
- Remove all PR #314 core framework changes (store_counters, products,
  multiplexer, declared_observer/predicate/provider/unfold, edge_maker,
  filter_impl, node_catalog, test files). These belong in PR #314.
- Keep only the try/catch in declared_transform.hpp (exception safety
  for cache entries when call() throws).

Add Python reference counting model documentation:
- plugins/python/src/python-refcounting.md describes the ownership
  rules for PyObject* references flowing through declared_transform
  caching, including why the bug was masked by CPython's small integer
  cache.

Co-authored-by: greenc-FNAL <2372949+greenc-FNAL@users.noreply.github.com>
@greenc-FNAL greenc-FNAL force-pushed the resolve-thread-unsafeties branch from 2654969 to aef3812 Compare February 12, 2026 21:02
@codecov
Copy link
Copy Markdown

codecov bot commented Feb 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

@@            Coverage Diff             @@
##             main     #314      +/-   ##
==========================================
+ Coverage   80.16%   80.19%   +0.02%     
==========================================
  Files         126      126              
  Lines        3071     3070       -1     
  Branches      545      547       +2     
==========================================
  Hits         2462     2462              
  Misses        381      381              
+ Partials      228      227       -1     
Flag Coverage Δ
unittests 80.19% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
phlex/core/declared_observer.hpp 100.00% <100.00%> (ø)
phlex/core/declared_predicate.hpp 89.65% <100.00%> (ø)
phlex/core/declared_provider.hpp 100.00% <100.00%> (ø)
phlex/core/declared_transform.hpp 91.30% <100.00%> (ø)
phlex/core/declared_unfold.hpp 94.73% <100.00%> (ø)
phlex/core/detail/filter_impl.cpp 95.23% <100.00%> (ø)
phlex/core/multiplexer.cpp 73.91% <ø> (-5.40%) ⬇️
phlex/core/store_counters.cpp 89.58% <100.00%> (+3.21%) ⬆️
phlex/model/products.hpp 95.23% <100.00%> (+0.23%) ⬆️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 881866f...aef3812. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@knoepfel knoepfel merged commit bda6402 into Framework-R-D:main Feb 13, 2026
47 checks passed
@knoepfel knoepfel deleted the resolve-thread-unsafeties branch February 13, 2026 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants