Skip to content

Commit cb7aa5b

Browse files
Copilotgreenc-FNAL
andcommitted
Add corrected python-refcounting.md documenting wlav's one-to-one model
The previous PR #313 document described an incorrect model with XINCREF/XDECREF wrapping. This version documents the correct one-to-one ownership model: each converter creates a reference, each consumer DECREFs it exactly once. Co-authored-by: greenc-FNAL <2372949+greenc-FNAL@users.noreply.github.com>
1 parent a3c6b24 commit cb7aa5b

File tree

1 file changed

+203
-0
lines changed

1 file changed

+203
-0
lines changed
Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
# Python Reference Counting in the Phlex Python Plugin
2+
3+
## Overview
4+
5+
Phlex's Python plugin bridges C++ and Python through `intptr_t` values
6+
that represent `PyObject*` pointers. These values flow through the
7+
framework's `declared_transform` nodes, which cache their outputs for
8+
reuse when duplicate hashes appear. This document describes the
9+
reference counting discipline used to avoid both leaks and crashes.
10+
11+
## Architecture
12+
13+
A typical Python transform pipeline looks like this:
14+
15+
```text
16+
Provider → [input converter] → [py_callback] → [output converter] → Observer
17+
(C++ → PyObject) (Py → Py) (PyObject → C++)
18+
```
19+
20+
Each `[…]` above is its own `declared_transform` node. Each node is
21+
named with a combination of the type, the product label, and the
22+
consumer algorithm, so every converter has a **unique** identity.
23+
24+
## The Caching Mechanism
25+
26+
`declared_transform` (see `phlex/core/declared_transform.hpp`) caches
27+
its output product store by `data_cell_index::hash()`:
28+
29+
```cpp
30+
if (stores_.insert(a, hash)) {
31+
// First time: call the transform, store result
32+
a->second = make_shared<product_store>(...);
33+
} else {
34+
// Cache hit: reuse a->second without calling the transform
35+
}
36+
```
37+
38+
Each transform function is called **exactly once** per unique hash.
39+
Duplicate events with the same hash reuse the cached product store
40+
without re-invoking the transform. The cache entry is erased once
41+
`done_with()` returns true (all events for that hash have been
42+
processed).
43+
44+
## The One-to-One Model
45+
46+
Because each converter node has a unique name and each transform is
47+
called once per unique hash, reference counting is **one-to-one**:
48+
49+
1. **Input converters** create a new `PyObject*` reference (refcnt=1).
50+
2. **Consumers** (`py_callback` or output converters) `Py_DECREF` the
51+
reference after use (refcnt→0, freed).
52+
53+
No `XINCREF`/`XDECREF` wrapping is needed around the callback call.
54+
Adding one would produce a net +1 leak per product.
55+
56+
### Input Converters (`_to_py`)
57+
58+
Each input converter creates a **new reference** that is stored in the
59+
product store as an `intptr_t`. The framework treats `intptr_t` as an
60+
opaque integer — it has no C++ destructor and cannot call `Py_DECREF`.
61+
The consumer is responsible for releasing the reference.
62+
63+
```cpp
64+
// BASIC_CONVERTER: creates new reference via Python C API
65+
static intptr_t int_to_py(int a) {
66+
PyGILRAII gil;
67+
return (intptr_t)PyLong_FromLong(a); // new reference, refcnt=1
68+
}
69+
70+
// VECTOR_CONVERTER: creates new PhlexLifeline wrapping a numpy view
71+
static intptr_t vint_to_py(shared_ptr<vector<int>> const& v) {
72+
// ... creates PyArrayObject + PhlexLifeline ...
73+
return (intptr_t)pyll; // new reference, refcnt=1
74+
}
75+
```
76+
77+
### py_callback (`call` / `callv`)
78+
79+
`py_callback::call()` and `py_callback::callv()` receive `intptr_t`
80+
args that are **owned references** created by the upstream input
81+
converter, dedicated to this specific consumer. They:
82+
83+
1. Call `lifeline_transform()` to unwrap any `PhlexLifeline` objects
84+
(extracting the numpy view for the Python function).
85+
2. Pass the args to `PyObject_CallFunctionObjArgs()`.
86+
3. Call `decref_all(args...)` to release the input references.
87+
88+
```cpp
89+
template <typename... Args>
90+
intptr_t call(Args... args) {
91+
PyGILRAII gil;
92+
PyObject* result = PyObject_CallFunctionObjArgs(
93+
m_callable, lifeline_transform(args)..., nullptr);
94+
// ... error handling ...
95+
decref_all(args...); // release input references
96+
return (intptr_t)result; // new reference from Python call
97+
}
98+
```
99+
100+
Both `call()` and `callv()` use `lifeline_transform()` symmetrically.
101+
102+
### Output Converters (`py_to_*`)
103+
104+
Output converters receive the `intptr_t` from `py_callback`'s product
105+
store — again, an **owned reference** created by the Python call,
106+
dedicated to this specific output converter. They extract the C++
107+
value, then `Py_DECREF` the input:
108+
109+
```cpp
110+
// BASIC_CONVERTER py_to_*
111+
static int py_to_int(intptr_t pyobj) {
112+
PyGILRAII gil;
113+
int i = (int)PyLong_AsLong((PyObject*)pyobj);
114+
// ... error handling ...
115+
Py_DECREF((PyObject*)pyobj); // release the reference
116+
return i;
117+
}
118+
119+
// NUMPY_ARRAY_CONVERTER py_to_*
120+
static shared_ptr<vector<int>> py_to_vint(intptr_t pyobj) {
121+
PyGILRAII gil;
122+
auto vec = make_shared<vector<int>>();
123+
// ... copy data from PyArray or PyList ...
124+
Py_DECREF((PyObject*)pyobj); // release the reference
125+
return vec;
126+
}
127+
```
128+
129+
## Reference Flow Diagram
130+
131+
```text
132+
┌──────────────┐
133+
│ Provider │ C++ value (int, float, vector<T>)
134+
└──────┬───────┘
135+
136+
┌──────▼───────┐
137+
│ input conv. │ Creates NEW PyObject* (refcnt=1)
138+
│ (e.g. int_ │ Stored in product_store as intptr_t
139+
│ to_py) │ One per unique hash; consumer owns it
140+
└──────┬───────┘
141+
│ intptr_t (owned reference, refcnt=1)
142+
┌──────▼───────┐
143+
│ py_callback │ lifeline_transform() to unwrap
144+
│ ::call() │ PyObject_CallFunctionObjArgs()
145+
│ │ decref_all(args...) → refcnt=0, freed
146+
│ │ Returns result (NEW reference, refcnt=1)
147+
└──────┬───────┘
148+
│ intptr_t (owned reference, refcnt=1)
149+
┌──────▼───────┐
150+
│ output conv. │ Reads PyObject* value
151+
│ (e.g. py_to_ │ Py_DECREF → refcnt=0, freed
152+
│ int) │ Returns C++ value
153+
└──────┬───────┘
154+
155+
┌──────▼───────┐
156+
│ Observer │ Uses C++ value
157+
└──────────────┘
158+
```
159+
160+
## Error Handling
161+
162+
### VECTOR_CONVERTER Must Throw on Error
163+
164+
`VECTOR_CONVERTER` error paths throw `std::runtime_error` instead of
165+
returning `(intptr_t)nullptr`. A null `intptr_t` passed to
166+
`PyObject_CallFunctionObjArgs` would act as the argument-list sentinel,
167+
silently truncating the argument list.
168+
169+
### lifeline_transform Returns a Borrowed Reference
170+
171+
`lifeline_transform()` unwraps `PhlexLifeline` objects to extract the
172+
numpy array view (`m_view`). The returned pointer is a borrowed
173+
reference from the lifeline object, which remains alive because the
174+
caller still holds the owned reference to the lifeline (the DECREF
175+
happens after the call returns).
176+
177+
### ll_new Must Return nullptr on Allocation Failure
178+
179+
`ll_new` (in `lifelinewrap.cpp`) returns `nullptr` when `tp_alloc`
180+
fails, rather than falling through to dereference the null pointer.
181+
182+
## Common Pitfalls
183+
184+
1. **Do not add `Py_XINCREF`/`Py_XDECREF` around `py_callback` calls.**
185+
The one-to-one model means each reference is created once and
186+
consumed once. Adding INCREF/DECREF introduces a net +1 leak per
187+
product — potentially millions of leaked objects.
188+
189+
2. **Do not remove `Py_DECREF` from output converters or `decref_all`
190+
from `py_callback`.** These are the "consume" side of the
191+
one-to-one contract. Removing them leaks every converted object.
192+
193+
3. **Do not return `(intptr_t)nullptr` from converters.** It acts as
194+
the `PyObject_CallFunctionObjArgs` sentinel. Throw an exception
195+
instead.
196+
197+
## Known Limitations
198+
199+
- `BASIC_CONVERTER`'s `_to_py` functions can return
200+
`(intptr_t)nullptr` if the underlying Python C API call fails (e.g.,
201+
`PyLong_FromLong` on OOM). This null would propagate through the
202+
product store and cause a crash in `decref_all`. This is an
203+
extremely rare edge case that has not been addressed yet.

0 commit comments

Comments
 (0)