Skip to content

Commit ebe88a2

Browse files
committed
test: comprehensive test suite for issue #262 tool call hang
Created 34 tests attempting to reproduce the issue where call_tool() hangs while list_tools() works. Tested many scenarios including: - Zero-buffer memory streams (inspired by issue #1764) - Server buffering and flushing behavior - Interleaved notifications during tool execution - Bidirectional communication (sampling during tool call) - Timing/race conditions with various delay patterns - Big delays (2-3 seconds) as suggested in issue comments - Slow callbacks that block processing - CPU pressure tests - Raw subprocess communication - Concurrent and stress tests All 34 tests pass on native Linux, indicating the issue is likely environment-specific (WSL Ubuntu as reported). Added investigation notes documenting the most likely root cause based on issue #1764: zero-buffer memory streams combined with start_soon pattern can cause deadlock when sender is faster than receiver initialization. Github-Issue: #262
1 parent c9bb431 commit ebe88a2

File tree

2 files changed

+1311
-0
lines changed

2 files changed

+1311
-0
lines changed

ISSUE_262_INVESTIGATION.md

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
# Issue #262 Investigation Notes
2+
3+
## Problem Statement
4+
`session.call_tool()` hangs indefinitely while `session.list_tools()` works fine.
5+
The server executes successfully and produces results, but the client cannot receive them.
6+
7+
## Key Observations from Issue
8+
- Debugger stepping makes issue disappear (timing/race condition)
9+
- Works on native Windows, fails on WSL Ubuntu
10+
- Affects both stdio and SSE transports
11+
- Server produces output but client doesn't receive it
12+
13+
## Related Issues
14+
15+
### Issue #1764 - CRITICAL INSIGHT!
16+
**Problem:** Race condition in StreamableHTTPServerTransport with SSE connections hanging.
17+
18+
**Root Cause:** Zero-buffer memory streams + `tg.start_soon()` pattern causes deadlock:
19+
- `send()` blocks until `receive()` is called on zero-buffer streams
20+
- When sender is faster than receiver task initializes, deadlock occurs
21+
- Responses with 1-2 items work, 3+ items deadlock (timing dependent!)
22+
23+
**Fix:** Either increase buffer size OR use `await tg.start()` to ensure receiver ready.
24+
25+
**Relevance to #262:** The `stdio_client` uses EXACTLY this pattern:
26+
```python
27+
read_stream_writer, read_stream = anyio.create_memory_object_stream(0) # Zero buffer!
28+
write_stream, write_stream_reader = anyio.create_memory_object_stream(0) # Zero buffer!
29+
# ...
30+
tg.start_soon(stdout_reader) # Not awaited!
31+
tg.start_soon(stdin_writer) # Not awaited!
32+
```
33+
34+
This could cause the exact hang described in #262 if the server responds before
35+
the client's receive loop is ready to receive!
36+
37+
## Comprehensive Test Results
38+
39+
### Test Categories and Results
40+
41+
| Category | Tests | Result | Notes |
42+
|----------|-------|--------|-------|
43+
| Basic tool call | 1 | PASS | Simple scenario works |
44+
| Buffering tests | 3 | PASS | Flush/no-flush, unbuffered all work |
45+
| 0-capacity streams | 3 | PASS | Rapid responses, notifications work |
46+
| Interleaved notifications | 2 | PASS | Server notifications during tool work |
47+
| Sampling during tool | 1 | PASS | Bidirectional communication works |
48+
| Timing races | 2 | PASS | Small delays don't trigger |
49+
| Big delays (2-3 sec) | 1 | PASS | Server delays don't cause hang |
50+
| Instant response | 1 | PASS | Immediate response works |
51+
| Burst responses | 1 | PASS | 20 rapid log messages handled |
52+
| Slow callbacks | 2 | PASS | Slow logging/message handlers work |
53+
| Many iterations | 1 | PASS | 50 rapid iterations all succeed |
54+
| Concurrent sessions | 2 | PASS | Multiple parallel sessions work |
55+
| Stress tests | 2 | PASS | 30 sequential sessions work |
56+
| Patched SDK | 3 | PASS | Delays in SDK don't trigger |
57+
| CPU pressure | 1 | PASS | Heavy CPU load doesn't trigger |
58+
| Raw subprocess | 2 | PASS | Direct pipe communication works |
59+
| Preemptive response | 1 | PASS | Unbuffered immediate response works |
60+
61+
**Total: 34 tests, all passing**
62+
63+
### Theories Tested
64+
65+
1. **Stdout Buffering** - Server not flushing stdout after responses
66+
- Result: NOT the cause - works with and without flush
67+
68+
2. **0-Capacity Streams** - stdio_client uses unbuffered streams (capacity 0)
69+
- Result: NOT the cause on this platform - works in test environment
70+
71+
3. **Interleaved Notifications** - Server sending log notifications during tool execution
72+
- Result: NOT the cause - notifications handled correctly
73+
74+
4. **Bidirectional Communication** - Server requesting sampling during tool execution
75+
- Result: NOT the cause - bidirectional works
76+
77+
5. **Timing/Race Conditions** - Small delays in server response
78+
- Result: Could not reproduce with various delay patterns
79+
80+
6. **Big Delays (2-3 seconds)** - As comments suggest
81+
- Result: NOT the cause - big delays work fine
82+
83+
7. **Slow Callbacks** - Message handler/logging callback that blocks
84+
- Result: NOT the cause - slow callbacks work
85+
86+
8. **Zero-buffer + start_soon race** (from #1764)
87+
- Result: Could not reproduce, but this remains the most likely cause
88+
89+
9. **CPU Pressure** - Heavy CPU load exposing timing issues
90+
- Result: NOT the cause on this platform
91+
92+
10. **Raw Subprocess Communication** - Direct pipe handling
93+
- Result: Works correctly, issue is not in pipe handling
94+
95+
## Environment Notes
96+
- Testing on: Linux (not WSL)
97+
- Python: 3.11.14
98+
- Using anyio for async
99+
- All 34 tests pass consistently
100+
101+
## Conclusions
102+
103+
### Why We Cannot Reproduce
104+
The issue appears to be **highly environment-specific**:
105+
1. **WSL-specific behavior** - The original reporter experienced this on WSL Ubuntu, not native Linux/Windows
106+
2. **Timing-dependent** - Debugger stepping makes it disappear, suggesting a very narrow timing window
107+
3. **Platform-specific pipe behavior** - WSL has different I/O characteristics than native Linux
108+
109+
### Most Likely Root Cause
110+
Based on issue #1764, the most likely cause is the **zero-buffer memory stream + start_soon pattern**:
111+
1. `stdio_client` creates 0-capacity streams
112+
2. Reader/writer tasks are started with `start_soon` (not awaited)
113+
3. In certain environments (WSL), the timing allows responses to arrive before the receive loop is ready
114+
4. This causes the send to block indefinitely (deadlock)
115+
116+
### Potential Fixes (to be verified on WSL)
117+
1. **Increase stream buffer size** - Change from `anyio.create_memory_object_stream(0)` to `anyio.create_memory_object_stream(1)` or higher
118+
2. **Use `await tg.start()`** - Ensure receive loop is ready before returning from context manager
119+
3. **Add synchronization** - Use an Event to signal when receive loop is ready
120+
121+
## Files Created
122+
- `tests/issues/test_262_tool_call_hang.py` - Comprehensive test suite (34 tests)
123+
- `tests/issues/reproduce_262_standalone.py` - Standalone reproduction script
124+
- `ISSUE_262_INVESTIGATION.md` - This investigation document
125+
126+
## Recommendations
127+
1. **For users experiencing this issue:**
128+
- Try running on native Linux or Windows instead of WSL
129+
- Check if adding a small delay after session creation helps
130+
131+
2. **For maintainers:**
132+
- Consider changing stream buffer size in `stdio_client` from 0 to 1
133+
- Consider using `await tg.start()` pattern instead of `start_soon` for critical tasks
134+
- Test changes specifically on WSL Ubuntu to verify fix
135+
136+
3. **For further investigation:**
137+
- Need WSL Ubuntu environment to reproduce
138+
- Could try patching `stdio_client` to use `anyio.create_memory_object_stream(1)` and test

0 commit comments

Comments
 (0)