|
1 | 1 | # Issue #262 Investigation Notes |
2 | 2 |
|
| 3 | +## *** REPRODUCTION CONFIRMED! *** |
| 4 | + |
| 5 | +We have successfully reproduced the race condition that causes issue #262! |
| 6 | + |
| 7 | +**Key finding:** The combination of zero-capacity memory streams + `start_soon()` creates |
| 8 | +a race condition where `send()` blocks forever if the receiver task hasn't started yet. |
| 9 | + |
| 10 | +See `tests/issues/test_262_minimal_reproduction.py` for the simplest reproduction: |
| 11 | +``` |
| 12 | +REPRODUCED: Send blocked because receiver wasn't ready! |
| 13 | +Receiver started: False |
| 14 | +``` |
| 15 | + |
3 | 16 | ## Problem Statement |
4 | 17 | `session.call_tool()` hangs indefinitely while `session.list_tools()` works fine. |
5 | 18 | The server executes successfully and produces results, but the client cannot receive them. |
@@ -113,13 +126,49 @@ Based on issue #1764, the most likely cause is the **zero-buffer memory stream + |
113 | 126 | 3. In certain environments (WSL), the timing allows responses to arrive before the receive loop is ready |
114 | 127 | 4. This causes the send to block indefinitely (deadlock) |
115 | 128 |
|
116 | | -### Potential Fixes (to be verified on WSL) |
| 129 | +### Confirmed Fixes (tested in reproduction) |
117 | 130 | 1. **Increase stream buffer size** - Change from `anyio.create_memory_object_stream(0)` to `anyio.create_memory_object_stream(1)` or higher |
| 131 | + - CONFIRMED: `test_demonstrate_fix_with_buffer` shows this works |
| 132 | + - Buffer allows send to complete without blocking on receiver |
| 133 | + |
118 | 134 | 2. **Use `await tg.start()`** - Ensure receive loop is ready before returning from context manager |
| 135 | + - CONFIRMED: `test_demonstrate_fix_with_start` shows this works |
| 136 | + - start() waits for task to call task_status.started() before continuing |
| 137 | + |
119 | 138 | 3. **Add synchronization** - Use an Event to signal when receive loop is ready |
| 139 | + - Similar to #2, ensures receiver is ready before sender proceeds |
| 140 | + |
| 141 | +### Where to Apply Fixes |
| 142 | +The fix should be applied in `src/mcp/client/stdio/__init__.py`: |
| 143 | + |
| 144 | +**Option 1: Change buffer size from 0 to 1 (simplest)** |
| 145 | +```python |
| 146 | +# Line 117-118: Change from: |
| 147 | +read_stream_writer, read_stream = anyio.create_memory_object_stream(0) |
| 148 | +write_stream, write_stream_reader = anyio.create_memory_object_stream(0) |
| 149 | + |
| 150 | +# To: |
| 151 | +read_stream_writer, read_stream = anyio.create_memory_object_stream(1) |
| 152 | +write_stream, write_stream_reader = anyio.create_memory_object_stream(1) |
| 153 | +``` |
| 154 | + |
| 155 | +**Option 2: Use start() instead of start_soon() (more robust)** |
| 156 | +```python |
| 157 | +# Lines 186-187: Change from: |
| 158 | +tg.start_soon(stdout_reader) |
| 159 | +tg.start_soon(stdin_writer) |
| 160 | + |
| 161 | +# To tasks that signal when ready: |
| 162 | +await tg.start(stdout_reader) |
| 163 | +await tg.start(stdin_writer) |
| 164 | +# (requires modifying stdout_reader and stdin_writer to call task_status.started()) |
| 165 | +``` |
120 | 166 |
|
121 | 167 | ## Files Created |
122 | 168 | - `tests/issues/test_262_tool_call_hang.py` - Comprehensive test suite (34 tests) |
| 169 | +- `tests/issues/test_262_aggressive.py` - Aggressive tests with SDK patches |
| 170 | +- `tests/issues/test_262_standalone_race.py` - Standalone reproduction of SDK patterns |
| 171 | +- `tests/issues/test_262_minimal_reproduction.py` - **Minimal reproduction that CONFIRMS the bug** |
123 | 172 | - `tests/issues/reproduce_262_standalone.py` - Standalone reproduction script |
124 | 173 | - `ISSUE_262_INVESTIGATION.md` - This investigation document |
125 | 174 |
|
|
0 commit comments