Improve test_gevent_monkeypatch robustness.#2419
Conversation
Previously the test could hang when the log fifo was closed by the gunicorn server before the port was written to the log; now it fails. Follow up to pex-tool#2417 which introduced the test and led to observed hangs in CI.
| @property | ||
| def port(self): | ||
| # type: () -> int | ||
| self.port_seen.wait() |
There was a problem hiding this comment.
The hang was here and was revealed by this test rig in the new, simplified code on the RHS.
Here the 16th round out of 34 (x 5 interpreters) exhibited the issue:
:; COUNT=0; while true; do echo -e "\nRound $((count+=1)):" && tox -p -epy3{8,9,10,11,12}-integration -- --devpi -vvsk test_gevent_monkeypatch -n0; done
...
Round 16:
ROOT: will run in automatically provisioned tox, host /home/jsirois/bin/tox.venv/bin/python3.11 is missing [requires (has)]: tox<4 (4.11.4), virtualenv<20.16 (20.25.0)
ROOT: provision> .tox/.tox/bin/python -m tox -p -epy38-integration -epy39-integration -epy310-integration -epy311-integration -epy312-integration -- --devpi -vvsk test_gevent_monkeypatch -n0
⠸ [5] py38-integration | py39-integration | py310-integration | py311-integration | py312-integrationERROR: invocation failed (exit code 1), logfile: /home/jsirois/dev/pex-tool/pex/.tox/py311-integration/log/py311-integration-33.log
=================================================================================================== log start ====================================================================================================
py311-integration installed: ansicolors==1.1.8,coloredlogs==15.0.1,execnet==2.1.1,humanfriendly==10.0,iniconfig==2.0.0,packaging==24.0,-e git+https://github.com/jsirois/pex@1a02d5483cd41751e856043fce40676e474e2c27#egg=pex,pexpect==4.9.0,pluggy==1.5.0,ptyprocess==0.7.0,py==1.11.0,pytest==7.4.0,pytest-forked==1.6.0,pytest-xdist==1.34.0,six==1.16.0
py311-integration run-test-pre: PYTHONHASHSEED='3609177135'
py311-integration run-test: commands[0] | python testing/bin/run_tests.py --it --devpi -vvsk test_gevent_monkeypatch -n0
INFO Logging configured at level INFO.
INFO Devpi server already running.
INFO Test control environment variables:
INFO PIP_INDEX_URL=http://0.0.0.0:41825/root/pypi/+simple/
INFO PIP_TRUSTED_HOST=0.0.0.0
INFO _PEX_TEST_PROJECT_DIR=/home/jsirois/dev/pex-tool/pex
INFO _PEX_USE_PIP_CONFIG=True
============================= test session starts ==============================
platform linux -- Python 3.11.9, pytest-7.4.0, pluggy-1.5.0 -- /home/jsirois/dev/pex-tool/pex/.tox/py311-integration/bin/python
cachedir: .tox/py311-integration/.pytest_cache
rootdir: /home/jsirois/dev/pex-tool/pex
plugins: xdist-1.34.0, forked-1.6.0
collecting ... collected 716 items / 715 deselected / 1 selected
tests/integration/test_issue_2415.py::test_gevent_monkeypatch FAILED
=================================== FAILURES ===================================
___________________________ test_gevent_monkeypatch ____________________________
tmpdir = local('/tmp/pytest-of-jsirois/pytest-93/test_gevent_monkeypatch0')
@pytest.mark.skipif(
IS_PYPY or PY_VER < (3, 8) or PY_VER >= (3, 13),
reason=(
"The lock file for this test only supports CPythons >=3.8,<3.13 which were the officially "
"supported CPythons at the time issue 2415 was reported."
),
)
def test_gevent_monkeypatch(tmpdir):
# type: (Any) -> None
with safe_open(os.path.join(str(tmpdir), "app.py"), "w") as app_fp:
app_fp.write(
dedent(
"""\
from gevent import monkey
monkey.patch_all()
from flask import Flask
app = Flask(__name__)
@app.route("/")
def hello_world():
return "Hello, World!"
"""
)
)
pex_root = os.path.join(str(tmpdir), "pex_root")
pex = os.path.join(str(tmpdir), "pex")
# N.B.: Created with the following, where gevent 1.3.4 was picked as a lower bound since it
# 1st introduced the `from gevent import monkey; monkey.patch_all()` ssl check warning that is
# the subject of issue 2415:
#
# pex3 lock create \
# --resolver-version pip-2020-resolver \
# --pip-version latest \
# --style universal \
# --interpreter-constraint ">=3.8,<3.13" \
# --indent 2 \
# flask \
# "gevent>=1.3.4" \
# gunicorn
lock = data.path("locks", "issue-2415.lock.json")
run_pex_command(
args=[
"--pex-root",
pex_root,
"--runtime-pex-root",
pex_root,
"--lock",
lock,
"-M",
"app",
"-c",
"gunicorn",
"--inject-args",
"app:app",
"-o",
pex,
],
cwd=str(tmpdir),
).assert_success()
log = os.path.join(str(tmpdir), "log")
os.mkfifo(log)
with open(os.path.join(str(tmpdir), "stderr"), "wb+") as stderr_fp:
gunicorn = subprocess.Popen(
args=[pex, "--bind", "127.0.0.1:0", "--log-file", log], stderr=stderr_fp
)
atexit.register(gunicorn.kill)
# N.B.: Since the fifo is blocking, we can only open it now, after the server is launched
# in the background where it too is blocked waiting to write to the log.
with open(log) as log_fp:
port = None # type: Optional[int]
for line in log_fp:
match = re.search(r"Listening at: http://127.0.0.1:(?P<port>\d{1,5})", line)
if match:
port = int(match.group("port"))
break
> assert port is not None, "Failed to determine port from gunicorn log at {log}".format(
log=log
)
E AssertionError: Failed to determine port from gunicorn log at /tmp/pytest-of-jsirois/pytest-93/test_gevent_monkeypatch0/log
E assert None is not None
tests/integration/test_issue_2415.py:108: AssertionError
=========================== short test summary info ============================
FAILED tests/integration/test_issue_2415.py::test_gevent_monkeypatch - Assert...
====================== 1 failed, 715 deselected in 11.80s ======================
ERROR: InvocationError for command /home/jsirois/dev/pex-tool/pex/.tox/py311-integration/bin/python testing/bin/run_tests.py --it --devpi -vvsk test_gevent_monkeypatch -n0 (exited with code 1)
==================================================================================================== log end =====================================================================================================
✖ FAIL py311-integration in 13.369 seconds
✔ OK py38-integration in 14.154 seconds
✔ OK py310-integration in 16.026 seconds
✔ OK py39-integration in 16.316 seconds
✔ OK py312-integration in 18.007 seconds
____________________________________________________________________________________________________ summary _____________________________________________________________________________________________________
py38-integration: commands succeeded
py39-integration: commands succeeded
py310-integration: commands succeeded
ERROR: py311-integration: parallel child exit code 1
py312-integration: commands succeeded
...
Round 34:
ROOT: will run in automatically provisioned tox, host /home/jsirois/bin/tox.venv/bin/python3.11 is missing [requires (has)]: tox<4 (4.11.4), virtualenv<20.16 (20.25.0)
ROOT: provision> .tox/.tox/bin/python -m tox -p -epy38-integration -epy39-integration -epy310-integration -epy311-integration -epy312-integration -- --devpi -vvsk test_gevent_monkeypatch -n0
✔ OK py311-integration in 13.348 seconds
✔ OK py38-integration in 13.399 seconds
✔ OK py310-integration in 13.53 seconds
✔ OK py39-integration in 15.95 seconds
✔ OK py312-integration in 18.083 seconds
____________________________________________________________________________________________________ summary _____________________________________________________________________________________________________
py38-integration: commands succeeded
py39-integration: commands succeeded
py310-integration: commands succeeded
py311-integration: commands succeeded
py312-integration: commands succeeded
congratulations :)
There was a problem hiding this comment.
And, got that failure here on the 1st CI burn: https://github.com/pex-tool/pex/actions/runs/9403807835/job/25901223784?pr=2419#step:5:1976
I'd like to figure out how / why the gunicorn server starts up, opens the log file, then crashes before ever writing out the ephemeral port it picks, but this is at least forward progress - fail fast is better than hang.
There was a problem hiding this comment.
Ok, I re-worked things and now the test appears to be fully robust to all but incredibly swamped CI machines (there is now a 30s connection timeout).
Polling for a unix socket to be opened by gunicorn has proven to be both fully robust via extended local testing and helps the test run a few seconds faster. Also switch gevent monkey-patching to being handled by gunicorn, which appears to be how this is actually done in production setups. With lazy imports reverted, the test fails as expected; so the monkey patching by gunicorn is confirmed to be happening.
The stdlib is wild here.
Previously the test could hang when the log fifo was closed by the
gunicorn server before the port was written to the log; now it fails
when this happens.
Follow up to #2417 which introduced the test and led to observed hangs
in CI.