You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 6, 2019. It is now read-only.
When running tests in Edge via Sauce Labs, we currently divide WPT into 100 segments (referred to by the WPT infrastructure as "chunks"). On 2018-12-26, the collection from chunk 10 of 100 appeared to complete successfully, but the Buildbot master did not detect completion, and after 20 minutes, the job timed out. This has occurred for that chunk with regularity in the weeks that have passed since.
Example of logs (from 2019-01-28)
2019-01-28 14:05:26,345 INFO validate-wpt-results wpt-run:stdout Testing ed53b0478faeb87abd71cd9972d4e2b7a6103241 == a7c05b88682fd5c1adb58984bb0a9edf46daee95
2019-01-28 14:05:26,345 INFO validate-wpt-results wpt-run:stdout 28 Jan 14:05:26 - Got signal terminated
2019-01-28 14:05:26,442 INFO validate-wpt-results wpt-run:stdout 28 Jan 14:05:26 - Cleaning up.
2019-01-28 14:05:26,442 INFO validate-wpt-results wpt-run:stdout 28 Jan 14:05:26 - Removing tunnel 9f7ba991b73046a593aab88fbfebe745.
2019-01-28 14:05:26,443 INFO validate-wpt-results wpt-run:stdout 28 Jan 14:05:26 - Waiting for any active jobs using this tunnel to finish.
2019-01-28 14:05:26,443 INFO validate-wpt-results wpt-run:stdout 28 Jan 14:05:26 - Press CTRL-C again to shut down immediately.
2019-01-28 14:05:26,443 INFO validate-wpt-results wpt-run:stdout 28 Jan 14:05:26 - Note: if you do this, tests that are still running will fail.
2019-01-28 14:05:31,267 INFO validate-wpt-results wpt-run:stdout 108:10.77 INFO Closing logging queue
2019-01-28 14:05:31,270 INFO validate-wpt-results wpt-run:stdout 108:10.77 INFO queue closed
2019-01-28 14:05:31,577 INFO validate-wpt-results WPT CLI exited with return code 1
2019-01-28 14:05:31,632 INFO validate-wpt-results Expected 1229 results
2019-01-28 14:05:31,633 INFO validate-wpt-results Found 1229 results
2019-01-28 14:05:31,633 INFO validate-wpt-results Found 0 unexpected results
2019-01-28 14:05:31,633 INFO validate-wpt-results Found 0 missing results
command timed out: 1200 seconds without output running ['run-and-verify.py', '--max-attempts', '3', '--log-wptreport', '/tmp/tmpXvRNyX/report.json', '--log-raw', '/tmp/tmpXvRNyX/log-raw.txt', '--', '--log-mach', '-', '--this-chunk', '10', '--total-chunks', '100', '--sauce-platform', 'windows 10', '--sauce-user', 'wpt-shiek', '--sauce-key', '<sauce_labs_key_shiek>', '--sauce-tunnel-id', 'shiek', '--sauce-connect-binary', 'sc', '--sauce-connect-arg=--logfile=/var/log/sauce-connect/sc.log', '--sauce-init-timeout', '45', '--no-restart-on-unexpected', '--run-by-dir', '3', 'sauce:MicrosoftEdge:17'], attempting to kill
SIGKILL failed to kill process
using fake rc=-1
program finished with exit code -1
remoteFailed: [Failure instance: Traceback from remote host -- exceptions.RuntimeError: SIGKILL failed to kill process
]
Between Buildbot, the "retry" wrapper script, the WPT CLI, and the Sauce Connect binary, the process tree is pretty deep for this code. My initial thought is that some thread is active and causing one of these processes to not report as complete, though the message "WPT CLI exited with return code 1" suggests this is happening at a pretty high level.
The text was updated successfully, but these errors were encountered:
I haven't been able to find a conclusive explanation for this behavior, nor have I been able to reproduce it under more controlled circumstances (running "chunk" number 10 in Edge takes about 2 hours).
Still, if the pattern is known to be susceptible to deadlocking, then we ought to avoid it. I've pushed up a commit to do just that. It's possible that this will also address our issue. While I'd certainly appreciate a more conclusive resolution, we need to time box investigation on this latest glitch.
Following a couple of successes, we failed to collect from the segment in question for the very same reason. I'll continue my effort to reproduce the bug locally and find the true fix.
When this issue is expressed, the Sauce Labs logs are conspicuously short. They abruptly end when the browser is accessing resources for the tests at html/infrastructure/urls/resolving-urls/query-encoding. It's odd that the WPT logs are complete in these cases, describing the results for all subsequent tests.
My working theory is that the Sauce Connect binary is falling into an uninterruptable sleep.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
When running tests in Edge via Sauce Labs, we currently divide WPT into 100 segments (referred to by the WPT infrastructure as "chunks"). On 2018-12-26, the collection from chunk 10 of 100 appeared to complete successfully, but the Buildbot master did not detect completion, and after 20 minutes, the job timed out. This has occurred for that chunk with regularity in the weeks that have passed since.
Example of logs (from 2019-01-28)
Between Buildbot, the "retry" wrapper script, the WPT CLI, and the Sauce Connect binary, the process tree is pretty deep for this code. My initial thought is that some thread is active and causing one of these processes to not report as complete, though the message "WPT CLI exited with return code 1" suggests this is happening at a pretty high level.
The text was updated successfully, but these errors were encountered: