Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: self-hosted agent - We stopped hearing from agent agent - when running ReadyAPI tests in parallel #4813

Open
1 of 4 tasks
LeaCCC opened this issue May 24, 2024 · 3 comments

Comments

@LeaCCC
Copy link

LeaCCC commented May 24, 2024

What happened?

I used DevOps pipeline and self-hosted Windows agents to run ReadyAPI tests in parallel.
The self-hosted agents were set up on our on-premise Windows Server.
Agent version is 3.239.1.
Windows Server 2019 Standard.

This issue happened multiple times this week.
The agent seems to have accepted the job request, but the pipeline process got stuck at "initialize job" stage.
After about 13 minutes, the pipeline failed with error below:
##[error]We stopped hearing from agent agentTFSServer4. Verify the agent machine is running and has a healthy network connection. Anything that terminates an agent process, starves it for CPU, or blocks its network access can cause this error. For more information, see: https://go.microsoft.com/fwlink/?linkid=846610

The agent log shows this message below is repeated for about 11 minutes:
"Sent GetAgentMessage to keep alive agent 80,"

Then we got this Visual Studio Service error:
PATCH request to https://dev.azure.com/cccgovtnz/_apis/distributedtask/pools/35/jobrequests/19344 failed. HTTP Status: BadRequest, AFD Ref: Ref A: F32BF2EFB8EF4E7A9CD5388CA79CD0B7 Ref B: AKL30EDGE0210 Ref C: 2024-05-20T03:30:39Z

the logging afterwards shows the job request is no longer valid to be renewed:
[2024-05-20 03:30:39Z INFO JobDispatcher] TaskAgentJobTokenExpiredException received renew job request 19344, job is no longer valid, stop renew job request.
[2024-05-20 03:30:39Z INFO JobDispatcher] Unable to renew job request for job 275f1d19-1bd8-5591-b06b-07d489ea915a for the first time, stop dispatching job to worker.

Versions

Azure DevOps self hosted agents version 3.239.1, windows

Environment type (Please select at least one enviroment where you face this issue)

  • Self-Hosted
  • Microsoft Hosted
  • VMSS Pool
  • Container

Azure DevOps Server type

dev.azure.com (formerly visualstudio.com)

Azure DevOps Server Version (if applicable)

No response

Operation system

Windows Server 2019 Standard

Version controll system

No response

Relevant log output

[2024-05-20 03:19:21Z INFO RSAEncryptedFileKeyManager] Loading RSA key parameters from file C:\agentDevOps4\.credentials_rsaparams
[2024-05-20 03:19:21Z INFO MessageListener] Message '43' received from session '89294bff-b3eb-417a-afb0-2f951587c456'.
[2024-05-20 03:19:21Z INFO JobDispatcher] Job request 19344 for plan d75273f5-3cc3-42c7-9095-dcc1f355a53a job 275f1d19-1bd8-5591-b06b-07d489ea915a received.
[2024-05-20 03:19:21Z INFO Terminal] WRITE LINE: 2024-05-20 03:19:21Z: Running job: Agent job - parallel readyAPI Test Use Repo3 - DEV - CCOTFSDPLYUI01
[2024-05-20 03:19:52Z INFO MessageListener] Sent GetAgentMessage to keep alive agent 80, session '89294bff-b3eb-417a-afb0-2f951587c456'.
[2024-05-20 03:20:23Z INFO MessageListener] Sent GetAgentMessage to keep alive agent 80, session '89294bff-b3eb-417a-afb0-2f951587c456'.
[2024-05-20 03:20:54Z INFO MessageListener] Sent GetAgentMessage to keep alive agent 80, session '89294bff-b3eb-417a-afb0-2f951587c456'.
[2024-05-20 03:21:25Z INFO MessageListener] Sent GetAgentMessage to keep alive agent 80, session '89294bff-b3eb-417a-afb0-2f951587c456'.
[2024-05-20 03:21:56Z INFO MessageListener] Sent GetAgentMessage to keep alive agent 80, session '89294bff-b3eb-417a-afb0-2f951587c456'.
[2024-05-20 03:22:27Z INFO MessageListener] Sent GetAgentMessage to keep alive agent 80, session '89294bff-b3eb-417a-afb0-2f951587c456'.
[2024-05-20 03:22:58Z INFO MessageListener] Sent GetAgentMessage to keep alive agent 80, session '89294bff-b3eb-417a-afb0-2f951587c456'.
[2024-05-20 03:23:29Z INFO MessageListener] Sent GetAgentMessage to keep alive agent 80, session '89294bff-b3eb-417a-afb0-2f951587c456'.
[2024-05-20 03:24:00Z INFO MessageListener] Sent GetAgentMessage to keep alive agent 80, session '89294bff-b3eb-417a-afb0-2f951587c456'.
[2024-05-20 03:24:31Z INFO MessageListener] Sent GetAgentMessage to keep alive agent 80, session '89294bff-b3eb-417a-afb0-2f951587c456'.
[2024-05-20 03:25:02Z INFO MessageListener] Sent GetAgentMessage to keep alive agent 80, session '89294bff-b3eb-417a-afb0-2f951587c456'.
[2024-05-20 03:25:33Z INFO MessageListener] Sent GetAgentMessage to keep alive agent 80, session '89294bff-b3eb-417a-afb0-2f951587c456'.
[2024-05-20 03:26:04Z INFO MessageListener] Sent GetAgentMessage to keep alive agent 80, session '89294bff-b3eb-417a-afb0-2f951587c456'.
[2024-05-20 03:26:35Z INFO MessageListener] Sent GetAgentMessage to keep alive agent 80, session '89294bff-b3eb-417a-afb0-2f951587c456'.
[2024-05-20 03:27:06Z INFO MessageListener] Sent GetAgentMessage to keep alive agent 80, session '89294bff-b3eb-417a-afb0-2f951587c456'.
[2024-05-20 03:27:37Z INFO MessageListener] Sent GetAgentMessage to keep alive agent 80, session '89294bff-b3eb-417a-afb0-2f951587c456'.
[2024-05-20 03:28:09Z INFO MessageListener] Sent GetAgentMessage to keep alive agent 80, session '89294bff-b3eb-417a-afb0-2f951587c456'.
[2024-05-20 03:28:40Z INFO MessageListener] Sent GetAgentMessage to keep alive agent 80, session '89294bff-b3eb-417a-afb0-2f951587c456'.
[2024-05-20 03:29:11Z INFO MessageListener] Sent GetAgentMessage to keep alive agent 80, session '89294bff-b3eb-417a-afb0-2f951587c456'.
[2024-05-20 03:29:42Z INFO MessageListener] Sent GetAgentMessage to keep alive agent 80, session '89294bff-b3eb-417a-afb0-2f951587c456'.
[2024-05-20 03:30:13Z INFO MessageListener] Sent GetAgentMessage to keep alive agent 80, session '89294bff-b3eb-417a-afb0-2f951587c456'.
[2024-05-20 03:30:39Z INFO AgentServer] Refresh JobRequest VssConnection to get on a different AFD node.
[2024-05-20 03:30:39Z INFO AgentServer] Establish connection with 30 seconds timeout.
[2024-05-20 03:30:39Z INFO VisualStudioServices] Starting operation Location.GetConnectionData
[2024-05-20 03:30:39Z INFO VisualStudioServices] Finished operation Location.GetConnectionData
[2024-05-20 03:30:39Z INFO JobDispatcher] Start renew job request 19344 for job 275f1d19-1bd8-5591-b06b-07d489ea915a.
[2024-05-20 03:30:39Z ERR  VisualStudioServices] PATCH request to https://dev.azure.com/cccgovtnz/_apis/distributedtask/pools/35/jobrequests/19344 failed. HTTP Status: BadRequest, AFD Ref: Ref A: F32BF2EFB8EF4E7A9CD5388CA79CD0B7 Ref B: AKL30EDGE0210 Ref C: 2024-05-20T03:30:39Z
[2024-05-20 03:30:39Z INFO JobDispatcher] TaskAgentJobTokenExpiredException received renew job request 19344, job is no longer valid, stop renew job request.
[2024-05-20 03:30:39Z INFO JobDispatcher] Unable to renew job request for job 275f1d19-1bd8-5591-b06b-07d489ea915a for the first time, stop dispatching job to worker.
@DmitriiBobreshev
Copy link
Contributor

Hi @LeaCCC, Thanks for the feedback! I suspect that might be connected with the worker stuck job from the issue #4812, We'll try to figure out the problem soon, but now we have more prioritized items.

@LeaCCC
Copy link
Author

LeaCCC commented May 26, 2024

Hi Dmitrii, thank you for triaging this issue. I did see the same issue being mentioned a few times by other users as well ( please see urls below) I hope the priority can be raised in the future. thanks, lea

https://learn.microsoft.com/en-us/answers/questions/1179302/(error)we-stopped-hearing-from-agent-azure-pipelin?comment=question#newest-question-comment

#3994
#4313

@LeaCCC
Copy link
Author

LeaCCC commented May 28, 2024

Hi Dmitrii, just a comment that bug 4813 did not happen with bug 4812 during my trials. thanks, lea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants