-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue/8653 fix quickstart failures #141
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preliminary review. Reviewed all but the do_test_deployment_and_verify
script for OSS.
lsm-srlinux/ci/Jenkinsfile
Outdated
matcher = isoProductVersion =~ /dev/ | ||
if (matcher.matches()) { | ||
return "code.inmanta.com:4567/solutions/containers/service-orchestrator:dev" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be simpler to move this to the second check? i.e. change \d+
to \d*
, and check if that part matched anything?
doc pr on core: inmanta/inmanta-core#8693 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to additionally request review from someone of the solutions team for this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have anything specific in mind to pay attention to ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just whether this is a good approach to validate that everything was deployed. But looking it over once more, I don't think it necessarily has to be someone from the solutions team. I would add a second reviewer, but rather for the full PR, not just this file.
@arnaudsjs Marking you as a second reviewer since you already worked on this issue iirc |
print(stdout.decode()) | ||
print(stderr.decode()) | ||
|
||
async def check_successful_deploy(file: str, expected_resources: set[str]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this method is not very well chosen. It does more than just checking.
subprocess.check_call( | ||
"sudo docker logs clab-srlinux-inmanta-server >server.log", shell=True | ||
) | ||
subprocess.check_call( | ||
"sudo docker logs clab-srlinux-postgres >postgres.log", shell=True | ||
) | ||
subprocess.check_call( | ||
"sudo docker exec -i clab-srlinux-inmanta-server sh -c cat /var/log/inmanta/resource-*.log >resource-actions.log", | ||
shell=True, | ||
) | ||
subprocess.check_call( | ||
"sudo docker exec -i clab-srlinux-inmanta-server sh -c cat /var/log/inmanta/agent-*.log >agents.log", | ||
shell=True, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit weird that this file creates sub-processes using both the blocking and the non-blocking API. I think we can rely on the blocking API only. (Not necessarily a change request)
subprocess.check_call( | ||
"sudo docker logs clab-srlinux-inmanta-server >server.log", shell=True | ||
) | ||
subprocess.check_call( | ||
"sudo docker logs clab-srlinux-postgres >postgres.log", shell=True | ||
) | ||
subprocess.check_call( | ||
"sudo docker exec -i clab-srlinux-inmanta-server sh -c cat /var/log/inmanta/resource-*.log >resource-actions.log", | ||
shell=True, | ||
) | ||
subprocess.check_call( | ||
"sudo docker exec -i clab-srlinux-inmanta-server sh -c cat /var/log/inmanta/agent-*.log >agents.log", | ||
shell=True, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see we only collect these logs if the deployment was successful, which pretty much defeats the purpose of collecting them.
return set( | ||
[ | ||
f"yang::GnmiResource[spine,name=global],v={version}", | ||
f"yang::GnmiResource[leaf2,name=global],v={version}", | ||
f"yang::GnmiResource[leaf1,name=global],v={version}", | ||
f"std::AgentConfig[internal,agentname=spine],v={version}", | ||
f"std::AgentConfig[internal,agentname=leaf2],v={version}", | ||
f"std::AgentConfig[internal,agentname=leaf1],v={version}", | ||
] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return set( | |
[ | |
f"yang::GnmiResource[spine,name=global],v={version}", | |
f"yang::GnmiResource[leaf2,name=global],v={version}", | |
f"yang::GnmiResource[leaf1,name=global],v={version}", | |
f"std::AgentConfig[internal,agentname=spine],v={version}", | |
f"std::AgentConfig[internal,agentname=leaf2],v={version}", | |
f"std::AgentConfig[internal,agentname=leaf1],v={version}", | |
] | |
) | |
return { | |
f"yang::GnmiResource[spine,name=global],v={version}", | |
f"yang::GnmiResource[leaf2,name=global],v={version}", | |
f"yang::GnmiResource[leaf1,name=global],v={version}", | |
f"std::AgentConfig[internal,agentname=spine],v={version}", | |
f"std::AgentConfig[internal,agentname=leaf2],v={version}", | |
f"std::AgentConfig[internal,agentname=leaf1],v={version}", | |
} |
async def done_deploying(expected_resources: set[str]) -> bool: | ||
result = await client.resource_list(tid=environment_id, deploy_summary=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async def done_deploying(expected_resources: set[str]) -> bool: | |
result = await client.resource_list(tid=environment_id, deploy_summary=True) | |
async def done_deploying(expected_resources: set[str]) -> bool: | |
if not expected_resources: | |
return True | |
result = await client.resource_list(tid=environment_id, deploy_summary=True) |
process = await asyncio.subprocess.create_subprocess_exec( | ||
*cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE | ||
) | ||
try: | ||
(stdout, stderr) = await asyncio.wait_for(process.communicate(), timeout=30) | ||
except asyncio.TimeoutError as e: | ||
process.kill() | ||
(stdout, stderr) = await process.communicate() | ||
raise e | ||
finally: | ||
print(stdout.decode()) | ||
print(stderr.decode()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why first buffer all output in-memory and then only print if when the process finished. This means that we:
- Don't see the output live in the Jenkins output
- It's more heavy on memory consumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored to use subprocess.check_call
instead
print(stderr.decode()) | ||
|
||
async def done_deploying(expected_resources: set[str]) -> bool: | ||
result = await client.resource_list(tid=environment_id, deploy_summary=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a race condition between the export
command and this resource_list()
command. The export command releases the version, but this happens asynchronously, so the resource_list()
might be executed before the version is released. This way we might be checking an outdated state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something but I think this is currently acceptable since the done_deploying
check is performed inside a retry_limited
and the expected_resources
parameter holds the version we want to check for ? ie
- we release version N
- we keep checking until resources for version N were correctly deployed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the done_deploying
method does an assert on the expected_resources
, which means that it raises an exception in case of failure. The retry_limited
doesn't catch this exception so it will crash. If we change this assertion to an if-condition that returns False in case of a mismatch, I agree with the above-mentioned statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof nice catch
Closes inmanta/inmanta-core#8653