Issue/8653 fix quickstart failures #141

Hugo-Inmanta · 2025-01-20T14:04:57Z

Closes inmanta/inmanta-core#8653

This reverts commit 64993d4.

This reverts commit 338e59b.

This reverts commit dec3585.

This reverts commit 86b1da5.

This reverts commit d7fda7a.

This reverts commit 75d0094.

sanderr

Preliminary review. Reviewed all but the do_test_deployment_and_verify script for OSS.

Networking/SR Linux/ci/Jenkinsfile

Networking/SR Linux/ci/do_test_deployment_and_verify.py

Networking/SR Linux/ci/Jenkinsfile

Networking/SR Linux/containerlab/topology.yml

sanderr · 2025-01-24T10:50:37Z

lsm-srlinux/ci/Jenkinsfile

+    matcher = isoProductVersion =~ /dev/
+    if (matcher.matches()) {
+        return "code.inmanta.com:4567/solutions/containers/service-orchestrator:dev"


Wouldn't it be simpler to move this to the second check? i.e. change \d+ to \d*, and check if that part matched anything?

Hugo-Inmanta · 2025-01-30T13:24:35Z

doc pr on core: inmanta/inmanta-core#8693

Networking/SR Linux/ci/do_test_deployment_and_verify.py

sanderr · 2025-01-31T13:59:37Z

Networking/SR Linux/ci/do_test_deployment_and_verify.py

I think it would be good to additionally request review from someone of the solutions team for this file.

Do you have anything specific in mind to pay attention to ?

Just whether this is a good approach to validate that everything was deployed. But looking it over once more, I don't think it necessarily has to be someone from the solutions team. I would add a second reviewer, but rather for the full PR, not just this file.

Hugo-Inmanta · 2025-02-03T13:38:42Z

@arnaudsjs Marking you as a second reviewer since you already worked on this issue iirc

Networking/SR Linux/ci/Jenkinsfile

arnaudsjs · 2025-02-03T14:07:01Z

Networking/SR Linux/ci/do_test_deployment_and_verify.py

+            print(stdout.decode())
+            print(stderr.decode())
+
+    async def check_successful_deploy(file: str, expected_resources: set[str]):


The name of this method is not very well chosen. It does more than just checking.

arnaudsjs · 2025-02-03T14:12:28Z

Networking/SR Linux/ci/do_test_deployment_and_verify.py

+    subprocess.check_call(
+        "sudo docker logs clab-srlinux-inmanta-server >server.log", shell=True
+    )
+    subprocess.check_call(
+        "sudo docker logs clab-srlinux-postgres >postgres.log", shell=True
+    )
+    subprocess.check_call(
+        "sudo docker exec -i clab-srlinux-inmanta-server sh -c cat /var/log/inmanta/resource-*.log >resource-actions.log",
+        shell=True,
+    )
+    subprocess.check_call(
+        "sudo docker exec -i clab-srlinux-inmanta-server sh -c cat /var/log/inmanta/agent-*.log >agents.log",
+        shell=True,
+    )


It's a bit weird that this file creates sub-processes using both the blocking and the non-blocking API. I think we can rely on the blocking API only. (Not necessarily a change request)

arnaudsjs · 2025-02-03T14:17:22Z

Networking/SR Linux/ci/do_test_deployment_and_verify.py

+    subprocess.check_call(
+        "sudo docker logs clab-srlinux-inmanta-server >server.log", shell=True
+    )
+    subprocess.check_call(
+        "sudo docker logs clab-srlinux-postgres >postgres.log", shell=True
+    )
+    subprocess.check_call(
+        "sudo docker exec -i clab-srlinux-inmanta-server sh -c cat /var/log/inmanta/resource-*.log >resource-actions.log",
+        shell=True,
+    )
+    subprocess.check_call(
+        "sudo docker exec -i clab-srlinux-inmanta-server sh -c cat /var/log/inmanta/agent-*.log >agents.log",
+        shell=True,
+    )


As far as I can see we only collect these logs if the deployment was successful, which pretty much defeats the purpose of collecting them.

arnaudsjs · 2025-02-03T14:18:48Z

Networking/SR Linux/ci/do_test_deployment_and_verify.py

+        return set(
+            [
+                f"yang::GnmiResource[spine,name=global],v={version}",
+                f"yang::GnmiResource[leaf2,name=global],v={version}",
+                f"yang::GnmiResource[leaf1,name=global],v={version}",
+                f"std::AgentConfig[internal,agentname=spine],v={version}",
+                f"std::AgentConfig[internal,agentname=leaf2],v={version}",
+                f"std::AgentConfig[internal,agentname=leaf1],v={version}",
+            ]
+        )


Suggested change

return set(

[

f"yang::GnmiResource[spine,name=global],v={version}",

f"yang::GnmiResource[leaf2,name=global],v={version}",

f"yang::GnmiResource[leaf1,name=global],v={version}",

f"std::AgentConfig[internal,agentname=spine],v={version}",

f"std::AgentConfig[internal,agentname=leaf2],v={version}",

f"std::AgentConfig[internal,agentname=leaf1],v={version}",

]

)

return {

f"yang::GnmiResource[spine,name=global],v={version}",

f"yang::GnmiResource[leaf2,name=global],v={version}",

f"yang::GnmiResource[leaf1,name=global],v={version}",

f"std::AgentConfig[internal,agentname=spine],v={version}",

f"std::AgentConfig[internal,agentname=leaf2],v={version}",

f"std::AgentConfig[internal,agentname=leaf1],v={version}",

}

arnaudsjs · 2025-02-03T14:20:31Z

Networking/SR Linux/ci/do_test_deployment_and_verify.py

+        async def done_deploying(expected_resources: set[str]) -> bool:
+            result = await client.resource_list(tid=environment_id, deploy_summary=True)


Suggested change

async def done_deploying(expected_resources: set[str]) -> bool:

result = await client.resource_list(tid=environment_id, deploy_summary=True)

async def done_deploying(expected_resources: set[str]) -> bool:

if not expected_resources:

return True

result = await client.resource_list(tid=environment_id, deploy_summary=True)

arnaudsjs · 2025-02-03T14:24:01Z

Networking/SR Linux/ci/do_test_deployment_and_verify.py

+        process = await asyncio.subprocess.create_subprocess_exec(
+            *cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE
+        )
+        try:
+            (stdout, stderr) = await asyncio.wait_for(process.communicate(), timeout=30)
+        except asyncio.TimeoutError as e:
+            process.kill()
+            (stdout, stderr) = await process.communicate()
+            raise e
+        finally:
+            print(stdout.decode())
+            print(stderr.decode())


Why first buffer all output in-memory and then only print if when the process finished. This means that we:

Don't see the output live in the Jenkins output

It's more heavy on memory consumption.

Refactored to use subprocess.check_call instead

arnaudsjs · 2025-02-03T15:04:48Z

Networking/SR Linux/ci/do_test_deployment_and_verify.py

+            print(stderr.decode())
+
+        async def done_deploying(expected_resources: set[str]) -> bool:
+            result = await client.resource_list(tid=environment_id, deploy_summary=True)


There is a race condition between the export command and this resource_list() command. The export command releases the version, but this happens asynchronously, so the resource_list() might be executed before the version is released. This way we might be checking an outdated state.

I might be missing something but I think this is currently acceptable since the done_deploying check is performed inside a retry_limited and the expected_resources parameter holds the version we want to check for ? ie

we release version N

we keep checking until resources for version N were correctly deployed

Currently the done_deploying method does an assert on the expected_resources, which means that it raises an exception in case of failure. The retry_limited doesn't catch this exception so it will crash. If we change this assertion to an if-condition that returns False in case of a mismatch, I agree with the above-mentioned statement.

Oof nice catch

sanderr and others added 30 commits January 17, 2025 12:27

release hacks

0502806

default

2442ec8

fix

d209614

no sudo

0d31ffe

fixes

64993d4

Revert "fixes"

2228169

This reverts commit 64993d4.

removed outdated pip url

338e59b

Revert "removed outdated pip url"

ebbef80

This reverts commit 338e59b.

setup.sh

a43f622

async fix

3fb0435

OSS fixes

c5f8fdf

attempt

dec3585

Revert "attempt"

86b1da5

This reverts commit dec3585.

Reapply "attempt"

d7fda7a

This reverts commit 86b1da5.

Revert "Reapply "attempt""

75d0094

This reverts commit d7fda7a.

Reapply "Reapply "attempt""

d616ff0

This reverts commit 75d0094.

wait

2cf773b

wip

ce14f99

wip

77e2640

wip

3eef27b

wip

6c04519

wip

77fe8a1

wip

9df5444

wip

5fef7b9

wip

49a048d

wip

74e3771

wip

a0c8155

wip

f8fd8aa

wip

19607f9

wip

5248dd3

sanderr requested changes Jan 24, 2025

View reviewed changes

sanderr self-requested a review January 24, 2025 10:57

sanderr mentioned this pull request Jan 24, 2025

Release hacks #139

Closed

Hugo-Inmanta added 2 commits January 30, 2025 14:44

remove code dir binding from topology file

601368d

refactor dev version logic

650b6a7

sanderr requested changes Jan 31, 2025

View reviewed changes

Hugo-Inmanta added 13 commits January 31, 2025 15:05

wip

2062d8f

remove redundant workdir argument

64f2fed

wip

b7e514b

wip

ee0c4bd

wip

31cc1c4

wip

0499788

wip

4b56cd3

wip

d527eea

add log archiving back

04f4ac7

remove setup.sh

341458f

wip

402ae54

wip

c8fb9dc

wip

12f97eb

sanderr approved these changes Feb 3, 2025

View reviewed changes

Hugo-Inmanta requested a review from arnaudsjs February 3, 2025 13:37

arnaudsjs requested changes Feb 4, 2025

View reviewed changes

Hugo-Inmanta added 3 commits February 4, 2025 09:54

wip

051719d

fix brittle logic

545d00c

revert unbuffered option removal

0ac01c7

Hugo-Inmanta requested a review from arnaudsjs February 4, 2025 09:20

arnaudsjs approved these changes Feb 4, 2025

View reviewed changes

Hugo-Inmanta merged commit f66765b into master Feb 4, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue/8653 fix quickstart failures #141

Issue/8653 fix quickstart failures #141

Hugo-Inmanta commented Jan 20, 2025 •

edited

Loading

sanderr left a comment

sanderr Jan 24, 2025

Hugo-Inmanta commented Jan 30, 2025

sanderr Jan 31, 2025

Hugo-Inmanta Feb 3, 2025

sanderr Feb 3, 2025

Hugo-Inmanta commented Feb 3, 2025

arnaudsjs Feb 3, 2025

arnaudsjs Feb 3, 2025

arnaudsjs Feb 3, 2025

arnaudsjs Feb 3, 2025

arnaudsjs Feb 3, 2025

arnaudsjs Feb 3, 2025

Hugo-Inmanta Feb 4, 2025

arnaudsjs Feb 3, 2025

Hugo-Inmanta Feb 4, 2025 •

edited

Loading

arnaudsjs Feb 4, 2025 •

edited

Loading

Hugo-Inmanta Feb 4, 2025

		async def done_deploying(expected_resources: set[str]) -> bool:
		result = await client.resource_list(tid=environment_id, deploy_summary=True)

Issue/8653 fix quickstart failures #141

Issue/8653 fix quickstart failures #141

Conversation

Hugo-Inmanta commented Jan 20, 2025 • edited Loading

sanderr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hugo-Inmanta commented Jan 30, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hugo-Inmanta commented Feb 3, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hugo-Inmanta Feb 4, 2025 • edited Loading

Choose a reason for hiding this comment

arnaudsjs Feb 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hugo-Inmanta commented Jan 20, 2025 •

edited

Loading

Hugo-Inmanta Feb 4, 2025 •

edited

Loading

arnaudsjs Feb 4, 2025 •

edited

Loading