Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libvirt: Enable multiple PodVM image scenario #2061

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ajaypvictor
Copy link
Contributor

Enable support for multiple PodVM images for libvirt provider with the merge of kata-containers/kata-containers#10274

Fixes #1282

Working Scenario:
Tried with the following busybox manifest:

apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: openshift-sandboxed-containers-operator
  labels:
    app: busybox
  annotations:
    io.katacontainers.config.hypervisor.image: "pppvm-nov-vol.qcow2"
spec:
  containers:
  - image: quay.io/prometheus/busybox
    imagePullPolicy: Always
    name: busybox-container
    ports:
    - containerPort: 80
      protocol: TCP
  restartPolicy: Always
  runtimeClassName: kata-remote

Here the pppvm-nov-vol.qcow2 is a valid libvirt volume and a podvm image is uploaded there.

CAA logs:

2024/09/26 09:20:17 [adaptor/cloud] initdata in Pod annotation:
2024/09/26 09:20:17 [adaptor/cloud] initdata in pod annotation is empty, use global initdata:
2024/09/26 09:20:17 [adaptor/cloud] create a sandbox 20a284a33e1e6222844abd4f92bda5ee7b897d752f2b3a505f4f32b5ef5eb33e for pod busybox in namespace openshift-sandboxed-containers-operator (netns: /var/run/netns/7745570f-3b24-4f0e-a623-e060ded3f3db)
2024/09/26 09:20:17 [adaptor/cloud/libvirt] LaunchSecurityType: None
2024/09/26 09:20:17 [adaptor/cloud/libvirt] Choosing pppvm-nov-vol.qcow2 as libvirt volume for the PodVM image
2024/09/26 09:20:17 [adaptor/cloud/libvirt] Checking if instance (podvm-busybox-20a284a3) exists
2024/09/26 09:20:17 [adaptor/cloud/libvirt] Uploaded volume key /var/lib/libvirt/images/kata-pod-vms/podvm-busybox-20a284a3-root.qcow2
2024/09/26 09:20:17 [adaptor/cloud/libvirt] Create cloudInit iso
2024/09/26 09:20:17 [adaptor/cloud/libvirt] Uploading iso file: podvm-busybox-20a284a3-cloudinit.iso
......
......
2024/09/26 09:21:28 [adaptor/proxy]         io.katacontainers.pkg.oci.container_type: pod_container
2024/09/26 09:21:28 [adaptor/proxy]         io.kubernetes.cri-o.Image: quay.io/prometheus/busybox@sha256:59d0ed3060aef57d1b23bc353a2223af24a6e1d035486647eb599a77ff2d446e
2024/09/26 09:21:28 [adaptor/proxy]         io.kubernetes.cri-o.Labels: {"io.kubernetes.container.name":"busybox-container","io.kubernetes.pod.name":"busybox","io.kubernetes.pod.namespace":"openshift-sandboxed-containers-operator","io.kubernetes.pod.uid":"ff0368c7-c9f1-4c2f-9c95-3a8a168c6ed8"}
2024/09/26 09:21:28 [adaptor/proxy]         io.katacontainers.config.hypervisor.image: pppvm-nov-vol.qcow2
2024/09/26 09:21:28 [adaptor/proxy]         io.kubernetes.container.restartCount: 0

Failure Scenario:
Tried with the following busybox manifest:

apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: openshift-sandboxed-containers-operator
  labels:
    app: busybox
  annotations:
    io.katacontainers.config.hypervisor.image: "rhel9-os"
spec:
  containers:
  - image: quay.io/prometheus/busybox
    imagePullPolicy: Always
    name: busybox-container
    ports:
    - containerPort: 80
      protocol: TCP
  restartPolicy: Always
  runtimeClassName: kata-remote

Here the rhel9-os is a dummy variable and not an actual libvirt volume.

CAA logs:

2024/09/26 09:29:09 [adaptor/cloud] create a sandbox 57bbf987d92ff21b508adddff69ef36a400f6ba7e9dfaabf13c1b54897aca0d3 for pod busybox in namespace openshift-sandboxed-containers-operator (netns: /var/run/netns/dd971e3c-d1e8-4156-8832-35ead351c3b0)
2024/09/26 09:29:09 [adaptor/cloud/libvirt] LaunchSecurityType: None
2024/09/26 09:29:09 [adaptor/cloud/libvirt] Choosing rhel9-os as libvirt volume for the PodVM image
2024/09/26 09:29:09 [adaptor/cloud/libvirt] Checking if instance (podvm-busybox-57bbf987) exists
2024/09/26 09:29:09 [adaptor/cloud/libvirt] failed to create an instance : Error in creating volume: Can't retrieve volume rhel9-os
2024/09/26 09:29:09 [adaptor/cloud] error starting instance: creating an instance : Error in creating volume: Can't retrieve volume rhel9-os
2024/09/26 09:29:09 [adaptor/proxy] shutting down socket forwarder

@ajaypvictor ajaypvictor requested a review from a team as a code owner September 26, 2024 09:33
@stevenhorsman
Copy link
Member

I think this need to be draft until #2036 which bumps us to the 3.9.0 version of the kata runtime is merged?

@ajaypvictor ajaypvictor marked this pull request as draft September 26, 2024 09:47
@ajaypvictor ajaypvictor marked this pull request as ready for review September 30, 2024 01:38
@snir911
Copy link
Contributor

snir911 commented Sep 30, 2024

I missed the kata-containers PR, do we expect other providers to use the same annotations for their podvm images? e.g. in AWS, will io.katacontainers.config.hypervisor.image point to an AMI image id?

@ajaypvictor
Copy link
Contributor Author

I missed the kata-containers PR, do we expect other providers to use the same annotations for their podvm images? e.g. in AWS, will io.katacontainers.config.hypervisor.image point to an AMI image id?

Hi Snir,
As you correctly pointed it's expected to have this annotation for other providers as well.
The discussion about this happened on the kata-dev slack channel.

@@ -79,6 +79,11 @@ func (p *libvirtProvider) CreateInstance(ctx context.Context, podName, sandboxID
}
logger.Printf("LaunchSecurityType: %s", vm.launchSecurityType.String())

if spec.Image != "" {
logger.Printf("Choosing %s as libvirt volume for the PodVM image", spec.Image)
p.libvirtClient.volName = spec.Image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ajay, I'd like to understand better how this change is propagated through if needed. My (very limited) understanding is that we set this value initially when we create the libvirt provider client, so it overriding that here sufficient to pick up this custom specified image, or is there any other code we need to skip to stop the previous version being used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Steve,
Exactly, that's what my understanding also is..

With the enhancement of CreateVM() function to support getting the image from GetImageFromAnnotation(), we are able to retrieve the same on CreateInstance() and then furthermore override the default Volume Name which is set initially once the CAA comes up for the libvirt provider as you pointed.

I guess with the CreateDomain() flow starting post overriding the volume and initially the volume being picked up from the LIBVIRT_VOL_NAME which is being assigned to libvirtcfg.VolName, we might not have anywhere else to update?

Kindly suggest as I might be missing something here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, there isn't anything I know that is missing, I just wanted to check my understanding.

I'm assuming that you've tested this locally. Is there a way to integrate this into our e2e tests for libvirt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Steve..

I have posted the test results (CAA logs) for these changes on the PR description.

I will start exploring the options of adding e2e tests for this..

@ajaypvictor ajaypvictor force-pushed the multiple-podvm-images branch 2 times, most recently from 077b833 to 4f00cce Compare October 3, 2024 15:37
@ajaypvictor
Copy link
Contributor Author

e2e test results:

test:~/cloud-api-adaptor/src/cloud-api-adaptor# RUN_TESTS=TestLibvirtCreatePeerPodContainerWithAlternateImage TEST_PROVISION=no TEST_INSTALL_CAA=no make TEST_PROVISION_FILE=/root/cloud-api-adaptor/src/cloud-api-adaptor/libvirt.properties CLOUD_PROVIDER=libvirt test-e2e
go test -v -tags=libvirt -timeout 60m -count=1 -run TestLibvirtCreatePeerPodContainerWithAlternateImage ./test/e2e
time="2024-10-03T15:36:01Z" level=info msg="Do setup"
time="2024-10-03T15:36:01Z" level=info msg="Container runtime: containerd"
time="2024-10-03T15:36:01Z" level=info msg="Creating namespace 'coco-pp-e2e-test-5ef29963'..."
time="2024-10-03T15:36:01Z" level=info msg="Wait for namespace 'coco-pp-e2e-test-5ef29963' be ready..."
time="2024-10-03T15:36:06Z" level=info msg="Wait for default serviceaccount in namespace 'coco-pp-e2e-test-5ef29963'..."
time="2024-10-03T15:36:06Z" level=info msg="default serviceAccount exists, namespace 'coco-pp-e2e-test-5ef29963' is ready for use"
=== RUN   TestLibvirtCreatePeerPodContainerWithAlternateImage
=== RUN   TestLibvirtCreatePeerPodContainerWithAlternateImage/PodVMwithAnnotationsAlternateImage_test
=== RUN   TestLibvirtCreatePeerPodContainerWithAlternateImage/PodVMwithAnnotationsAlternateImage_test/Failed_to_Create_PodVM_with_a_non-existent_image
=== NAME  TestLibvirtCreatePeerPodContainerWithAlternateImage/PodVMwithAnnotationsAlternateImage_test
    assessment_runner.go:617: Deleting pod annotations-instance-type...
    assessment_runner.go:624: Pod annotations-instance-type has been successfully deleted within 60s
--- PASS: TestLibvirtCreatePeerPodContainerWithAlternateImage (10.03s)
    --- PASS: TestLibvirtCreatePeerPodContainerWithAlternateImage/PodVMwithAnnotationsAlternateImage_test (10.03s)
        --- PASS: TestLibvirtCreatePeerPodContainerWithAlternateImage/PodVMwithAnnotationsAlternateImage_test/Failed_to_Create_PodVM_with_a_non-existent_image (0.00s)
PASS
time="2024-10-03T15:36:16Z" level=info msg="Deleting namespace 'coco-pp-e2e-test-5ef29963'..."
time="2024-10-03T15:36:26Z" level=info msg="Namespace 'coco-pp-e2e-test-5ef29963' has been successfully deleted within 60s"
ok  	github.com/confidential-containers/cloud-api-adaptor/src/cloud-api-adaptor/test/e2e	25.107s

@@ -406,6 +406,32 @@ func DoTestPodVMwithAnnotationsLargerCPU(t *testing.T, e env.Environment, assert
NewTestCase(t, e, "PodVMwithAnnotationsLargerCPU", assert, "Failed to Create PodVM with Annotations Larger CPU").WithPod(pod).WithInstanceTypes(testInstanceTypes).WithCustomPodState(v1.PodPending).Run()
}

func DoTestCreatePeerPodContainerWithAlternateImage(t *testing.T, e env.Environment, assert CloudAssert) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the negative test is a good start to check that the code path works. I'm wondering if we could use the provisioner.UploadPodvm logic to upload another podvmImage if a test image is provided (I guess the same image with a different name might be the best we can hope for) and then switch the annotation to try using that. I guess the problem then becomes how we check that the alternate podvm is being used might be tricky?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's true.. but if checking the CAA logs and ensuring that the PodVM picked up is the alternate one as per the annotation sounds good, then I guess we can try that option?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - the CAA logs would be a good place to check - I didn't think about that. I'd say it's probably worth spending a bit of time to investigate this approach at least then

Enable support for multiple PodVM images for libvirt provider with the merge of kata-containers/kata-containers#10274

Fixes confidential-containers#1282

Signed-off-by: Ajay Victor <[email protected]>
@ajaypvictor
Copy link
Contributor Author

Latest e2e test results:

~/cloud-api-adaptor/src/cloud-api-adaptor# RUN_TESTS="^TestLibvirtCreatePeerPodContainerWith\(Valid\|Invalid\)AlternateImage$" TEST_PROVISION=no TEST_INSTALL_CAA=no make TEST_PROVISION_FILE=/root/cloud-api-adaptor/src/cloud-api-adaptor/libvirt.properties CLOUD_PROVIDER=libvirt test-e2e
go test -v -tags=libvirt -timeout 60m -count=1 -run ^TestLibvirtCreatePeerPodContainerWith\(Valid\|Invalid\)AlternateImage$ ./test/e2e
time="2024-10-08T09:35:56Z" level=info msg="Do setup"
time="2024-10-08T09:35:56Z" level=info msg="Container runtime: containerd"
time="2024-10-08T09:35:56Z" level=info msg="Creating namespace 'coco-pp-e2e-test-0e4be14f'..."
time="2024-10-08T09:35:56Z" level=info msg="Wait for namespace 'coco-pp-e2e-test-0e4be14f' be ready..."
time="2024-10-08T09:36:01Z" level=info msg="Wait for default serviceaccount in namespace 'coco-pp-e2e-test-0e4be14f'..."
time="2024-10-08T09:36:01Z" level=info msg="default serviceAccount exists, namespace 'coco-pp-e2e-test-0e4be14f' is ready for use"
=== RUN   TestLibvirtCreatePeerPodContainerWithValidAlternateImage
=== RUN   TestLibvirtCreatePeerPodContainerWithValidAlternateImage/PodVMwithAnnotationsValidAlternateImage_test
    assessment_runner.go:270: Waiting for containers in pod: annotations-valid-alternate-image are ready
=== RUN   TestLibvirtCreatePeerPodContainerWithValidAlternateImage/PodVMwithAnnotationsValidAlternateImage_test/PodVM_created_with_an_alternate_image
    assessment_runner.go:569: PodVM was brought up using the alternate PodVM image another-podvm-base.qcow2
=== NAME  TestLibvirtCreatePeerPodContainerWithValidAlternateImage/PodVMwithAnnotationsValidAlternateImage_test
    assessment_runner.go:650: Deleting pod annotations-valid-alternate-image...
    assessment_runner.go:657: Pod annotations-valid-alternate-image has been successfully deleted within 60s
--- PASS: TestLibvirtCreatePeerPodContainerWithValidAlternateImage (95.05s)
    --- PASS: TestLibvirtCreatePeerPodContainerWithValidAlternateImage/PodVMwithAnnotationsValidAlternateImage_test (95.05s)
        --- PASS: TestLibvirtCreatePeerPodContainerWithValidAlternateImage/PodVMwithAnnotationsValidAlternateImage_test/PodVM_created_with_an_alternate_image (0.03s)
=== RUN   TestLibvirtCreatePeerPodContainerWithInvalidAlternateImage
=== RUN   TestLibvirtCreatePeerPodContainerWithInvalidAlternateImage/PodVMwithAnnotationsInvalidAlternateImage_test
=== RUN   TestLibvirtCreatePeerPodContainerWithInvalidAlternateImage/PodVMwithAnnotationsInvalidAlternateImage_test/Failed_to_Create_PodVM_with_a_non-existent_image
=== NAME  TestLibvirtCreatePeerPodContainerWithInvalidAlternateImage/PodVMwithAnnotationsInvalidAlternateImage_test
    assessment_runner.go:650: Deleting pod annotations-invalid-alternate-image...
    assessment_runner.go:657: Pod annotations-invalid-alternate-image has been successfully deleted within 60s
--- PASS: TestLibvirtCreatePeerPodContainerWithInvalidAlternateImage (10.03s)
    --- PASS: TestLibvirtCreatePeerPodContainerWithInvalidAlternateImage/PodVMwithAnnotationsInvalidAlternateImage_test (10.03s)
        --- PASS: TestLibvirtCreatePeerPodContainerWithInvalidAlternateImage/PodVMwithAnnotationsInvalidAlternateImage_test/Failed_to_Create_PodVM_with_a_non-existent_image (0.00s)
PASS
time="2024-10-08T09:37:47Z" level=info msg="Deleting namespace 'coco-pp-e2e-test-0e4be14f'..."
time="2024-10-08T09:37:57Z" level=info msg="Namespace 'coco-pp-e2e-test-0e4be14f' has been successfully deleted within 60s"
ok  	github.com/confidential-containers/cloud-api-adaptor/src/cloud-api-adaptor/test/e2e	120.159s

Copy link
Member

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few refactor suggestions, but otherwise it looks good. Thanks!

Comment on lines 556 to 579
var podlist v1.PodList
var podLogString string
expectedSuccessMessage := "Choosing " + tc.alternateImageName
if err := client.Resources("confidential-containers-system").List(ctx, &podlist); err != nil {
t.Fatal(err)
}
for _, pod := range podlist.Items {
if pod.Labels["app"] == "cloud-api-adaptor" {
podLogString, _ = GetPodLog(ctx, client, pod)
break
}
}
if strings.Contains(podLogString, expectedSuccessMessage) {
t.Logf("PodVM was brought up using the alternate PodVM image %s", tc.alternateImageName)
} else {
t.Logf("cloud-api-adaptor pod logs: %s", podLogString)
yamlData, err := yaml.Marshal(tc.pod.Status)
if err != nil {
log.Errorf("Error marshaling pod.Status to JSON: %s", err.Error())
} else {
t.Logf("Current Pod State: %v", string(yamlData))
}
t.Fatal("failed due to unknown reason")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional request: Could we move this section to assessment_helper.go as a new function? I'm trying to reduce how massive the assessment_runner has got with this strategy in some WIP commits I'm working on on the side.

@@ -68,6 +69,9 @@ func NewLibvirtProvisioner(properties map[string]string) (pv.CloudProvisioner, e
vol_name = properties["libvirt_vol_name"]
}

// alternate_vol_name is used for multi-podvm testing.
alternate_vol_name := "another-podvm-base.qcow2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a possibility to extract this as a constant, so we can reference it in libvirt_test rather than having to copy the value there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I tried to do this way, but unfortunately common.go inside the e2e folder is not extending to this provision_common.go, also thought about adding it to common.go under provisioner, which is again not referenced on libvirt_test/common_suite under e2e. if it sounds fair to import, i can try this out..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. One approach we could take (which I'm not sure is even good) is create the const in libvirt/provision_common.go and then we can reference it in libvirt_test.go and pass it in as an argument to the common_test.go version? Otherwise I think we can leave it for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Steve, this sounds like an idea, will test it out..

@@ -135,6 +136,11 @@ func (tc *TestCase) WithInstanceTypes(testInstanceTypes InstanceValidatorFunctio
return tc
}

func (tc *TestCase) WithMultiplePodVM(alternateImageName string) *TestCase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if WithMultiplePodVM is the best name here? Maybe something like WithAlternateImage would be clearer?

Negative test for multiple PodVM images on libvirt provider with a non existing volume

Fixes confidential-containers#1282

Signed-off-by: Ajay Victor <[email protected]>
@ajaypvictor
Copy link
Contributor Author

ajaypvictor commented Oct 9, 2024

@stevenhorsman Would it be fair to add test_e2e_libvirt label now, if we are good to test the PR via the pipeline?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow modifying the VM image used for the podvm instance
3 participants