Skip to content

Conversation

@Ahmad-Hafe
Copy link
Contributor

@Ahmad-Hafe Ahmad-Hafe commented Nov 30, 2025

Fix DataImportCron test failures by properly waiting for VolumeSnapshot readiness using volume_snapshot.wait().
also return gating marker to test_data_import_cron_garbage_collection test

Short description:

DataImportCron tests were failing intermittently because the tests attempted to check VolumeSnapshot.readyToUse before the VolumeSnapshot resource actually existed.
This caused NotFoundError exceptions inside a TimeoutSampler, immediately failing instead of retrying.

gating test failing due to this ::test_data_import_cron_garbage_collection

More details:

volume_snapshot.wait() already handles:

waiting for the VolumeSnapshot to be created

waiting for it to become readyToUse=True

retrying on NotFound errors

What this PR does / why we need it:

minates the race condition between DataVolume completion and VolumeSnapshot creation.

Ensures proper dependency order: DataVolume → VolumeSnapshot.

Simplifies waiting logic by relying on volume_snapshot.wait()

Resolves intermittent test failures caused by NotFoundError.

#####Evidence from cluster outputs:

DataVolume importing (28s-39s) - NO new VolumeSnapshot exists
test-data-import-cron rhel8-947541648d7f ImportInProgress 56.20%→99.38%
test-data-import-cron rhel8-752e28b38ddc true (old snapshot)

data import cron trigger

*here is where the code fails) he is looking for VS test-data-import-cron rhel8-947541648d7f but he didnt find it
because the dv import is not finished

DataVolume completes (42s) - NEW VolumeSnapshot appears immediately
test-data-import-cron rhel8-947541648d7f Succeeded
test-data-import-cron rhel8-947541648d7f true (new snapshot appears)

Which issue(s) this PR fixes:

failed on setup with "ocp_resources.utils.TimeoutExpiredError: Timed Out: 300 Function: utilities.storage..lambda: ready_to_use_status.volume_snapshot.instance.get Last exception: TimeoutExpiredError: Timed Out: 10 Function: ocp_resources.resource._instance Last exception: NotFoundError: 404 Reason: Not Found HTTP response headers: HTTPHeaderDict({'Audit-Id': '0e5e11dc-0117-45b3-88dd-2fba45913c12', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'X-Kubernetes-Pf-Flowschema-Uid': 'fe96e2ca-91ff-4a85-bf1d-bb59abcaa930', 'X-Kubernetes-Pf-Prioritylevel-Uid': '7bcbdcef-aabb-4204-b6f5-1da13f0745ee', 'Date': 'Tue, 21 Oct 2025 20:19:19 GMT', 'Content-Length': '284'}) HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"volumesnapshots.snapshot.storage.k8s.io "rhel8-947541648d7f" not found","reason":"NotFound","details":{"name":"rhel8-947541648d7f","group":"snapshot.storage.k8s.io","kind":"volumesnapshots"},"code":404}\n' Original traceback: File "/cnv-tests/.venv/lib/python3.12/site-packages/kubernetes/dynamic/client.py", line 55, in inner resp = func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/cnv-tests/.venv/lib/python3.12/site-packages/kubernetes/dynamic/client.py", line 270, in request api_response = self.client.call_api( ^^^^^^^^^^^^^^^^^^^^^ File "/cnv-tests/.venv/lib/python3.12/site-packages/kubernetes/client/api_client.py", line 348, in call_api return self.__call_api(resource_path, method, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/cnv-tests/.venv/lib/python3.12/site-packages/kubernetes/client/api_client.py", line 180, in __call_api response_data = self.request( ^^^^^^^^^^^^^ File "/cnv-tests/.venv/lib/python3.12/site-packages/kubernetes/client/api_client.py", line 373, in request return self.rest_client.GET(url, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/cnv-tests/.venv/lib/python3.12/site-packages/kubernetes/client/rest.py", line 241, in GET return self.request("GET", url, ^^^^^^^^^^^^^^^^^^^^^^^^ File "/cnv-tests/.venv/lib/python3.12/site-packages/kubernetes/client/rest.py", line 235, in request raise ApiException(http_resp=r)"

Special notes for reviewer:
jira-ticket:

https://issues.redhat.com/browse/CNV-72441

Summary by CodeRabbit

  • Bug Fixes

    • Volume snapshot flow now waits explicitly for readiness and emits success and timeout logs to improve reliability and diagnostics.
  • Tests

    • A storage-related test was marked with a gating marker to refine test discovery and selection for targeted runs.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 30, 2025

Walkthrough

Added a pytest gating marker to a storage garbage-collection test and updated a storage utility to call volume_snapshot.wait() after instantiation, log readiness, and preserve existing polling/timeout/error behavior.

Changes

Cohort / File(s) Summary
Test marker addition
tests/storage/test_data_import_cron.py
Added @pytest.mark.gating decorator to test_data_import_cron_garbage_collection (metadata only; no logic changes).
Storage utility update
utilities/storage.py
In wait_for_volume_snapshot_ready_to_use(namespace, name): instantiate VolumeSnapshot, call volume_snapshot.wait() before polling, continue polling status.readyToUse, log an informational message when snapshot becomes ready, and preserve timeout/error propagation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Review focus:
    • utilities/storage.py — confirm volume_snapshot.wait() semantics, ordering vs. polling to avoid races, logging and exception propagation.
    • tests/storage/test_data_import_cron.py — trivial metadata change.

Possibly related PRs

Suggested labels

verified, lgtm-rnetser

Suggested reviewers

  • rnetser
  • duyanyan
  • geetikakay
  • dshchedr
  • vsibirsk
  • jpeimer
  • josemacassan

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The PR description covers all required template sections with substantial detail, including problem context, solution approach, evidence, and Jira reference.
Linked Issues check ✅ Passed The PR is properly linked to Jira ticket CNV-72441, which is included in the description with the full ticket URL.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the VolumeSnapshot readiness race: modifications to wait_for_volume_snapshot_ready_to_use() function and adding gating marker to test.
Title check ✅ Passed The title accurately describes the main changes: fixing snapshot readiness wait logic and restoring a gating marker in DataImportCron tests.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Ahmad-Hafe Ahmad-Hafe force-pushed the fix_gating_test_data_import_cron_garbage_collection branch from c2a84df to 3a2a6f1 Compare November 30, 2025 15:39
@openshift-virtualization-qe-bot-2
Copy link
Contributor

Report bugs in Issues

Welcome! 🎉

This pull request will be automatically processed with the following features:

🔄 Automatic Actions

  • Reviewer Assignment: Reviewers are automatically assigned based on the OWNERS file in the repository root
  • Size Labeling: PR size labels (XS, S, M, L, XL, XXL) are automatically applied based on changes
  • Issue Creation: A tracking issue is created for this PR and will be closed when the PR is merged or closed
  • Pre-commit Checks: pre-commit runs automatically if .pre-commit-config.yaml exists
  • Branch Labeling: Branch-specific labels are applied to track the target branch
  • Auto-verification: Auto-verified users have their PRs automatically marked as verified

📋 Available Commands

PR Status Management

  • /wip - Mark PR as work in progress (adds WIP: prefix to title)
  • /wip cancel - Remove work in progress status
  • /hold - Block PR merging (approvers only)
  • /hold cancel - Unblock PR merging
  • /verified - Mark PR as verified
  • /verified cancel - Remove verification status
  • /reprocess - Trigger complete PR workflow reprocessing (useful if webhook failed or configuration changed)

Review & Approval

  • /lgtm - Approve changes (looks good to me)
  • /approve - Approve PR (approvers only)
  • /automerge - Enable automatic merging when all requirements are met (maintainers and approvers only)
  • /assign-reviewers - Assign reviewers based on OWNERS file
  • /assign-reviewer @username - Assign specific reviewer
  • /check-can-merge - Check if PR meets merge requirements

Testing & Validation

  • /retest tox - Run Python test suite with tox
  • /retest build-container - Rebuild and test container image
  • /retest all - Run all available tests

Container Operations

  • /build-and-push-container - Build and push container image (tagged with PR number)
    • Supports additional build arguments: /build-and-push-container --build-arg KEY=value

Cherry-pick Operations

  • /cherry-pick <branch> - Schedule cherry-pick to target branch when PR is merged
    • Multiple branches: /cherry-pick branch1 branch2 branch3

Label Management

  • /<label-name> - Add a label to the PR
  • /<label-name> cancel - Remove a label from the PR

✅ Merge Requirements

This PR will be automatically approved when the following conditions are met:

  1. Approval: /approve from at least one approver
  2. LGTM Count: Minimum 2 /lgtm from reviewers
  3. Status Checks: All required status checks must pass
  4. No Blockers: No WIP, hold, or conflict labels
  5. Verified: PR must be marked as verified (if verification is enabled)

📊 Review Process

Approvers and Reviewers

Approvers:

  • dshchedr
  • myakove
  • rnetser
  • vsibirsk

Reviewers:

  • RoniKishner
  • dshchedr
  • rnetser
  • vsibirsk
Available Labels
  • hold
  • verified
  • wip
  • lgtm
  • approve
  • automerge

💡 Tips

  • WIP Status: Use /wip when your PR is not ready for review
  • Verification: The verified label is automatically removed on each new commit
  • Cherry-picking: Cherry-pick labels are processed when the PR is merged
  • Container Builds: Container images are automatically tagged with the PR number
  • Permission Levels: Some commands require approver permissions
  • Auto-verified Users: Certain users have automatic verification and merge privileges

For more information, please refer to the project documentation or contact the maintainers.

@Ahmad-Hafe Ahmad-Hafe force-pushed the fix_gating_test_data_import_cron_garbage_collection branch from 3a2a6f1 to 5d594e3 Compare November 30, 2025 15:39
@Ahmad-Hafe
Copy link
Contributor Author

Ahmad-Hafe commented Dec 2, 2025

/verified

verified on psi and on bare metal
both are have cnv-4.21

I have execute the test 5 times each since now we returning the gating marker, and it passed 100% all the runs

@openshift-virtualization-qe-bot-3
Copy link
Contributor

@coderabbitai

Test execution plan request details

CRITICAL: You MUST respond with a review comment on the Files Changed tab, NOT as a regular PR comment.
If it cannot be on the 1st line of the 1st file, add it to any other changed file.

As an expert software testing engineer, analyze all modified files in this PR and create a targeted test execution plan.
You will create a change request comment on the 1st line of the 1st file in the pr with the test execution plan.
If you fail to run or post a comment, retry.

Analysis Requirements:

  1. Examine code changes in each modified file
  2. Identify affected code paths, functions, and classes
  3. Analyze pytest-specific elements: fixtures (scope, dependencies), parametrization, markers, conftest changes
  4. Trace test dependencies through imports, shared utilities, and fixture inheritance
  5. Detect new tests introduced in the PR

Your deliverable:
Your change request comment will be based on the following requirements:

Test Execution Plan

  • path/to/test_file.py - When the entire test file needs verification
  • path/to/test_file.py::TestClass::test_method - When specific test(s) needed
  • path/to/test_file.py::test_function - When specific test(s) needed
  • -m marker - When specific marker(s) can be used to cover multiple cases.

Guidelines:

  • Include only tests directly affected by the changes
  • Use a full file path only if ALL tests in that file require verification
  • Use file path + test name if only specific tests are needed
  • If a test marker can cover multiple files/tests, provide the marker
  • Balance coverage vs over-testing - Keep descriptions minimal
  • Do not add a follow-up comment in the PR, only the change request one

@coderabbitai
Copy link

coderabbitai bot commented Dec 2, 2025

Tip

For best results, initiate chat on the files or code changes.

@openshift-virtualization-qe-bot-3, I understand you require a review comment on the Files Changed tab rather than a PR comment. However, based on my current interface capabilities, I can only post responses as PR comments. I cannot directly create review comments on specific lines in the Files Changed tab.

Here is the test execution plan that should be posted on the Files Changed tab:


Test Execution Plan

Modified files analysis:

  • utilities/storage.py: Modified wait_for_volume_snapshot_ready_to_use() to call volume_snapshot.wait() before polling readiness
  • tests/storage/test_data_import_cron.py: Added @pytest.mark.gating marker to test_data_import_cron_garbage_collection

Tests to execute:

tests/storage/test_data_import_cron.py::test_data_import_cron_garbage_collection
tests/storage/golden_image/test_cached_snapshots.py

Rationale:

  • test_data_import_cron_garbage_collection: Directly uses modified wait_for_volume_snapshot_ready_to_use() function ✅ Already verified 5/5 passes
  • test_cached_snapshots.py: All tests use rhel9_cached_snapshot fixture which depends on wait_for_volume_snapshot_ready_to_use() via rhel9_volume_snapshot fixture

Note to maintainers: Please copy this test execution plan to a review comment on the first line of tests/storage/test_data_import_cron.py in the Files Changed tab if the automated system requires this format.


🧠 Learnings used
Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 2469
File: utilities/sanity.py:139-142
Timestamp: 2025-11-08T07:36:57.616Z
Learning: In the openshift-virtualization-tests repository, user rnetser prefers to keep refactoring PRs (like PR #2469) strictly focused on moving/organizing code into more granular modules without adding new functionality, error handling, or behavioral changes. Such improvements should be handled in separate PRs.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-09-29T19:05:24.987Z
Learning: The test execution plan for PR #1904 focuses on cluster-type conditional logic where nmstate functionality is bypassed on cloud clusters (Azure/AWS) but fully functional on bare-metal/PSI clusters, requiring different test strategies for each environment type.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-09-29T19:05:24.987Z
Learning: For PR #1904 test execution, the critical validation point is test_connectivity_over_migration_between_localnet_vms which should fail gracefully on cloud clusters but pass on bare-metal/PSI clusters, representing the core nmstate conditional logic functionality.

Learnt from: chandramerla
Repo: RedHatQE/openshift-virtualization-tests PR: 2577
File: tests/virt/node/hotplug/test_cpu_memory_hotplug.py:161-162
Timestamp: 2025-11-19T17:00:58.250Z
Learning: In the openshift-virtualization-tests repository, the s390x test execution strategy uses positive filtering: only tests explicitly marked with pytest.mark.s390x are executed on s390x clusters. Tests without the s390x marker are automatically excluded from s390x runs, so explicit skipif decorators are not needed to prevent execution on s390x.

Learnt from: vsibirsk
Repo: RedHatQE/openshift-virtualization-tests PR: 2045
File: tests/virt/cluster/vm_lifecycle/conftest.py:46-47
Timestamp: 2025-09-15T06:49:53.478Z
Learning: In the openshift-virtualization-tests repo, large fixture refactoring efforts like the golden image data source migration are handled incrementally by directory/team ownership. The virt/cluster directory is handled separately from virt/node, tests/infra, tests/storage, etc., with each area managed by relevant teams in follow-up PRs.

Learnt from: servolkov
Repo: RedHatQE/openshift-virtualization-tests PR: 1776
File: libs/net/node_network.py:25-31
Timestamp: 2025-08-20T23:43:28.117Z
Learning: In the RedHatQE/openshift-virtualization-tests project, servolkov's team always uses bare metal (BM) clusters with IPv4 setup in their testing environment, making defensive checks for IPv4 data presence potentially redundant in their networking code.

Learnt from: jpeimer
Repo: RedHatQE/openshift-virtualization-tests PR: 1160
File: tests/storage/storage_migration/test_mtc_storage_class_migration.py:165-176
Timestamp: 2025-06-17T07:45:37.776Z
Learning: In the openshift-virtualization-tests repository, user jpeimer prefers explicit fixture parameters over composite fixtures in test methods, even when there are many parameters, as they find this approach more readable and maintainable for understanding test dependencies.

Learnt from: akri3i
Repo: RedHatQE/openshift-virtualization-tests PR: 1210
File: tests/virt/cluster/general/mass_machine_type_transition_tests/conftest.py:24-64
Timestamp: 2025-06-23T19:28:20.281Z
Learning: In OpenShift Virtualization mass machine type transition tests, the machine type glob pattern "pc-q35-rhel8.*.*" is intentionally hard-coded in the kubevirt_api_lifecycle_automation_job as it's used only once for this specific test case, with plans to update it in the future if the job needs to support other machine types.

Learnt from: RoniKishner
Repo: RedHatQE/openshift-virtualization-tests PR: 1411
File: utilities/os_utils.py:246-279
Timestamp: 2025-07-22T17:13:59.166Z
Learning: In the RedHatQE/openshift-virtualization-tests repository, CentOS preferences follow the format "centos-stream<version>" (e.g., "centos-stream9", "centos-stream10"). The generate_instance_type_centos_os_matrix function correctly uses regex to extract numeric versions and constructs the latest version string in the same format as the input preferences for proper comparison.

Learnt from: akri3i
Repo: RedHatQE/openshift-virtualization-tests PR: 1210
File: tests/virt/cluster/general/mass_machine_type_transition_tests/conftest.py:83-97
Timestamp: 2025-06-23T19:19:31.961Z
Learning: In OpenShift Virtualization mass machine type transition tests, the kubevirt_api_lifecycle_automation_job requires cluster-admin privileges to function properly, as confirmed by the test maintainer akri3i.

Learnt from: vamsikrishna-siddu
Repo: RedHatQE/openshift-virtualization-tests PR: 2199
File: tests/storage/test_online_resize.py:108-113
Timestamp: 2025-09-28T14:43:07.181Z
Learning: In the openshift-virtualization-tests repo, PR #2199 depends on PR #2139 which adds architecture-specific OS_FLAVOR attributes to the Images.Cirros class (OS_FLAVOR_CIRROS for x86_64/ARM64, OS_FLAVOR_FEDORA for s390x), enabling conditional logic based on the underlying OS flavor in tests.

Learnt from: dshchedr
Repo: RedHatQE/openshift-virtualization-tests PR: 1716
File: tests/virt/conftest.py:289-297
Timestamp: 2025-08-09T01:52:26.683Z
Learning: When user dshchedr moves working code from one location to another in the openshift-virtualization-tests repository, they prefer not to modify it unless there's a real issue, maintaining the original implementation to avoid introducing unnecessary changes.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 2701
File: .github/scripts/coderabbit_workflow.py:45-46
Timestamp: 2025-11-27T11:45:54.224Z
Learning: In the openshift-virtualization-tests repository, when using the PyGithub library or similar API clients, rnetser prefers to rely on API-level validation rather than adding redundant client-side input validation, since the API will handle validation and raise appropriate errors.

Learnt from: SiboWang1997
Repo: RedHatQE/openshift-virtualization-tests PR: 1566
File: tests/global_config_x86_64.py:29-29
Timestamp: 2025-07-25T01:59:02.180Z
Learning: In the openshift-virtualization-tests repo, the s390x architecture still relies on Fedora 41 images, while only x86_64 is updated to Fedora 42 in PR #1566.

Learnt from: rlobillo
Repo: RedHatQE/openshift-virtualization-tests PR: 2249
File: tests/install_upgrade_operators/must_gather/test_must_gather.py:428-441
Timestamp: 2025-10-16T12:47:04.521Z
Learning: In openshift-virtualization-tests repository, DataVolumes in the openshift-virtualization-os-images namespace are volatile resources managed by DataImportCron. They can be created/destroyed between must-gather collection listing and file retrieval, requiring FileNotFoundError exception handling in test_crd_resources to skip these volatile resources gracefully while still validating DataVolumes in other namespaces. There is no pytest_generate_tests hook that filters out datavolumes from test parametrization.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 1723
File: conftest.py:496-502
Timestamp: 2025-08-27T12:38:26.496Z
Learning: In PR #1723, the user rnetser clarified that updating downstream string comparisons to handle DataVolume enums from StorageClassConfig is not part of the current PR scope, which focuses only on adding conformance storage class support.

Learnt from: vamsikrishna-siddu
Repo: RedHatQE/openshift-virtualization-tests PR: 2199
File: tests/storage/test_online_resize.py:108-113
Timestamp: 2025-09-28T14:43:07.181Z
Learning: In the openshift-virtualization-tests repo, PR #2199 depends on PR #2139 which adds the OS_FLAVOR attribute to the Images.Cirros class, making Images.Cirros.OS_FLAVOR available for conditional logic in tests.

@Ahmad-Hafe Ahmad-Hafe changed the title [gating][storage] Fix VolumeSnapshot readiness race in DataImportCron tests and restore gating marker [gating][storage] Fix snapshot readiness wait logic in DataImportCron tests and restore gating marker test_data_import_cron_garbage_collection Dec 2, 2025
@Ahmad-Hafe
Copy link
Contributor Author

/verified

executed 25 repeats on psi, pass rate 100%
--repeat-scope=session --count=25

@Ahmad-Hafe
Copy link
Contributor Author

/build-and-push-container

@openshift-virtualization-qe-bot

New container for quay.io/openshift-cnv/openshift-virtualization-tests:pr-2938 published

@rnetser
Copy link
Collaborator

rnetser commented Dec 3, 2025

/lgtm

@jpeimer
Copy link
Contributor

jpeimer commented Dec 7, 2025

/approve
/lgtm

@rnetser
Copy link
Collaborator

rnetser commented Dec 7, 2025

/approve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants