-
Notifications
You must be signed in to change notification settings - Fork 838
Add ARM64 architecture support to integration tests #7068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
7e3bd5d to
64bceac
Compare
a9e3e5d to
ce1d513
Compare
This commit adds ARM64 runner support to the CI pipeline to ensure integration tests run on both amd64 and arm64 architectures, as ARM64 images are widely used in production. Changes: - Add matrix strategy to integration job with separate runners for amd64 (ubuntu-24.04) and arm64 (ubuntu-24.04-arm) - Dynamically set CORTEX_IMAGE based on matrix.arch variable - Add matrix strategy to integration-configs-db job for both architectures - Add appropriate timeouts to accommodate ARM64 test execution times - Set fail-fast: false to ensure all architecture tests complete All existing amd64 tests remain unchanged, and ARM64 tests use the same test suites with architecture-appropriate Docker images. Fixes cortexproject#6897 Signed-off-by: thc1006 <[email protected]>
The script was hardcoded to download x86_64 Docker binaries, causing "Exec format error" on ARM64 runners. This commit adds architecture detection to download the appropriate binaries for both amd64 and arm64. Changes: - Add architecture detection using uname -m - Map system architecture to Docker download paths (x86_64/aarch64) - Map architecture to buildx binary names (amd64/arm64) - Add informative echo to show detected architecture - Add error handling for unsupported architectures This fix is required for ARM64 integration tests to run successfully. Signed-off-by: thc1006 <[email protected]>
These tests fail on ARM64 runners and should only execute on AMD64: ## integration_backward_compatibility Old Cortex versions (v1.13.1, v1.13.2, v1.14.0) were released before ARM64 support was added in v1.14.1 and do not have ARM64 Docker images. When Docker attempts to run these amd64-only images on ARM64 runners via QEMU emulation, they crash with a fatal Go runtime error: "runtime: lfstack.push invalid packing ... fatal error: lfstack.push" This is a known issue with Go binaries and QEMU emulation (golang/go#69255). While v1.14.1+ versions do have ARM64 images, skipping the entire test on ARM64 is simpler and sufficient since backward compatibility testing validates protocol compatibility, which is architecture-agnostic. ## integration_query_fuzz This fuzzy testing suite compares query results between Cortex v1.18.1 and the current version. Although v1.18.1 has ARM64 support, the test produces inconsistent results on ARM64 (NaN value mismatches), likely due to floating-point arithmetic differences between architectures. ## integration_querier One specific subtest fails on ARM64: TestQuerierWithBlocksStorageRunningInSingleBinaryMode/ blocks_sharding_enabled,_redis_index_cache,_bucket_index_enabled,thanosEngine=true Error: "unable to find metrics [thanos_store_index_cache_requests_total] with expected values. Last values: [36]" This appears to be a timing-sensitive test where the exact number of cache requests differs between ARM64 and AMD64 runners, likely due to performance characteristics or subtle behavioral differences in the Thanos store gateway. ## Testing Coverage All other ARM64 integration tests (5 test suites) pass successfully: - requires_docker - integration_alertmanager - integration_memberlist - integration_ruler - integration_remote_write_v2 This provides comprehensive validation of core Cortex functionality on ARM64 architecture while avoiding known compatibility and timing issues with historical and edge-case testing scenarios. Fixes cortexproject#6897 Signed-off-by: thc1006 <[email protected]>
Removed deprecated `// +build` build constraint comments from 40 files. These are no longer needed as `//go:build` directives are now used exclusively as per Go 1.17+ requirements. This fixes golangci-lint buildtag errors detected with newer linter versions on ARM64 platform. Files modified: - 37 integration test files - 3 pkg/configs/db/dbtest files Signed-off-by: thc1006 <[email protected]>
a387cc5 to
aff13a8
Compare
|
Update: This PR has been rebased onto the latest master branch, which now includes the fix for TestBlocksCleaner_ShouldRemoveBlocksOutsideRetentionPeriod from PR #7082. Current status:
All ARM64-specific functionality has been verified locally. The PR is ready for review and CI approval. Thank you for your patience. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done. Just 2 minor nits
Re: integration_querier on ARM64Thank you for taking the time to review this! TLDR: I skipped this on ARM64 because the test expects exact cache request counts (like 36), but ARM64 gets different numbers due to timing. The querier itself works correctly on ARM64. BackgroundIn commit 368828e, I intentionally skipped this test. The specific subtest that fails is:
It expects exactly 36 Would appreciate your thoughtsI understand this doesn't fully align with the "same tests" goal from issue #6897. I can either:
Please let me know which direction you'd prefer. Really appreciate your guidance on this! Re: integration_backward_compatibility on ARM64Thank you for the feedback! TLDR: The old Cortex versions (v1.13.1-v1.14.0) this test uses don't have ARM64 images and crash under emulation. Since backward compatibility is protocol-level rather than architecture-specific, I skipped this on ARM64. The issueIn commit 368828e, I skipped this test because it validates against old Cortex versions (v1.13.1, v1.13.2, v1.14.0) that were released before ARM64 support was added in v1.14.1. When Docker attempts to run these amd64-only images on ARM64 runners via QEMU emulation, they crash with: This is a known issue with Go binaries under QEMU (golang/go#69255). My reasoningI felt that backward compatibility testing validates protocol-level compatibility, which shouldn't vary by architecture - if the protocol works on AMD64, it should work identically on ARM64. But I completely understand if you see this differently. If you'd like, I could modify the test to only validate v1.14.1+ on ARM64 (which do have ARM64 images). It would provide partial coverage, though it wouldn't test the oldest versions. Please let me know your thoughts - I'm happy to adjust the approach based on what you think makes most sense for the project. |
|
@thc1006 Thanks for the patience!
You could skip any tests that don't run in arm64 with something like (or similar). These can be fixed in a follow up PR
per https://cortexmetrics.io/docs/configuration/v1guarantees/#flags-config-and-minor-version-upgrades , we only need to support v1.18.0, v1.17.0, v1.16.0. You can remove support for v1.13.X and v1.14.0 |
…ARM64 This commit addresses reviewer feedback to enable these two test suites on ARM64 architecture while maintaining test reliability. ## Changes ### integration_querier - Added runtime.GOARCH skip for Thanos engine subtests on non-amd64 - Allows the test suite to run on ARM64, skipping only timing-sensitive subtests that check exact cache request counts - These assertions vary across architectures due to performance differences ### integration_backward_compatibility - Removed support for Cortex v1.13.x-v1.15.x (11 versions) - Retained only v1.16.0+ (7 versions with ARM64 support) - Per https://cortexmetrics.io/docs/configuration/v1guarantees/, only the last 3 minor versions need backward compatibility testing - All retained versions have ARM64 Docker images available ### Workflow updates - Added integration_querier and integration_backward_compatibility to ARM64 matrix - Updated Docker image preloading to match retained versions - Added v1.19.0 to preload list ## Result ARM64 test coverage increases from 5/8 to 7/8 integration test suites. Only integration_query_fuzz remains ARM64-exclusive due to known issue cortexproject#6982. Addresses: cortexproject#7068 (comment)
…ARM64 This commit addresses reviewer feedback to enable these two test suites on ARM64 architecture while maintaining test reliability. ## Changes ### integration_querier - Added runtime.GOARCH skip for Thanos engine subtests on non-amd64 - Allows the test suite to run on ARM64, skipping only timing-sensitive subtests that check exact cache request counts - These assertions vary across architectures due to performance differences ### integration_backward_compatibility - Removed support for Cortex v1.13.x-v1.15.x (11 versions) - Retained only v1.16.0+ (7 versions with ARM64 support) - Per https://cortexmetrics.io/docs/configuration/v1guarantees/, only the last 3 minor versions need backward compatibility testing - All retained versions have ARM64 Docker images available ### Workflow updates - Added integration_querier and integration_backward_compatibility to ARM64 matrix - Updated Docker image preloading to match retained versions - Added v1.19.0 to preload list ## Result ARM64 test coverage increases from 5/8 to 7/8 integration test suites. Only integration_query_fuzz remains ARM64-exclusive due to known issue cortexproject#6982. Addresses: cortexproject#7068 (comment) Signed-off-by: thc1006 <[email protected]>
4a6bf6f to
385294d
Compare
|
Hi @friedrichg, TLDR: I've implemented both changes you suggested. ARM64 coverage now increases from 5/8 to 7/8 test suites. Changes madeintegration_querier: integration_backward_compatibility: Workflow: Thank you for the clear guidance - it made the implementation straightforward. Please let me know if you'd like any adjustments! Latest commit: 385294d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Description
This PR adds ARM64 architecture support to the integration test suite, enabling integration tests to run on both amd64 and arm64 architectures where technically feasible.
Motivation
ARM64 images are widely used in production environments, and currently integration tests only run on amd64. This creates a gap in test coverage that could lead to architecture-specific issues going undetected.
Changes
Modified Jobs
Implementation Details
ubuntu-24.04(amd64) andubuntu-24.04-arm(arm64)matrix.archvariablefail-fast: falseto ensure complete test coverage across all architecturesTest Coverage
5 of 8 integration test suites now run on both architectures:
3 test suites run on AMD64 only:
See commit 368828e for detailed technical reasoning behind ARM64 test exclusions.
Testing
ARCHS = amd64 arm64definition in MakefileNotes
// +buildtags from 40 files)Fixes #6897