Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Azure Compute Gallery in CPI #710

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

s4heid
Copy link
Contributor

@s4heid s4heid commented Feb 10, 2025

Summary

This change relates to #709 and introduces support for Azure Compute Gallery enabling centralized management and distribution of stemcell images across regions and accounts while maintaining backwards compatibility with existing user image workflows.

  • Instead of storing stemcells only as VHD blobs in storage accounts, BOSH can now store them in an Azure Compute Gallery
  • When BOSH deploys VMs in different Azure regions, it can use the gallery's built-in replication capabilities
  • The gallery acts as a central repository for all stemcell versions, making them easily accessible across the Azure subscription

When enabled via cpi config parameter (azure.compute_gallery_name), BOSH will:

  1. Create a gallery image definition for each stemcell series that is uploaded
  2. Create a gallery image version for each stemcell version that is uploaded
  3. Automatically replicate images to regions where deployments are needed
  4. Fall back to user images if gallery operations fail (e.g. if stemcell was uploaded without creating gallery image, i.e. the image properties cannot be retrieved from the blob's metadata)
  5. Clean up old image versions when stemcells are deleted

Stemcell Workflow with Compute Gallery

sequenceDiagram
    participant BOSH as BOSH Director
    participant CPI as Azure CPI
    participant Gallery as Azure Compute Gallery
    participant Blob as Azure Storage Account
    
    Note over BOSH,Blob: Stemcell Upload Flow
    BOSH->>CPI: create_stemcell(image_path, properties)
    CPI->>Blob: Upload VHD blob
    CPI->>Blob: Store gallery metadata
    CPI->>Gallery: Create image definition<br/>(publisher/offer/sku)
    CPI->>Gallery: Create image version<br/>(with replica count)
    
    Note over BOSH,Blob: VM Deployment Flow
    BOSH->>CPI: Deploy VM in region
    CPI->>Gallery: get_compute_gallery_image_version
    alt Image exists in region
        Gallery->>CPI: Return image details
    else Image not in region
        CPI->>Gallery: Update image version<br/>(add target region)
        Gallery->>Gallery: Replicate to new region
        Gallery->>CPI: Return image details
    end
    CPI->>BOSH: Return success

    Note over BOSH,Blob: Cleanup Flow
    BOSH->>CPI: delete_stemcell
    CPI->>Gallery: Delete gallery image version
    CPI->>Blob: Delete VHD blob
Loading

Checklist

Please check each of the boxes below for which you have completed the corresponding task:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • All unit tests pass locally (after my changes)
  • Rubocop reports zero errors (after my changes)

Unit Test output:

$ bundle exec rake spec:unit
$ bundle exec rake spec:unit
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................./workspaces/bosh-azure-cpi-release/src/bosh_azure_cpi/vendor/package/ruby/3.2.0/gems/azure-storage-table-2.0.4/lib/azure/storage/table/table_service.rb:30: warning: already initialized constant Azure::Storage::StorageService
/workspaces/bosh-azure-cpi-release/src/bosh_azure_cpi/vendor/package/ruby/3.2.0/gems/azure-storage-blob-2.0.3/lib/azure/storage/blob/blob_service.rb:33: warning: previous definition of StorageService was here
............................................................................................................................................................................................................................................................

Finished in 5.24 seconds (files took 5.35 seconds to load)
1065 examples, 0 failures

Coverage report generated for RSpec to /workspaces/bosh-azure-cpi-release/src/bosh_azure_cpi/coverage.
Line Coverage: 49.35% (20234 / 41002)

Rubocop output:

$ bundle exec rake rubocop

These violations look unrelated to this change.

Running RuboCop...
Inspecting 192 files
......WW..........................W....W.W.....W............................W..............................W.............................WW...........W........WW...............................

Offenses:

lib/cloud/azure.rb:18:1: W: [Correctable] Lint/RedundantRequireStatement: Remove unnecessary require statement.
require 'pp'
^^^^^^^^^^^^
lib/cloud/azure.rb:19:1: W: [Correctable] Lint/RedundantRequireStatement: Remove unnecessary require statement.
require 'set'
^^^^^^^^^^^^^
lib/cloud/azure/cloud.rb:21:5: W: Lint/MissingSuper: Call super to initialize state of the parent class.
    def initialize(options, api_version = 1) ...
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
lib/cloud/azure/restapi/azure_client.rb:21:5: W: Lint/MissingSuper: Call super to initialize state of the parent class.
    def initialize(status = nil) ...
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
lib/cloud/azure/restapi/azure_client.rb:584:35: W: Lint/ShadowingOuterLocalVariable: Shadowing outer local variable - disk.
        disk = data_disks.find { |disk| disk['lun'] == i }
                                  ^^^^
lib/cloud/azure/restapi/azure_client.rb:640:70: W: Lint/ShadowingOuterLocalVariable: Shadowing outer local variable - disk.
      disk = vm['properties']['storageProfile']['dataDisks'].find { |disk| disk['name'] == disk_name }
                                                                     ^^^^
lib/cloud/azure/restapi/azure_client.rb:646:68: W: Lint/ShadowingOuterLocalVariable: Shadowing outer local variable - disk.
      vm['properties']['storageProfile']['dataDisks'].delete_if { |disk| disk['name'] == disk_name }
                                                                   ^^^^
lib/cloud/azure/storage/blob_manager.rb:355:25: W: [Correctable] Lint/AssignmentInCondition: Use == if you meant to do a comparison or wrap the expression in parentheses to indicate you meant to assign in a condition.
            while chunk = chunks.shift
                        ^
lib/cloud/azure/telemetry/telemetry_event.rb:43:7: W: Lint/DuplicateBranch: Duplicate branch body detected.
      when FalseClass ...
      ^^^^^^^^^^^^^^^
lib/cloud/azure/telemetry/telemetry_event.rb:45:7: W: Lint/DuplicateBranch: Duplicate branch body detected.
      when Hash ...
      ^^^^^^^^^
lib/cloud/azure/telemetry/telemetry_event.rb:47:7: W: Lint/DuplicateBranch: Duplicate branch body detected.
      else
      ^^^^
lib/cloud/azure/utils/helpers.rb:200:5: W: Lint/SuppressedException: Do not suppress exceptions.
    rescue error
    ^^^^^^^^^^^^
spec/manual_tests/resource_group_name/helpers.rb:29:29: W: [Correctable] Lint/NonAtomicFileOperation: Remove unnecessary existence check File.exist?.
  FileUtils.rm_rf(dest_dir) if File.exist?(dest_dir)
                            ^^^^^^^^^^^^^^^^^^^^^^^^
spec/manual_tests/resource_group_name/helpers.rb:30:3: W: Lint/NonAtomicFileOperation: Use atomic file operation method FileUtils.mkdir_p.
  Dir.mkdir(dir) unless File.exist?(dir)
  ^^^^^^^^^^^^^^
spec/manual_tests/resource_group_name/helpers.rb:30:18: W: [Correctable] Lint/NonAtomicFileOperation: Remove unnecessary existence check File.exist?.
  Dir.mkdir(dir) unless File.exist?(dir)
                 ^^^^^^^^^^^^^^^^^^^^^^^
spec/unit/blob_manager_spec.rb:149:9: W: Lint/NonAtomicFileOperation: Use atomic file operation method FileUtils.rm_f.
        File.delete(@file_path) if File.exist?(@file_path)
        ^^^^^^^^^^^^^^^^^^^^^^^
spec/unit/blob_manager_spec.rb:149:33: W: [Correctable] Lint/NonAtomicFileOperation: Remove unnecessary existence check File.exist?.
        File.delete(@file_path) if File.exist?(@file_path)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^
spec/unit/blob_manager_spec.rb:200:9: W: Lint/ConstantDefinitionInBlock: Do not define constants this way within a block.
        MAX_CHUNK_SIZE = 2 * 1024 * 1024 # 2MB
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
spec/unit/blob_manager_spec.rb:206:9: W: Lint/NonAtomicFileOperation: Use atomic file operation method FileUtils.rm_f.
        File.delete(@empty_file_path) if File.exist?(@empty_file_path)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
spec/unit/blob_manager_spec.rb:206:39: W: [Correctable] Lint/NonAtomicFileOperation: Remove unnecessary existence check File.exist?.
        File.delete(@empty_file_path) if File.exist?(@empty_file_path)
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
spec/unit/blob_manager_spec.rb:357:7: W: Lint/NonAtomicFileOperation: Use atomic file operation method FileUtils.rm_f.
      File.delete(@file_path) if File.exist?(@file_path)
      ^^^^^^^^^^^^^^^^^^^^^^^
spec/unit/blob_manager_spec.rb:357:31: W: [Correctable] Lint/NonAtomicFileOperation: Remove unnecessary existence check File.exist?.
      File.delete(@file_path) if File.exist?(@file_path)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^
spec/unit/helpers_spec.rb:21:3: W: Lint/ConstantDefinitionInBlock: Do not define constants this way within a block.
  class HelpersTester ...
  ^^^^^^^^^^^^^^^^^^^
spec/unit/instance_id_spec.rb:249:9: W: Lint/EmptyBlock: Empty block detected.
        let(:instance_id) {}
        ^^^^^^^^^^^^^^^^^^^^
spec/unit/models/obj_id_spec.rb:24:9: W: Lint/ConstantDefinitionInBlock: Do not define constants this way within a block.
        ErrorMsg = Bosh::AzureCloud::ErrorMsg
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
spec/unit/table_manager_spec.rb:60:3: W: Lint/ConstantDefinitionInBlock: Do not define constants this way within a block.
  class MyArray < Array ...
  ^^^^^^^^^^^^^^^^^^^^^
spec/unit/table_manager_spec.rb:64:3: W: Lint/ConstantDefinitionInBlock: Do not define constants this way within a block.
  class MyEntity ...
  ^^^^^^^^^^^^^^
spec/unit/telemetry/telemetry_event_handler_spec.rb:90:7: W: Lint/NonAtomicFileOperation: Use atomic file operation method FileUtils.rm_f.
      Dir.delete(cpi_events_dir) if Dir.exist?(cpi_events_dir)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^
spec/unit/telemetry/telemetry_event_handler_spec.rb:90:34: W: [Correctable] Lint/NonAtomicFileOperation: Remove unnecessary existence check Dir.exist?.
      Dir.delete(cpi_events_dir) if Dir.exist?(cpi_events_dir)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
spec/unit/telemetry/telemetry_event_handler_spec.rb:147:7: W: Lint/NonAtomicFileOperation: Use atomic file operation method FileUtils.rm_f.
      Dir.delete(cpi_events_dir) if Dir.exist?(cpi_events_dir)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^
spec/unit/telemetry/telemetry_event_handler_spec.rb:147:34: W: [Correctable] Lint/NonAtomicFileOperation: Remove unnecessary existence check Dir.exist?.
      Dir.delete(cpi_events_dir) if Dir.exist?(cpi_events_dir)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
spec/unit/telemetry/telemetry_event_handler_spec.rb:220:9: W: Lint/NonAtomicFileOperation: Use atomic file operation method FileUtils.rm_f.
        File.delete(timestamp_file) if File.exist?(timestamp_file)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
spec/unit/telemetry/telemetry_event_handler_spec.rb:220:37: W: [Correctable] Lint/NonAtomicFileOperation: Remove unnecessary existence check File.exist?.
        File.delete(timestamp_file) if File.exist?(timestamp_file)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

192 files inspected, 33 offenses detected, 11 offenses autocorrectable

Changelog

  • Added support for Azure Compute Gallery to manage stemcells across regions and accounts:
    • Automatically creates gallery image definitions for stemcell series
    • Manages stemcell versions as gallery image versions
    • Handles cross-region replication automatically
    • Includes cleanup of gallery resources when stemcells are deleted
    • Configurable via compute_gallery_name in CPI config
    • Requires use_managed_disks: true and location to be set.

Manual Testing

I deployed this pull request on a bosh-director and enabled the compute gallery via the cpi config. I uploaded the latest stemcell from bosh.io and the cpi created the image definition and image version as shown in the screenshots below.

image definition

image-definition

image version

image-version

Duration of bosh upload-stemcell

  • ~2m -- Unpacking the VHD and uploading to the storage account (heavily depends on your network)
  • ~1s -- Creating compute gallery image definition
  • ~8m -- Creating new image version

Performance Analysis: Azure Compute Gallery vs Regular Images

Test Methodology

  • Deployment: Zookeeper with 10 parallel instances
  • VM Size: Standard_B2as_v2
  • Versions Tested:
    • Regular Images: azure-cpi v50.0.1
    • Gallery Images: azure-cpi 5af7bbe (with 3 replicas)
  • Location: eastus

Results

VM Creation Times (mm:ss)

VM # Gallery Images Regular Images
Run 1 Run 2 Run 3 Run 1 Run 2 Run 3
1 02:07 02:10 02:10 03:01 02:46 02:44
2 02:11 02:13 02:14 03:03 02:58 02:48
3 02:12 02:14 02:14 03:22 03:09 02:56
4 02:12 02:15 02:15 03:22 03:09 03:02
5 02:14 02:15 02:16 03:22 03:10 04:20
6 02:14 02:15 02:16 03:23 03:18 04:30
7 02:14 02:16 02:17 04:37 04:26 04:30
8 02:26 02:17 02:19 04:41 04:29 04:31
9 02:29 02:20 02:20 04:41 04:31 04:40
10 03:06 02:21 02:23 04:42 04:33 04:42
----- ----- ----- ----- ----- -----
02:20 02:15 02:16 03:49 03:38 03:52
σ 00:16 00:03 00:03 00:42 00:42 00:49
min 02:07 02:10 02:10 03:01 02:46 02:44
max 03:06 02:21 02:23 04:42 04:33 04:42

Statistics

Metric Gallery Images Regular Images
Mean (mm:ss) 02:17 03:46
Std Dev (s) 00:10 00:45
Min (mm:ss) 02:07 02:28
Max (mm:ss) 02:10 04:32

boxplot_comparison

Observations

  • Gallery Images are created 89s faster on average (02:17 vs 03:46)
  • Consistent performance across runs with 35s less variance in creation times (σ = 10s vs 45s)
  • Regular images show a clear pattern of increasing times with each run, which is not observed with gallery images

@jpalermo jpalermo requested review from jpalermo, a team and ystros and removed request for a team February 13, 2025 16:02
- Introduces new methods in `AzureClient` to manage gallery images
- Enhances `StemcellManager2` to handle gallery operations, when configured
  - Creates gallery image definition and version when uploading stemcells
  - Stores gallery metadata in blob properties during stemcell creation
  - Supports cross-region replication by updating target regions
  - Falls back to user images if gallery operations fail
  - Properly cleans up gallery resources during stemcell deletion
- Enables efficient stemcell distribution across regions using Azure's managed
  gallery service while maintaining backwards compatibility with existing user
  image workflows.
- Adds a config option for the replica count of compute gallery image versions
- Removes the dependency to the storage blob when a compute gallery image exists
@s4heid s4heid force-pushed the compute-gallery-support branch from f3d2521 to 147ae42 Compare February 13, 2025 18:12
@s4heid
Copy link
Contributor Author

s4heid commented Feb 13, 2025

I have resolved the conflicts and added another commit addressing the feedback provided by @MSSedusch. With this change, the gallery images can now be found directly based on the tags, rather than through the tags of the VHD blob, decoupling gallery images from the VHD blobs. Theoretically, they can be deleted, but I would suggest keeping them as a fallback.

@a-hassanin
Copy link

a-hassanin commented Feb 14, 2025

Just to clarity, this PR covers the proposed solution in #709
Alternative approach 1 mentioned there, is not covered by this PR, right? @s4heid ..

FYI @s4heid, as expected, it seems that alternative approach 1 is also attractive for the community as mentioned by @jpalermo as it basically matches the other IaaS providers. Do you think it make sense to support both approaches?

@s4heid
Copy link
Contributor Author

s4heid commented Feb 14, 2025

Correct. I believe both approaches have their benefits and could be valuable for the community. If you're also interested in the alternative approach of #709, I am happy to provide a complete implementation.

@jpalermo
Copy link
Member

This seems like a great idea. I've never heard of the Compute Gallery before and the CPI does take a bit longer to create the stemcell, but the VM creation savings seem to make it more than worth it.

I think this PR stands alone just fine. It would be great for us to explore adding the additional functionality to the CPI to support "light" stemcells stored in a public gallery. We'd need to figure out some things on the publish side, but finally having light stemcells for Azure would be great.

One change request for this PR though. The compute_gallery_name and compute_gallery_replicas properties are parsed in the config, but not taken in via the job/spec file or rendered in the cpi.json.erb file. If you were testing this in its current form, I'm guessing you were instead providing the CPI properties via a CPI config, but we should also add them to the job/spec file.

@s4heid
Copy link
Contributor Author

s4heid commented Feb 18, 2025

Thank you for the positive feedback, @jpalermo. It appears I missed the most crucial part of making this change usable. During my tests, I patched the CPI config directly on the director VM and rsync'ed the CPI changes. In the latest commit I have added the parameters to the job/spec. Good catch!

cloud_error("Missing the property 'location' in the global configuration") if location.nil?

metadata = stemcell_properties.dup
stemcell_series = metadata['name']&.delete_prefix('bosh-azure-hyperv-ubuntu-')&.delete_suffix('-go_agent')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice a hardcoded ubuntu here, which feels strange. What's the significance of shortening the stemcell name for use as a "sku" and "compute_gallery_image_definition"? Is the gallery feature not compatible with Windows stemcells?

Conceptually there isn't anything to prevent someone from creating heavy Windows stemcells, which would go through this code path.

We only produce light Windows stemcells. Previously, light stemcells would have gone through a different code path. However, the addition of the !@use_compute_gallery in cloud.rb implies that all stemcells, light and heavy, will go through this code path now when configured to use compute gallery.
https://github.com/cloudfoundry/bosh-azure-cpi-release/pull/710/files#diff-a4feb37b1f55ca0dede269baecf464669cd9a30bfadf2c14568cf43e8a2771f1R85

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, nothing should prevent a user from uploading a full windows stemcell, but as you said currently there are no full windows stemcells published. When uploading windows light stemcells, they are handled by the light-stemcell-manager which does not interfere with compute gallery.

Since compute gallery images are referenced by resource ID in create_vm, unlike light stemcells which use publisher/offer/sku, the publisher/offer/sku of compute gallery images can essentially be chosen arbitrarily.

After reconsidering the current choices and discussing with @MSSedusch, we concluded that the current naming in this PR is not ideal. We have come up with the following alternative:

{
    "publisher": "bosh",
    "offer": "bosh-azure-hyperv-ubuntu-bionic-go_agent",
    "sku": "gen1"
}

The value of offer refers to cloud_properties.name without any changes. publisher is set to bosh, which is more appropriate than azure. sku typically indicates a version or generation, and we are proposing gen1 as the default. For newer stemcell-series like ubuntu-noble, the correct hypervgeneration value is needed; otherwise, features like the efi bootloader won't work correctly (see #706). This means the generation should be added to the stemcell's cloud properties. Once added, we can read it from the metadata, replace the static reference, and set the value when creating the image definition and for the regular images.

@ystros if this proposal makes sense to you, I would apply the changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @s4heid for the explanation of the chosen publisher, sku, and offer fields. I'd defer to your expertise on the required values for these fields. Your updated proposal makes more sense as a generalized solution than trying to massage the stemcell name.

When uploading windows light stemcells, they are handled by the light-stemcell-manager which does not interfere with compute gallery.

This used to be the case, but as I mentioned, this PR breaks that flow when compute gallery is configured. I'll open up a new thread on the specific line so that it is more clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I didn't like the massage of the stemcell name either. I have changed it as part of 236ff8e.

{
'location' => location,
'tags' => metadata,
'storage_account_name' => @default_storage_account_name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ The usage of a storage account here confused me initially, but it appears even when using managed disks the CPI will either search for a usable storage account, or create one if it can't find one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I also needed a moment to understand this. CPI will then attempt to copy the stemell blob into that storage account.

@@ -80,7 +82,7 @@ def create_stemcell(image_path, cloud_properties)
'stemcell' => "#{cloud_properties.fetch('name', 'unknown_name')}-#{cloud_properties.fetch('version', 'unknown_version')}"
}
@telemetry_manager.monitor('create_stemcell', extras: extras) do
if has_light_stemcell_property?(cloud_properties)
if has_light_stemcell_property?(cloud_properties) && !@use_compute_gallery
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change (and the corresponding update in delete_stemcell) breaks uploading light Windows stemcells when the new compute gallery feature is active. Since !@use_compute_gallery is false in this scenario, uploading a light Windows stemcell will not use the light stemcell manager. Instead, it will call the stemcell manager 2 code, and attempt to create a gallery image out of the light stemcell.

I created a dev release from this branch + deployed a Director to confirm this. Uploading the latest Windows 2019 light stemcell results in a CPI error as it tries to treat the light stemcell as a heavy one, and convert it into a gallery image:

$ bosh upload-stemcell https://bosh-windows-stemcells-production.s3.amazonaws.com/2019/light-bosh-stemcell-2019.83-azure-hyperv-windows2019-go_agent.tgz
Using environment '10.0.8.5' as client 'admin'

Task 6

Task 6 | 22:07:05 | Update stemcell: Downloading remote stemcell (00:00:00)
Task 6 | 22:07:05 | Update stemcell: Extracting stemcell archive (00:00:00)
Task 6 | 22:07:05 | Update stemcell: Verifying stemcell manifest (00:00:00)
Task 6 | 22:07:07 | Update stemcell: Checking if this stemcell already exists (00:00:00)
Task 6 | 22:07:07 | Update stemcell: Uploading stemcell bosh-azure-hyperv-windows2019-go_agent/2019.83 to the cloud (00:00:01)
                  L Error: Unknown CPI error 'Unknown' with message 'undefined method `split' for nil' in 'create_stemcell' CPI method (CPI request ID: 'cpi-133183')

Snippet from the CPI log:

I, [2025-03-04T22:07:08.851675 #19043 #2440] INFO -- [req_id cpi-133183]: StemcellManager2.create_stemcell(/var/vcap/data/director/tmp/stemcell20250304-19012-7g86nu/image, {"infrastructure"=>"azure", "os_type"=>"windows", "image"=>{"offer"=>"bosh-windows-server-2019", "publisher"=>"pivotal", "sku"=>"2019-sku2", "version"=>"2019.83.004001"}})
I, [2025-03-04T22:07:08.851941 #19043]  INFO -- [req_id cpi-133183]: Finished create_stemcell in 0.78 seconds
Rescued Unknown: undefined method `split' for nil. backtrace: /var/vcap/data/packages/bosh_azure_cpi/50e6cd4e8908c1507e2d71d4ad0c162e92c078fa/lib/cloud/azure/stemcell/stemcell_manager2.rb:122:in `_make_semver_compliant'
/var/vcap/data/packages/bosh_azure_cpi/50e6cd4e8908c1507e2d71d4ad0c162e92c078fa/lib/cloud/azure/stemcell/stemcell_manager2.rb:24:in `create_stemcell'
/var/vcap/data/packages/bosh_azure_cpi/50e6cd4e8908c1507e2d71d4ad0c162e92c078fa/lib/cloud/azure/cloud.rb:88:in `block (2 levels) in create_stemcell'
/var/vcap/data/packages/bosh_azure_cpi/50e6cd4e8908c1507e2d71d4ad0c162e92c078fa/lib/cloud/azure/telemetry/telemetry_manager.rb:71:in `monitor'
/var/vcap/data/packages/bosh_azure_cpi/50e6cd4e8908c1507e2d71d4ad0c162e92c078fa/lib/cloud/azure/cloud.rb:84:in `block in create_stemcell'
/var/vcap/data/packages/bosh_azure_cpi/50e6cd4e8908c1507e2d71d4ad0c162e92c078fa/gem_home/ruby/3.3.0/gems/bosh_common-2.0.0/lib/common/thread_formatter.rb:50:in `with_thread_name'
/var/vcap/data/packages/bosh_azure_cpi/50e6cd4e8908c1507e2d71d4ad0c162e92c078fa/lib/cloud/azure/cloud.rb:80:in `create_stemcell'
/var/vcap/data/packages/bosh_azure_cpi/50e6cd4e8908c1507e2d71d4ad0c162e92c078fa/gem_home/ruby/3.3.0/gems/bosh_cpi-2.6.0/lib/bosh/cpi/cli.rb:90:in `public_send'
/var/vcap/data/packages/bosh_azure_cpi/50e6cd4e8908c1507e2d71d4ad0c162e92c078fa/gem_home/ruby/3.3.0/gems/bosh_cpi-2.6.0/lib/bosh/cpi/cli.rb:90:in `run'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the confusion. You are absolutely right; the logic was incorrect, and I also forgot to add tests to the change, which explains why I missed it. @ystros please review 236ff8e, which should fix the logic.

@jpalermo
Copy link
Member

jpalermo commented Mar 8, 2025

I did some additional testing today. Was able to upload multiple versions and lines of ubuntu stemcells and deploy them as well as windows2019 light stemcells.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Pending Review | Discussion
Development

Successfully merging this pull request may close these issues.

4 participants