Skip to content
This repository was archived by the owner on Mar 20, 2023. It is now read-only.
This repository was archived by the owner on Mar 20, 2023. It is now read-only.

Pool resize with NC24rs_v3 fails to find PKEYS during nodeprep #360

@themorey

Description

@themorey

Problem Description

Creating a multi-instance pool with NC24rs_v3 fails during start prep as it is looking for the mlx5_0 in shipyard_nodeprep.sh lines 1609-1612:

export_ib_pkey()
{
    key0=$(cat /sys/class/infiniband/mlx5_0/ports/1/pkeys/0)
    key1=$(cat /sys/class/infiniband/mlx5_0/ports/1/pkeys/1)

The NC24rs_v3 has the ConnectX3 card and is identified as mlx4_0 not mlx5_0. Manually modifying shipyard_nodeprep.sh each time a pool is created will workaround the issue.

Batch Shipyard Version

3.9.1 (Mac)

Steps to Reproduce

Resize a multi-instance pool containing NC24rs_v3 and wait for it to fail.

Expected Results

Node finds the PKEYS and boots normally without intervention.

Actual Results

Manual intervention is required each time a pool is created or modified.

Redacted Configuration

 pool_specification:
    id: arvinas-relion-pool-NCv3
    vm_configuration:
      platform_image:
       offer: CentOS-HPC
       publisher: OpenLogic
       sku: '7.7'
       version: '7.7.2020062600'
   vm_count:
     dedicated: 0
     low_priority: 0
   vm_size: STANDARD_NC24rs_v3
   autoscale:
     evaluation_interval: 00:05:00
     scenario:
       name: active_tasks
       maximum_vm_count:
         dedicated: 4
         low_priority: 4
       maximum_vm_increment_per_evaluation:
         dedicated: -1
         low_priority: -1
       bias_node_type: low_priority
   inter_node_communication_enabled: true
   virtual_network:
     arm_subnet_id: /subscriptions/{sub}/resourceGroups/{RG}/providers/Microsoft.Network/virtualNetworks/{Vnet}/subnets/{sn}
   ssh:
     username: shipyard

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions