Skip to content

Commit

Permalink
feat(nodegroup): new integration: spot ocean
Browse files Browse the repository at this point in the history
  • Loading branch information
liranp authored and IdanShohamNetApp committed Jul 13, 2023
1 parent 5a37b45 commit 43a7b74
Show file tree
Hide file tree
Showing 44 changed files with 4,843 additions and 564 deletions.
108 changes: 108 additions & 0 deletions docs/proposal-009-spot-ocean.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Spot ocean integration

## Authors

Spot By NetApp Ocean (@spotinst/sig-developers)

## Status

In process.

## Table of Contents
<!-- toc -->
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Linked Docs](#linked-docs)
- [Proposal](#proposal)
- [Design Details](#design-details)
- [Test Plan](#test-plan)
- [Alternatives](#alternatives)
<!-- /toc -->

## Summary

We implemented Spot Ocean structures that are based on the eksctl Cluster and NodeGroup structures from release `0.146.0`. This implementation
allows spot-ocean users to utilize eksctl in various ways on their clusters and node groups.
We note that no dependencies exist between the spot-ocean and eksctl structures that could create problematic issues in the future.

The value in integrating Spot Ocean with `eksctl` is simply to bring existing and future AWS customers a way of:

a) Creating new clusters and/or node groups with spot ocean integration using a
single command.

b) Modifying clusters and/or node groups with spot ocean integration using a
single command.

Spot by Netapp pledges to fully maintain this integration.
This includes:
- Monthly updates with new features
- Code reviews and feature assessment from the direct EKSCTL community
- Feature parity with our direct API and UI enabling EKSCTL all the latest features
- Spot by Netapp fully managing Support and maintenance of this integration
- Bug fixes directly from the EKSCTL community
- Urgent 24/7 support available on our platform
- Ensuring full compatibility with the newest versions of Kubernetes and EKS

## Motivation

The overall motivation of this proposal is to solve 2 problems:

- There are many AWS customers with eks clusters, with a demand for spot ocean integration.
- AWS Customers want to integrate their eks clusters and nodegroups with spot ocean via eksctl's configuration.

### Goals

- Enable AWS users to create spot ocean clusters and nodegroups using eksctl.
- Enable AWS users to modify their spot ocean cluster configs and nodegroups using eksctl.
- Enable AWS users to perform utility actions on their spot ocean clusters and nodegroups using eksctl.

### Linked Docs

[Original PR](https://github.com/weaveworks/eksctl/pull/6731).
[Spot Ocean docs](../userdocs/src/usage/spot).
[Expansion issue](https://github.com/weaveworks/eksctl/issues/6694).

## Proposal

This design proposes adding a new field `spotOcean` to both cluster and nodegroup level,
and creates cluster with spot ocean managed nodegroups.

for example:

```bash
eksctl create cluster \
--name example \
--spot-ocean
--managed=false
```

will result in a new spot ocean cluster.
In addition, the design proposes 2 new utils options `update-spot-ocean-cluster` and `update-spot-ocean-credentials`.

for example:
```bash
eksctl utils update-spot-ocean-cluster -v 4 -f ./cluster.yaml
```

while the `cluster.yaml` contains the new updated cluster definition.

## Design Details

The new arg option `--spot-ocean` will be added to `eksctl create cluster` and `eksctl create nodegroup`. That option will also be supported in the ClusterConfig file for self-managed nodegroups.
In addition, we have added 2 new options for utils actions of eksctl, `update-spot-ocean-cluster` and `update-spot-ocean-credentials`, both require a configuration file, mainly meant for update action regarding the cluster.
- For more details feel free to browse our [spot ocean guides](../userdocs/src/usage/spot/ocean/spot-ocean-cluster.md)

### Test Plan

Following maintenance or the release of a new feature, we check the following:

- Running all the existing unit tests to make sure nothing broke from our changes.
- Creation of new eks clusters on the various up to date k8s versions.
- Creation and modification of nodegroups inside said clusters.
- Verification of utility actions concerning ocean cluster management within eksctl.

## Alternatives

The current alternative is use of our own branch forked from the main eksctl branch [repo](https://github.com/spotinst/weaveworks-eksctl/releases/tag/v0.146.0) for customer purposes.
110 changes: 110 additions & 0 deletions examples/38-spot-ocean.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# An example of ClusterConfig object with Spot Ocean nodegroups.
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
name: cluster-22
region: us-west-2

spotOcean:
strategy:
utilizeReservedInstances: true
fallbackToOnDemand: true

scheduling:
shutdownHours:
isEnabled: true
timeWindows:
- Mon:22:00-Tue:06:00
- Tue:22:00-Wed:06:00
- Wed:22:00-Thu:06:00
- Thu:22:00-Fri:06:00
- Fri:22:00-Mon:06:00

tasks:
- isEnabled: true
taskType: manualHeadroomUpdate
cronExpression: 0 1 * * *
config:
headrooms:
- cpuPerUnit: 2000
memoryPerUnit: 4000
gpuPerUnit: 0
numOfUnits: 1
- isEnabled: true
taskType: manualHeadroomUpdate
cronExpression: 0 2 * * *
config:
headrooms:
- cpuPerUnit: 0
memoryPerUnit: 200
gpuPerUnit: 0
numOfUnits: 2

autoScaler:
enabled: true
cooldown: 300
autoConfig: false
headrooms:
cpuPerUnit: 2
gpuPerUnit: 0
memoryPerUnit: 64
numOfUnits: 1

compute:
instanceTypes:
whitelist:
- t3a.large
- t3a.xlarge
- t3a.2xlarge
- m5a.large
- m5a.xlarge
- m5a.2xlarge
- m5a.4xlarge
- c5.large
- c5.xlarge
- c5.2xlarge
- c5.4xlarge

nodeGroups:
- name: ocean-ng1
spotOcean: {}

- name: ocean-ng2
spotOcean:
strategy:
spotPercentage: 100

compute:
instanceTypes:
- t3a.large
- t3a.xlarge
- t3a.2xlarge

autoScaler:
headrooms:
- cpuPerUnit: 2
gpuPerUnit: 0
memoryPerUnit: 32
numOfUnits: 1

- name: ocean-ng3
spotOcean:
strategy:
spotPercentage: 70

compute:
instanceTypes:
- m5a.large
- m5a.xlarge
- m5a.2xlarge
- m5a.4xlarge
- c5.large
- c5.xlarge
- c5.2xlarge
- c5.4xlarge

autoScaler:
resourceLimits:
maxInstanceCount: 10
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -442,7 +442,7 @@ require (
github.com/spf13/cast v1.5.0 // indirect
github.com/spf13/jwalterweatherman v1.1.0 // indirect
github.com/spf13/viper v1.15.0 // indirect
github.com/spotinst/spotinst-sdk-go v1.129.0 // indirect
github.com/spotinst/spotinst-sdk-go v1.149.0 // indirect
github.com/ssgreg/nlreturn/v2 v2.2.1 // indirect
github.com/stbenjam/no-sprintf-host-port v0.1.1 // indirect
github.com/stretchr/objx v0.5.0 // indirect
Expand Down
2 changes: 2 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -1811,6 +1811,8 @@ github.com/spf13/viper v1.15.0 h1:js3yy885G8xwJa6iOISGFwd+qlUo5AvyXb7CiihdtiU=
github.com/spf13/viper v1.15.0/go.mod h1:fFcTBJxvhhzSJiZy8n+PeW6t8l+KeT/uTARa0jHOQLA=
github.com/spotinst/spotinst-sdk-go v1.129.0 h1:1HuySAZ0LuBTmPWGa2I1c6CHx8j+mnrf7B475F2Ub9o=
github.com/spotinst/spotinst-sdk-go v1.129.0/go.mod h1:C6mrT7+mqOgPyabacjyYTvilu8Xm96mvTvrZQhj99WI=
github.com/spotinst/spotinst-sdk-go v1.149.0 h1:mg5srf81kTy7mqPJDm8epWDopOnTqP66j4X9I3o4OxE=
github.com/spotinst/spotinst-sdk-go v1.149.0/go.mod h1:Ku9c4p+kRWnQqmXkzGcTMHLcQKgLHrQZISxeKY7mPqE=
github.com/src-d/gcfg v1.4.0/go.mod h1:p/UMsR43ujA89BJY9duynAwIpvqEujIH/jFlfL7jWoI=
github.com/ssgreg/nlreturn/v2 v2.2.1 h1:X4XDI7jstt3ySqGU86YGAURbxw3oTDPK9sPEi6YEwQ0=
github.com/ssgreg/nlreturn/v2 v2.2.1/go.mod h1:E/iiPB78hV7Szg2YfRgyIrk1AD6JVMTRkkxBiELzh2I=
Expand Down
17 changes: 17 additions & 0 deletions pkg/actions/cluster/delete.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ import (
"github.com/weaveworks/eksctl/pkg/elb"
"github.com/weaveworks/eksctl/pkg/fargate"
"github.com/weaveworks/eksctl/pkg/kubernetes"
"github.com/weaveworks/eksctl/pkg/spot"
ssh "github.com/weaveworks/eksctl/pkg/ssh/client"
"github.com/weaveworks/eksctl/pkg/utils/apierrors"
"github.com/weaveworks/eksctl/pkg/utils/kubeconfig"
Expand Down Expand Up @@ -65,6 +66,22 @@ func deleteSharedResources(ctx context.Context, cfg *api.ClusterConfig, ctl *eks
return err
}
}

// Spot Ocean.
{
// List all nodegroup stacks.
stacks, err := stackManager.ListNodeGroupStacks(ctx)
if err != nil {
return err
}

// Execute pre-delete actions.
if err := spot.RunPreDelete(ctx, ctl.AWSProvider, cfg, cfg.NodeGroups,
stacks, spot.NewAlwaysFilter(), false, 0, false); err != nil {
return err
}
}

return nil
}

Expand Down
35 changes: 25 additions & 10 deletions pkg/actions/nodegroup/create.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import (
"fmt"
"io"

"github.com/aws/amazon-ec2-instance-selector/v2/pkg/selector"
"github.com/kris-nova/logger"
"github.com/pkg/errors"

Expand Down Expand Up @@ -232,19 +233,33 @@ func (m *Manager) nodeCreationTasks(ctx context.Context, isOwnedCluster bool) er
vpcImporter = vpc.NewSpecConfigImporter(*m.ctl.Status.ClusterInfo.Cluster.ResourcesVpcConfig.ClusterSecurityGroupId, cfg.VPC)
}

allNodeGroupTasks := &tasks.TaskTree{
Parallel: true,
}
nodeGroupTasks := m.stackManager.NewUnmanagedNodeGroupTask(ctx, cfg.NodeGroups, !awsNodeUsesIRSA, vpcImporter)
if nodeGroupTasks.Len() > 0 {
allNodeGroupTasks.Append(nodeGroupTasks)
nodeGroupTasks, err := m.stackManager.NewNodeGroupTask(ctx, cfg.NodeGroups, cfg.ManagedNodeGroups, !awsNodeUsesIRSA, vpcImporter)
if err != nil {
return fmt.Errorf("failed to create nodegroup tasks: %v", err)
}
managedTasks := m.stackManager.NewManagedNodeGroupTask(ctx, cfg.ManagedNodeGroups, !awsNodeUsesIRSA, vpcImporter)
if managedTasks.Len() > 0 {
allNodeGroupTasks.Append(managedTasks)

// Spot Ocean.
{
for _, ng := range cfg.NodeGroups {
if ng.Name != api.SpotOceanClusterNodeGroupName {
continue
}

logger.Debug("ocean: normalizing cluster nodegroup")

instanceSelector, err := selector.New(ctx, m.ctl.AWSProvider.AWSConfig())
if err != nil {
return fmt.Errorf("ocean: failed to create instance selector: %v", err)
}

svc := eks.NewNodeGroupService(m.ctl.AWSProvider, instanceSelector, nil)
if err := svc.Normalize(ctx, []api.NodePool{ng}, cfg); err != nil {
return fmt.Errorf("ocean: failed to normalize cluster nodegroup: %v", err)
}
}
}

taskTree.Append(allNodeGroupTasks)
taskTree.Append(nodeGroupTasks)
return eks.DoAllNodegroupStackTasks(taskTree, meta.Region, meta.Name)
}

Expand Down
Loading

0 comments on commit 43a7b74

Please sign in to comment.