The GitOps Fleet Management system is a comprehensive solution for managing multiple Kubernetes clusters using ArgoCD. It provides a flexible approach to cluster management with support for both centralized and distributed (agent) deployment models.
This mono-repo establishes a GitOps-based fleet management system using ArgoCD, enabling both centralized and distributed management of multiple Kubernetes clusters. The idea is that this repo can be split depending on renspocibility model and teams.
The application-sets chart is the foundation of the fleet management system. It serves as a wrapper/template generator for creating multiple ArgoCD ApplicationSets based on different configurations.
Key Features:
- Dynamic value merging from component files
- Support for different release types and versions
- Flexible deployment models (centralized or distributed)
- Hierarchical configuration with tenant/environment/cluster overrides
The fleet management system is organized into two main components:
This directory contains the core ApplicationSets for initializing the hub cluster:
-
addons.yaml:
- Manages cluster add-ons deployment across the fleet
- Uses application-sets helm chart wrapper
- Deploys core controllers for 'Agent' model (ArgoCD, ALB, etc.)
- References configurations in
addons/bootstrap/defaultto overwirte default values
-
resources.yaml:
- Handles fleet-wide resource deployments
- Follows similar structure to addons.yaml
- Supports separate team management
- Independent resource lifecycle
-
fleetv2.yaml:
- Manages spoke cluster registration process
- References fleet-bootstrap configurations
- Controls cluster onboarding
-
monitoring.yaml:
- An example of seperation of applciation sets dipending on teams
- Configures monitoring solutions across the fleet
The bootstrap process uses the versions/applicationSets.yaml file to determine which version of the application-sets chart to deploy for each component (addons, resources, monitoring, fleet).
This directory handles the registration and initialization of spoke clusters:
-
fleet-hub-external-secrets.yaml:
- Creates external secrets on hub cluster
- Pulls spoke registration from AWS Secrets Manager
- Configures tenant repository connections
- References fleet-members configurations
-
fleet-members/:
- Spoke cluster configurations
- Deployment model selection
- Tenant and environment settings
- Repository access configurations
- AWS Secrets Manager references
-
fleet-hub-secret-store.yaml:
- Hub cluster secret store configuration
- AWS Secrets Manager connection settings
- Core secrets management
-
fleet-members-bootstrap.yaml:
- Spoke cluster initialization for distributed setup
- Triggers members-init-v2 configurations
- Enabled for remote ArgoCD deployments
-
members-application-sets/:
- Spoke cluster ApplicationSet templates
- Used in distributed ArgoCD setup
- Referenced by spoke ArgoCD instances
- Resource deployment definitions
-
members-init-v2/:
- Spoke initialization configurations
- ApplicationSet deployment settings
- Remote ArgoCD management setup
In this model, the hub cluster directly manages all spoke clusters. This is configured by setting use_remote_argo: "false" in the spoke cluster configuration.
Characteristics:
- Hub cluster's ArgoCD instance manages all resources on spoke clusters
- Single point of control and visibility
- Suitable for smaller fleets or environments requiring centralized governance
In this model, the hub deploys ArgoCD to spoke clusters, which then manage their own resources. This is configured by setting use_remote_argo: "true" in the spoke cluster configuration.
Characteristics:
- Improved scalability for large fleets
- Spoke clusters can operate independently
- Reduced load on the hub cluster
- Configurable with
enable_remote_resourcesandenable_remote_addonsflags
The system uses a hierarchical approach to configuration:
- Default values: Base configurations for all clusters
- Tenant values: Configurations specific to a tenant
- Environment values: Configurations specific to an environment (dev, prod, etc.)
- Cluster values: Configurations specific to individual clusters
This hierarchy allows for flexible configuration management while maintaining consistency across the fleet.
The system supports multiple release types through the versions/applicationSets.yaml file:
- Default release: The standard version to deploy when no specific version is requested
- Named releases: Specific versions that can be deployed to selected clusters
The useVersionSelectors flag enables version-specific deployments by adding version labels to cluster secrets.
- Store spoke credentials in AWS Secrets Manager
- Create a fleet member configuration in
fleet-bootstrap/fleet-members/ - The hub cluster's external-secrets operator pulls the credentials
- Based on the configuration flags (
use_remote_argo,enable_remote_resources, etc.), the appropriate resources are deployed to the spoke cluster
When using the distributed model (use_remote_argo: "true"):
- ArgoCD is deployed to the spoke cluster
- External secrets operator is deployed to the spoke cluster
- The spoke cluster's ArgoCD instance is configured to connect to the Git repositories
- ApplicationSets from
members-application-sets/are deployed to the spoke cluster - The spoke cluster begins managing its own resources based on the
enable_remote_resourcesandenable_remote_addonsflags
The _import_values.tpl template dynamically merges values from component files based on the mergeValues configuration. This allows for modular configuration of different components (addons, resources, monitoring, etc.) while maintaining a single source of truth.
The system uses several types of selectors to determine which resources to deploy:
- Global selectors: Applied to all ApplicationSets
- Component-specific selectors: Applied to specific components (e.g., addons, resources)
- Version selectors: Used to deploy specific versions of components
The selector logic in addons.yaml handles three cases:
- When
useSelectorsis true, include all group releases - When a specific group release is defined, only include matching releases
- Otherwise, use the default release
- Modular Configuration: Use the hierarchical configuration approach to maintain consistency while allowing for customization
- Version Control: Manage releases through the
versions/applicationSets.yamlfile - Deployment Model Selection: Choose the appropriate deployment model based on fleet size and governance requirements
- Secret Management: Use AWS Secrets Manager and external-secrets operator for secure credential management
- Tenant Isolation: Use the tenant structure to isolate configurations between different teams or environments
- Shared cluster add-ons
- Common controllers
- Reusable components
- Default configurations
- Fleet-wide Helm charts
- Shared configurations
- Reusable templates
- Common resource definitions
- Shared configurations
- Fleet-wide settings
- External-secrets operator deployed
- Fleet bootstrap components configured
- Secret store established for AWS Secrets Manager access
# Example spoke cluster configuration
tenant: "eks-auto"
clusterName: "spoke-auto-workload1"
secretManagerSecretName: "hub-cluster/spoke-workload1"
# Management Model Selection
use_remote_argo: "false" # Distributed ArgoCD deployment
enable_remote_resources: "false" # Spoke self-management
enable_remote_addons: "false" # Spoke addon management
use_external_secrets: "false" # External secrets on spoke
push_argo_secrets: "false" # Push secrets to spokeThe addons and resources folders serve two critical purposes in the GitOps Fleet Management system:
-
Override Default Values: They provide a hierarchical structure to override default values from the application-sets helm chart that is versioned and pushed to ECR.
-
Define Cluster Labels: They contain configuration files that define labels applied to cluster secrets, which control what components are deployed to each cluster.
addons/
├── bootstrap/ # Bootstrap configurations
├── control-plane/ # Control plane specific configurations
├── defaults/ # Default configurations for all clusters
│ ├── addons/ # Default addon configurations
│ ├── fleet/ # Default fleet configurations
│ ├── monitoring/ # Default monitoring configurations
│ └── application-sets/ # Default application-sets configurations
└── tenants/ # Tenant-specific configurations
├── defaults/ # Default configurations for all tenants
│ ├── addons/
│ ├── fleet/
│ │ └── fleet-secret/
│ │ └── values.yaml # Defines cluster labels and releases
│ └── monitoring/
└── tenant1/ # Specific tenant configurations
The file addons/tenants/defaults/fleet/fleet-secret/values.yaml defines which releases and components will be deployed on clusters:
externalSecret:
labels:
fleetRelease: default
addonsRelease: release1
monitoringRelease: default
enable_metrics_server: "true"
enable_external_secrets: "true"These labels are applied to the cluster secrets and determine:
- Which version of each component group (fleet, addons, monitoring) to deploy
- Which specific components to enable (metrics-server, external-secrets, etc.)
The fleet-hub-external-secrets.yaml ApplicationSet in the fleet/fleet-bootstrap folder uses these configurations to:
- Read the cluster registration information from AWS Secrets Manager
- Apply the labels defined in the values.yaml files
- Create ArgoCD secrets for cluster registration
- Configure the deployment model (centralized or distributed) based on the labels
The ApplicationSet uses a matrix generator to:
- Find all registered clusters in the fleet-members directory
- Read the version information from fleetSecrets.yaml
- Combine this information to create properly labeled secrets
The labels on these secrets are then used by other ApplicationSets (addons.yaml, resources.yaml, etc.) to determine what to deploy to each cluster based on selector matching.
The GitOps Fleet Management system uses a configuration hierarchy that allows for flexible overrides at multiple levels:
-
Cluster-specific configurations
- Path:
addons/tenants/{tenant}/clusters/{cluster-name}/{component} - Most specific configurations that apply to a single cluster
- Path:
-
Environment-specific configurations
- Path:
addons/tenants/{tenant}/environments/{environment}/{component} - Configurations shared by all clusters in a specific environment
- Path:
-
Tenant-specific configurations
- Path:
addons/tenants/{tenant}/defaults/{component} - Configurations shared by all clusters belonging to a tenant
- Path:
-
Global tenant defaults
- Path:
addons/tenants/defaults/{component} - Default configurations for all tenants
- Path:
-
System-wide defaults
- Path:
addons/defaults/{component} - Base configurations for the entire system
- Path:
When deploying components to a cluster, the system resolves values by merging files from the hierarchy above. This is handled in the ApplicationSet templates, which include value files from each level in the hierarchy.
For example, in the fleet-hub-external-secrets.yaml ApplicationSet:
valueFiles:
- $values/{{ .metadata.annotations.addons_repo_basepath }}/bootstrap/defaults/{{.values.applicationSetGroup}}.yaml
- $values/{{ .metadata.annotations.addons_repo_basepath }}/{{ .metadata.labels.tenant }}/defaults/{{ .values.chartName }}/{{.values.applicationSetGroup}}.yaml
- $values/{{ .metadata.annotations.addons_repo_basepath }}/{{ .metadata.labels.tenant }}/clusters/{{ .name }}/{{ .values.chartName }}/{{.values.applicationSetGroup}}.yaml
- $values/{{ .metadata.annotations.addons_repo_basepath }}/{{ .metadata.labels.tenant }}/environments/{{ .metadata.labels.environment }}/{{ .values.chartName }}/{{.values.applicationSetGroup}}.yamlThis approach allows for:
- Common configurations to be defined once at a higher level
- Specific overrides to be applied only where needed
- Consistent base configurations across the fleet
- Flexibility to customize at any level of the hierarchy
The following steps outline the process of adding a new cluster to the GitOps Fleet Management system:
Create a new YAML file in the appropriate tenant directory:
fleet/fleet-bootstrap/fleet-members/{hub-cluster}/{spoke-cluster}.yaml
tenant: "tenant1"
clusterName: "spoke-dev-workload1"
# If you geting a secret from remote account use arn or from the same account we can just use the nae of the secret
secretManagerSecretName: "arn:aws:secretsmanager:region:account:secret:hub-cluster/spoke-dev-workload1"
# GitHub repository access (if needed)
githubSecret: "true"
githubSecretName: "github/gitops/app-creds"
# Deployment model selection
use_remote_argo: "true" # Use distributed ArgoCD
enable_remote_resources: "true" # Enable spoke self-management for resources
enable_remote_addons: "true" # Enable spoke self-management for addons
use_fleet_ack: "false" # Don't use fleet ACK controllers
use_argocd_ingress: "false" # Don't create ArgoCD ingressCreate or modify values files to customize the cluster's configuration:
# addons/tenants/tenant1/clusters/spoke-dev-workload1/fleet-secret/values.yaml
externalSecret:
labels:
fleetRelease: default
addonsRelease: release1
monitoringRelease: default
enable_metrics_server: "true"
enable_external_secrets: "true"
enable_aws_load_balancer_controller: "true"Commit the changes to the Git repository and push to trigger the GitOps workflow.
- The
fleet-hub-external-secrets.yamlApplicationSet detects the new cluster configuration - It creates an external secret that pulls the cluster's kubeconfig from AWS Secrets Manager
- The external secret creates an ArgoCD cluster secret with the appropriate labels
- Based on the
use_remote_argoflag:- If
false: The hub's ArgoCD begins managing the cluster directly - If
true: The hub deploys ArgoCD to the spoke cluster, which then manages its own resources
- If
Based on the labels applied to the cluster secret:
- The appropriate version of each component group (fleet, addons, monitoring) is deployed
- Specific components are enabled or disabled according to the labels
- Configuration values are applied following the hierarchy described earlier
The GitOps Fleet Management system includes Terraform configurations for provisioning both hub and spoke clusters. These configurations handle the initial infrastructure setup before the GitOps processes take over.
The hub cluster is the central management cluster that orchestrates the fleet. Its Terraform configuration:
-
Creates the EKS Cluster
- Sets up the hub cluster with appropriate node groups
- Configures VPC and networking components
- Establishes IAM roles and permissions
-
Installs Initial Components
- Deploys ArgoCD as the GitOps engine
- Sets up external-secrets operator for secret management
- Configures repository access using GitHub App credentials
-
Establishes GitOps Bridge
- Creates Kubernetes secrets for Git repository access
- Bootstraps the initial ApplicationSet that will manage the fleet
- Points to the fleet repository and bootstrap directory
-
Registers the Hub Cluster
- Creates an AWS Secrets Manager secret for the hub cluster
- Stores cluster metadata and connection information
- Sets up IAM roles for ArgoCD to manage spoke clusters
-
Configures Pod Identity
- Sets up AWS IAM roles for service accounts
- Creates roles for external-secrets, ArgoCD, and other components
- Stores role ARNs in SSM parameters for cross-account access
Spoke clusters are the workload clusters managed by the hub. Their Terraform configuration:
-
Creates the EKS Cluster
- Sets up the spoke cluster with appropriate node groups
- Configures VPC and networking components
- Establishes IAM roles and permissions
-
Establishes Hub-Spoke Connection
- Creates an IAM role that allows the hub's ArgoCD to access the spoke
- Sets up trust relationships between hub and spoke accounts
- Configures cross-account access policies
-
Registers the Spoke Cluster
- Creates an AWS Secrets Manager secret containing the spoke's connection information
- Includes cluster metadata, endpoint, and certificate data
- Sets up a KMS key for encrypting the secret
- Configures a resource policy allowing the hub's external-secrets to access the secret
-
Optional: Configures Distributed ArgoCD
- When using the distributed model, sets up IAM roles for the spoke's ArgoCD
- Configures pod identity for the spoke's components
- Establishes permissions for the spoke to manage its own resources
The Terraform configurations implement a secure cross-account access model:
-
Hub Account
- ArgoCD uses pod identity to assume a role for managing spoke clusters
- External-secrets uses pod identity to access secrets in spoke accounts
- Role ARNs are stored in SSM parameters for reference
-
Spoke Accounts
- Create IAM roles that trust the hub's ArgoCD role
- Set up resource policies on secrets to allow access from the hub
- Use KMS encryption for secure secret storage
After Terraform provisions the infrastructure:
- The hub cluster's ArgoCD instance is bootstrapped with an initial ApplicationSet
- This ApplicationSet points to the fleet bootstrap directory in the Git repository
- ArgoCD applies the configurations from the repository, setting up the fleet management system
- The system transitions from infrastructure-as-code to GitOps for ongoing management
This approach provides a clean separation between initial infrastructure provisioning (Terraform) and ongoing configuration management (GitOps), while ensuring secure cross-account access between hub and spoke clusters.
The complete process for onboarding a new cluster to the GitOps Fleet Management system involves both Terraform infrastructure provisioning and GitOps configuration. Here's the end-to-end workflow:
- Deploy the hub cluster using the Terraform configuration in
terraform/hub/ - This installs ArgoCD and bootstraps the initial ApplicationSet
- The hub cluster is registered in AWS Secrets Manager
- IAM roles and pod identity are configured for cross-account access
- Deploy the spoke cluster using the Terraform configuration in
terraform/spokes/ - Configure cross-account access between hub and spoke
- Register the spoke cluster in AWS Secrets Manager
- Set up resource policies to allow the hub to access the spoke's secret
- The bootstrapped ApplicationSet on the hub discovers the fleet bootstrap directory
- It applies the configurations from
fleet/bootstrap/ - This sets up the core ApplicationSets for addons, resources, and fleet management
- Create a spoke cluster configuration file in
fleet/fleet-bootstrap/fleet-members/{hub-cluster}/{spoke-cluster}.yaml - Configure the deployment model (centralized or distributed) and other settings
- Commit and push the changes to the Git repository
- The
fleet-hub-external-secrets.yamlApplicationSet detects the new configuration - It creates an external secret that pulls the spoke's connection information from AWS Secrets Manager
- The external secret creates an ArgoCD cluster secret with labels from the configuration
- ArgoCD registers the spoke cluster and begins managing it according to the deployment model
- The hub's ArgoCD directly deploys components to the spoke cluster
- ApplicationSets for addons and resources select the spoke based on its labels
- Components are deployed according to the configuration hierarchy
- The hub deploys ArgoCD to the spoke cluster
- The hub configures the spoke's ArgoCD with repository access
- The spoke's ArgoCD deploys components based on ApplicationSets from
members-application-sets/ - The spoke manages its own resources according to the
enable_remote_resourcesandenable_remote_addonsflags
- Update values files in the appropriate location in the configuration hierarchy
- Commit and push changes to the Git repository
- ArgoCD automatically applies the changes to the affected clusters
- Update the
versions/applicationSets.yamlfile to change component versions - Update cluster labels in
addons/tenants/defaults/fleet/fleet-secret/values.yamlor cluster-specific overrides - Commit and push changes to the Git repository
- ArgoCD automatically updates components to the specified versions
This end-to-end process combines infrastructure-as-code for initial provisioning with GitOps for ongoing management, providing a scalable and maintainable approach to managing a fleet of Kubernetes clusters.