Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drive terraform from ansible inventory #240

Draft
wants to merge 57 commits into
base: main
Choose a base branch
from
Draft

Conversation

sjpb
Copy link
Collaborator

@sjpb sjpb commented Dec 1, 2022

Release Notes

Define infrastructure using Ansible inventory variables instead of Terraform variables:

  • Infrastructure is defined by new inventory vars cluster_* and node_* in an environments/<environment>/inventory/cluster.yml inventory file. See ansible/roles/terraform/README.md for available variables.
  • Infrastructure is provisioned using a new playbook: ansible-playbook ansible/provision.yml

New/changed functionality

  • Any node_*-prefixed var can vary per-node, so can be defined using groupvars (all or specific groups) and/or hostvars allowing far more flexibility in node/group definition.
  • More flexible definition of node names, which may now also use host ranges.
  • Support for multiple network interfaces per node.
  • Instance hostnames are now a fqdn.
  • Default terraform templates add instance_id variable to hosts to support operations on specific instances even with multiple identical hostnames.
  • TODO: what else?

Configuration

  • If creating a new environment with cookiecutter, a file environments/<environment>/inventory/cluster.yml will be created which contains example infrastructure definition. This should be modified as required.

  • Defaults are defined in environments/common/inventory/cluster.yml. Note that this is lower-precedence than all groupvars.

  • If necessary, the default terraform templates can be also replaced or extended with environment-specific ones. See TODO:

  • TODO: Add skeleton cluster config file.

Upgrading

Note that this functionality is opt-in; any current terraform in e.g. environments/<environment>/terraform/ will not be replaced unless the ansible/provision.yml playbook is run. To "upgrade" a cluster using the previous Terraform:

  1. Delete all Terraform templates TODO NOT the state files.
  2. Copy the skeleton file TODO to environments/<environment>/inventory/cluster.yml
  3. Modify inventory variables and/or Terraform templates
  4. Run the ansible/provision.yml playbook and cancel the apply if Terraform says there will be changes.
  5. Repeat steps 3 and 4 until Terraform reports no changes.

Design Notes

  1. Currently the "inputs" to the cluster are split between TF and ansible variables. This PR drives TF from ansible, so ansible is the "single source of truth".
  2. This design uses an actual inventory file to define the infrastructure meaning inventory hosts are defined before provisioning. This means:
    a. groupvars and hostvars can be used when templating ansible, as opposed to the CaaS approach where only all groupvars are available from the default localhost.
    b. With some tweaks to stackhpc.openhpc role environment-specific image builds will be possible without having deployed a cluster (as e.g. the slurm control hostname is already known)
    c. Ansible hostpatterns can be used to define nodenames, e,g. compute-[0:10].
  3. inventory_hostnames are now a short name (e.g. control which does not contain the cluster name. Actual hostnames are a fqdn including the cluster name. This change is rolled through to ansible/roles/etc_hosts and ansible/adhoc/rebuild.yml. This maintains the current (pervasive!) assumption that inventory_hostnames are resolvable names.
  4. The community.general.terraform role is wrapped to (by default) require user confirmation before making changes to infra.

TODO:

  • Remove the terraform in environments/skeleton/{{cookiecutter.environment}}/ - leaving it in currently to make merges easier.

@sjpb sjpb added the no-ci Don't run CI on this PR label Feb 3, 2023
@sjpb sjpb removed the no-ci Don't run CI on this PR label Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant