Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drive terraform from ansible inventory #240

Draft
wants to merge 57 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
cee5da4
wip - adding provision before switching
sjpb Nov 23, 2022
fcc2834
working TF templates - inventory/partitions probably not right
sjpb Nov 23, 2022
39bfa2c
add simple inventory templating
sjpb Nov 24, 2022
472f50b
remove duplicate basic_user definition in arcus
sjpb Nov 24, 2022
bf3c409
Merge branch 'feat/remove-cloud-init' into refactor/templatetf2
sjpb Nov 24, 2022
fb5cb9f
remove unneeded/dead cloud_init groupvars symlink from arcus
sjpb Nov 25, 2022
fb09911
add tf_autoapprove
sjpb Nov 30, 2022
95483b3
use inventory_hostname.cluster_name.tld as hostname
sjpb Nov 30, 2022
7af1300
autogenerate cluster-prefixed groups for partitions
sjpb Nov 30, 2022
63092b8
make OOD proxying more flexible for compute nodenames with autogenera…
sjpb Dec 1, 2022
8d6bc2e
revert arcus nodenames to original pattern
sjpb Dec 1, 2022
be7b812
tidy up cluster.yml files
sjpb Dec 1, 2022
487813d
update CI workflow for ansible-driven TF
sjpb Dec 1, 2022
b54644f
bump CI setup-terraform action version
sjpb Dec 1, 2022
175161e
don't ask for user confirmation on provisioning in CI
sjpb Dec 14, 2022
70422f6
Merge branch 'main' into refactor/templatef3
sjpb Dec 14, 2022
a9becff
debug CI: disable setup-terraform
sjpb Dec 14, 2022
7ad999a
remove unneeded cloud-init environment directory
sjpb Dec 15, 2022
a146783
use sd* prefixed devices for volumes on arcus/CI
sjpb Dec 15, 2022
792d4d7
fix CI slurm checks now clustername not in slurm nodename
sjpb Dec 15, 2022
577f115
make partition order & hence default partition match main branch
sjpb Dec 15, 2022
4cc2287
make rebuild adhoc support inventory_hostname != instance name
sjpb Dec 16, 2022
baf8272
fix slurm-driven rebuild for 'short' nodenames
sjpb Dec 16, 2022
c6fcb56
remove duplicate/wrong extension tf template file
sjpb Jan 17, 2023
f5c8120
Merge branch 'main' into refactor/templatef3
sjpb Jan 17, 2023
21d3a1c
add node_ and cluster_ prefixes to tf-templated vars
sjpb Jan 17, 2023
3d90921
make openhpc_slurm_partitions groups templating stable
sjpb Jan 17, 2023
00bef16
fix arcus/CI image (from main branch)
sjpb Jan 17, 2023
f8e6455
move provision playbook to top level
sjpb Jan 17, 2023
52510e8
add instance_id to host vars (allows use from rebuild ad-hoc)
sjpb Jan 17, 2023
763b08c
fix CI
sjpb Jan 17, 2023
f7ae629
define node_fqdn for brevity
sjpb Jan 17, 2023
32ca35c
fix templating of network + subnet
sjpb Jan 17, 2023
b137c94
support multiple interfaces
sjpb Jan 18, 2023
295fe8a
only create per-cluster security-group TF objects
sjpb Jan 19, 2023
8a5c3d4
make node_volumes autogenerate appropriate userdata
sjpb Jan 19, 2023
0aaf136
minor tweaks to TF for clarity
sjpb Feb 3, 2023
ff3c440
support volume type
sjpb Feb 3, 2023
f6743dd
support cluster_ssh_keys
sjpb Feb 3, 2023
8cbeffd
add node_flavor_name and node_flavor_id
sjpb Feb 3, 2023
b356f05
add support for FIPs
sjpb Feb 3, 2023
5e00a6a
add auth scope data resource
sjpb Feb 3, 2023
c50d475
bugfix node TF template
sjpb Feb 3, 2023
0c4fabf
support either node_image_id or node_image_name
sjpb Feb 3, 2023
132ec99
add cluster-specific security groups + rules
sjpb Feb 3, 2023
2a5c41e
disable workflow for PRs tagged 'no-ci'
sjpb Feb 3, 2023
3a1490a
add comments on default cluster vars
sjpb Feb 3, 2023
3e88e10
remove unneeded tf_template_path
sjpb Feb 3, 2023
c7ec87d
remove identity scope - CaaS-specific
sjpb Feb 3, 2023
5cf07cf
add indentation in hosts template
sjpb Feb 3, 2023
733441c
produce yaml inventory hosts file
sjpb Feb 3, 2023
7718e7b
export entire network info in inventory host file
sjpb Feb 3, 2023
19557dd
fix no-ci logic
sjpb Feb 3, 2023
9e57904
cleanup cluster.yml definitions
sjpb Feb 3, 2023
6a47b7f
add terraform role README
sjpb Feb 3, 2023
9e77a95
Merge branch 'main' into refactor/templatef3
sjpb Feb 7, 2023
916e313
change fqdn->node_fqnd and tld->cluster_domain_suffix
sjpb Feb 1, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions .github/workflows/stackhpc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ on:
pull_request:
jobs:
openstack:
if: ${{ github.event.label.name != 'no-ci' }}
name: openstack-ci-${{ matrix.cloud }}
strategy:
matrix:
Expand All @@ -19,7 +20,7 @@ jobs:
env:
ANSIBLE_FORCE_COLOR: True
OS_CLOUD: openstack
TF_VAR_cluster_name: ci${{ github.run_id }}
CI_CLUSTER_NAME: ci${{ github.run_id }}
steps:
- uses: actions/checkout@v2

Expand All @@ -39,12 +40,12 @@ jobs:
- name: Install ansible etc
run: dev/setup-env.sh

- name: Install terraform
uses: hashicorp/setup-terraform@v1
# - name: Install terraform
# uses: hashicorp/setup-terraform@v2

- name: Initialise terraform
run: terraform init
working-directory: ${{ github.workspace }}/environments/${{ matrix.cloud }}/terraform
# - name: Initialise terraform
# run: terraform init
# working-directory: ${{ github.workspace }}/environments/${{ matrix.cloud }}/terraform

- name: Write clouds.yaml
run: |
Expand All @@ -63,13 +64,12 @@ jobs:
env:
TESTUSER_PASSWORD: ${{ secrets.TEST_USER_PASSWORD }}

- name: Provision servers
- name: Provision cluster
id: provision_servers
run: |
. venv/bin/activate
. environments/${{ matrix.cloud }}/activate
cd $APPLIANCES_ENVIRONMENT_ROOT/terraform
terraform apply -auto-approve
ansible-playbook ansible/provision.yml -e tf_autoapprove=yes

- name: Get server provisioning failure messages
id: provision_failure
Expand Down Expand Up @@ -156,7 +156,7 @@ jobs:
run: |
. venv/bin/activate
. environments/${{ matrix.cloud }}/activate
ansible login -v -a "sudo scontrol reboot ASAP nextstate=RESUME reason='rebuild image:${{ steps.packer_build.outputs.NEW_COMPUTE_IMAGE_ID }}' ${TF_VAR_cluster_name}-compute-[0-3]"
ansible login -v -a "sudo scontrol reboot ASAP nextstate=RESUME reason='rebuild image:${{ steps.packer_build.outputs.NEW_COMPUTE_IMAGE_ID }}' compute-[0-3]"
ansible compute -m wait_for_connection -a 'delay=60 timeout=600' # delay allows node to go down
ansible-playbook -v ansible/ci/check_slurm.yml

Expand Down
4 changes: 3 additions & 1 deletion ansible/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,6 @@ roles/*
!roles/mysql/
!roles/mysql/**
!roles/systemd/
!roles/systemd/**
!roles/systemd/**
!roles/terraform/
!roles/terraform/**
4 changes: 3 additions & 1 deletion ansible/adhoc/rebuild.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@
become: no
gather_facts: no
tasks:
- command: "openstack server rebuild {{ instance_id | default(inventory_hostname) }}{% if rebuild_image is defined %} --image {{ rebuild_image }}{% endif %}"
- command: "openstack server rebuild {{ instance_id | default(instance_name) }}{% if rebuild_image is defined %} --image {{ rebuild_image }}{% endif %}"
delegate_to: localhost
vars:
instance_name: "{{ node_fqdn if cluster_domain_suffix is defined else inventory_hostname }}"
- wait_for_connection:
4 changes: 2 additions & 2 deletions ansible/ci/check_slurm.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,5 @@
<end>
vars:
expected_sinfo:
- "{{ openhpc_cluster_name }}-compute-[0-1] small* up 60-00:00:00 2 idle"
- "{{ openhpc_cluster_name }}-compute-[2-3] extra up 60-00:00:00 2 idle"
- "compute-[0-1] small* up 60-00:00:00 2 idle"
- "compute-[2-3] extra up 60-00:00:00 2 idle"
15 changes: 15 additions & 0 deletions ansible/provision.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
- hosts: all
become: no
gather_facts: no
tasks:
- import_role:
name: terraform
tasks_from: template.yml

- hosts: localhost
become: no
gather_facts: no
tasks:
- import_role:
name: terraform
tasks_from: apply.yml
4 changes: 2 additions & 2 deletions ansible/roles/etc_hosts/templates/hosts.j2
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

{% for hostname in groups['etc_hosts'] | sort -%}
{{ hostvars[hostname]['ansible_host'] }} {{ hostname }}
{% for inventory_hostname in groups['etc_hosts'] | sort -%}
{{ hostvars[inventory_hostname]['ansible_host'] }} {% if cluster_domain_suffix is defined %}{{ node_fqdn }}{% endif %} {{ inventory_hostname }}
{% endfor -%}
81 changes: 81 additions & 0 deletions ansible/roles/terraform/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# terraform

Define infrastructure using Terraform from Ansible inventory:
- Creates Terraform configuration files from Jinja templates using Ansible inventory variables.
- Runs `terraform apply` on the configuration.

Note by default interactive user confirmation is required before making changes to infrastructure.

# Role Variables

These are split into three:
- `cluster_*` variables define infrastructure parameters which are the same across all nodes. They must be defined as `all` groupvars.
- `node_*` variables define instructure parameters which *may* vary between nodes. They may be defined in `all` groupvars, specific groupvars or hostvars.
- `tf_*` variables control operation of the role itself.

## `cluster_*` and `node_*` variables

### Required variables
These have no default values and must be specified for an environment (e.g. in `environments/<environment>/inventory/cluster.yml`):
- `cluster_key_pair`: Name of an existing OpenStack keypair.
or
- `cluster_ssh_keys`: List of public SSH keys to add as authorized keys.
Note the private key for one of these public keys must be on the deploy host to allow configuration of the cluster.

- `cluster_network_name`: Required unless `node_interfaces` is defined. Name of existing network to use for cluster.
- `node_flavor_name` or `node_flavor_id`: Name or ID of flavor.
- `node_image_name` or `node_image_id`: Name or ID of image.

### Optional variables
Defaults for these are provided by [environments/common/inventory/cluster.yml](../../../environments/common/inventory/cluster.yml):

- `cluster_name`: Name of cluster. Defaults to the name of the current environment directory.
- `cluster_tld`: Top level domain name for nodes. Default `invalid` which is [guaranteed](https://www.rfc-editor.org/rfc/rfc2606.html#section-2) not to exist in global DNS.

- `cluster_security_groups`: List of security group mappings as follows:
- `name`: Required. Unique name for this security group.
- `description`: Required. Description.
- `rules`: Required. List of security group rule mappings (see [openstack_networking_secgroup_v2](https://registry.terraform.io/providers/terraform-provider-openstack/openstack/latest/docs/resources/networking_secgroup_rule_v2) for details) as follows:
- `direction`: Required, `ingress` or `egress`.
- `ethertype`: Optional, default `IPv4`.
- `remote_group`: Optional, name of a security group in `cluster_security_groups`.
- `protocol`: Required if `port` specified.
- `port`: Allowed port number.

Note security groups are cluster-specific (they will be prefixed with `cluster_name`) and any default OpenStack rules will not be applied. The defaults allow:
- All IPv4 traffic between all cluster nodes.
- All outbound IPv4 traffic from all cluster nodes.
- Inbound SSH and HTTPS on `login` nodes.

- `node_interfaces`: List of mappings defining the network interfaces for a node (see [openstack_networking_port_v2](https://registry.terraform.io/providers/terraform-provider-openstack/openstack/latest/docs/resources/networking_port_v2) for details) as follows:
- `network_name`: Required. Name of existing network.
- `fixed_ip`: Optional. Mapping defining the fixed/private IP as follows:
- `subname_name`: Optional, name of subnet to use.
- `ip_address`: Optional, address to use for port.
- `port_security_enabled`: Optional bool, whether to explictly enable or disable port security or use the OpenStack defaults (usually `true`). If `false` no security groups must be defined on this interface.
- `security_groups`: Optional, list of names of security groups defined in `cluster_security_groups`. NB: Currently externally-defined security groups cannot be applied.
- `binding`: Optional. Mapping defining port binding information:
- `vnic_type`: Optional. Type of VNIC, as per [openstack_networking_port_v2](https://registry.terraform.io/providers/terraform-provider-openstack/openstack/latest/docs/resources/networking_port_v2). Useful for RDMA-capable interfaces or baremetal nodes.
- `profile`: Optional. Custom binding profile mapping (converted to JSON internally). Useful for some RDMA-capable interfaces.

- `node_fqdn`: Fully-qualified domain name of the node. Default `{{ inventory_hostname }}.{{ cluster_name }}.{{ cluster_tld }}`.
- `node_volumes`: List of mappings defining empty OpenStack volumes to create and attach to the node as follows:
- `label`: Required. Label for volume, used to mount it.
- `description`: Optional.
- `size`: Size in GB of volume.
- `filesystem`: Optional. Type of filesystem, default `ext4`.
- `mount_point`: Required. Path of mount point (will be created if necessary).
- `mount_options`: Optional. Comma-separated string giving mount options as per the fourth (fs_mntops) field in `/etc/fstab`.
The default is to attach two volumes to the `control` node, for user `$HOME` and state information. The sizes of these are defined by `home_volume_size` and `state_volume_size`.
See also `node_volume_device_prefix`.
- `node_volume_device_prefix`: Prefix of path at which volumes will be mounted, default `/dev/vd`. Note if `virtio-scsi` properties are set on the image this should be changed to `/dev/sd`.
- `node_tf_ignore_changes`: List of strings giving Terraform [openstack_compute_instance_v2](https://registry.terraform.io/providers/terraform-provider-openstack/openstack/latest/docs/resources/compute_instance_v2) attributes to ignore when calculating changes to instances. See `ignore_changes` for Terraform's [lifecycle meta-argument](https://developer.hashicorp.com/terraform/language/meta-arguments/lifecycle).
- `node_user_data`: String of cloud-init userdata. Default is empty for all nodes except `control`, where it defines the filesystems and mounts for `node_volumes`.
- `node_floating_ip_address`: Address of floating IP to attach to a node. NB specifying a particular network is not currently supported.

## `tf_*` variables
Default values for these are provided in the role:
- `tf_project_path`: Optional. Where to output Terraform configuration files. Default `{{ appliances_environment_root }}/terraform/`.
- `tf_cluster_templates`: Optional. List of Jinja templates to template once per cluster, using `localhost`. Default is the in-role files `main.tf.j2`, `network.tf.j2`, `inventory.tf.j2`.
- `tf_host_templates`: Optional. List of Jinja templates to template per-host. Default is the in-role file `node.tf.j2`.
- `tf_autoapprove`: Optional bool. Set true to make infrastructure changes without user confirmation. Default `no`.
8 changes: 8 additions & 0 deletions ansible/roles/terraform/defaults/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
tf_project_path: "{{ appliances_environment_root }}/terraform/"
tf_cluster_templates: # these are templated once per cluster, on localhost
- main.tf.j2
- network.tf.j2
- inventory.tf.j2
tf_host_templates: # these are templated by each host
- node.tf.j2
tf_autoapprove: false
34 changes: 34 additions & 0 deletions ansible/roles/terraform/tasks/apply.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
- name: Create Terraform plan
community.general.terraform:
project_path: "{{ tf_project_path }}"
state: planned
plan_file: tf.plan
force_init: yes
register: _tf_plan

- name: Show Terraform plan
debug:
msg: "{{ _tf_plan.stdout }}"

- name: Confirm Terraform plan execution
pause:
prompt: "Do you want to execute this plan? (Only 'yes' executes)"
register: confirm_plan
when:
- "'No changes. Your infrastructure matches the configuration.' not in _tf_plan.stdout"
- 'not tf_autoapprove | bool'

- name: Execute Terraform plan
community.general.terraform:
project_path: "{{ tf_project_path }}"
state: present
plan_file: tf.plan
force_init: yes
when: |
((not confirm_plan.skipped | default(false)) and (confirm_plan.user_input | bool)) or
(tf_autoapprove | bool )
register: _tf_apply

- debug:
msg: "{{ _tf_apply.stdout }}"
when: "'stdout' in _tf_apply"
14 changes: 14 additions & 0 deletions ansible/roles/terraform/tasks/template.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
- name: Template host-independent Terraform configuration files
template:
src: "{{ item }}"
dest: "{{ tf_project_path }}/{{ item | splitext | first }}"
loop: "{{ tf_cluster_templates }}"
run_once: true
delegate_to: localhost

- name: Template host-dependent Terraform configuration files
template:
src: "{{ item }}"
dest: "{{ tf_project_path }}/{{inventory_hostname }}_{{ item | splitext | first }}"
loop: "{{ tf_host_templates }}"
delegate_to: localhost
24 changes: 24 additions & 0 deletions ansible/roles/terraform/templates/inventory.tf.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@

#jinja2: lstrip_blocks: "True"

resource "local_file" "hosts" {
content = <<-EOT
# define host addresses:
all:
hosts:
{% for inventory_hostname in groups['all'] | sort %}
{{ inventory_hostname}}:
ansible_host: ${[for n in openstack_compute_instance_v2.{{ inventory_hostname }}.network: n if n.access_network][0].fixed_ip_v4 }
networks:
${indent(16,yamlencode(openstack_compute_instance_v2.{{ inventory_hostname }}.network))}
{% endfor %}

# auto-define groups for Slurm partitions:
{% for part in (openhpc_slurm_partitions | sort(attribute='name') ) %}
{{ cluster_name }}_{{ part.name }}:
children:
{{ part.name }}
{% endfor %}
EOT
filename = "../inventory/hosts"
}
8 changes: 8 additions & 0 deletions ansible/roles/terraform/templates/main.tf.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
terraform {
required_version = ">= 0.14"
required_providers {
openstack = {
source = "terraform-provider-openstack/openstack"
}
}
}
51 changes: 51 additions & 0 deletions ansible/roles/terraform/templates/network.tf.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
#jinja2: lstrip_blocks: "True"

# NB: This template runs on localhost so has 'all' groupvars

{# Networks/subnets groups are specified as part of a node but aren't actually node-specific (data) resources, so this aggregates them #}

# Networks
{% set unique_network_names = (hostvars | json_query('*.node_interfaces[].network_name') | unique | sort ) %}
{% for network_name in unique_network_names %}
data "openstack_networking_network_v2" "{{ network_name }}" {
name = "{{ network_name }}"
}
{% endfor %}

# Subnets
{% set unique_network_subnet_pairs = (hostvars | json_query('*.node_interfaces[].[network_name, fixed_ip.subnet_name]') | unique | sort ) %}
{% for (network_name, subnet_name) in unique_network_subnet_pairs %}
{% if subnet_name is not none %}
data "openstack_networking_subnet_v2" "{{ network_name }}_{{ subnet_name }}" {
network_id = data.openstack_networking_network_v2.{{ network_name }}.id
name = "{{ network.subnet_name }}"
}
{% endif %}
{% endfor %}

# Security groups
{% for secgroup in cluster_security_groups %}
resource "openstack_networking_secgroup_v2" "{{ cluster_name}}_{{ secgroup.name }}" {
name = "{{cluster_name}}_{{ secgroup.name }}"
description = "{{ secgroup.description }}"
delete_default_rules = true
}

{% for secrule in secgroup.rules %}
resource "openstack_networking_secgroup_rule_v2" "{{cluster_name}}_{{ secgroup.name }}_{{ loop.index0 }}" {
direction = "{{ secrule.direction }}"
ethertype = "{{ secrule.ethertype | default('IPv4') }}"
{% if 'remote_group' in secrule %}
remote_group_id = openstack_networking_secgroup_v2.{{cluster_name}}_{{ secrule.remote_group }}.id
{% endif %}
{% if 'protocol' in secrule %}
protocol = "{{ secrule.protocol }}"
{% endif %}
{% if 'port' in secrule %}
port_range_min = {{ secrule.port }}
port_range_max = {{ secrule.port }}
{% endif %}
security_group_id = openstack_networking_secgroup_v2.{{ cluster_name }}_{{ secgroup.name }}.id
}
{% endfor %}
{% endfor %}
Loading