Skip to content

Commit

Permalink
Remove extra logging receivers and add addtional labels to cloud ops …
Browse files Browse the repository at this point in the history
…config
  • Loading branch information
abbas1902 committed Sep 27, 2024
1 parent 36e226c commit 0e1ba21
Show file tree
Hide file tree
Showing 22 changed files with 241 additions and 175 deletions.
49 changes: 49 additions & 0 deletions .github/workflows/pre-commit.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
# Copyright 2024 "Google LLC"
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name: 'Use pre-commit to validate Pull Request'

# yamllint disable-line rule:truthy
on:
pull_request:
types:
- edited
- opened
- labeled
- synchronize
branches:
- master
- v5

jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.10'
check-latest: true
cache: 'pip'
- uses: terraform-linters/setup-tflint@v4
with:
tflint_version: v0.49.0
- run: tflint --init
env:
# https://github.com/terraform-linters/tflint/blob/master/docs/user-guide/plugins.md#avoiding-rate-limiting
GITHUB_TOKEN: ${{ github.token }}
- uses: pre-commit/[email protected]
with:
extra_args: --show-diff-on-failure --all-files
52 changes: 20 additions & 32 deletions ansible/roles/cloudagents/templates/ops_agent.yaml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -14,66 +14,54 @@

logging:
receivers:
slurmdbd:
slurm_daemon:
type: files
include_paths:
- /var/log/slurm/slurmdbd.log
slurmrestd:
type: files
include_paths:
- /var/log/slurm/slurmrestd.log
slurmctld:
type: files
include_paths:
- /var/log/slurm/slurmctld.log
slurmd:
type: files
include_paths:
- /var/log/slurm/slurmd-*.log
slurm_resume:
type: files
include_paths:
- /var/log/slurm/resume.log
slurm_suspend:
- /var/log/slurm/slurmrestd.log
record_log_file_path: true
slurm:
type: files
include_paths:
- /var/log/slurm/suspend.log
slurm_sync:
type: files
include_paths:
- /var/log/slurm/slurmsync.log
- /var/log/slurm/resume.log
record_log_file_path: true
setup:
type: files
include_paths:
- /slurm/scripts/setup.log
record_log_file_path: true
processors:
parse_slurmlog:
type: parse_regex
field: message
regex: "^\[(?<time>\S+)\] (?<message>((?<severity>(fatal|error|verbose|debug[0-9]?)):)?.*)$"
#time_key: time
#time_format: "%Y-%M-%dT%H:%M:%S.%L"
regex: '^\[(?<time>\S+)\] (?<message>((?<severity>(fatal|error|verbose|debug[0-9]?)):)?.*)$'
parse_slurmlog2:
type: parse_regex
field: message
regex: "^(?<time>\S+ \S+) (?<message>(?<severity>(CRITICAL|ERROR|WARNING|INFO|DEBUG))(\(\S+\))?:.*)$"
#time_key: time
#time_format: "%Y-%M-%d %H:%M:%S,%L"
regex: '^(?<time>\S+ \S+) (?<message>(?<severity>(CRITICAL|ERROR|WARNING|INFO|DEBUG))(\(\S+\))?:.*)$'
add_cluster_info:
type: modify_fields
fields:
labels."cluster_name":
static_value: placeholder_clustername
labels."hostname":
static_value: placeholder_hostname
service:
pipelines:
slurmlog_pipeline:
receivers:
- slurmdbd
- slurmrestd
- slurmctld
- slurmd
- slurm_daemon
processors:
- parse_slurmlog
- add_cluster_info
slurmlog2_pipeline:
receivers:
- slurm_resume
- slurm_suspend
- slurm_sync
- slurm
- setup
processors:
- parse_slurmlog2
- add_cluster_info
1 change: 0 additions & 1 deletion scripts/resume.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,6 @@ def instance_properties(nodeset, model, placement_group, labels=None):
"startup-script": (
Path(cfg.slurm_scripts_dir or util.dirs.scripts) / "startup.sh"
).read_text(),
"VmDnsSetting": "GlobalOnly",
}
info_metadata = {
item.get("key"): item.get("value") for item in template_info.metadata["items"]
Expand Down
32 changes: 32 additions & 0 deletions scripts/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
import sys
import stat
import time
import yaml
from pathlib import Path

import util
Expand Down Expand Up @@ -331,6 +332,36 @@ def configure_dirs():
scripts_log.symlink_to(dirs.log)


def configure_cloud_ops():
cloudOpsStatus = run(
"systemctl is-active --quiet google-cloud-ops-agent.service"
).returncode
if cloudOpsStatus == 0:
try:
with open("/etc/google-cloud-ops-agent/config.yaml", "r") as f:
file = yaml.safe_load(f)
file["logging"]["processors"]["add_cluster_info"]["fields"][
'labels."cluster_name"'
]["static_value"] = f"{cfg.slurm_cluster_name}"
file["logging"]["processors"]["add_cluster_info"]["fields"][
'labels."hostname"'
]["static_value"] = f"{lkp.hostname}"

with open("/etc/google-cloud-ops-agent/config.yaml", "w") as f:
yaml.dump(file, f, sort_keys=False)

except Exception as e:
log.exception(
"Cloud Ops Agent setup has encountered an exception while trying to edit its configuration"
)
raise e

run("systemctl restart google-cloud-ops-agent.service", timeout=30)

log.info("Check status of cloud-ops agent")
run("systemctl status google-cloud-ops-agent.service")


def setup_controller(args):
"""Run controller setup"""
log.info("Setting up controller")
Expand Down Expand Up @@ -477,6 +508,7 @@ def setup_compute(args):
def main(args):
start_motd()
configure_dirs()
configure_cloud_ops()

# call the setup function for the instance type
setup = dict.get(
Expand Down
8 changes: 4 additions & 4 deletions terraform/_network/README_TF.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,16 +41,16 @@ No resources.

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_auto_create_subnetworks"></a> [auto\_create\_subnetworks](#input\_auto\_create\_subnetworks) | When set to true, the network is created in 'auto subnet mode' and it will<br>create a subnet for each region automatically across the 10.128.0.0/9<br>address range. When set to false, the network is created in 'custom subnet mode'<br>so the user can explicitly connect subnetwork resources. | `bool` | `false` | no |
| <a name="input_delete_default_internet_gateway_routes"></a> [delete\_default\_internet\_gateway\_routes](#input\_delete\_default\_internet\_gateway\_routes) | If set, ensure that all routes within the network specified whose names begin<br>with 'default-route' and with a next hop of 'default-internet-gateway' are<br>deleted. | `bool` | `false` | no |
| <a name="input_auto_create_subnetworks"></a> [auto\_create\_subnetworks](#input\_auto\_create\_subnetworks) | When set to true, the network is created in 'auto subnet mode' and it will<br/>create a subnet for each region automatically across the 10.128.0.0/9<br/>address range. When set to false, the network is created in 'custom subnet mode'<br/>so the user can explicitly connect subnetwork resources. | `bool` | `false` | no |
| <a name="input_delete_default_internet_gateway_routes"></a> [delete\_default\_internet\_gateway\_routes](#input\_delete\_default\_internet\_gateway\_routes) | If set, ensure that all routes within the network specified whose names begin<br/>with 'default-route' and with a next hop of 'default-internet-gateway' are<br/>deleted. | `bool` | `false` | no |
| <a name="input_description"></a> [description](#input\_description) | An optional description of this resource. The resource must be recreated to modify this field. | `string` | `""` | no |
| <a name="input_firewall_rules"></a> [firewall\_rules](#input\_firewall\_rules) | List of additional firewall rules. | `list(map(string))` | `[]` | no |
| <a name="input_mtu"></a> [mtu](#input\_mtu) | The network MTU. Must be a value between 1460 and 1500 inclusive. If set to 0<br>(meaning MTU is unset), the network will default to 1460 automatically. | `number` | `0` | no |
| <a name="input_mtu"></a> [mtu](#input\_mtu) | The network MTU. Must be a value between 1460 and 1500 inclusive. If set to 0<br/>(meaning MTU is unset), the network will default to 1460 automatically. | `number` | `0` | no |
| <a name="input_network_name"></a> [network\_name](#input\_network\_name) | The name of the network being created. | `string` | n/a | yes |
| <a name="input_project_id"></a> [project\_id](#input\_project\_id) | The ID of the project where this VPC will be created. | `string` | n/a | yes |
| <a name="input_routes"></a> [routes](#input\_routes) | List of routes being created in this VPC. | `list(map(string))` | `[]` | no |
| <a name="input_routing_mode"></a> [routing\_mode](#input\_routing\_mode) | The network routing mode (default 'GLOBAL') | `string` | `"GLOBAL"` | no |
| <a name="input_secondary_ranges"></a> [secondary\_ranges](#input\_secondary\_ranges) | Secondary ranges that will be used in some of the subnets | <pre>map(list(object({<br> range_name = string,<br> ip_cidr_range = string<br> })))</pre> | `{}` | no |
| <a name="input_secondary_ranges"></a> [secondary\_ranges](#input\_secondary\_ranges) | Secondary ranges that will be used in some of the subnets | <pre>map(list(object({<br/> range_name = string,<br/> ip_cidr_range = string<br/> })))</pre> | `{}` | no |
| <a name="input_shared_vpc_host"></a> [shared\_vpc\_host](#input\_shared\_vpc\_host) | Makes this project a Shared VPC host if 'true' (default 'false') | `bool` | `false` | no |
| <a name="input_subnets"></a> [subnets](#input\_subnets) | The list of subnets being created. | `list(map(string))` | `[]` | no |

Expand Down
Loading

0 comments on commit 0e1ba21

Please sign in to comment.