feat: bootstrap agent identities and automated install flow#37
feat: bootstrap agent identities and automated install flow#37EfrainGaray wants to merge 16 commits intoopenclaw:mainfrom
Conversation
|
@codex please run a full review of this PR focusing on identity bootstrap, workspace seed idempotency, SQL reconcile safety, and auto-install flow regressions. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c5ca81a6fb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @if [[ "$(AUTO_PURGE)" == "1" ]]; then \ | ||
| $(MAKE) purge CONFIRM=1 ENV="$(ENV)" INVENTORY="$(INVENTORY)" LIMIT="$(LIMIT)"; \ | ||
| fi | ||
| @$(MAKE) auth-sync ENV="$(ENV)" INVENTORY="$(INVENTORY)" LIMIT="$(LIMIT)" PROFILES="$(PROFILES)" OAUTH_PROVIDER="$(OAUTH_PROVIDER)" MODEL_REF="$(MODEL_REF)" |
There was a problem hiding this comment.
Reorder auto-install to run install before auth-sync
auto-install currently calls make auth-sync before make install, but ops/auth-sync.sh requires an existing openclaw account (id -u openclaw) and invokes /home/openclaw/.local/bin/openclaw to set models; on a fresh host (or after purging binaries/users), this fails before any provisioning starts, so the advertised automated install flow is not bootstrappable.
Useful? React with 👍 / 👎.
| -e openclaw_control_plane_enabled=true \ | ||
| -e openclaw_control_plane_manage_stack=true \ |
There was a problem hiding this comment.
Respect inventory control-plane toggle during install
ops/install.sh unconditionally passes -e openclaw_control_plane_enabled=true, which overrides environment inventories and forces the control-plane role even where no openclaw_control_plane_profiles are defined; this causes installs for non-dev inventories to fail at the role assertion that requires a non-empty profile list.
Useful? React with 👍 / 👎.
| required_keys=( | ||
| "vault_openclaw_gateway_token_dev_main" | ||
| "vault_openclaw_gateway_token_andrea" | ||
| "vault_openclaw_cp_postgres_password_efra_core" | ||
| "vault_openclaw_cp_nats_password_efra_core" |
There was a problem hiding this comment.
Derive required secret keys from selected environment
validate-secrets.sh hardcodes dev-specific keys (dev_main, andrea, efra_core) and is always invoked by ops/install.sh, so valid staging/prod vault files with different profile names are rejected as missing placeholders before Ansible runs; this blocks installs outside the baked-in topology.
Useful? React with 👍 / 👎.
| - name: Seed AGENTS.md per agent workspace | ||
| ansible.builtin.template: | ||
| src: workspace-agents.md.j2 | ||
| dest: "{{ agent_workspace }}/AGENTS.md" |
There was a problem hiding this comment.
Prevent persona files from overwriting in shared workspace
Workspace seed templates are written per agent to {{ agent_workspace }}/AGENTS.md (and related files) without enforcing unique workspace paths, so when multiple agents inherit the same default workspace only the last loop iteration survives; this breaks identity bootstrap correctness and makes seeded persona docs non-idempotent for shared-workspace profiles.
Useful? React with 👍 / 👎.
| CREATE ROLE {{ profile.postgres_user | default('openclaw') }} LOGIN PASSWORD '{{ profile.postgres_password | replace("'", "''") }}'; | ||
| ELSE | ||
| ALTER ROLE {{ profile.postgres_user | default('openclaw') }} WITH LOGIN PASSWORD '{{ profile.postgres_password | replace("'", "''") }}'; |
There was a problem hiding this comment.
Quote postgres role identifier in reconcile block
The reconcile SQL injects profile.postgres_user directly into CREATE ROLE/ALTER ROLE identifiers, so profile-specific usernames containing characters like -, uppercase, or quotes produce invalid SQL (or malformed statements) and break reconciliation; use identifier-safe quoting (e.g., format('%I', ...)) instead of raw interpolation.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 71d4bfcfa0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| read_yaml_value() { | ||
| local key="$1" | ||
| awk -F': *' -v key="$key" '$1 == key {sub(/^["'"'"']/, "", $2); sub(/["'"'"']$/, "", $2); print $2; exit}' "${vault_file}" |
There was a problem hiding this comment.
Bypass plaintext key scan for ansible-vault files
read_yaml_value parses vault.yml with a raw awk key lookup, so encrypted vault files (which start with $ANSIBLE_VAULT) always return empty values and are reported as missing secrets. Because ops/install.sh runs this validator before Ansible, installs fail immediately in environments that keep group_vars/vault.yml encrypted, even though Ansible itself could decrypt and use those values.
Useful? React with 👍 / 👎.
| run_sudo docker compose -f /home/efra/openclaw-control-plane/efra-core/docker-compose.yml -p ocp-efra-core ps >/dev/null | ||
| run_sudo docker compose -f /home/efra/openclaw-control-plane/andrea/docker-compose.yml -p ocp-andrea ps >/dev/null |
There was a problem hiding this comment.
Parameterize smoke checks instead of fixed dev stack paths
The smoke workflow is hardcoded to efra-core/andrea compose files under /home/efra, so make smoke fails on hosts where control-plane project directories or profile names differ (including non-dev inventories). This breaks the advertised ENV/INVENTORY-driven operations flow by coupling smoke validation to one specific machine layout instead of deployed inventory data.
Useful? React with 👍 / 👎.
| environment: | ||
| METRICS_PORT: 9413 | ||
| WORKER_AGENT_ID: browser-login | ||
| NATS_URL: nats://{{ profile.nats_user | default('queue') }}:{{ profile.nats_password }}@127.0.0.1:{{ profile.nats_host_port | default(14222) }} |
There was a problem hiding this comment.
Use service-network NATS endpoint for browser worker
This worker always overrides NATS_URL to 127.0.0.1:<host-port>, but host networking is enabled only when worker_exec_mode is openclaw; with the default stub mode the container stays on the compose bridge, where 127.0.0.1 is itself, not the NATS service. In that configuration worker-browser-login cannot connect to NATS and browser-login intents will not be consumed.
Useful? React with 👍 / 👎.
|
Follow-up after permission hardening commit What was fixed
Full recovery validation (fresh cycle)
Telegram E2E evidence
Regression
|
|
Thanks for the substantial contribution here. There’s clearly a lot of thoughtful work in this PR, and the intent is well documented. The multi-profile gateway setup, agent workspace seeding, control-plane role, and operator automation all move toward a more full-featured deployment framework for an advanced multi-agent environment. That said, I don’t think this is a fit for this repository’s current goals. The purpose of None of that is inherently bad, and it may be genuinely useful for some users, but it’s a different product direction than the one this repo is trying to serve. Because of that, I don’t think we should merge it into I appreciate the effort and the detail that went into this. If you want to keep developing this direction, it may make sense as a separate repo or an extension layered on top of this installer. |
Summary
AGENTS.md,SOUL.md,IDENTITY.md,USER.md)name+identity+agent_personasfordev-mainagents (main,research,browser-login,coolify-ops) andandrea/mainDO $$ ... $$) to avoid install breakagemake auto-installto automateauth-sync + install + smoke(with optional backup/purge toggles)Validation
ansible-playbook --syntax-check playbook.ymlansible-playbook --syntax-check -i inventories/dev/hosts.yml playbooks/enterprise.ymlansible-playbook --syntax-check -i inventories/dev/hosts.yml playbooks/control-plane-only.ymlmake -n auto-install ENV=dev LIMIT=zennook PROFILES='dev-main andrea' OAUTH_PROVIDER=openai-codex MODEL_REF=openai-codex/gpt-5.3-codexReviewer Focus
@codex please review these points in depth:
inventories/dev/group_vars/all.ymlroles/openclaw_enterprise/tasks/main.ymlroles/openclaw_control_plane/tasks/profile.ymlMakefileauto-installflow safety and expected operator ergonomicsarchitecture-installed-layout.md,AGENT_HANDOFF.md)Target after review: merge to
mainforv0.0.0baseline.