Skip to content

feat: bootstrap agent identities and automated install flow#37

Closed
EfrainGaray wants to merge 16 commits intoopenclaw:mainfrom
EfrainGaray:feature/auth-sync-codex-v0
Closed

feat: bootstrap agent identities and automated install flow#37
EfrainGaray wants to merge 16 commits intoopenclaw:mainfrom
EfrainGaray:feature/auth-sync-codex-v0

Conversation

@EfrainGaray
Copy link

Summary

  • add initial identity/persona bootstrap for agents using official OpenClaw docs model
  • seed per-agent workspace files on install (AGENTS.md, SOUL.md, IDENTITY.md, USER.md)
  • define name + identity + agent_personas for dev-main agents (main, research, browser-login, coolify-ops) and andrea/main
  • fix control-plane postgres reconcile SQL block (DO $$ ... $$) to avoid install breakage
  • add make auto-install to automate auth-sync + install + smoke (with optional backup/purge toggles)
  • include architecture and handoff docs for installed runtime layout and operational continuity

Validation

  • ansible-playbook --syntax-check playbook.yml
  • ansible-playbook --syntax-check -i inventories/dev/hosts.yml playbooks/enterprise.yml
  • ansible-playbook --syntax-check -i inventories/dev/hosts.yml playbooks/control-plane-only.yml
  • make -n auto-install ENV=dev LIMIT=zennook PROFILES='dev-main andrea' OAUTH_PROVIDER=openai-codex MODEL_REF=openai-codex/gpt-5.3-codex

Reviewer Focus

@codex please review these points in depth:

  1. agent identity/persona schema and mapping in inventories/dev/group_vars/all.yml
  2. workspace seeding task ordering/idempotency in roles/openclaw_enterprise/tasks/main.yml
  3. postgres SQL reconcile safety/compatibility in roles/openclaw_control_plane/tasks/profile.yml
  4. Makefile auto-install flow safety and expected operator ergonomics
  5. docs consistency with runtime behavior (architecture-installed-layout.md, AGENT_HANDOFF.md)

Target after review: merge to main for v0.0.0 baseline.

@EfrainGaray
Copy link
Author

@codex please run a full review of this PR focusing on identity bootstrap, workspace seed idempotency, SQL reconcile safety, and auto-install flow regressions.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c5ca81a6fb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@if [[ "$(AUTO_PURGE)" == "1" ]]; then \
$(MAKE) purge CONFIRM=1 ENV="$(ENV)" INVENTORY="$(INVENTORY)" LIMIT="$(LIMIT)"; \
fi
@$(MAKE) auth-sync ENV="$(ENV)" INVENTORY="$(INVENTORY)" LIMIT="$(LIMIT)" PROFILES="$(PROFILES)" OAUTH_PROVIDER="$(OAUTH_PROVIDER)" MODEL_REF="$(MODEL_REF)"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reorder auto-install to run install before auth-sync

auto-install currently calls make auth-sync before make install, but ops/auth-sync.sh requires an existing openclaw account (id -u openclaw) and invokes /home/openclaw/.local/bin/openclaw to set models; on a fresh host (or after purging binaries/users), this fails before any provisioning starts, so the advertised automated install flow is not bootstrappable.

Useful? React with 👍 / 👎.

Comment on lines +38 to +39
-e openclaw_control_plane_enabled=true \
-e openclaw_control_plane_manage_stack=true \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Respect inventory control-plane toggle during install

ops/install.sh unconditionally passes -e openclaw_control_plane_enabled=true, which overrides environment inventories and forces the control-plane role even where no openclaw_control_plane_profiles are defined; this causes installs for non-dev inventories to fail at the role assertion that requires a non-empty profile list.

Useful? React with 👍 / 👎.

Comment on lines +25 to +29
required_keys=(
"vault_openclaw_gateway_token_dev_main"
"vault_openclaw_gateway_token_andrea"
"vault_openclaw_cp_postgres_password_efra_core"
"vault_openclaw_cp_nats_password_efra_core"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Derive required secret keys from selected environment

validate-secrets.sh hardcodes dev-specific keys (dev_main, andrea, efra_core) and is always invoked by ops/install.sh, so valid staging/prod vault files with different profile names are rejected as missing placeholders before Ansible runs; this blocks installs outside the baked-in topology.

Useful? React with 👍 / 👎.

- name: Seed AGENTS.md per agent workspace
ansible.builtin.template:
src: workspace-agents.md.j2
dest: "{{ agent_workspace }}/AGENTS.md"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prevent persona files from overwriting in shared workspace

Workspace seed templates are written per agent to {{ agent_workspace }}/AGENTS.md (and related files) without enforcing unique workspace paths, so when multiple agents inherit the same default workspace only the last loop iteration survives; this breaks identity bootstrap correctness and makes seeded persona docs non-idempotent for shared-workspace profiles.

Useful? React with 👍 / 👎.

Comment on lines +112 to +114
CREATE ROLE {{ profile.postgres_user | default('openclaw') }} LOGIN PASSWORD '{{ profile.postgres_password | replace("'", "''") }}';
ELSE
ALTER ROLE {{ profile.postgres_user | default('openclaw') }} WITH LOGIN PASSWORD '{{ profile.postgres_password | replace("'", "''") }}';

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Quote postgres role identifier in reconcile block

The reconcile SQL injects profile.postgres_user directly into CREATE ROLE/ALTER ROLE identifiers, so profile-specific usernames containing characters like -, uppercase, or quotes produce invalid SQL (or malformed statements) and break reconciliation; use identifier-safe quoting (e.g., format('%I', ...)) instead of raw interpolation.

Useful? React with 👍 / 👎.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 71d4bfcfa0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


read_yaml_value() {
local key="$1"
awk -F': *' -v key="$key" '$1 == key {sub(/^["'"'"']/, "", $2); sub(/["'"'"']$/, "", $2); print $2; exit}' "${vault_file}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Bypass plaintext key scan for ansible-vault files

read_yaml_value parses vault.yml with a raw awk key lookup, so encrypted vault files (which start with $ANSIBLE_VAULT) always return empty values and are reported as missing secrets. Because ops/install.sh runs this validator before Ansible, installs fail immediately in environments that keep group_vars/vault.yml encrypted, even though Ansible itself could decrypt and use those values.

Useful? React with 👍 / 👎.

Comment on lines +55 to +56
run_sudo docker compose -f /home/efra/openclaw-control-plane/efra-core/docker-compose.yml -p ocp-efra-core ps >/dev/null
run_sudo docker compose -f /home/efra/openclaw-control-plane/andrea/docker-compose.yml -p ocp-andrea ps >/dev/null

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parameterize smoke checks instead of fixed dev stack paths

The smoke workflow is hardcoded to efra-core/andrea compose files under /home/efra, so make smoke fails on hosts where control-plane project directories or profile names differ (including non-dev inventories). This breaks the advertised ENV/INVENTORY-driven operations flow by coupling smoke validation to one specific machine layout instead of deployed inventory data.

Useful? React with 👍 / 👎.

environment:
METRICS_PORT: 9413
WORKER_AGENT_ID: browser-login
NATS_URL: nats://{{ profile.nats_user | default('queue') }}:{{ profile.nats_password }}@127.0.0.1:{{ profile.nats_host_port | default(14222) }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use service-network NATS endpoint for browser worker

This worker always overrides NATS_URL to 127.0.0.1:<host-port>, but host networking is enabled only when worker_exec_mode is openclaw; with the default stub mode the container stays on the compose bridge, where 127.0.0.1 is itself, not the NATS service. In that configuration worker-browser-login cannot connect to NATS and browser-login intents will not be consumed.

Useful? React with 👍 / 👎.

@EfrainGaray
Copy link
Author

Follow-up after permission hardening commit 71d4bfc.

What was fixed

  • ops/auth-sync.sh: ensure profile skeleton ownership before model config.
  • roles/openclaw_control_plane/tasks/main.yml: resolve worker UID/GID dynamically from openclaw account.
  • roles/openclaw/tasks/openclaw.yml: guard recursive pnpm ownership task in ci_test to keep idempotency green.

Full recovery validation (fresh cycle)

  • make purge CONFIRM=1 ENV=dev LIMIT=zennook
  • make install ENV=dev LIMIT=zennook
  • make auth-sync ENV=dev LIMIT=zennook ...
  • make smoke ENV=dev LIMIT=zennook ✅ (efra-core + andrea queue flow DONE)

Telegram E2E evidence

  • POST /telegram/webhook -> HTTP 202 accepted with taskId
  • task terminal status via control API -> DONE
  • Telegram outbound sendMessage -> ok=true (message_id=4737)

Regression

  • bash tests/run-tests.sh ubuntu2404
    • convergence PASS
    • verification PASS
    • idempotency PASS (0 changed)

@alauppe
Copy link
Member

alauppe commented Mar 11, 2026

Thanks for the substantial contribution here. There’s clearly a lot of thoughtful work in this PR, and the intent is well documented. The multi-profile gateway setup, agent workspace seeding, control-plane role, and operator automation all move toward a more full-featured deployment framework for an advanced multi-agent environment.

That said, I don’t think this is a fit for this repository’s current goals. The purpose of openclaw-ansible as documented in the README is to provide a straightforward, hardened Ansible installer for standard OpenClaw deployments: one-command install, sensible defaults, release/development install modes, and secure host setup. This PR goes well beyond that scope and starts turning the project into a specialized orchestration/deployment framework with opinionated profile layouts, agent personas, Codex OAuth sync, and control-plane components tailored to a particular operating model.

None of that is inherently bad, and it may be genuinely useful for some users, but it’s a different product direction than the one this repo is trying to serve. Because of that, I don’t think we should merge it into main here.

I appreciate the effort and the detail that went into this. If you want to keep developing this direction, it may make sense as a separate repo or an extension layered on top of this installer.

@alauppe alauppe closed this Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants