Feat/vpn interface#2
Open
NamSupawan wants to merge 131 commits into
Open
Conversation
| config_file = params[:config_file] | ||
| output, status = Open3.capture2e("sudo wg-quick up #{config_file}") | ||
|
|
||
| output, status = Open3.capture2e("sudo wg-quick up #{config_file}") |
Check failure
Code scanning / CodeQL
Uncontrolled command line Critical
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 21 days ago
Use a non-shell invocation and strict input validation/allowlisting before execution.
Best fix here (without changing intended functionality):
- Validate
config_fileagainst a conservative allowlist (e.g., only letters, numbers,_,-,., and/) and reject unsafe values. - Invoke
Open3.capture2ewith argument separation ("sudo", "wg-quick", "up", config_file) instead of a single interpolated string, so the shell is not used. - Keep existing redirect behavior, adding an early return for invalid input.
Edit region: app/controllers/vpn_devices_controller.rb, inside def create (lines shown 107–115).
No new imports or dependencies are required (Open3 is already used in this file/snippet context).
Suggested changeset
1
app/controllers/vpn_devices_controller.rb
| @@ -105,8 +105,14 @@ | ||
|
|
||
| # POST /vpn_devices or /vpn_devices.json | ||
| def create | ||
| config_file = params[:config_file] | ||
| output, status = Open3.capture2e("sudo wg-quick up #{config_file}") | ||
| config_file = params[:config_file].to_s | ||
|
|
||
| unless config_file.match?(/\A[\w.\-\/]+\z/) | ||
| redirect_to new_vpn_device_path, alert: 'Invalid configuration file name.' | ||
| return | ||
| end | ||
|
|
||
| output, status = Open3.capture2e('sudo', 'wg-quick', 'up', config_file) | ||
| if status.success? | ||
| redirect_to vpn_devices_path, notice: 'WireGuard interface created successfully.' | ||
| else |
Copilot is powered by AI and may make mistakes. Always verify output.
Captures the 6 files that have been modified directly on the running
production EC2 (product-wireguard-vpn / i-05caf7960a63760c6) but never
made it into any branch on GitHub. Some of these date back to early 2025.
Files included:
.env.sample (trailing-newline only — trivial)
config/credentials.yml.enc (encrypted Rails credentials)
config/initializers/omniauth.rb (removed email_verified: true
from Google OAuth options)
package.json (postcss-cli ^5 -> ^11,
sass ^1.77.6 -> ^1.79.1)
package-lock.json (matches package.json)
yarn.lock (matches package.json)
NOT included (intentionally):
.gitignore — reverted; PR #11 ships a strict superset
backup.rules, firewall.txt, rules.v5, config/credentials.yml.enc.backup
— operational artifacts, gitignored by PR #11
This branch is for preserving the diff so it can be reviewed and merged
into feat/vpn_interface as a normal PR, instead of getting silently
clobbered the next time someone runs git pull on the server.
No code review has happened on these changes — please review before merge.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prod-drift snapshot 2026-05: omniauth + package updates from running production
* Hot-reload WireGuard on peer changes, race-safe IP allocation
Rebased onto feat/vpn_interface (the actual production branch) instead
of main, which is 127 commits behind production. Preserves Nam's
existing changes: the .140-.254 allocation range, the #add_with_user
admin flow, and the #create wg-quick-up flow.
Fixes three operational bugs surfaced during the product-wireguard-vpn
SSM investigation (5 users currently in DB but missing from wg0.conf,
conf last written 57 days ago, 4 manual `systemctl restart wg-quick@wg0`
cycles in 80 min today).
1. Restart needed per new user
- `after_action :update_wireguard_config` was only running on
`:update`/`:destroy`. Both `:new` (current_user signup) and
`:add_with_user` (admin-creates-for-user) PERSIST a VpnDevice but
were missing from the allowlist, so wg0.conf never got regenerated
for new signups.
- Replace `systemctl restart wg-quick@wg0` (which deletes the link
and drops every active session) with
`wg syncconf wg0 <(wg-quick strip wg0)` — atomic peer
reconciliation, no session drops. Implemented in
`WireguardConfigGenerator.reload_wireguard` and called automatically
from `write_server_configuration`.
2. Deleted IPs occasionally "skip"
- DB-level unique index on `ip_allocations.ip_address` so concurrent
signups can't both win the same IP via the model-level uniqueness
race.
- `IpAllocation.allocate_ip` runs in a per-attempt transaction and
retries on `RecordNotUnique`.
- FK `ip_allocations.vpn_device_id` re-added with `ON DELETE CASCADE`
so manual/raw deletes always free the IP.
- Migration de-dupes any existing duplicate IPs before adding the
unique index (no duplicates exist today, so this is a no-op).
- Allocation range `(140..254)` preserved from feat/vpn_interface and
extracted to `ALLOCATION_RANGE` constant with a comment explaining
why .2-.139 are reserved.
3. Users sometimes can't connect after restart
- Root cause = bug 1. With the `:new`/`:add_with_user` after_action
fix, every signup now writes the conf and syncs the kernel
atomically.
Other changes
- IP allocation moved from the controllers into `VpnDevice`'s
`after_create` callback. Pool exhaustion or uniqueness-race
exhaustion now rolls back the device instead of leaving an orphan
VpnDevice without an `ip_allocation`.
- Atomic conf write (`.tmp` + rename) so the kernel never reads a
half-written file via the
`/etc/wireguard/wg0.conf -> config/wireguard/wg0.conf` symlink.
- `lib/tasks/wireguard.rake`:
* `wireguard:audit` — DB vs conf vs kernel drift report
* `wireguard:regenerate` — rewrite conf + wg syncconf
* `wireguard:fix` — audit -> regenerate -> audit
Deploy (server is already on feat/vpn_interface)
git pull
bundle exec rake db:migrate
bundle exec rake wireguard:fix # onboards the 5 currently-stuck users
sudo systemctl reload puma
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Gitignore operational artifacts produced on production boxes
The production EC2 (product-wireguard-vpn) accumulates a handful of files
during normal operations that are not source code and should never be
committed:
*.rules iptables-save dumps
rules.v* iptables-save dumps with versioned suffix
firewall.txt iptables-save dumps (named variant)
*.enc.backup backup of config/credentials.yml.enc
*.backup generic backup suffix
.claude-crew* Claude Code agent session state
.fos/ local tooling state
Today these show up as 4 untracked files on the server (backup.rules,
config/credentials.yml.enc.backup, firewall.txt, rules.v5) which would
otherwise need ad-hoc cleanup before every `git pull`.
No actual files are added or removed — this only affects future `git
status` / `git add` behavior.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Reopen ALLOCATION_RANGE to (2..254) now that delete propagation works
Background (per Nam, the original author of the (140..254) workaround):
> i reserve ip .2 ─ .139 because i cannot delete key on server
> wireguard but db delete key done
The narrowed allocation range was a workaround for a deeper bug:
destroying a VpnDevice in the UI deleted the DB row but never told the
WireGuard kernel to drop the peer (the conf file was rewritten but
`systemctl restart wg-quick@wg0` was the only way to push it into the
kernel, and restarts dropped every active session — so it was avoided).
That left "phantom" peers in the kernel with stale public keys bound to
.2-.139 IPs. To prevent IP-key collisions when an old IP got
re-allocated to a new user, Nam moved the allocator floor to .140.
This is no longer needed:
* The other commit in this PR ("Hot-reload WireGuard...") wires
`wg syncconf wg0 <(wg-quick strip wg0)` into the destroy after_action.
Deletes now atomically reconcile the kernel with the conf without
dropping any other peer's session.
* With reliable delete propagation, the full .2-.254 range is safe to
reopen. Effective pool size grows from 115 → 253 addresses, and the
~80 gap addresses below .139 (left behind by old deletes that never
freed their kernel peer) become reusable on the next signup.
Also: improve the wireguard:audit rake task to label each ghost peer
with the "# User: ..." comment line that precedes it in the conf. With
the range reopened, operators need to verify ghosts represent stale
state (safe to clean up) vs. real users (re-add via UI before running
`rake wireguard:fix`).
Before / after:
before: == Peers in conf but missing from DB (ghosts, ...): 2 ==
10.42.5.18
10.42.5.173
after: == Peers in conf but missing from DB ...: 2 ==
⚠ Review each one before running `rake wireguard:fix`.
If a peer here represents a real user, re-add their
device via the UI first.
10.42.5.18 # User: Pheeraphat (New) Prisan, Device: NewFlowAccountDevice
10.42.5.173 # User: <name>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Prevent duplicate VpnDevice creation from Turbo prefetch / double-submit
Root cause:
`VpnDevicesController#new` is a GET that persists a record (controller
pattern preserved on `feat/vpn_interface` and not refactored here).
Combined with Rails 7's Turbo Drive — which prefetches link targets on
hover for perceived speed — every mouse hover over the "Let's add a
device" button silently fires another GET, each of which creates
another VpnDevice. Browser pre-rendering and double-clicks have the
same effect.
Evidence (production scan today):
- Pheeraphat (user#18) created 4 devices within ~1 second after
re-onboarding, only 1 with the description he typed:
04:25:05 id=626 "PheeraphatPrisanMacbook" <- actual click
04:25:05 id=627 nil <- prefetch
04:25:06 id=628 nil <- prefetch
04:25:06 id=629 nil <- prefetch
- 16 other users have device pairs with sub-second creation deltas,
many ending in placeholder/nil descriptions. 7 of these are
confirmed never-handshook prefetch artifacts (kernel rx/tx = 0).
Fix:
1. View — `app/views/admin/index.html.erb`
Add `data-turbo-prefetch="false"` to the "Let's add a device" link
in the empty-state. Stops Turbo from issuing background prefetch
GETs on hover. Click behavior is unchanged. (The `_vpn_devices`
partial's admin-only "Add a VPN For YourDevice" link already had
`data: { turbo: false }`, so it wasn't a vector.)
2. Controller — `app/controllers/vpn_devices_controller.rb`
Add a 30-second idempotency guard in `#new` and `#add_with_user`.
If the target user created a device in the last 30s, short-circuit
and redirect with a notice instead of persisting another row.
Catches all the remaining vectors (double-click, refresh, browser
prefetch from other links, accidental form re-submit). Extracted
into a private `recent_device_for(user)` helper.
Out of scope:
Proper REST refactor (`GET /vpn_devices/new` → form only, no DB write;
`POST /vpn_devices` → actual create). The current controller still has
`#create` repurposed as a `wg-quick up` shell-out, so untangling it is
a bigger change. The prefetch fix + idempotency guard resolves the
user-visible bug without that refactor.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Fix description-update form requiring page refresh to show download button
User-reported bug: when a user with a nil-description device (a prefetch
artifact) types a description into the "Please add description" form and
clicks "Update", the page appears to do nothing — the form stays
visible with the same nil-description device. They have to manually
refresh the browser to see the device move into the card section (which
has the Download Config button).
Root cause:
`form_with model: vpn_device` in Rails 7 defaults to `local: false`,
meaning the form submission goes through Turbo Drive as a fetch-style
request. The controller's `update` action responds with
`redirect_to root_path`, which Turbo SHOULD follow via a Drive
navigation that fully re-renders the page from the new URL. In
practice the page does not update — the user sees the same stale form
until they refresh.
The matching `add_with_user` form a few lines below in the same partial
already has `local: true`, presumably for the same reason — somebody
hit this before but only fixed one of the two forms.
Fix:
Add `local: true` to the per-device update form. Submission is now a
standard HTML POST/PATCH, the response 302 is followed by the browser
(not Turbo), and the new page renders correctly with the just-named
device shown in the card section, Download Config button visible.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#13) After PR #11 deployed, the very first new signup (Lookplub at .5) hit the exact bug I called out at the end of the deploy: `wg syncconf` is failing silently every time Rails regenerates the conf. The user gets created in DB and conf, but is never added to the kernel. Operator has to manually `wg syncconf` after every signup — back to the original problem PR #11 was meant to solve. Two root causes in `WireguardConfigGenerator`, both my own from PR #11: 1. `File.write` + atomic rename in `write_server_configuration` leaves the new conf file at mode 0644 (default umask). `wg-quick strip` then warns "Warning: '/etc/wireguard/wg0.conf' is world accessible" — which is correct, the server's [Interface] private key was sitting in a world-readable file (real security issue, not just cosmetic). 2. `reload_wireguard` uses `Open3.capture2e` to run `wg-quick strip`, which captures stderr alongside stdout. The "Warning: ..." text from (1) ends up in the buffer we then pipe into `wg syncconf`. syncconf parses each line as a config directive, hits "Warning:" and aborts: [wg-reload] wg syncconf wg0 failed: Line unrecognized: `Warning:`/home/ubuntu/gate-wireguard/config/wireguard/wg0.conf`is world accessible' Configuration parsing error Fix: * Force mode 0600 on wg0.conf (and the private.key file, while we're there) right after writing them. This eliminates the warning AND closes the world-readable-private-key security gap. * Switch `capture2e` to `capture2` for `wg-quick strip` so any future stderr warnings can't poison the stripped output we feed to syncconf. Verified manually on production today: with 0600 perms, `wg-quick strip` emits no stderr, syncconf accepts the input, kernel is updated correctly for new peers. Lookplub (the user who hit this on the deploy) was unblocked by the same chmod + syncconf pattern this PR codifies. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.