Wifi v4 #327

volumio · 2025-12-11T17:31:20Z

No description provided.

…es cold boot race conditions, reduces connection time to 8-20s and failure detection to 10-20s

…ility, Single Network Mode, and event-driven interface monitoring

…reconnection blocking in SNM

feat(wifi): Reworked single network mode and handling

fix(wireless): Fix hotspot not starting on fresh boot without ethernet

fix(wireless): Rolling back previous commit

fix(wireless): Fix hotspot startup race condition

fix(wireless): Fix hotspot on cold interface

fix(wireless): Add interface readiness check for first boot hotspot

foonerd · 2025-12-14T10:53:02Z

@volumio,

Thank you for the thorough review. I appreciate the attention to detail on such a critical component.

What was delivered:

The v4.0 refactoring addresses the core requirements we set out to solve: SNM compliance for AP2 licensing, deadlock elimination, USB WiFi adapter timing, DHCP lease management, and emergency recovery modes. The code has been tested against the agreed test cases and functions correctly across known to me scenarios.

Regarding your review items:

Items 1-4 (notifyWirelessReady timing, regdomain behavior, variable declaration order, WiFi enable/disable in dual mode): These are valid observations. However, they require decisions about backend integration and system-wide behavior that the core team is better positioned to make. You have visibility into dependencies I do not.

Item 5 (12-hour reconnection): This is wpa_supplicant's responsibility. wireless.js monitors state changes but does not control the underlying reconnection behavior.

Item 6 (Hotspot with background reconnection): This is a feature request, not part of the original scope.

Style feedback: Noted for any future work.

Going forward:

I have completed the wireless.js prototype I committed myself to. The code is functional and documented. The atomic commit history provides clear traceability for review.

Further development decisions - what to merge, what to modify, what to prioritize - belong with the core team. I lack visibility into backend systems, product roadmap, and integration requirements to make those calls.

I remain available to assist with specific, well-defined tasks if needed. But ownership of wireless.js now sits with Volumio.

foonerd · 2025-12-14T11:04:17Z

@volumio,

Pi 3B+ CPU Consumption Report

An additional issue has been reported: wireless.js consuming 99% CPU on Pi 3B+, causing system-wide performance degradation including audio xruns and network latency.

Evidence:

Node.js process accumulated 5:33 CPU time in ~4 minutes wall-clock
Network latency 5252ms (vs 948ms on Pi 4 with same code)
Audio xruns from CPU starvation, not WiFi signal issues

Likely causes:

Tight polling loop without proper delays
fs.watch callback firing repeatedly
Retry mechanism without backoff or exit condition

My position:

The prototype code uses the same callback patterns and timing constants established in the original implementation. If there is a runaway loop, it may be triggered by a specific hardware or network condition not present in my test environment.

This is noted for core team investigation. A defensive fix would be to add loop counters and maximum iteration guards to any polling or retry mechanism, but identifying the specific offending code path requires logs from the affected device

volumio · 2025-12-15T17:26:16Z

@foonerd

I think we are slightly misaligned on expectations, so I want to clarify this clearly.

You did a great job pushing hard on this and driving the refactor forward, and that effort is appreciated. That said, in this case the push may have had the opposite effect. With large refactors of core systems like wireless.js, speed alone is not the goal. What we strive for is landing changes that work 100% of the time across hardware and real-world conditions.

For small PRs, it is fine to deliver a working change and let maintainers take it from there. For large core refactors, responsibility is shared until the PR is either merged or withdrawn, including working through review feedback and fixing regressions found during testing (and that's why I tried to discourage this approach...). A “thanks for QA, good luck” handoff is not appropriate for a change of this scope, and it ends up wasting time on both sides.

Right now we have users affected by performance issues, and this is exactly where your competence can be best used. We can fix this together by aligning on the blocking issues and iterating until it is solid and safe to merge.

If you are willing to continue, let us define a short list of fixes and move forward together. If not, we can pause or withdraw the PR and revisit parts of it later under core-team ownership.

Let me know how you would like to proceed.

Darmur · 2025-12-15T17:53:50Z

I tried to have a look to the PR code, but unfortuately it's way beyond my skills and knowledge of the system.
I also had a look to the message exchange, my two cents.

The work looks solid, and initial feedback on the forum from the people testing it is very positive, this is a potential big step forward for the system.
Anyhow, being such a big rework makes very hard for other people to step-in and eventyally add changes/fixes, at least in the short term.
It will take quite some time for the Core team to read and understand all the changes, and this will bring delays in making it the new default for stable releases.

If we want to be pragmatic and speed things up, my proposal is to identify the potential regression and fix them, I think @foonerd is the best person to take care of the fixing part, being this rework one of his creations.
One of the regressions has already being identified (country code), in that case the easiest solution could be to copy/restore the old logic as-it-is, we know it works fine also on corner cases and specific devices(MP1).
The other one is the the high CPU consumption in some corner cases, that to me looks like a side-effect of this rework (it was not reported with Volumio3 and with previous images with the old wireless.js, since the very beginning of Volumio4 alpha).

Of course we can support with extensive testing and QA.

foonerd · 2025-12-17T07:45:55Z

It will take quite some time for the Core team to read and understand all the changes, and this will bring delays in making it the new default for stable releases.

This alone unfolds knowledge gap. If overdue wireless overhaul is beyond grasp I have serious concerns about bookworm as such where technological change is continuously underestimated. Perhaps complete move to Bookworm should be canned.

foonerd · 2025-12-17T07:48:07Z

I am not the one blocking progress and nothing has changed. My time is limited.
What are the tasks in order of priorities?
1...
2...
etc

volumio · 2025-12-17T11:51:39Z

Thanks. I updated the original description adding the new bug which is high CPU usage. IMHO the priorities are (from urgent to low):

5 High CPU Usage of wireless which creates buffer underruns
2 REGDOMAIN SETTING
1 ENABLE \ DISABLE WIFI IN DOUBLE NETWORK MODE
4 CHECK CAREFULLY WIFI RECONNECTION
3 POSSIBLE BUG IN VARIABLES USED BEFORE DECLARATIONS

If there's one where you feel I will be better suited, feel free to tell and we can split the tasks

…nable

foonerd · 2025-12-17T14:44:29Z

1 ENABLE \ DISABLE WIFI IN DOUBLE NETWORK MODE

This should behave inline with DISABLE/ENABLE boolean switch: 6c00d94

2 REGDOMAIN SETTING

Regdom always check on start: 82d688a

@volumio -FYI

foonerd · 2025-12-17T15:47:59Z

Startup race condition

On boot with ethernet connected, wireless.js starts two parallel flows. The variable isWiredNetworkActive is initialized to false at module load. The ethernet monitor (startWiredNetworkingMonitor) uses fs.watch() on /sys/class/net/eth0/carrier which fires asynchronously after the main flow begins.

The sequence:

Module loads, isWiredNetworkActive = false
initializeWirelessFlow() called
stop() -> detectAndApplyRegdomain() -> startFlow()
startFlow() checks isWiredNetworkActive - still false
Goes to WiFi client mode, starts startAP()
Meanwhile, fs.watch() callback fires, sets isWiredNetworkActive = true
SNM transition triggers second initializeWirelessFlow()
Two flows now running in parallel
First flow completes and starts hotspot
Second flow reaches scan mode but gets overwritten

Result: Hotspot running when system should be in scan mode.

Fix:

Added refreshEthernetState() - synchronous read of /sys/class/net/eth0/carrier immediately before startFlow(). This ensures the ethernet state is current regardless of whether fs.watch() has fired yet.

function refreshEthernetState() {
    try {
        var carrier = fs.readFileSync('/sys/class/net/eth0/carrier', 'utf8').trim();
        var newState = (carrier === '1');
        if (newState !== isWiredNetworkActive) {
            loggerInfo('refreshEthernetState: Corrected ethernet state: ' + 
                (newState ? 'connected' : 'disconnected'));
            isWiredNetworkActive = newState;
        }
    } catch (e) {
        // eth0 doesn't exist or carrier not readable
    }
}

Called in initializeWirelessFlow() after detectAndApplyRegdomain() completes, before startFlow().

The async monitor remains for runtime cable plug/unplug events. The sync check handles startup timing only.

…ping

…reconnect

foonerd · 2025-12-17T17:37:55Z

@volumio,

Commits to date should handle priorities from 1 to 4 included.

As such are ready for review.

…ents

foonerd · 2025-12-18T05:48:59Z

Priority 5 - High CPU Usage of wireless which creates buffer underruns

@volumio - this is serious blame :44c2573

Issue: Duplicate logging causing excessive sync I/O

Current implementation writes every log message twice:

console.log -> systemd journal (sync)
fs.appendFileSync -> /tmp/wireless.log (sync)

Measured impact:

Log statements in code: 315 total

loggerInfo: ~200 calls
loggerDebug: ~115 calls (only when debug=true)

During 30-second connection phase with 1-second polling:

Per poll iteration: ~8 log calls
30 iterations x 8 calls = 240 log events
240 x 2 sync writes = 480 blocking I/O operations

Each sync write:

Syscall overhead: ~50-100us on Pi 4, ~150-300us on Pi 3B+
Journal write includes serialization overhead

Estimated I/O time per connection attempt:

Pi 4: 480 x 75us = 36ms blocked
Pi 3B+: 480 x 225us = 108ms blocked

With debug enabled (adds ~115 more paths):

Potential 2x increase in log volume during active operations

Options:

Option A: Journal only (remove file logging)

Pro: Single write, journal handles rotation/persistence
Pro: Timestamps added by journalctl
Con: Loses /tmp/wireless.log for manual inspection
Con: Journal can be harder to grep during field debug

Option B: File only (remove console.log)

Pro: Single write, dedicated log file
Pro: Easy to tail/grep during field debug
Con: No journal integration
Con: Manual rotation needed (file grows indefinitely)

Option C: File only when debug=true, journal always

Pro: Production runs lean (journal only)
Pro: Debug mode gets detailed file for analysis
Con: Still dual-write when debugging

Option D: Journal only, file only when debug=true

Pro: Production uses standard journal path
Pro: Debug gets supplementary file for detailed analysis
Pro: Zero file I/O overhead in production
Con: Slightly more complex logic

Recommendation: Option D

Production (debug=false):

loggerInfo: console.log only (journal)
loggerDebug: no output

Debug (debug=true):

loggerInfo: console.log (journal) + file
loggerDebug: console.log (journal) + file

This halves sync I/O in production and provides full logging when investigating issues.

volumio · 2025-12-18T14:26:32Z

Started testing

foonerd · 2025-12-18T19:03:36Z

@volumio - triage needed as per my previous comment: Priority 5 - High CPU Usage of wireless which creates buffer underruns

volumio · 2025-12-19T14:10:56Z

@foonerd did quite some extensive testing and all looks nominal now.

Re your performance remark about double logging. I agree with you that option D is the best one
loggerInfo: console.log only (journal)
loggerDebug: no output

However I would suggest that we keep like it is for now until we have verified the behavour 100%, as this might get us useful debug informations. We can revist in 1 month from now by implementing option D.

FYI, on this regard I change the use of append from sync to async, this alone will improve performance. and considering the logging frequency we should not have problem with effective sequence of logs.

As a general rule, given how nodejs works we should limit sync operations as much as possible, as they block nodejs event loop quite substantially and impact severely the performance.

I think we can merge now and keep on monitoring the behaviour and keep on improving this excellent code now. Thanks.

Merging, you can do a build at your convenience !

volumio · 2025-12-19T14:11:38Z

@foonerd I need your approval before merging, or anyone else...

foonerd

Posting to my todo list for end of January 2026 review of the logger options - namely selected option D.

foonerd and others added 30 commits November 14, 2025 22:59

Fix critical wireless recovery and add emergency hotspot fallback

9584abc

Disarming - Disable IPv4 link-local address assignment

c559cdc

Trigger ip-changed notification on hotspot static IP assignment

a2322d7

Event-driven interface validation and WPA state monitoring - eliminat…

2f1b203

…es cold boot race conditions, reduces connection time to 8-20s and failure detection to 10-20s

Predictable exit from hotspot completion

070a92b

v4.0-rc1: Complete wireless daemon redevelopment with enhanced reliab…

599fcff

…ility, Single Network Mode, and event-driven interface monitoring

v4.0-rc2: Fix ethernet plug/unplug deadlock, infinite loop, and WiFi …

354c9af

…reconnection blocking in SNM

v4.0-rc3: Fix graceful dhcpd requests handliers

737705c

Disable IPv4 link-local address assignment (169.254.x.x)

6783ef4

Fixed regdomain log output

6ff31ef

Merge branch 'volumio:common' into common

92dd208

Merge branch 'volumio:master' into wifi-v4

6bb2cbe

Merge branch 'volumio:master' into wifi-v4

7b6dc80

Merge branch 'volumio:master' into wifi-v4

cf636a0

Merge branch 'volumio:master' into wifi-v4

ae71ee5

Merge branch 'volumio:master' into wifi-v4

c3a31a8

Merge pull request #312 from foonerd/wifi-v4

aa124c2

feat(wifi): Reworked single network mode and handling

fix(wireless): Fix hotspot not starting on fresh boot without ethernet

a7d351d

Merge pull request #313 from foonerd/wifi-v4

6a0bffa

fix(wireless): Fix hotspot not starting on fresh boot without ethernet

fix(wireless): Fix hotspot not starting on fresh boot without ethernet

a862ed4

Merge branch 'volumio:wifi-v4' into wifi-v4

b51ecd3

Merge pull request #314 from foonerd/wifi-v4

a147d82

fix(wireless): Rolling back previous commit

fix(wireless): Fix hotspot startup race condition

19f92c6

Merge pull request #315 from foonerd/wifi-v4

1b82ee0

fix(wireless): Fix hotspot startup race condition

fix(wireless): Fix hotspot on cold interface

077b1de

Merge pull request #316 from foonerd/wifi-v4

d660210

fix(wireless): Fix hotspot on cold interface

fix(wireless): Add interface readiness check for first boot hotspot

3a406db

Merge pull request #317 from foonerd/wifi-v4

0c6d448

fix(wireless): Add interface readiness check for first boot hotspot

fix(wireless): Add interface readiness check for first boot hotspot

b85dc77

Merge pull request #325 from foonerd/wifi-v4

140c154

fix(wireless): Add interface readiness check for first boot hotspot

foonerd added 3 commits December 17, 2025 14:12

fix(wireless): update header from review

8dcc659

fix(wireless): always scan regdomain on startup

82d688a

fix(wireless): ensure interface ready before iw commands on WiFi re-e…

6c00d94

…nable

fix(wireless): synchronous ethernet check before startup flow decision

6afb909

foonerd added 2 commits December 17, 2025 17:14

fix(wireless): remove dead wpasupp declaration and fix debug mode sco…

b953b95

…ping

fix(wireless): reset wirelessFlowInProgress flag after scan mode and …

5a0af50

…reconnect

fix(wireless): debounce fs.watch to prevent CPU spike from inotify ev…

50709fe

…ents

foonerd requested a review from ashthespy December 18, 2025 12:02

perf: use async file append for wireless daemon logs

2305f2d

foonerd self-assigned this Dec 19, 2025

foonerd self-requested a review December 19, 2025 14:17

foonerd approved these changes Dec 19, 2025

View reviewed changes

foonerd merged commit 5250573 into master Dec 19, 2025
2 of 3 checks passed

foonerd deleted the wifi-v4 branch December 19, 2025 15:06

foonerd mentioned this pull request Dec 19, 2025

Detect and expose actual WiFi mode capabilities (STA / AP / AP+STA) during initial setup #225

Closed

Wifi v4 #327

Wifi v4 #327

Conversation

volumio commented Dec 11, 2025

Uh oh!

foonerd commented Dec 14, 2025

Uh oh!

foonerd commented Dec 14, 2025

Uh oh!

volumio commented Dec 15, 2025

Uh oh!

Darmur commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

foonerd commented Dec 17, 2025

Uh oh!

foonerd commented Dec 17, 2025

Uh oh!

volumio commented Dec 17, 2025

Uh oh!

foonerd commented Dec 17, 2025

Uh oh!

foonerd commented Dec 17, 2025

Uh oh!

foonerd commented Dec 17, 2025

Uh oh!

foonerd commented Dec 18, 2025

Uh oh!

volumio commented Dec 18, 2025

Uh oh!

foonerd commented Dec 18, 2025

Uh oh!

volumio commented Dec 19, 2025

Uh oh!

volumio commented Dec 19, 2025

Uh oh!

foonerd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Darmur commented Dec 15, 2025 •

edited

Loading