Skip to content

Conversation

@volumio
Copy link
Owner

@volumio volumio commented Dec 11, 2025

No description provided.

foonerd and others added 30 commits November 14, 2025 22:59
…es cold boot race conditions, reduces connection time to 8-20s and failure detection to 10-20s
…ility, Single Network Mode, and event-driven interface monitoring
feat(wifi): Reworked single network mode and handling
fix(wireless):  Fix hotspot not starting on fresh boot without ethernet
fix(wireless): Rolling back previous commit
fix(wireless): Fix hotspot startup race condition
fix(wireless): Fix hotspot on cold interface
fix(wireless): Add interface readiness check for first boot hotspot
fix(wireless): Add interface readiness check for first boot hotspot
@foonerd
Copy link
Collaborator

foonerd commented Dec 14, 2025

@volumio,

Thank you for the thorough review. I appreciate the attention to detail on such a critical component.

What was delivered:

The v4.0 refactoring addresses the core requirements we set out to solve: SNM compliance for AP2 licensing, deadlock elimination, USB WiFi adapter timing, DHCP lease management, and emergency recovery modes. The code has been tested against the agreed test cases and functions correctly across known to me scenarios.

Regarding your review items:

Items 1-4 (notifyWirelessReady timing, regdomain behavior, variable declaration order, WiFi enable/disable in dual mode): These are valid observations. However, they require decisions about backend integration and system-wide behavior that the core team is better positioned to make. You have visibility into dependencies I do not.

Item 5 (12-hour reconnection): This is wpa_supplicant's responsibility. wireless.js monitors state changes but does not control the underlying reconnection behavior.

Item 6 (Hotspot with background reconnection): This is a feature request, not part of the original scope.

Style feedback: Noted for any future work.

Going forward:

I have completed the wireless.js prototype I committed myself to. The code is functional and documented. The atomic commit history provides clear traceability for review.

Further development decisions - what to merge, what to modify, what to prioritize - belong with the core team. I lack visibility into backend systems, product roadmap, and integration requirements to make those calls.

I remain available to assist with specific, well-defined tasks if needed. But ownership of wireless.js now sits with Volumio.

@foonerd
Copy link
Collaborator

foonerd commented Dec 14, 2025

@volumio,

Pi 3B+ CPU Consumption Report

An additional issue has been reported: wireless.js consuming 99% CPU on Pi 3B+, causing system-wide performance degradation including audio xruns and network latency.

Evidence:

  • Node.js process accumulated 5:33 CPU time in ~4 minutes wall-clock
  • Network latency 5252ms (vs 948ms on Pi 4 with same code)
  • Audio xruns from CPU starvation, not WiFi signal issues

Likely causes:

  • Tight polling loop without proper delays
  • fs.watch callback firing repeatedly
  • Retry mechanism without backoff or exit condition

My position:

The prototype code uses the same callback patterns and timing constants established in the original implementation. If there is a runaway loop, it may be triggered by a specific hardware or network condition not present in my test environment.

This is noted for core team investigation. A defensive fix would be to add loop counters and maximum iteration guards to any polling or retry mechanism, but identifying the specific offending code path requires logs from the affected device

@volumio
Copy link
Owner Author

volumio commented Dec 15, 2025

@foonerd

I think we are slightly misaligned on expectations, so I want to clarify this clearly.

You did a great job pushing hard on this and driving the refactor forward, and that effort is appreciated. That said, in this case the push may have had the opposite effect. With large refactors of core systems like wireless.js, speed alone is not the goal. What we strive for is landing changes that work 100% of the time across hardware and real-world conditions.

For small PRs, it is fine to deliver a working change and let maintainers take it from there. For large core refactors, responsibility is shared until the PR is either merged or withdrawn, including working through review feedback and fixing regressions found during testing (and that's why I tried to discourage this approach...). A “thanks for QA, good luck” handoff is not appropriate for a change of this scope, and it ends up wasting time on both sides.

Right now we have users affected by performance issues, and this is exactly where your competence can be best used. We can fix this together by aligning on the blocking issues and iterating until it is solid and safe to merge.

If you are willing to continue, let us define a short list of fixes and move forward together. If not, we can pause or withdraw the PR and revisit parts of it later under core-team ownership.

Let me know how you would like to proceed.

@Darmur
Copy link
Collaborator

Darmur commented Dec 15, 2025

I tried to have a look to the PR code, but unfortuately it's way beyond my skills and knowledge of the system.
I also had a look to the message exchange, my two cents.

The work looks solid, and initial feedback on the forum from the people testing it is very positive, this is a potential big step forward for the system.
Anyhow, being such a big rework makes very hard for other people to step-in and eventyally add changes/fixes, at least in the short term.
It will take quite some time for the Core team to read and understand all the changes, and this will bring delays in making it the new default for stable releases.

If we want to be pragmatic and speed things up, my proposal is to identify the potential regression and fix them, I think @foonerd is the best person to take care of the fixing part, being this rework one of his creations.
One of the regressions has already being identified (country code), in that case the easiest solution could be to copy/restore the old logic as-it-is, we know it works fine also on corner cases and specific devices(MP1).
The other one is the the high CPU consumption in some corner cases, that to me looks like a side-effect of this rework (it was not reported with Volumio3 and with previous images with the old wireless.js, since the very beginning of Volumio4 alpha).

Of course we can support with extensive testing and QA.

@foonerd
Copy link
Collaborator

foonerd commented Dec 17, 2025

It will take quite some time for the Core team to read and understand all the changes, and this will bring delays in making it the new default for stable releases.

This alone unfolds knowledge gap. If overdue wireless overhaul is beyond grasp I have serious concerns about bookworm as such where technological change is continuously underestimated. Perhaps complete move to Bookworm should be canned.

@foonerd
Copy link
Collaborator

foonerd commented Dec 17, 2025

I am not the one blocking progress and nothing has changed. My time is limited.
What are the tasks in order of priorities?
1...
2...
etc

@volumio
Copy link
Owner Author

volumio commented Dec 17, 2025

Thanks. I updated the original description adding the new bug which is high CPU usage. IMHO the priorities are (from urgent to low):

5 High CPU Usage of wireless which creates buffer underruns
2 REGDOMAIN SETTING
1 ENABLE \ DISABLE WIFI IN DOUBLE NETWORK MODE
4 CHECK CAREFULLY WIFI RECONNECTION
3 POSSIBLE BUG IN VARIABLES USED BEFORE DECLARATIONS

If there's one where you feel I will be better suited, feel free to tell and we can split the tasks

@foonerd
Copy link
Collaborator

foonerd commented Dec 17, 2025

1 ENABLE \ DISABLE WIFI IN DOUBLE NETWORK MODE

This should behave inline with DISABLE/ENABLE boolean switch: 6c00d94

2 REGDOMAIN SETTING

Regdom always check on start: 82d688a

@volumio -FYI

@foonerd
Copy link
Collaborator

foonerd commented Dec 17, 2025

Startup race condition

On boot with ethernet connected, wireless.js starts two parallel flows. The variable isWiredNetworkActive is initialized to false at module load. The ethernet monitor (startWiredNetworkingMonitor) uses fs.watch() on /sys/class/net/eth0/carrier which fires asynchronously after the main flow begins.

The sequence:

  1. Module loads, isWiredNetworkActive = false
  2. initializeWirelessFlow() called
  3. stop() -> detectAndApplyRegdomain() -> startFlow()
  4. startFlow() checks isWiredNetworkActive - still false
  5. Goes to WiFi client mode, starts startAP()
  6. Meanwhile, fs.watch() callback fires, sets isWiredNetworkActive = true
  7. SNM transition triggers second initializeWirelessFlow()
  8. Two flows now running in parallel
  9. First flow completes and starts hotspot
  10. Second flow reaches scan mode but gets overwritten

Result: Hotspot running when system should be in scan mode.

Fix:

Added refreshEthernetState() - synchronous read of /sys/class/net/eth0/carrier immediately before startFlow(). This ensures the ethernet state is current regardless of whether fs.watch() has fired yet.

function refreshEthernetState() {
    try {
        var carrier = fs.readFileSync('/sys/class/net/eth0/carrier', 'utf8').trim();
        var newState = (carrier === '1');
        if (newState !== isWiredNetworkActive) {
            loggerInfo('refreshEthernetState: Corrected ethernet state: ' + 
                (newState ? 'connected' : 'disconnected'));
            isWiredNetworkActive = newState;
        }
    } catch (e) {
        // eth0 doesn't exist or carrier not readable
    }
}

Called in initializeWirelessFlow() after detectAndApplyRegdomain() completes, before startFlow().

The async monitor remains for runtime cable plug/unplug events. The sync check handles startup timing only.

@foonerd
Copy link
Collaborator

foonerd commented Dec 17, 2025

@volumio,

Commits to date should handle priorities from 1 to 4 included.

As such are ready for review.

@foonerd
Copy link
Collaborator

foonerd commented Dec 18, 2025

Priority 5 - High CPU Usage of wireless which creates buffer underruns

@volumio - this is serious blame :44c2573

Issue: Duplicate logging causing excessive sync I/O

Current implementation writes every log message twice:

  1. console.log -> systemd journal (sync)
  2. fs.appendFileSync -> /tmp/wireless.log (sync)

Measured impact:

Log statements in code: 315 total

  • loggerInfo: ~200 calls
  • loggerDebug: ~115 calls (only when debug=true)

During 30-second connection phase with 1-second polling:

  • Per poll iteration: ~8 log calls
  • 30 iterations x 8 calls = 240 log events
  • 240 x 2 sync writes = 480 blocking I/O operations

Each sync write:

  • Syscall overhead: ~50-100us on Pi 4, ~150-300us on Pi 3B+
  • Journal write includes serialization overhead

Estimated I/O time per connection attempt:

  • Pi 4: 480 x 75us = 36ms blocked
  • Pi 3B+: 480 x 225us = 108ms blocked

With debug enabled (adds ~115 more paths):

  • Potential 2x increase in log volume during active operations

Options:

Option A: Journal only (remove file logging)

  • Pro: Single write, journal handles rotation/persistence
  • Pro: Timestamps added by journalctl
  • Con: Loses /tmp/wireless.log for manual inspection
  • Con: Journal can be harder to grep during field debug

Option B: File only (remove console.log)

  • Pro: Single write, dedicated log file
  • Pro: Easy to tail/grep during field debug
  • Con: No journal integration
  • Con: Manual rotation needed (file grows indefinitely)

Option C: File only when debug=true, journal always

  • Pro: Production runs lean (journal only)
  • Pro: Debug mode gets detailed file for analysis
  • Con: Still dual-write when debugging

Option D: Journal only, file only when debug=true

  • Pro: Production uses standard journal path
  • Pro: Debug gets supplementary file for detailed analysis
  • Pro: Zero file I/O overhead in production
  • Con: Slightly more complex logic

Recommendation: Option D

Production (debug=false):

  • loggerInfo: console.log only (journal)
  • loggerDebug: no output

Debug (debug=true):

  • loggerInfo: console.log (journal) + file
  • loggerDebug: console.log (journal) + file

This halves sync I/O in production and provides full logging when investigating issues.

@foonerd foonerd requested a review from ashthespy December 18, 2025 12:02
@volumio
Copy link
Owner Author

volumio commented Dec 18, 2025

Started testing

@foonerd
Copy link
Collaborator

foonerd commented Dec 18, 2025

@volumio - triage needed as per my previous comment: Priority 5 - High CPU Usage of wireless which creates buffer underruns

@volumio
Copy link
Owner Author

volumio commented Dec 19, 2025

@foonerd did quite some extensive testing and all looks nominal now.

Re your performance remark about double logging. I agree with you that option D is the best one
loggerInfo: console.log only (journal)
loggerDebug: no output

However I would suggest that we keep like it is for now until we have verified the behavour 100%, as this might get us useful debug informations. We can revist in 1 month from now by implementing option D.

FYI, on this regard I change the use of append from sync to async, this alone will improve performance. and considering the logging frequency we should not have problem with effective sequence of logs.

As a general rule, given how nodejs works we should limit sync operations as much as possible, as they block nodejs event loop quite substantially and impact severely the performance.

I think we can merge now and keep on monitoring the behaviour and keep on improving this excellent code now. Thanks.

Merging, you can do a build at your convenience !

@volumio
Copy link
Owner Author

volumio commented Dec 19, 2025

@foonerd I need your approval before merging, or anyone else...

@foonerd foonerd self-assigned this Dec 19, 2025
@foonerd foonerd self-requested a review December 19, 2025 14:17
Copy link
Collaborator

@foonerd foonerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting to my todo list for end of January 2026 review of the logger options - namely selected option D.

@foonerd foonerd merged commit 5250573 into master Dec 19, 2025
2 of 3 checks passed
@foonerd foonerd deleted the wifi-v4 branch December 19, 2025 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants