Skip to content

Conversation

@spagochitarra
Copy link

This PR introduces a consolidated epoll-based I/O model across core Volclava components, replacing legacy socket handling with scalable, event-driven mechanisms. The changes were squashed into commit 88193e8 and cherry-picked here for visibility and review.

Key Features:

  • Register/deregister epoll channels via static openSock(), with %m-style logging for diagnostics
  • epollChan_() implementation now active across lim, lstools, and API libraries
  • Listening socket cleanup on master restart during request processing
  • Fixed epoll timeout behavior for reliable responsiveness
  • Enabled epoll support in mbatchd and sbatchd network communication
  • epoll_fd creation deferred until post-fork to prevent descriptor reset in child processes
  • Non-blocking channel reads gated by readiness events

Code Quality Improvements:

  • Reformatted lsb.job.c for consistency and readability
  • Untabified and re-indented legacy sections for maintainability

Test Coverage:

  • Verified base and batch command functionality

  • Ran full job lifecycle: submit → dispatch → execution → termination

  • Observed correct channel readiness and socket cleanup

  • Caveats:
    This integration is functional but not yet fully hardened. Some edge cases remain under investigation, particularly around fork timing and descriptor lifecycle. Reviewers are encouraged to test under mixed OS conditions and high-load scenarios.

Lu

bokassa and others added 5 commits October 4, 2025 07:54
…onents

This commit consolidates comprehensive epoll integration across multiple Volclava
components, replacing traditional socket handling with efficient event-driven
I/O:

Key Features:
- Register/deregister epoll channels with static openSock() and %m logging
- epollChan_() implementation working across lim, lstools, and API libraries
- Proper listening socket cleanup on master restart during request processing
- Fixed epoll timeout handling for reliable operation
- Enabled epoll support in mbatchd and sbatchd network communication
- Create epoll_fd post-fork to prevent descriptor reset in child processes
- Non-blocking channel reads when events are ready

Code Quality:
- Reformatted lsb.job.c for consistency
- Cleaned up and untabified messy code sections for maintainability

The implementation provides scalable, non-blocking I/O handling that properly
manages file descriptors across process boundaries and ensures robust error
handling throughout the event loop.
Comment on lines +1284 to +1289
// Handle error conditions
if ((ev & EPOLLERR) || (ev & EPOLLHUP) || (ev & EPOLLRDHUP)) {
channels[ch].events |= EPOLL_EVENTS_ERROR;
channels[ch].chanerr = CHANE_SYSCALL;
continue;
}
Copy link

@tczz88888 tczz88888 Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be an issue here. When a peer writes data and immediately closes the socket, the first epoll event may trigger EPOLLIN. If the data isn't fully read in one read() call, the next epoll event will likely include both EPOLLIN (remaining data) and EPOLLRDHUP (peer closed the connection). However, this logic treats EPOLLRDHUP as an error unconditionally, marking the channel as errored and stopping further processing. This could prevent reading remaining data in the buffer, leading to data loss.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good observation, I think we saw this problem even before regardless of epoll(), basically the EOF on the connection arrives before the data, we saw this in testing on one hosts but also in production.

The current client does chanRpc_() inside call_server(), I think it waits for reply for the daemon about the status of the operation, so logically that should work.

Do you have a reproduce scenario or it is just a consideration? If we have real case we can investigate it closely.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I encountered this in a real case during MBD_PROBE handling. Here's the scenario:

  • MBD calls probe_slave, sending a MBD_PROBE request to SBD.
  • SBD processes it via do_probe, sends a response using chanWrite_, then closes the socket.
  • MBD uses epoll to monitor the socket,epoll triggers both EPOLLIN (remaining data) and EPOLLRDHUP (peer closed).

The current logic unconditionally treats EPOLLRDHUP as an error, marking the channel as errored before fully reading data. This causes doProbeReply to be called with exception == TRUE, leading to a false error.

@spagochitarra
Copy link
Author

spagochitarra commented Oct 21, 2025 via email

@tczz88888
Copy link

I understand, in the epoll loop the mbd handles the error first. Ok so this is the case in which the order of operations is inverted or perhaps not inverted but 2 operations are compressed in 1 since the tcp layer does some buffering and the two bits are on. What is the effect on mbd? It thinks the host is down?

On Tue, Oct 21, 2025 at 2:53 PM tczz88888 @.> wrote: @.* commented on this pull request. ------------------------------ In lsf/lib/lib.channel.c <#35 (comment)>: > + // Handle error conditions + if ((ev & EPOLLERR) || (ev & EPOLLHUP) || (ev & EPOLLRDHUP)) { + channels[ch].events |= EPOLL_EVENTS_ERROR; + channels[ch].chanerr = CHANE_SYSCALL; + continue; + } Yes, I encountered this in a real case during MBD_PROBE handling. Here's the scenario: - MBD calls probe_slave, sending a MBD_PROBE request to SBD. - SBD processes it via do_probe, sends a response using chanWrite_, then closes the socket. - MBD uses epoll to monitor the socket,epoll triggers both EPOLLIN (remaining data) and EPOLLRDHUP (peer closed). The current logic unconditionally treats EPOLLRDHUP as an error, marking the channel as errored before fully reading data. This causes doProbeReply to be called with exception == TRUE, leading to a false error. — Reply to this email directly, view it on GitHub <#35 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO77WUMMVOBTSUX2FB3FI3T3YYUEBAVCNFSM6AAAAACJMDRZGWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTGNRQGUYTKNJWGY . You are receiving this because you authored the thread.Message ID: @.***>

Actually, the impact is not significant. A failed doProbeReply will only output an ERR log. Additionally, it will set the flag through sbdPtr->hData->flags |= HOST_NEEDPOLL;. Later, when pollSbatchds is invoked, it will detect the HOST_NEEDPOLL flag and initiate another call to probe_slave.

@spagochitarra
Copy link
Author

spagochitarra commented Oct 21, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants