Skip to content

Conversation

@geunwoonoh
Copy link
Collaborator

@geunwoonoh geunwoonoh commented Dec 29, 2025

Summary

This PR merges the complete agent refactoring work from 131-alpamon-agent-refactoring branch into main. The refactoring addresses critical architectural issues, including the God Object anti-pattern, uncontrolled goroutine spawning, and circular dependencies.

Key Achievements

Metric Before After Improvement
command.go size 2,363 lines 121 lines 95% reduction
Handler separation 1 monolithic file 9 specialized handlers Clear SoC
Goroutine management Unlimited spawning Pool-limited (10 workers) Controlled
Context propagation Partial 100% coverage Complete
Circular dependencies Present None Eliminated

Architecture Changes

1. Handler Pattern Implementation

Decomposed the monolithic command.go into specialized handlers:

pkg/executor/handlers/
├── common/        # Shared interfaces and utilities
├── system/        # upgrade, restart, reboot, shutdown, quit
├── user/          # adduser, deluser, moduser
├── group/         # addgroup, delgroup
├── firewall/      # firewall, firewall-rollback, reorder operations
├── file/          # upload, download
├── shell/         # shell command execution
├── terminal/      # openpty, openftp, resizepty
├── tunnel/        # opentunnel, closetunnel (NEW)
└── info/          # ping, help, commit, sync

2. Goroutine Pool & Context Management

Worker Pool (internal/pool/pool.go):

  • Fixed worker count (default: 10)
  • Job queue with configurable capacity (default: 100)
  • Panic recovery and graceful shutdown
  • Non-blocking job submission

Context Manager (pkg/agent/context.go):

  • Centralized context creation
  • Hierarchical context tree from root
  • Coordinated shutdown across all components

3. Command Dispatcher System

pkg/executor/
├── dispatcher.go   # Routes commands to appropriate handlers
├── registry.go     # Handler registration and lookup
├── factory.go      # Dependency injection and handler creation
└── executor.go     # System command execution

4. Main Branch Features Integrated

Design Patterns Applied

Pattern Location Purpose
Handler pkg/executor/handlers/ Command execution isolation
Factory pkg/executor/factory.go Centralized handler creation
Registry pkg/executor/registry.go Handler lookup by command
Strategy handlers/firewall/backend.go iptables/nftables abstraction
Object Pool internal/pool/pool.go Goroutine management
Context pkg/agent/context.go Lifecycle management

Breaking Changes

API Changes

  • NewCommandRunner() now requires dispatcher parameter
  • NewWebsocketClient() now requires ctxManager and workerPool parameters

Configuration

New [pool] section in alpamon.conf:

[pool]
max_workers = 10
queue_size = 100
default_timeout = 300

Dependencies

  • Added: github.com/xtaci/smux (tunnel multiplexing)
  • Removed: gopkg.in/natefinch/lumberjack.v2 (log rotation)

Test Results

$ go build -v ./...
# Build successful

$ go test -v ./... -p 1
# All tests pass

Test Coverage

  • 38 test files across all packages
  • Unit tests for each handler
  • Integration tests for dispatcher system
  • Goroutine leak detection tests
  • Benchmark tests for pool and context manager

Files Changed (Key Files)

Core Refactoring

  • pkg/runner/command.go - Reduced from 2,363 to 121 lines
  • pkg/runner/client.go - Added pool integration and ping/pong
  • cmd/alpamon/command/root.go - Updated initialization sequence

New Components

  • internal/pool/pool.go - Worker pool implementation
  • pkg/agent/context.go - Context manager
  • pkg/executor/dispatcher.go - Command dispatcher
  • pkg/executor/factory.go - Handler factory
  • pkg/executor/handlers/* - 9 specialized handlers

Integrated from Main

  • pkg/runner/tunnel_client.go - smux tunnel client
  • pkg/runner/auth_manager.go - Sudo approval manager
  • pkg/runner/control_client.go - Control WebSocket client

Verification Checklist

  • Build passes on Linux and macOS
  • All existing tests pass
  • No goroutine leaks detected
  • No circular dependencies
  • Backward compatible with existing server API
  • Configuration changes documented

Deployment Notes

  1. Configuration Update: Add [pool] section to existing configs
  2. systemd: Service file updated for journald logging
  3. Monitoring: Watch goroutine count and memory usage post-deployment

Related Issues/PRs

geunwoonoh and others added 30 commits November 24, 2025 12:02
Add a Pool struct that implements the Goroutine
Worker Pool pattern to solve the problem of
potentially having an unlimited number of
goroutines.
There was a potential for goroutines to not shut down
gracefully upon alpamon termination because their contexts were not propagated.
Furthermore, the unrestrained use of contexts
resulted in poor context management.
To address these problems, add a ContextManager
to globally manage contexts within alpamon.
As the Pool was added, add test cases
for the functionalities it provides.
As the ContextManager was added, add test
cases for the functionalities it provides.
As the management of goroutines and contexts was changed
to be handled globally by the Pool and ContextManager, respectively,
refactor the following components:
- Collector(pkg/collector/collector.go)
- WebsocketClient(pkg/runner/client.go)
- CommandRunner(pkg/runner/command.go)
- CommitAsync()(pkg/runner/commit.go)
- Reporter(pkg/scheduler/reporter.go)
- LogServer(pkg/logger/server.go)
If multiple PtyClients attempt to use the
terminals variable simultaneously, a race
condition could occur.
To solve this, use sync.RWMutex to prevent
the race condition.
Add a pool section to alpamon.conf to allow users to
dynamically configure parameters such as the maximum
number of workers and the queue size for the Pool.
Refactore the newly added pool section in
alpamon.conf to be stored and used in
Settings, similar to the other existing
sections in alpamon.conf.
Delete unused NewContextManagerWithRoot().
Adjust golint to pool_test.go.
Add a ReporterManager to manage the goroutine
lifecycle of the Reporters.
Fix a context leak by modifying the code
to use the ReporterManager for managing
the goroutine and context lifecycle of the
Reporters.
As mentioned in the code review, fix the code to
use a cancel function to prevent context leaks
and memory leaks in the restart, quit, reboot,
and shutdown operation cases.
As mentioned in the code review, fix the code to
use a cancel function to prevent context leaks
and memory leaks in LogServer and
WebsocketClient.
Replace unconventional context timeout as sleep mechanism
with standard time.After + select pattern for better readability
and Go idioms.
Use pointer type for DefaultTimeout to distinguish between
"not configured" and "explicitly set to 0", enabling users to
disable timeout via config.
…ase-1-goroutine-pool-and-context-management-implementation

Resolve "[Alpamon Agent Refactoring] Phase 1: Goroutine Pool and Context Management Implementation"
Add executor components to resolve the God Object
Anti-pattern in pkg/runner/command.go
Add Handler, CommandExecutor, and WSClient interfaces
considering polymorphism and extensibility.
Executor is the component that executes the actual command,
responsible for receiving the command data from Handler,
managing privilege demotion, and executing the command.
Add Registry, which is responsible for registering
and unregistering handlers, and managing the
registered handlers.
Add CommandDispatcher component, which is
responsible for routing the Command received
from the CommandRunner to the correct handler.
Add BaseHandler to provide functionality
commonly used by all handlers.
The previous use of maps for passing CommandData
between the Dispatcher, Handler, and Executor resulted
in reduced type safety.
This was resolved by adding a separate CommandArgs
struct to manage the data used by these components.
Update the CommandDispatcher following the addition of
CommandArgs.
Update Interfaces following the addition of CommandArgs
at pkg/executor/handlers/common/interfaces.go
Integrate NewBaseHandlerWithTypes()
into NewBaseHandler().
Refactor pkg/runner/command.go and
pkg/runner/command\_types.go following the addition
of executor-related components to resolve the God Object
Anti-pattern in pkg/runner/command.go.
Refactor the CommandRunner so that its sole responsibility
is to parse the incoming CommandData into CommandArgs,
route the command to the Dispatcher for execution by the
Executor, and send the execution result back to the
WebsocketClient.
Firewall logic was kept for compatibility and is scheduled
for refactoring in Phase 3.
Add dispatcher field to WebsocketClient to handle command
with CommandRunner and CommandDispatcher.
Separate the logic for the command cases into a method
named handleCommand()
Add a SystemHandler to process system-related
commands.
This handler takes over the following cases previously
handled by CommandRunner:
upgrade, restart, quit, reboot, shutdown, update, byebye
geunwoonoh and others added 15 commits December 16, 2025 15:18
Update pty.go to import and use protocol.CommandData directly.
Update terminal.go to use protocol.CommandData instead of
runner.CommandData.
Update client.go, command.go to use types in
internal/protocol instead of types in pkg/runner.
…-phase4-transport-layer-separation

Resolve "[Alpamon Agent Refactoring] Phase4: Transport Layer Separation"
Add InfoHandler tests (ping, help, commit, sync)
Add ShellHandler tests (operators, user/group, env, timeout)
Add SystemHandler tests (upgrade, restart, quit, reboot, shutdown)
Add pool leak tests (normal ops, panic, context cancel, queue full).
Add context manager leak tests (create/cancel, child cleanup).
Verify zero goroutine leaks after operations.
Add pool benchmarks (worker scaling, queue throughput, concurrent submit).
Add context manager benchmarks (create, timeout, concurrent).
Add dispatcher/registry benchmarks (get, list, concurrent).
Fix benchmark deadlock by increasing queue size.
Add integration tests for registry and handler execution.
Add regression tests for all handler/command types.
Add resource usage tests (memory, startup time, overhead).
Apply golint.
Apply golint.
Apply golint.
…ase-5-testing-and-validation

Resolve "[Alpamon Agent Refactoring] Phase 5: Testing and Validation"
…actoring-integrate-main-branch-changes-into-refactored-architecture
Add TunnelHandler to handle the following commands:
opentunnel, closetunnel
Define types and structs related to TunnelHandler.
Add fields related to tunneling functionality
in conjunction with TunnelHandler.
…ation

Update RegisterAll() to include TunnelHandler registration.
…tegrate-main-branch-changes-into-refactored-architecture

Resolve "[Alpamon Agent Refactoring] Integrate Main Branch Changes into Refactored Architecture"
@geunwoonoh geunwoonoh self-assigned this Dec 29, 2025
@geunwoonoh geunwoonoh linked an issue Dec 29, 2025 that may be closed by this pull request
14 tasks
@geunwoonoh
Copy link
Collaborator Author

@junho226 , @jisung-02 First of all, I apologize for requesting a review on such a large amount of code.
However, I believe it should be manageable as the code has already undergone review and testing in the following PRs:

During the refactoring process, I have also incorporated the latest changes merged into the main branch.
Therefore, I would appreciate it if you could focus your review on the following areas:

  • @junho226 : pkg/runner/auth_manager.go, pkg/runner/control_client.go, cmd/alpamon/command/root.go
  • @jisung-02 : pkg/executor/handlers/tunnel

Inline suppressions added for intentional behaviors (9 locations):
- executor.go: CWE-78 command injection (core agent functionality)
- file.go: CWE-22 path traversal, CWE-918 SSRF (admin file operations)
- user.go: CWE-22 path traversal (home directory management)
- fs.go: CWE-22 path traversal (file existence check)
- http_client.go: CWE-918 SSRF (admin-specified URL requests)

Alpamon is a server management agent equivalent to SSH server.
Commands are received only from authenticated Alpacon administrators.
Add CodeQL security scanning for Go codebase with custom configuration.

New files:
- .github/workflows/codeql.yml: Weekly CodeQL analysis on main branch
- .github/codeql/codeql-config.yml: Security model documentation
@geunwoonoh geunwoonoh marked this pull request as draft December 29, 2025 10:25
Remove custom CodeQL configuration to resolve conflict with GitHub's
Default Setup which is already enabled on this repository.

Deleted files:
- .github/workflows/codeql.yml
- .github/codeql/codeql-config.yml

Inline suppression comments in source files (executor.go, file.go,
user.go, fs.go, http_client.go) remain intact and will work with
GitHub's Default Setup.
junho226
junho226 previously approved these changes Dec 29, 2025
Add // lgtm[...] inline comments to suppress CodeQL security alerts
that are false positives for this server management agent.

Alpamon is designed to execute admin commands from authenticated
Alpacon console, similar to an SSH server. These operations are
intentional features, not vulnerabilities:

- go/command-injection: executor.go (admin command execution)
- go/path-injection: file.go, user.go, fs.go (file operations)
- go/request-forgery: file.go, http_client.go (URL fetching)

Existing // codeql[...] comments retained for documentation.
@geunwoonoh geunwoonoh marked this pull request as ready for review December 29, 2025 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Alpamon Agent Refactoring

3 participants