Skip to content

Conversation

@DajanaV
Copy link
Collaborator

@DajanaV DajanaV commented Nov 18, 2025

Mirrored from ggml-org/llama.cpp#17362

This PR splits the server.cpp into small components:

  • Main, server.cpp contains the server_context, server_slot and handler functions
  • server-common containing most of the functions from utils.hpp (some one-off functions are moved to server-task)
  • server-task containing all server_task_* classes and subclasses. The main idea is to consider these classes as serializer/deserialize. In the future, most of the JSON handling will be done here (instead of scattering across the code base)
  • server-queue containing implementation of task queue and result queue. The goal is to group all of the mutex-related logic into one file, potentially reusing them for other things in the future (completely decoupled from server_context)
flowchart TD
    server_common -.- main
    server_common -.- server_task
    server_task -.- server_slot
    server_task -.- server_queue
    server_slot <-->|update slots| server_context
    server_queue -->|get task| server_context
    HTTP_handlers -->|post task| server_queue
    subgraph main
        server_slot
        server_context -.- HTTP_handlers
    end
Loading

@loci-agentic-ai
Copy link

Access the complete analysis in the LOCI Dashboard

Server Code Refactoring Analysis Summary

Project ID: 2621b8c0-b5ce-11f0-b333-453f42058aa1
Versions: d20ade69-bacf-489b-96f8-3de69880cc4f vs 78c4bc6d-4360-4b44-861b-55f67fa5a936

Change Overview

This PR implements a comprehensive architectural refactoring of the llama.cpp server code, splitting the monolithic server.cpp (5,000+ lines) into four focused modules:

server-common: Shared utilities, tokenization helpers, and data structures
server-task: Task management, parameter parsing, and result formatting
server-queue: Thread-safe queue operations and response handling
server.cpp: Core server logic and slot management (reduced to ~2,800 lines)

Technical Implementation

Code Changes:
Modularization: 3,196 additions, 2,814 deletions across 9 files
Function Renaming: slot_paramstask_params, format_infillformat_prompt_infill
Header Restructuring: Replaced utils.hpp with modular includes
Build System: Updated CMakeLists.txt to include new source files

Architectural Benefits:
Separation of Concerns: Each module handles distinct functionality
Maintainability: Smaller, focused files improve code navigation
Compilation: Modular headers reduce incremental build times
Testability: Individual components can be unit tested independently

Performance Assessment

Analysis Limitations:
Performance metrics analysis could not be completed due to data availability constraints. The MCP tools returned status codes but no detailed performance data for the specified versions.

Expected Impact:
Runtime Performance: Neutral - no algorithmic changes to core inference paths
Memory Layout: Unchanged - data structures and processing logic preserved
Function Call Overhead: Minimal - modern compilers can inline across modules with LTO
Core Functions: No modifications to critical inference functions (llama_decode, llama_encode, llama_tokenize)

Code Quality Impact

This refactoring represents a significant improvement in code organization without introducing functional changes. The modular structure enhances maintainability and developer productivity while preserving existing performance characteristics. All HTTP endpoints and API compatibility remain unchanged.

Verification Needed:
• Build system compatibility across all supported platforms
• HTTP endpoint functionality validation
• Thread safety verification in queue operations

@loci-dev loci-dev force-pushed the main branch 21 times, most recently from 40cd625 to a18a56b Compare November 24, 2025 03:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants