UPSTREAM PR #17362: server: split server code into main/common/task/queue #261
+3,196
−2,814
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Mirrored from ggml-org/llama.cpp#17362
This PR splits the
server.cppinto small components:server.cppcontains theserver_context,server_slotand handler functionsserver-commoncontaining most of the functions fromutils.hpp(some one-off functions are moved toserver-task)server-taskcontaining allserver_task_*classes and subclasses. The main idea is to consider these classes as serializer/deserialize. In the future, most of the JSON handling will be done here (instead of scattering across the code base)server-queuecontaining implementation of task queue and result queue. The goal is to group all of the mutex-related logic into one file, potentially reusing them for other things in the future (completely decoupled fromserver_context)flowchart TD server_common -.- main server_common -.- server_task server_task -.- server_slot server_task -.- server_queue server_slot <-->|update slots| server_context server_queue -->|get task| server_context HTTP_handlers -->|post task| server_queue subgraph main server_slot server_context -.- HTTP_handlers end