You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Extend Tau compute to support AI inference capabilities more efficiently. Previously, the Ollama plugin was integrated but it introduced significant overhead and had limited concurrency capabilities. Check out the existing implementation at ollama-cloud.
Objective: Develop a plugin that exports a model management and inference interface capable of handling concurrent requests efficiently.
Proposed Approach:
Start with llama.cpp. Consider using go-llama.cpp or create a fork and update it to the latest version of llama.cpp.
Simplify the compilation process similar to Ollama. Utilize builder to build shared objects from llama.cpp, and embed them into the plugin using the embed package.
Next Steps for Enhancement:
In the second iteration, integrate support for TensorRT-LLM, which offers superior performance and is more suited for cloud environments.
Inspiration and Resources:
Draw inspiration from Ollama for embedding techniques.
Refer to LocalAI for effective llama.cpp integration.
Consider using cortex.llamacpp over llama.cpp directly. Building configurations can be found here.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Extend Tau compute to support AI inference capabilities more efficiently. Previously, the Ollama plugin was integrated but it introduced significant overhead and had limited concurrency capabilities. Check out the existing implementation at ollama-cloud.
Objective: Develop a plugin that exports a model management and inference interface capable of handling concurrent requests efficiently.
Proposed Approach:
llama.cpp
. Consider using go-llama.cpp or create a fork and update it to the latest version ofllama.cpp
.llama.cpp
, and embed them into the plugin using theembed
package.Next Steps for Enhancement:
Inspiration and Resources:
llama.cpp
integration.llama.cpp
directly. Building configurations can be found here.Beta Was this translation helpful? Give feedback.
All reactions