Merge pull request #193 from rjmacarthy/development

Updates to support fully configurable api, enabled support for LiteLLM
twinnydotdev · Mar 28, 2024 · c1fc91c · c1fc91c
2 parents f560752 + f3b571c
commit c1fc91c
Show file tree

Hide file tree

Showing 23 changed files with 353 additions and 273 deletions.
diff --git a/README.md b/README.md
@@ -5,14 +5,13 @@ Are you fed up of all of those so called "free" Copilot alternatives with paywal
 Twinny is the most no-nonsense locally hosted (or api hosted) AI code completion plugin for **Visual Studio Code** and any compatible editors (like VSCodium) designed to work seamlessly with: 
 
 - [Ollama](https://github.com/jmorganca/ollama)
-- [Ollama Web UI](https://github.com/ollama-webui/ollama-webui)
 - [llama.cpp](https://github.com/ggerganov/llama.cpp)
 - [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui)
 - [LM Studio](https://github.com/lmstudio-ai)
-
-Like Github Copilot but 100% free and private.
-
-
+- [LiteLLM](https://github.com/BerriAI/litellm)
+- [Ollama Web UI](https://github.com/ollama-webui/ollama-webui)
+- 
+Like Github Copilot but 100% free!
 
 <div align="center">
     <a href="https://marketplace.visualstudio.com/items?itemName=rjmacarthy.twinny">
@@ -41,6 +40,9 @@ Through the side bar, have a conversation with your model and get explanations a
 
 #### Other features 
 
+- Works online or offline.
+- Highly configurable api endpoints for fim and chat
+- Conforms to the OpenAI API standard
 - Single or multiline fill-in-middle completions
 - Customisable prompt templates to add context to completions
 - Easy installation via vscode extensions marketplace or by downloading and running a binary directly
@@ -49,14 +51,14 @@ Through the side bar, have a conversation with your model and get explanations a
 - Accept code solutions directly to editor
 - Create new documents from code blocks
 - Copy generated code solution blocks
-- Chat history preserved per workspace 
+- Chat history preserved per workspace
 
 ## 🚀 Getting Started
 
 ### With Ollama
 
 1. Install the VS code extension [link](https://marketplace.visualstudio.com/items?itemName=rjmacarthy.twinny) (or if [VSCodium](https://open-vsx.org/extension/rjmacarthy/twinny))
-2. Install [ollama](https://ollama.com/)
+2. Twinny is configured to use Ollama by default as the backend, you can install Ollama here: [ollama](https://ollama.com/)
 3. Choose your model from the [library](https://ollama.com/library) (eg: `codellama:7b`)
 
 ```sh
@@ -69,35 +71,31 @@ You should see the 🤖 icon indicating that twinny is ready to use.
 
 5. See [Keyboard shortcuts](#keyboard-shortcuts) to start using while coding 🎉
 
-### With llama.cpp / LM Studio / Oobabooga
+### With llama.cpp / LM Studio / Oobabooga / LiteLLM or any other provider.
 
 1. Install the VS code extension [link](https://marketplace.visualstudio.com/items?itemName=rjmacarthy.twinny) (or if [VSCodium](https://open-vsx.org/extension/rjmacarthy/twinny))
-2. Get [llama.cpp](https://github.com/ggerganov/llama.cpp) / LM Studio / Oobabooga
+2. Get [llama.cpp](https://github.com/ggerganov/llama.cpp) / LM Studio / Oobabooga / LiteLLM
 3. Download and run the model locally using the chosen provider
-
 4. Open VS code (if already open a restart might be needed) and press `ctr + shift + T` to open the side panel.
-
 5. From the top ⚙️ icon open the settings page and in the `Api Provider` panel change from `ollama` to `llamacpp` (or others respectively).
-6. In the left panel you should see the 🤖 icon indicating that twinny is ready to use.
-
-5. See [Keyboard shortcuts](#keyboard-shortcuts) to start using while coding 🎉
+6. Update the settings for chat provider, port and hostname etc to be the correct. Please adjust carefully for other providers.
+7. In the left panel you should see the 🤖 icon indicating that twinny is ready to use.
+8. See [Keyboard shortcuts](#keyboard-shortcuts) to start using while coding 🎉
 
 ### With other providers
 
-Twinny supports the OpenAI API specification so in theory any provider should work as long as it supports the specification. 
-
-If you find that isn't the case please [open an issue](https://github.com/rjmacarthy/twinny/issues/new/choose) with details of how you are having problems.
+Twinny supports the OpenAI API specification so in theory any provider should work as long as it supports the specification.
 
+The easiest way to use OpenAI API through twinny is to use LiteLLM as your procvider as a local proxy, it works seamlessly if configured correctly.
 
+If you find that isn't the case please [open an issue](https://github.com/rjmacarthy/twinny/issues/new/choose) with details of how you are having problems.
 
 #### Note!
 
 When choosing an API provider the port and API path names will be updated automatically based on the provider you choose to use. These options can also be set manually.
 
 The option for chat model name and fim model name are only applicable to Ollama and Oobabooga providers.
 
-
-
 ## Model support
 
 Twinny works with any model as long as it can run on your machine and it exposes a OpenAI API compliant endpoint.
@@ -114,7 +112,6 @@ All instruct models should work for chat generations, but the templates might ne
 
 - For computers with a good GPU, use: `deepseek-coder:6.7b-base-q5_K_M` (or any other good instruct model).
 
-
 ### **Models for FIM (fill in the middle) completions**
 
 For FIM completions, you need to use LLM models called "base models". Unlike instruct models, base models will only try to complete your prompt. They are not designed to answer questions.
@@ -145,6 +142,7 @@ In the settings there is an option called `useFileContext` this will keep track
 - Sometimes a restart of vscode is required for new settings to take effect, please open an issue if you are having problems with this.
 - Using file context often causes unreliable completions for FIM because small models get confused when provided with more than one file context.
 - See open issues on github to see any known issues that are not yet fixed.
+- LiteLLM fim template needs invetigation
 
 
 If you have a problem with Twinny or have any suggestions please report them on github issues.  Please include your vscode version and OS details in your issue.

diff --git a/package-lock.json b/package-lock.json
diff --git a/package.json b/package.json
@@ -2,7 +2,7 @@
   "name": "twinny",
   "displayName": "twinny - AI Code Completion and Chat",
   "description": "Locally hosted AI code completion plugin for vscode",
-  "version": "3.9.3",
+  "version": "3.10.0",
   "icon": "assets/icon.png",
   "keywords": [
     "code-inference",
@@ -228,105 +228,126 @@
         "twinny.apiHostname": {
           "order": 1,
           "type": "string",
-          "default": "localhost",
-          "description": "Hostname for the completion API.",
+          "default": "0.0.0.0",
+          "description": "Hostname for chat completion API.",
           "required": true
         },
-        "twinny.apiProvider": {
+        "twinny.apiFimHostname": {
           "order": 2,
           "type": "string",
+          "default": "0.0.0.0",
+          "description": "Hostname for FIM completion API.",
+          "required": true
+        },
+        "twinny.apiProvider": {
+          "order": 3,
+          "type": "string",
           "enum": [
             "ollama",
             "llamacpp",
             "lmstudio",
             "oobabooga",
-            "other"
+            "litellm"
           ],
           "default": "ollama",
-          "description": "The API provider to use (sets the paths and port automatically to defaults)."
+          "description": "API Chat provider."
+        },
+        "twinny.apiProviderFim": {
+          "order": 4,
+          "type": "string",
+          "enum": [
+            "ollama",
+            "llamacpp",
+            "lmstudio",
+            "oobabooga",
+            "litellm"
+          ],
+          "default": "ollama",
+          "description": "API FIM provider."
         },
         "twinny.chatApiPath": {
-          "order": 3,
+          "order": 5,
           "type": "string",
           "default": "/v1/chat/completions",
           "description": "Endpoint path for chat completions.",
           "required": true
         },
         "twinny.chatApiPort": {
-          "order": 4,
+          "order": 6,
           "type": "number",
           "default": 11434,
           "description": "The API port usually `11434` for Ollama and `8080` for llama.cpp (May differ depending on API configuration)",
           "required": true
         },
         "twinny.fimApiPort": {
-          "order": 5,
+          "order": 7,
           "type": "number",
           "default": 11434,
           "description": "The API port usually `11434` for Ollama and `8080` for llama.cpp (May differ depending on API configuration)",
           "required": true
         },
         "twinny.fimApiPath": {
-          "order": 6,
+          "order": 8,
           "type": "string",
           "default": "/api/generate",
           "description": "Endpoint path for FIM completions.",
           "required": true
         },
         "twinny.chatModelName": {
-          "order": 7,
+          "order": 9,
           "type": "string",
           "default": "codellama:7b-instruct",
           "description": "Model identifier for chat completions. Applicable only for Ollama and Oobabooga API."
         },
         "twinny.fimModelName": {
-          "order": 8,
+          "order": 10,
           "type": "string",
           "default": "codellama:7b-code",
           "description": "Model identifier for FIM completions. Applicable only for Ollama and Oobabooga API."
         },
         "twinny.fimTemplateFormat": {
-          "order": 9,
+          "order": 11,
           "type": "string",
           "enum": [
             "automatic",
             "stable-code",
             "codellama",
             "deepseek",
-            "starcoder"
+            "starcoder",
+            "custom-template"
           ],
           "default": "automatic",
           "description": "The prompt format to be used for FIM completions. Overrides automatic detection."
         },
         "twinny.disableAutoSuggest": {
-          "order": 10,
+          "order": 12,
           "type": "boolean",
           "default": false,
           "description": "Disables automatic suggestions, manual trigger (default shortcut Alt+\\)."
         },
         "twinny.contextLength": {
-          "order": 11,
+          "order": 13,
           "type": "number",
           "default": 100,
           "description": "Defines the number of lines before and after the current line to include in FIM prompts.",
           "required": true
         },
         "twinny.debounceWait": {
-          "order": 12,
+          "order": 14,
           "type": "number",
           "default": 300,
           "description": "Delay in milliseconds before triggering the next completion.",
           "required": true
         },
         "twinny.temperature": {
-          "order": 13,
+          "order": 15,
           "type": "number",
           "default": 0.2,
           "description": "Sets the model's creativity level (temperature) for generating completions.",
           "required": true
         },
         "twinny.useMultiLineCompletions": {
-          "order": 14,
+          "order": 16,
           "type": "boolean",
           "default": true,
           "description": "Use multiline completions"
@@ -335,63 +356,63 @@
           "dependencies": {
             "twinny.useMultiLineCompletions": true
           },
-          "order": 15,
+          "order": 17,
           "type": "number",
           "default": 20,
           "description": "Maximum number of lines to use for multi line completions. Applicable only when useMultiLineCompletions is enabled."
         },
         "twinny.useFileContext": {
-          "order": 16,
+          "order": 18,
           "type": "boolean",
           "default": false,
           "description": "Enables scanning of neighbouring documents to enhance completion prompts. (Experimental)"
         },
         "twinny.enableCompletionCache": {
-          "order": 17,
+          "order": 19,
           "type": "boolean",
           "default": false,
           "description": "Caches FIM completions for identical prompts to enhance performance."
         },
         "twinny.numPredictChat": {
-          "order": 18,
+          "order": 20,
           "type": "number",
           "default": 512,
           "description": "Maximum token limit for chat completions.",
           "required": true
         },
         "twinny.numPredictFim": {
-          "order": 19,
+          "order": 21,
           "type": "number",
           "default": 512,
           "description": "Maximum token limit for FIM completions. Set to -1 for no limit. Twinny should stop at logical line breaks.",
           "required": true
         },
         "twinny.enableSubsequentCompletions": {
-          "order": 20,
+          "order": 22,
           "type": "boolean",
           "default": true,
           "description": "Enable this setting to allow twinny to keep making subsequent completion requests to the API after the last completion request was accepted."
         },
         "twinny.keepAlive": {
-          "order": 21,
+          "order": 23,
           "type": "string",
           "default": "5m",
           "description": "Keep models in memory by making requests with keep_alive=-1. Applicable only for Ollama API."
         },
         "twinny.useTls": {
-          "order": 22,
+          "order": 24,
           "type": "boolean",
           "default": false,
           "description": "Enables TLS encryption for API connections."
         },
         "twinny.apiBearerToken": {
-          "order": 23,
+          "order": 25,
           "type": "string",
           "default": "",
           "description": "Bearer token for secure API authentication."
         },
         "twinny.enableLogging": {
-          "order": 24,
+          "order": 26,
           "type": "boolean",
           "default": true,
           "description": "Enable twinny debug mode"