server : fix format_infill #10724

ngxson · 2024-12-08T20:10:30Z

Should fix #10691 (comment) , I remove the format_chat/_infill/_rerank from handle_completions_generic but forgot to put it back in handle_infill

ggerganov · 2024-12-08T20:16:44Z

examples/server/server.cpp

@@ -3509,6 +3516,21 @@ int main(int argc, char ** argv) {
        }
        data["input_extra"] = input_extra; // default to empty array if it's not exist

+        std::string prompt = json_value(data, "prompt", std::string());
+        std::vector<llama_tokens> tokenized_prompts = tokenize_input_prompts(ctx_server.ctx, prompt, true, true);


We should probably return an error if there is more than 1 resulting prompt?

Because above we already checked if data["prompt"] is string, here we can be sure that we only have one single prompt to deal with. Probably an GGML_ASSERT here make more sense?

(The expected behavior of tokenize_input_prompts is that if prompt is a string, then output vector size is == 1)

Got it. It's fine as it is.

ngxson · 2024-12-08T20:37:34Z

@ggerganov I added a test using Qwen2.5-Coder-1.5B-Instruct-GGUF, you can run it locally using:

SLOW_TESTS=1 ./tests.sh ./unit/test_infill.py -v -x

Feel free to add more complicated test case(s) if you need!

ngxson · 2024-12-08T20:52:51Z

Btw please note that adding "prompt": "Complete this" to the request makes Qwen model to hallucinate the answer. Looking at the technical report, I don't think the model is trained to follow instruction when doing infill:

ggerganov · 2024-12-08T21:01:51Z

Ah, the infill endpoint should be used only with the Coder models. Not the Coder-Instruct. The authors confirmed that: https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct/discussions/2#6731a45e0e39be0605a0df20

So the tests should be updated to use the Coder variant.

ngxson · 2024-12-08T21:19:01Z

OK so I've tried the non-instruction model but I think the problem is related to placement of prompt in the formatted version. What I'm getting is:

<|repo_name|>myproject\n<|file_sep|>llama.h\nLLAMA_API int32_t llama_n_threads();\n<|file_sep|>filename\n<|fim_prefix|>#include <cstdio>\n#include \"llama.h\"\n\nint main() {\n int n_threads = llama_Complete this<|fim_suffix|>}\n<|fim_middle|>

If the prompt is placed at the beginning, it should work:

Complete this<|repo_name|>myproject\n<|file_sep|>llama.h\nLLAMA_API int32_t llama_n_threads();\n<|file_sep|>filename\n<|fim_prefix|>#include <cstdio>\n#include \"llama.h\"\n\nint main() {\n int n_threads = llama_<|fim_suffix|>}\n<|fim_middle|>

We can fix this in another PR, for now I'm gonna comment out the "prompt": "Complete this" and change the model to non-instruction

ggerganov · 2024-12-08T21:43:24Z

FIM should not add instructions such as "Complete this", as can be seen in the technical report. The "prompt" field in the /infill requests is designed to contain the prefix of the current line on which the cursor is located. This is appended to the "input_prefix" (which contains the previous lines) to obtain the final fim_prefix. So it is working as intended.

ngxson · 2024-12-08T21:54:13Z

The "prompt" field in the /infill requests is designed to contain the prefix of the current line on which the cursor is located.

Ok thanks for the clarification. I updated the test to reflect this. The "prompt" now contains the current line where the cursor is located:

    res = server.make_request("POST", "/infill", data={
        "input_extra": [{
            "filename": "llama.h",
            "text": "LLAMA_API int32_t llama_n_threads();\n"
        }],
        "input_prefix": "#include <cstdio>\n#include \"llama.h\"\n\nint main() {\n",
        "prompt": "    int n_threads = llama_",
        "input_suffix": "}\n",
    })

ggerganov · 2024-12-08T21:55:19Z

Yes, perfect. The test_invalid_input_extra_req also needs to be updated like this.

* server : fix format_infill * fix * rename * update test * use another model * update test * update test * test_invalid_input_extra_req

server : fix format_infill

b46dc2f

ngxson requested a review from ggerganov December 8, 2024 20:10

ngxson added 2 commits December 8, 2024 21:11

fix

6ec3f77

rename

a4d2572

github-actions bot added examples server labels Dec 8, 2024

ggerganov approved these changes Dec 8, 2024

View reviewed changes

ggerganov mentioned this pull request Dec 8, 2024

server : fix infill prompt format #10725

Closed

ngxson added 2 commits December 8, 2024 21:29

update test

ac2ea53

use another model

5ffc2a0

github-actions bot added the python python script changes label Dec 8, 2024

update test

d47360e

update test

055aa9e

test_invalid_input_extra_req

3a81c60

ggerganov approved these changes Dec 8, 2024

View reviewed changes

ngxson merged commit ce8784b into ggerganov:master Dec 8, 2024
48 checks passed

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024

server : fix format_infill (ggerganov#10724)

efc7661

* server : fix format_infill * fix * rename * update test * use another model * update test * update test * test_invalid_input_extra_req

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server : fix format_infill #10724

server : fix format_infill #10724

ngxson commented Dec 8, 2024 •

edited

Loading

ggerganov Dec 8, 2024

ngxson Dec 8, 2024

ggerganov Dec 8, 2024

ngxson commented Dec 8, 2024

ngxson commented Dec 8, 2024

ggerganov commented Dec 8, 2024

ngxson commented Dec 8, 2024

ggerganov commented Dec 8, 2024 •

edited

Loading

ngxson commented Dec 8, 2024

ggerganov commented Dec 8, 2024

server : fix format_infill #10724

server : fix format_infill #10724

Conversation

ngxson commented Dec 8, 2024 • edited Loading

ggerganov Dec 8, 2024

Choose a reason for hiding this comment

ngxson Dec 8, 2024

Choose a reason for hiding this comment

ggerganov Dec 8, 2024

Choose a reason for hiding this comment

ngxson commented Dec 8, 2024

ngxson commented Dec 8, 2024

ggerganov commented Dec 8, 2024

ngxson commented Dec 8, 2024

ggerganov commented Dec 8, 2024 • edited Loading

ngxson commented Dec 8, 2024

ggerganov commented Dec 8, 2024

ngxson commented Dec 8, 2024 •

edited

Loading

ggerganov commented Dec 8, 2024 •

edited

Loading