Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(forge/llm): Add LlamafileProvider #7091

Open
wants to merge 43 commits into
base: master
Choose a base branch
from

Conversation

k8si
Copy link

@k8si k8si commented Apr 19, 2024

Background

This draft PR is a step toward enabling the use of local models in AutoGPT by adding llamafile as an LLM provider.

Implementation notes are included in forge/forge/llm/providers/llamafile/README.md

Related issues:

Depends on:

Changes πŸ—οΈ

  • Add minimal implementation of LlamafileProvider, a new ChatModelProvider for llamafiles. It extends BaseOpenAIProvider and only overrides methods that are necessary to get the system to work at a basic level.

  • Add support for mistral-7b-instruct-v0.2. This is the only model currently supported by LlamafileProvider because this is the only model I tested anything with.

  • Misc changes to app configuration to enable switching between openai/llamafile providers. In particular, added config field LLM_PROVIDER that, when set to 'llamafile', will use LllamafileProvider in agents rather than OpenAIProvider.

  • Add instructions to use AutoGPT with llamafile in the docs at autogpt/setup/index.md

Limitations:

  • Only tested with (quantized) Mistral-7B-Instruct-v0.2
  • Only tested with a single AutoGPT 'task' ("Tell me about Roman dodecahedrons")
  • Did not attempt extensive refactoring of existing components; I just added special cases as necessary
  • Haven't added any tests for new classes/methods

PR Quality Scorecard ✨

  • Have you used the PR description template?   +2 pts
  • Is your pull request atomic, focusing on a single change?   +5 pts
  • Have you linked the GitHub issue(s) that this PR addresses?   +5 pts
  • Have you documented your changes clearly and comprehensively?   +5 pts
  • Have you changed or added a feature?   -4 pts
    • Have you added/updated corresponding documentation?   +4 pts
    • Have you added/updated corresponding integration tests?   +5 pts
  • Have you changed the behavior of AutoGPT?   -5 pts
    • Have you also run agbenchmark to verify that these changes do not regress performance?   +10 pts

…der for llamafiles. Currently it just extends OpenAIProvider and only overrides methods that are necessary to get the system to work at a basic level.

Update ModelProviderName schema and config/configurator so that app startup using this provider is handled correctly.
Add 'mistral-7b-instruct-v0' to OpenAIModelName/OPEN_AI_CHAT_MODELS registries.
…-Instruct chat template, which supports the 'user' & 'assistant' roles but does not support the 'system' role.
…kens`, and `get_tokenizer` from classmethods so I can override them in LlamafileProvide (and so I can access instance instance attributes from inside them). Implement class `LlamafileTokenizer` that calls the llamafile server's `/tokenize` API endpoint.
…tes on the integration; add helper scripts for downloading/running a llamafile + example env file.
…ange serve.sh to use model's full context size (this does not seem to cause OOM errors, surpisingly).
Copy link

netlify bot commented Apr 19, 2024

βœ… Deploy Preview for auto-gpt-docs canceled.

Name Link
πŸ”¨ Latest commit 74923f1
πŸ” Latest deploy log https://app.netlify.com/sites/auto-gpt-docs/deploys/6679f5aedfb8b100080f86b2

@github-actions github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label Apr 22, 2024
Copy link
Contributor

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

@Swiftyos
Copy link
Contributor

@CodiumAI-Agent /review

@CodiumAI-Agent
Copy link

PR Review

⏱️ Estimated effort to review [1-5]

4, due to the complexity and breadth of the changes introduced, including new model provider integrations, extensive modifications to configuration and provider logic, and the addition of new scripts and documentation. The PR touches multiple core components and introduces a new LLM provider, which requires careful review to ensure compatibility and correctness.

πŸ§ͺΒ Relevant tests

No

πŸ”Β Possible issues

Possible Bug: The method check_model_llamafile in configurator.py uses api_credentials.api_base.get_secret_value() which might expose sensitive information in error messages. This could lead to security risks if the error messages are logged or displayed in an environment where unauthorized users can view them.

Possible Bug: In LlamafileProvider, the method _create_chat_completion hard-codes the seed for reproducibility, which might not be desirable in all use cases and could limit the functionality of the model in generating diverse responses.

πŸ”’Β Security concerns

Sensitive information exposure: The method check_model_llamafile potentially exposes sensitive API base URLs in exception messages, which could be a security risk if these messages are logged or improperly handled.

Code feedback:
relevant fileautogpts/autogpt/autogpt/app/configurator.py
suggestion Β Β Β Β Β 

Consider removing or masking sensitive information such as api_base from error messages in check_model_llamafile to prevent potential leakage of sensitive data. [important]

relevant lineraise ValueError(f"llamafile server at {api_credentials.api_base.get_secret_value()} does not have access to {model_name}. Please configure {model_type} to use one of {available_model_ids} or use a different llamafile.")

relevant fileautogpts/autogpt/autogpt/core/resource/model_providers/llamafile.py
suggestion Β Β Β Β Β 

Remove the hard-coded seed in _create_chat_completion or make it configurable via method parameters or configuration settings to allow for more dynamic behavior. [important]

relevant linekwargs["seed"] = 0


✨ Review tool usage guide:

Overview:
The review tool scans the PR code changes, and generates a PR review which includes several types of feedbacks, such as possible PR issues, security threats and relevant test in the PR. More feedbacks can be added by configuring the tool.

The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on any PR.

  • When commenting, to edit configurations related to the review tool (pr_reviewer section), use the following template:
/review --pr_reviewer.some_config1=... --pr_reviewer.some_config2=...
[pr_reviewer]
some_config1=...
some_config2=...

See the review usage page for a comprehensive guide on using this tool.

@Pwuts
Copy link
Member

Pwuts commented May 24, 2024

@k8si any chance you could enable maintainer write access on this PR?

@Pwuts Pwuts added the local llm Related to local llms label May 29, 2024
@k8si
Copy link
Author

k8si commented May 29, 2024

@Pwuts it doesn't look like I have the ability to do that. I added you as a maintainer to the forked project, is that sufficient or do others need write access?

Alternatively, you could branch off my branch and I can just accept the changes via PR?

@github-actions github-actions bot removed the conflicts Automatically applied to PRs with merge conflicts label May 31, 2024
Copy link
Contributor

Conflicts have been resolved! πŸŽ‰ A maintainer will review the pull request shortly.

@github-actions github-actions bot added conflicts Automatically applied to PRs with merge conflicts Forge labels May 31, 2024
…vider`, `GroqProvider` and `LlamafileProvider`

and rebase the latter three on `BaseOpenAIProvider`
@github-actions github-actions bot removed the conflicts Automatically applied to PRs with merge conflicts label May 31, 2024
Copy link
Contributor

Conflicts have been resolved! πŸŽ‰ A maintainer will review the pull request shortly.

Copy link

codecov bot commented May 31, 2024

Codecov Report

Attention: Patch coverage is 0% with 151 lines in your changes missing coverage. Please review.

Project coverage is 24.73%. Comparing base (c19ab2b) to head (74923f1).
Report is 3 commits behind head on master.

Files Patch % Lines
forge/forge/llm/providers/llamafile/llamafile.py 0.00% 132 Missing ⚠️
forge/forge/llm/providers/multi.py 0.00% 16 Missing ⚠️
forge/forge/llm/providers/llamafile/__init__.py 0.00% 2 Missing ⚠️
forge/forge/llm/providers/schema.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7091      +/-   ##
==========================================
- Coverage   25.50%   24.73%   -0.77%     
==========================================
  Files          80       82       +2     
  Lines        4661     4806     +145     
  Branches      631      659      +28     
==========================================
  Hits         1189     1189              
- Misses       3402     3547     +145     
  Partials       70       70              
Flag Coverage Ξ”
Linux 24.73% <0.00%> (-0.77%) ⬇️
Windows 24.72% <0.00%> (-0.78%) ⬇️
autogpt-agent 35.94% <ΓΈ> (ΓΈ)
forge 20.74% <0.00%> (-0.89%) ⬇️
macOS 24.73% <0.00%> (-0.77%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

β˜” View full report in Codecov by Sentry.
πŸ“’ Have feedback on the report? Share it here.

@Pwuts
Copy link
Member

Pwuts commented Jun 21, 2024

I am currently experiencing these issues: Mozilla-Ocho/llamafile#356, Mozilla-Ocho/llamafile#100.

May need to amend llamafile/serve.py further to fix this for WSL.

Update: this isn't scriptable and not our problem. I'll amend the docs with a note that llamafiles can't be run from WSL, but can still be used by running them on Windows and then connecting to them in WSL.

@Pwuts Pwuts requested review from ntindle and kcze June 24, 2024 21:38
@Pwuts Pwuts requested a review from ntindle June 24, 2024 22:40
Comment on lines +14 to +15
## LLAMAFILE_API_BASE - Llamafile API base URL
# LLAMAFILE_API_BASE=http://localhost:8080/v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Env var not added to options.md

"--port", type=int, help="Specify the port for the llamafile server to listen on"
)
@click.option(
"--use-gpu", is_flag=True, help="Use an AMD or Nvidia GPU to speed up inference"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +126 to +127
"--ctx-size",
"0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think context size should be parametrizable; it has impact on performance so it's important to have a way of limiting it.

The first time this is run, it will download a file containing the model + runtime,
which may take a while and a few gigabytes of disk space.

To force GPU acceleration, add `--use-gpu` to the command.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like it'll attempt to use GPU but use CPU if not possible

Comment on lines +175 to +178
# 5 tokens for [INST], [/INST], which actually get
# tokenized into "[, INST, ]" and "[, /, INST, ]"
# by the mistral tokenizer
prompt_added += 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's 7? πŸ€”

Copy link
Member

@ntindle ntindle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this "works" but the model isn't great. Can we constrain the output schema to our models like is an option in the llamafile UI?

if not click.prompt(
click.style(
"You seem to have specified a different URL for the default model "
f"({llamafile.name}). Are you sure this is correct? "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"({llamafile.name}). Are you sure this is correct? "
f"({llamafile}). Are you sure this is correct? "

llamafile.name doesn't exist

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing only --llamafile Mixtral-8x22B-Instruct-v0.1-llamafile cause's a weird prompt input that can't be escaped and needs a reply like yes to continue before crashing on attempting to check llamafile.is_file()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing only --llamafile Mixtral-8x22B-Instruct-v0.1-llamafile cause's a weird prompt input that can't be escaped and needs a reply like yes to continue before crashing on attempting to check llamafile.is_file()

That's how I intended it, why don't you pass something with a .llamafile extension instead of -llamafile?


on_windows = platform.system() == "Windows"

if not llamafile.is_file():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running with --use-gpu --llamafile rocket-3b.Q5_K_M.llamafile --llamafile_url https://huggingface.co/Mozilla/rocket-3B-llamafile/resolve/main/rocket-3b.Q5_K_M.llamafile will crash here

Comment on lines +157 to +178
if model_name == LlamafileModelName.MISTRAL_7B_INSTRUCT:
# For mistral-instruct, num added tokens depends on if the message
# is a prompt/instruction or an assistant-generated message.
# - prompt gets [INST], [/INST] added and the first instruction
# begins with '<s>' ('beginning-of-sentence' token).
# - assistant-generated messages get '</s>' added
# see: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
#
prompt_added = 1 # one for '<s>' token
assistant_num_added = 0
ntokens = 0
for message in messages:
if (
message.role == ChatMessage.Role.USER
# note that 'system' messages will get converted
# to 'user' messages before being sent to the model
or message.role == ChatMessage.Role.SYSTEM
):
# 5 tokens for [INST], [/INST], which actually get
# tokenized into "[, INST, ]" and "[, /, INST, ]"
# by the mistral tokenizer
prompt_added += 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sucks that this isn't consistent across models or there's not a global tokenizer

@Pwuts
Copy link
Member

Pwuts commented Jun 26, 2024

this "works" but the model isn't great. Can we constrain the output schema to our models like is an option in the llamafile UI?

I think we could implement something like that by allowing to pass a model as the completion_parser. Sounds like a follow-up PR though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AutoGPT Agent documentation Improvements or additions to documentation Forge local llm Related to local llms size/xl
Projects
Status: 🚧 Needs work
Development

Successfully merging this pull request may close these issues.

Instructions for local models Support using other/local LLMs
8 participants