-
Notifications
You must be signed in to change notification settings - Fork 413
Support LLMs through Cloud Vendors #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'd be grateful for contributions to this! I think adding it similar to the OpenAI and Anthropic support would be superb. I haven't prioritized adding additional providers to focus on building the library capabilities out, so any help here would be appreciated. One thing that could help is if there are providers that support the OpenAI API format, we could reuse almost the entirety of the OpenAIAugmentedLLM class. We already support specifying a |
Maybe integrate with this? https://www.litellm.ai/ supports 100+ llms |
I think that will be great @hrishikeshio |
AWS Sagemaker would great to have. Here is the code I have for Sagemaker client. How can I leverage this code, def get_streaming_response(self, prompt):
"""
Sends a structured prompt to the SageMaker LLM endpoint and streams the response.
:param prompt: The structured prompt formatted as a JSON list.
:return: The streamed response as a string.
"""
try:
# Define inference parameters with streaming enabled
inference_params = {
"do_sample": True,
"temperature": 0.1,
"top_k": 50,
"max_new_tokens": 512,
"repetition_penalty": 1.03,
"stop": ["</s>", "<|system|>", "<|user|>", "<|assistant|>"],
"return_full_text": False
}
body = json.dumps({"inputs": prompt, "parameters": inference_params, "stream": True})
# Invoke SageMaker endpoint with response streaming
response = self.client.invoke_endpoint_with_response_stream(
EndpointName=self.endpoint_name,
Body=body,
ContentType="application/json"
)
event_stream = response["Body"]
return event_stream |
Hi @aatish-shinde, I'm a little confused, are you fine-tuning an LLM with sagemaker? From initial google searches, |
@MattMorgis No I am not finetuning. But I think creating bedrock and sagemaker client are the same. You just have mention which aws resource you are calling. You can actually create the client using sagemaker-runtime too. Like this. self.client = boto3.client('sagemaker-runtime', region_name="us-east-2")
# Invoke SageMaker endpoint with response streaming
response = self.client.invoke_endpoint_with_response_stream(
EndpointName=self.endpoint_name,
Body=body,
ContentType="application/json"
) |
@saqadri I could use some input. There are two ways I could possibly take this:
I guess it really boils down to if the framework wants to support by:
Let me know if that makes sense? |
@aatish-shinde What is |
@MattMorgis it is just
@MattMorgis It is just a name. I mean I can name it as "abcd". You usually just create an endpoint with a name in aws Sagemaker and use its boto3 client to instantiate it. I think for aws bedrock you do something like this. which is similar. |
@MattMorgis I thought about this some more, and I think it makes sense to expose it by provider, even if the core implementation may be implementations of the same base classes. Currently, I see providers as supporting multiple models via the same API schema (e.g. openai has a bunch of models available via the same interface, anthropic, together ai has it). Some of the providers have standardized on some of the model provider API schema (e.g. openai-compatible api endpoints from Azure). So instead of having LLaMAAugmentedLLM, it would be better to choose the provider (e.g. TogetherAugmentedLLM), and specify the model preferences or model ID (just like we do with RequestParams). |
Might be worth taking a look at : https://github.com/evalstate/fast-agent/blob/935e6c627158438c8df488541e63a20802b18720/src/mcp_agent/workflows/llm/model_factory.py#L4 This supports a dot notation for |
Just want to add some input that might be helpful on this topic. The various models in Bedrock do not have a unified API. Claude requires a different input structure than LLama for example. AWS recently addressed this by releasing the converse api which works with most models(I believe it works with any model that supports tool use), but still leaves room for model-specific inference parameters. I would highly recommend implementing this feature using the converse API since everything is moving in that direction and it will be significantly less work because a single interface could prompt many models. If you use the invoke_model API, it will most likely require a custom implementation for every model. @MattMorgis I'm very interested in this feature since I can't really use this framework until there is Bedrock support, and my company is about to rewrite one of out agentic apps very soon. I would really like to use this framework for that, but we are required to only use Bedrock. If you need any help or want to piece any of this feature out to speed it up, please let me know. |
@BTripp1986 thanks for this suggestion! I can help prioritize this work to make sure you can use mcp-agent for the agent app rewrite. Please find me on discord (@saqadri) and we can chat through it as well. |
Sounds great! Discord request sent from btrippcode |
LiteLLM is an excellent choice if we can support that, it opens all the possibilities! Looking forward! |
I could help you to test LiteLLM with Azure~ |
I have started working on a PR to add support for AWS Bedrock to the Anthropic Augmented LLM and Azure for the OpenAI one.
Also looking for any thoughts / feedback on this before I got started. Just curious if there was anything in the works / planned to be done a specific way before I took a pass at it. Otherwise I'll have at it.
The text was updated successfully, but these errors were encountered: