Support LLMs through Cloud Vendors #40

MattMorgis · 2025-03-05T20:52:11Z

I have started working on a PR to add support for AWS Bedrock to the Anthropic Augmented LLM and Azure for the OpenAI one.

Also looking for any thoughts / feedback on this before I got started. Just curious if there was anything in the works / planned to be done a specific way before I took a pass at it. Otherwise I'll have at it.

saqadri · 2025-03-05T21:02:42Z

I have started working on a PR to add support for AWS Bedrock to the Anthropic Augmented LLM and Azure for the OpenAI one.

Also looking for any thoughts / feedback on this before I got started. Just curious if there was anything in the works / planned to be done a specific way before I took a pass at it. Otherwise I'll have at it.

I'd be grateful for contributions to this! I think adding it similar to the OpenAI and Anthropic support would be superb. I haven't prioritized adding additional providers to focus on building the library capabilities out, so any help here would be appreciated.

One thing that could help is if there are providers that support the OpenAI API format, we could reuse almost the entirety of the OpenAIAugmentedLLM class. We already support specifying a base_url in the config, which is how the Ollama integration works. But it could be more explicitly done as well, so have at it!

hrishikeshio · 2025-03-06T11:05:09Z

Maybe integrate with this? https://www.litellm.ai/ supports 100+ llms

saqadri · 2025-03-06T18:52:30Z

Maybe integrate with this? https://www.litellm.ai/ supports 100+ llms

I think that will be great @hrishikeshio

aatish-shinde · 2025-03-13T18:08:21Z

AWS Sagemaker would great to have. Here is the code I have for Sagemaker client. How can I leverage this code,

def get_streaming_response(self, prompt):
        """
        Sends a structured prompt to the SageMaker LLM endpoint and streams the response.

        :param prompt: The structured prompt formatted as a JSON list.
        :return: The streamed response as a string.
        """
        try:
            # Define inference parameters with streaming enabled
            inference_params = {
                "do_sample": True,
                "temperature": 0.1,
                "top_k": 50,
                "max_new_tokens": 512,
                "repetition_penalty": 1.03,
                "stop": ["</s>", "<|system|>", "<|user|>", "<|assistant|>"],
                "return_full_text": False
            }

            body = json.dumps({"inputs": prompt, "parameters": inference_params, "stream": True})

            # Invoke SageMaker endpoint with response streaming
            response = self.client.invoke_endpoint_with_response_stream(
                EndpointName=self.endpoint_name,
                Body=body,
                ContentType="application/json"
            )

            event_stream = response["Body"]

            return event_stream

MattMorgis · 2025-03-13T18:19:36Z

Hi @aatish-shinde,

I'm a little confused, are you fine-tuning an LLM with sagemaker?

From initial google searches, invoke_endpoint_with_response_stream seems to be a method on AWS Bedrock client?

aatish-shinde · 2025-03-13T19:16:46Z

@MattMorgis No I am not finetuning. But I think creating bedrock and sagemaker client are the same. You just have mention which aws resource you are calling. You can actually create the client using sagemaker-runtime too. Like this.

self.client = boto3.client('sagemaker-runtime', region_name="us-east-2")

# Invoke SageMaker endpoint with response streaming
            response = self.client.invoke_endpoint_with_response_stream(
                EndpointName=self.endpoint_name,
                Body=body,
                ContentType="application/json"
            )

MattMorgis · 2025-03-13T19:21:21Z

@saqadri I could use some input.

There are two ways I could possibly take this:

An AWSBedrockAugmentedLLM and AzureOpenAIAugmentedLLM. The Bedrock one for example could expose access to Claude, Llama, Titan, Mistral or any others in Bedrock. It would use boto3 and the bedrock client.
Update the existing AnthropicAugmentedLLM and OpenAIAugmentedLLM. As you mentioned, they can be re-used completely, the only difference for both is authentication. AnthropicAugmentedLLM would continue to use the Anthropic library

I guess it really boils down to if the framework wants to support by:

model - AnthropicAugmentedLLM to use Claude via Anthropic directly, AWS Bedrock or Google Vertex.
provider - Choose your model based on your provider:AnthropicAugmentedLLM, AWSBedrockAugmentedLLM, GoogleVertexAugmentedLLM

Let me know if that makes sense?

MattMorgis · 2025-03-13T19:23:46Z

@MattMorgis No I am not finetuning. But I think creating bedrock and sagemaker client are the same. You just have mention which aws resource you are calling. You can actually create the client using sagemaker-runtime too. Like this.
self.client = boto3.client('sagemaker-runtime', region_name="us-east-2")

# Invoke SageMaker endpoint with response streaming
        response = self.client.invoke_endpoint_with_response_stream(
            EndpointName=self.endpoint_name,
            Body=body,
            ContentType="application/json"
        )

@aatish-shinde What is self.endpoint_name defined as? What model are you using?

aatish-shinde · 2025-03-13T19:36:42Z

@MattMorgis it is just

@MattMorgis No I am not finetuning. But I think creating bedrock and sagemaker client are the same. You just have mention which aws resource you are calling. You can actually create the client using sagemaker-runtime too. Like this.
self.client = boto3.client('sagemaker-runtime', region_name="us-east-2")

# Invoke SageMaker endpoint with response streaming
        response = self.client.invoke_endpoint_with_response_stream(
            EndpointName=self.endpoint_name,
            Body=body,
            ContentType="application/json"
        )
@aatish-shinde What is self.endpoint_name defined as? What model are you using?

@MattMorgis It is just a name. I mean I can name it as "abcd". You usually just create an endpoint with a name in aws Sagemaker and use its boto3 client to instantiate it.
boto3.client('sagemaker-runtime', region_name="us-east-2")
and there is also another way of invoking,
POST /endpoints/EndpointName/invocations-response-stream HTTP/1.1

https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpointWithResponseStream.html

I think for aws bedrock you do something like this. which is similar.
client = boto3.client(service_name='bedrock-runtime') \n response = client.invoke_model_with_response_stream( modelId='anthropic.claude-v2', body=body )

saqadri · 2025-03-14T20:12:50Z

@saqadri I could use some input.

There are two ways I could possibly take this:

An AWSBedrockAugmentedLLM and AzureOpenAIAugmentedLLM. The Bedrock one for example could expose access to Claude, Llama, Titan, Mistral or any others in Bedrock. It would use boto3 and the bedrock client.

Update the existing AnthropicAugmentedLLM and OpenAIAugmentedLLM. As you mentioned, they can be re-used completely, the only difference for both is authentication. AnthropicAugmentedLLM would continue to use the Anthropic library

I guess it really boils down to if the framework wants to support by:

model - AnthropicAugmentedLLM to use Claude via Anthropic directly, AWS Bedrock or Google Vertex.

provider - Choose your model based on your provider:AnthropicAugmentedLLM, AWSBedrockAugmentedLLM, GoogleVertexAugmentedLLM

Let me know if that makes sense?

@MattMorgis I thought about this some more, and I think it makes sense to expose it by provider, even if the core implementation may be implementations of the same base classes. Currently, I see providers as supporting multiple models via the same API schema (e.g. openai has a bunch of models available via the same interface, anthropic, together ai has it). Some of the providers have standardized on some of the model provider API schema (e.g. openai-compatible api endpoints from Azure).

So instead of having LLaMAAugmentedLLM, it would be better to choose the provider (e.g. TogetherAugmentedLLM), and specify the model preferences or model ID (just like we do with RequestParams).

evalstate · 2025-03-15T14:08:58Z

Might be worth taking a look at : https://github.com/evalstate/fast-agent/blob/935e6c627158438c8df488541e63a20802b18720/src/mcp_agent/workflows/llm/model_factory.py#L4

This supports a dot notation for provider.model.reasoning-level

BTripp1986 · 2025-03-24T22:09:37Z

@saqadri I could use some input.
There are two ways I could possibly take this:

An AWSBedrockAugmentedLLM and AzureOpenAIAugmentedLLM. The Bedrock one for example could expose access to Claude, Llama, Titan, Mistral or any others in Bedrock. It would use boto3 and the bedrock client.

Update the existing AnthropicAugmentedLLM and OpenAIAugmentedLLM. As you mentioned, they can be re-used completely, the only difference for both is authentication. AnthropicAugmentedLLM would continue to use the Anthropic library

I guess it really boils down to if the framework wants to support by:

model - AnthropicAugmentedLLM to use Claude via Anthropic directly, AWS Bedrock or Google Vertex.

provider - Choose your model based on your provider:AnthropicAugmentedLLM, AWSBedrockAugmentedLLM, GoogleVertexAugmentedLLM

Let me know if that makes sense?

@MattMorgis I thought about this some more, and I think it makes sense to expose it by provider, even if the core implementation may be implementations of the same base classes. Currently, I see providers as supporting multiple models via the same API schema (e.g. openai has a bunch of models available via the same interface, anthropic, together ai has it). Some of the providers have standardized on some of the model provider API schema (e.g. openai-compatible api endpoints from Azure).

So instead of having LLaMAAugmentedLLM, it would be better to choose the provider (e.g. TogetherAugmentedLLM), and specify the model preferences or model ID (just like we do with RequestParams).

Just want to add some input that might be helpful on this topic. The various models in Bedrock do not have a unified API. Claude requires a different input structure than LLama for example. AWS recently addressed this by releasing the converse api which works with most models(I believe it works with any model that supports tool use), but still leaves room for model-specific inference parameters.

I would highly recommend implementing this feature using the converse API since everything is moving in that direction and it will be significantly less work because a single interface could prompt many models. If you use the invoke_model API, it will most likely require a custom implementation for every model.

@MattMorgis I'm very interested in this feature since I can't really use this framework until there is Bedrock support, and my company is about to rewrite one of out agentic apps very soon. I would really like to use this framework for that, but we are required to only use Bedrock. If you need any help or want to piece any of this feature out to speed it up, please let me know.

saqadri · 2025-03-25T03:24:41Z

@BTripp1986 thanks for this suggestion! I can help prioritize this work to make sure you can use mcp-agent for the agent app rewrite. Please find me on discord (@saqadri) and we can chat through it as well.

BTripp1986 · 2025-03-25T10:56:39Z

@BTripp1986 thanks for this suggestion! I can help prioritize this work to make sure you can use mcp-agent for the agent app rewrite. Please find me on discord (@saqadri) and we can chat through it as well.

Sounds great! Discord request sent from btrippcode

rwang1987 · 2025-03-26T18:54:12Z

LiteLLM is an excellent choice if we can support that, it opens all the possibilities! Looking forward!

rwang1987 · 2025-03-26T18:55:04Z

I could help you to test LiteLLM with Azure~

saqadri assigned MattMorgis Mar 5, 2025

saqadri added the enhancement New feature or request label Mar 5, 2025

saqadri mentioned this issue Mar 13, 2025

Please help. How can integrate MCP with model deployed on AWS Sagemaker? Much appreciate your help. #60

Open

This was referenced Mar 30, 2025

Add AWS Bedrock support #85

Merged

Add Azure support #90

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support LLMs through Cloud Vendors #40

Support LLMs through Cloud Vendors #40

MattMorgis commented Mar 5, 2025

saqadri commented Mar 5, 2025

hrishikeshio commented Mar 6, 2025

saqadri commented Mar 6, 2025

aatish-shinde commented Mar 13, 2025

MattMorgis commented Mar 13, 2025

aatish-shinde commented Mar 13, 2025

MattMorgis commented Mar 13, 2025

MattMorgis commented Mar 13, 2025

aatish-shinde commented Mar 13, 2025 •

edited

Loading

saqadri commented Mar 14, 2025

evalstate commented Mar 15, 2025 •

edited

Loading

BTripp1986 commented Mar 24, 2025

saqadri commented Mar 25, 2025

BTripp1986 commented Mar 25, 2025

rwang1987 commented Mar 26, 2025

rwang1987 commented Mar 26, 2025

Support LLMs through Cloud Vendors #40

Support LLMs through Cloud Vendors #40

Comments

MattMorgis commented Mar 5, 2025

saqadri commented Mar 5, 2025

hrishikeshio commented Mar 6, 2025

saqadri commented Mar 6, 2025

aatish-shinde commented Mar 13, 2025

MattMorgis commented Mar 13, 2025

aatish-shinde commented Mar 13, 2025

MattMorgis commented Mar 13, 2025

MattMorgis commented Mar 13, 2025

aatish-shinde commented Mar 13, 2025 • edited Loading

saqadri commented Mar 14, 2025

evalstate commented Mar 15, 2025 • edited Loading

BTripp1986 commented Mar 24, 2025

saqadri commented Mar 25, 2025

BTripp1986 commented Mar 25, 2025

rwang1987 commented Mar 26, 2025

rwang1987 commented Mar 26, 2025

aatish-shinde commented Mar 13, 2025 •

edited

Loading

evalstate commented Mar 15, 2025 •

edited

Loading