Skip to content

Support LLMs through Cloud Vendors #40

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MattMorgis opened this issue Mar 5, 2025 · 16 comments
Open

Support LLMs through Cloud Vendors #40

MattMorgis opened this issue Mar 5, 2025 · 16 comments
Assignees
Labels
enhancement New feature or request

Comments

@MattMorgis
Copy link
Contributor

I have started working on a PR to add support for AWS Bedrock to the Anthropic Augmented LLM and Azure for the OpenAI one.

Also looking for any thoughts / feedback on this before I got started. Just curious if there was anything in the works / planned to be done a specific way before I took a pass at it. Otherwise I'll have at it.

@saqadri
Copy link
Collaborator

saqadri commented Mar 5, 2025

I have started working on a PR to add support for AWS Bedrock to the Anthropic Augmented LLM and Azure for the OpenAI one.

Also looking for any thoughts / feedback on this before I got started. Just curious if there was anything in the works / planned to be done a specific way before I took a pass at it. Otherwise I'll have at it.

I'd be grateful for contributions to this! I think adding it similar to the OpenAI and Anthropic support would be superb. I haven't prioritized adding additional providers to focus on building the library capabilities out, so any help here would be appreciated.

One thing that could help is if there are providers that support the OpenAI API format, we could reuse almost the entirety of the OpenAIAugmentedLLM class. We already support specifying a base_url in the config, which is how the Ollama integration works. But it could be more explicitly done as well, so have at it!

@saqadri saqadri added the enhancement New feature or request label Mar 5, 2025
@hrishikeshio
Copy link

Maybe integrate with this? https://www.litellm.ai/ supports 100+ llms

@saqadri
Copy link
Collaborator

saqadri commented Mar 6, 2025

Maybe integrate with this? https://www.litellm.ai/ supports 100+ llms

I think that will be great @hrishikeshio

@aatish-shinde
Copy link

AWS Sagemaker would great to have. Here is the code I have for Sagemaker client. How can I leverage this code,

def get_streaming_response(self, prompt):
        """
        Sends a structured prompt to the SageMaker LLM endpoint and streams the response.

        :param prompt: The structured prompt formatted as a JSON list.
        :return: The streamed response as a string.
        """
        try:
            # Define inference parameters with streaming enabled
            inference_params = {
                "do_sample": True,
                "temperature": 0.1,
                "top_k": 50,
                "max_new_tokens": 512,
                "repetition_penalty": 1.03,
                "stop": ["</s>", "<|system|>", "<|user|>", "<|assistant|>"],
                "return_full_text": False
            }

            body = json.dumps({"inputs": prompt, "parameters": inference_params, "stream": True})

            # Invoke SageMaker endpoint with response streaming
            response = self.client.invoke_endpoint_with_response_stream(
                EndpointName=self.endpoint_name,
                Body=body,
                ContentType="application/json"
            )

            event_stream = response["Body"]

            return event_stream

@MattMorgis
Copy link
Contributor Author

Hi @aatish-shinde,

I'm a little confused, are you fine-tuning an LLM with sagemaker?

From initial google searches, invoke_endpoint_with_response_stream seems to be a method on AWS Bedrock client?

@aatish-shinde
Copy link

@MattMorgis No I am not finetuning. But I think creating bedrock and sagemaker client are the same. You just have mention which aws resource you are calling. You can actually create the client using sagemaker-runtime too. Like this.

self.client = boto3.client('sagemaker-runtime', region_name="us-east-2")

# Invoke SageMaker endpoint with response streaming
            response = self.client.invoke_endpoint_with_response_stream(
                EndpointName=self.endpoint_name,
                Body=body,
                ContentType="application/json"
            )

@MattMorgis
Copy link
Contributor Author

@saqadri I could use some input.

There are two ways I could possibly take this:

  1. An AWSBedrockAugmentedLLM and AzureOpenAIAugmentedLLM. The Bedrock one for example could expose access to Claude, Llama, Titan, Mistral or any others in Bedrock. It would use boto3 and the bedrock client.

  2. Update the existing AnthropicAugmentedLLM and OpenAIAugmentedLLM. As you mentioned, they can be re-used completely, the only difference for both is authentication. AnthropicAugmentedLLM would continue to use the Anthropic library

I guess it really boils down to if the framework wants to support by:

  • model - AnthropicAugmentedLLM to use Claude via Anthropic directly, AWS Bedrock or Google Vertex.
  • provider - Choose your model based on your provider:AnthropicAugmentedLLM, AWSBedrockAugmentedLLM, GoogleVertexAugmentedLLM

Let me know if that makes sense?

@MattMorgis
Copy link
Contributor Author

@MattMorgis No I am not finetuning. But I think creating bedrock and sagemaker client are the same. You just have mention which aws resource you are calling. You can actually create the client using sagemaker-runtime too. Like this.

self.client = boto3.client('sagemaker-runtime', region_name="us-east-2")

# Invoke SageMaker endpoint with response streaming
        response = self.client.invoke_endpoint_with_response_stream(
            EndpointName=self.endpoint_name,
            Body=body,
            ContentType="application/json"
        )

@aatish-shinde What is self.endpoint_name defined as? What model are you using?

@aatish-shinde
Copy link

aatish-shinde commented Mar 13, 2025

@MattMorgis it is just

@MattMorgis No I am not finetuning. But I think creating bedrock and sagemaker client are the same. You just have mention which aws resource you are calling. You can actually create the client using sagemaker-runtime too. Like this.

self.client = boto3.client('sagemaker-runtime', region_name="us-east-2")

# Invoke SageMaker endpoint with response streaming
        response = self.client.invoke_endpoint_with_response_stream(
            EndpointName=self.endpoint_name,
            Body=body,
            ContentType="application/json"
        )

@aatish-shinde What is self.endpoint_name defined as? What model are you using?

@MattMorgis It is just a name. I mean I can name it as "abcd". You usually just create an endpoint with a name in aws Sagemaker and use its boto3 client to instantiate it.
boto3.client('sagemaker-runtime', region_name="us-east-2")
and there is also another way of invoking,
POST /endpoints/EndpointName/invocations-response-stream HTTP/1.1

https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpointWithResponseStream.html

I think for aws bedrock you do something like this. which is similar.
client = boto3.client(service_name='bedrock-runtime') \n response = client.invoke_model_with_response_stream( modelId='anthropic.claude-v2', body=body )

@saqadri
Copy link
Collaborator

saqadri commented Mar 14, 2025

@saqadri I could use some input.

There are two ways I could possibly take this:

  1. An AWSBedrockAugmentedLLM and AzureOpenAIAugmentedLLM. The Bedrock one for example could expose access to Claude, Llama, Titan, Mistral or any others in Bedrock. It would use boto3 and the bedrock client.
  2. Update the existing AnthropicAugmentedLLM and OpenAIAugmentedLLM. As you mentioned, they can be re-used completely, the only difference for both is authentication. AnthropicAugmentedLLM would continue to use the Anthropic library

I guess it really boils down to if the framework wants to support by:

  • model - AnthropicAugmentedLLM to use Claude via Anthropic directly, AWS Bedrock or Google Vertex.
  • provider - Choose your model based on your provider:AnthropicAugmentedLLM, AWSBedrockAugmentedLLM, GoogleVertexAugmentedLLM

Let me know if that makes sense?

@MattMorgis I thought about this some more, and I think it makes sense to expose it by provider, even if the core implementation may be implementations of the same base classes. Currently, I see providers as supporting multiple models via the same API schema (e.g. openai has a bunch of models available via the same interface, anthropic, together ai has it). Some of the providers have standardized on some of the model provider API schema (e.g. openai-compatible api endpoints from Azure).

So instead of having LLaMAAugmentedLLM, it would be better to choose the provider (e.g. TogetherAugmentedLLM), and specify the model preferences or model ID (just like we do with RequestParams).

@evalstate
Copy link
Contributor

evalstate commented Mar 15, 2025

Might be worth taking a look at : https://github.com/evalstate/fast-agent/blob/935e6c627158438c8df488541e63a20802b18720/src/mcp_agent/workflows/llm/model_factory.py#L4

This supports a dot notation for provider.model.reasoning-level

@BTripp1986
Copy link

@saqadri I could use some input.
There are two ways I could possibly take this:

  1. An AWSBedrockAugmentedLLM and AzureOpenAIAugmentedLLM. The Bedrock one for example could expose access to Claude, Llama, Titan, Mistral or any others in Bedrock. It would use boto3 and the bedrock client.
  2. Update the existing AnthropicAugmentedLLM and OpenAIAugmentedLLM. As you mentioned, they can be re-used completely, the only difference for both is authentication. AnthropicAugmentedLLM would continue to use the Anthropic library

I guess it really boils down to if the framework wants to support by:

  • model - AnthropicAugmentedLLM to use Claude via Anthropic directly, AWS Bedrock or Google Vertex.
  • provider - Choose your model based on your provider:AnthropicAugmentedLLM, AWSBedrockAugmentedLLM, GoogleVertexAugmentedLLM

Let me know if that makes sense?

@MattMorgis I thought about this some more, and I think it makes sense to expose it by provider, even if the core implementation may be implementations of the same base classes. Currently, I see providers as supporting multiple models via the same API schema (e.g. openai has a bunch of models available via the same interface, anthropic, together ai has it). Some of the providers have standardized on some of the model provider API schema (e.g. openai-compatible api endpoints from Azure).

So instead of having LLaMAAugmentedLLM, it would be better to choose the provider (e.g. TogetherAugmentedLLM), and specify the model preferences or model ID (just like we do with RequestParams).

Just want to add some input that might be helpful on this topic. The various models in Bedrock do not have a unified API. Claude requires a different input structure than LLama for example. AWS recently addressed this by releasing the converse api which works with most models(I believe it works with any model that supports tool use), but still leaves room for model-specific inference parameters.

I would highly recommend implementing this feature using the converse API since everything is moving in that direction and it will be significantly less work because a single interface could prompt many models. If you use the invoke_model API, it will most likely require a custom implementation for every model.

@MattMorgis I'm very interested in this feature since I can't really use this framework until there is Bedrock support, and my company is about to rewrite one of out agentic apps very soon. I would really like to use this framework for that, but we are required to only use Bedrock. If you need any help or want to piece any of this feature out to speed it up, please let me know.

@saqadri
Copy link
Collaborator

saqadri commented Mar 25, 2025

@BTripp1986 thanks for this suggestion! I can help prioritize this work to make sure you can use mcp-agent for the agent app rewrite. Please find me on discord (@saqadri) and we can chat through it as well.

@BTripp1986
Copy link

@BTripp1986 thanks for this suggestion! I can help prioritize this work to make sure you can use mcp-agent for the agent app rewrite. Please find me on discord (@saqadri) and we can chat through it as well.

Sounds great! Discord request sent from btrippcode

@rwang1987
Copy link

LiteLLM is an excellent choice if we can support that, it opens all the possibilities! Looking forward!

@rwang1987
Copy link

I could help you to test LiteLLM with Azure~

This was referenced Mar 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants