Skip to content

Conversation

@Fuzzwah
Copy link

@Fuzzwah Fuzzwah commented Nov 28, 2025

Summary

Fixes Replicate API to properly support Gemini models and improves the embed() function to be more robust.

Problem

When using Gemini models (e.g., google/gemini-2.5-flash) on Replicate:

  1. The system_prompt field is ignored - Gemini needs the system message in the main prompt
  2. The stream() method returns empty results - Gemini needs run() instead
  3. The embed() function would incorrectly use the chat model for embeddings

Changes

Gemini Model Support

  • Detect Gemini models by checking if model name contains 'gemini'
  • Combine system message into the main prompt (since system_prompt is ignored)
  • Use run() instead of stream() for Gemini models
  • Handle various output formats (string, array, object)

Improved embed() Function

  • Always use a dedicated embedding model, not the chat model
  • Detect if configured model is an embedding model (contains 'embed', 'gte', 'e5-')
  • Fall back to mark3labs/embeddings-gte-base for chat models
  • Add input validation for text parameter
  • Handle multiple output formats (vectors, embedding, embeddings, array)
  • Better error messages

Usage

To use Gemini on Replicate, configure your profile:

{
  "model": "replicate/google/gemini-2.5-flash",
  "embedding": "replicate/mark3labs/embeddings-gte-base"
}

Testing

Tested with:

  • google/gemini-2.5-flash for chat
  • mark3labs/embeddings-gte-base for embeddings

Both work correctly with this fix.

Gemini model fixes:
- Detect Gemini models by name and use different input format
- Combine system message into prompt (Gemini ignores system_prompt field)
- Use run() instead of stream() for Gemini (streaming returns empty)
- Handle various output formats (string, array)

Embed function improvements:
- Always use dedicated embedding model, not chat model
- Add input validation for text parameter
- Handle multiple output formats (vectors, embedding, embeddings, array)
- Detect embedding models vs chat models to avoid using wrong model

These changes enable using Gemini models (e.g., google/gemini-2.5-flash)
on Replicate alongside existing Llama/Mistral models.
Copy link
Contributor

@Sweaterdog Sweaterdog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, I haven't tested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants