Skip to content

Gemini Adapter: Handle Non-Data Stream Chunks and Clarify :map Type Limitations in json_schema Mode #104

@nshkrdotcom

Description

@nshkrdotcom

Currently, the Gemini adapter can encounter issues when processing streams in :json_schema mode and when dealing with Ecto's generic :map types.

1. Stream Processing for :json_schema Mode:
When using stream: true with mode: :json_schema for the Gemini adapter, the Instructor.Adapters.Gemini.parse_stream_chunk_for_mode/2 function can raise a FunctionClauseError. This occurs because the stream from Gemini includes metadata or terminal chunks (e.g., indicating finishReason: "STOP") that do not match the expected data-bearing structure (candidates -> content -> parts -> text).

A more robust handling mechanism is needed to gracefully ignore these non-data chunks, allowing the stream processing to complete successfully.

Example Error (before fix):

** (FunctionClauseError) no function clause matching in Instructor.Adapters.Gemini.parse_stream_chunk_for_mode/2
   The following arguments were given to Instructor.Adapters.Gemini.parse_stream_chunk_for_mode/2:
       # 1
       :json_schema
       # 2
       %{"candidates" => [%{"content" => %{"role" => "model"}, "finishReason" => "STOP", ...}]}

2. :map Type Incompatibility with Gemini json_schema Mode:
Gemini's responseSchema in json_schema mode has strict requirements for object definitions. Specifically, it appears that:
* It does not support additionalProperties for defining generic maps with arbitrary keys.
* It requires that any nested object defined within the schema must have a non-empty properties field.

This creates an incompatibility with Ecto's generic :map and {:map, type} types, which naturally translate to JSON schema objects that either use additionalProperties or might have an empty properties map (e.g., {"type": "object", "properties": {}}).

The Instructor.Adapters.Gemini.normalize_json_schema/1 function currently includes a client-side raise to prevent sending schemas with empty properties for nested objects, as these are rejected by the Gemini API. This raise is informative:

Invalid JSON Schema: object with no properties at path: [...]. Gemini does not support empty objects. This is likely because it uses a naked :map type without any fields at [...]. Try switching to an embedded schema instead.

Expected Behavior / Solution:

  1. For stream processing, the Gemini adapter should add a fallback clause to parse_stream_chunk_for_mode/2 for :json_schema mode to return an empty string for unmatchable/non-data chunks.
  2. The client-side raise in normalize_json_schema for empty properties should be maintained as it correctly identifies an incompatibility and guides the user.
  3. Documentation for the Gemini adapter should clearly state the limitation regarding generic :map types in :json_schema mode and recommend using Ecto's embeds_one/embeds_many with explicit schemas for nested objects.
  4. The test suite should be updated to use embeds_one/embeds_many for map-like structures specifically when testing the Gemini adapter in json_schema mode, while allowing other adapters to be tested with generic :map types if they support them.

Impact:
These changes will improve the reliability of streaming with Gemini in json_schema mode and provide clearer guidance to users about schema design for compatibility with Gemini's structured output features.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions