Batch Prediction with Long Context and Context Caching using Google Gemini API in Colab - #550 #602

william-Dic · 2025-03-24T16:50:04Z

Description of the feature request:

This feature request aims to develop a robust, production-ready code sample that demonstrates how to perform batch prediction with the Google Gemini API using long context and context caching. The primary use case is to extract information from large video content—such as lectures or documentaries—by asking multiple, potentially interconnected, questions.

Key aspects of the feature include:

Batch Prediction: Efficiently submitting a batch of questions in a way that minimizes API calls and handles rate limits, possibly by dividing the questions into smaller batches.
Long Context Handling: Leveraging Gemini’s long context capabilities to provide the entire video transcript or segmented summaries as context. This includes strategies to segment and summarize transcripts that exceed maximum context limits.
Context Caching: Implementing persistent context caching (using, for example, a JSON file) to store and reuse previous summarizations and conversation history, thereby reducing redundant API calls and improving response times.
Interconnected Questions: Supporting conversational history so that each question can build upon previous answers, leading to more accurate and relevant responses.
Output Formatting: Delivering clear, structured, and user-friendly outputs, with potential enhancements like clickable links to relevant video timestamps.
Robust Error Handling: Ensuring the solution gracefully handles network errors, API failures, and invalid inputs through retries and exponential backoff.
Multi-Language Support: Allowing the user to specify the transcript language, accommodating videos in different languages.

What problem are you trying to solve with this feature?

The feature addresses the challenge of extracting meaningful insights from lengthy video transcripts. When dealing with large amounts of text, it's difficult to efficiently process and query the information without running into API context limits or making redundant calls. This solution tackles that problem by segmenting and summarizing the transcript, caching context to reduce unnecessary API usage, and maintaining conversation history to answer interconnected questions accurately.

Demonstration of the Current Gemini Video Analysis Solution

In this demonstration, I use Gemini to analyze an almost two-hour-long video and then ask it questions. The system returns responses asynchronously in under one second.

Demo.mp4

review-notebook-app · 2025-03-24T16:50:09Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Giom-V · 2025-03-24T17:14:44Z

Thanks a lot @william-Dic. I'll check internally if everything is fine with us and will come back to you.

Giom-V · 2025-03-31T13:14:59Z

Hello @william-Dic, I think the example is interesting, but now that we have the YT integration, wouldn't be easier to just use chat mode, load the video and ask questions? What's the value added of your workflow?

JonathanMShaw · 2025-04-03T19:12:17Z

How does the billing on this play out? Say I have a 1M token db and I want to make 1k tiny requests against it, each of which generates a tiny answer. The naive way is to include the db in the input prompt, for a total of 1M * 1k * input_token_cost. If instead I cache it, I would expect to get charged 1M * 1k * cached_token_cost, which is 1/4 of the naive cost (plus a few dollars an hour for them to hang onto the cache). If cache the db and batch the 1k prompts, does that half the cost again, so the whole thing is 1/8th as expensive as the naive implementation?

Add GeminiVideoReader.ipynb

e0bade5

github-actions bot added status:awaiting review PR awaiting review from a maintainer component:examples Issues/PR referencing examples folder labels Mar 24, 2025

william-Dic mentioned this pull request Mar 24, 2025

Batch Prediction with Long Context and Context Caching using Google Gemini API in Colab - #550 #553

Closed

Giom-V self-assigned this Mar 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch Prediction with Long Context and Context Caching using Google Gemini API in Colab - #550 #602

Batch Prediction with Long Context and Context Caching using Google Gemini API in Colab - #550 #602

william-Dic commented Mar 24, 2025

review-notebook-app bot commented Mar 24, 2025

Giom-V commented Mar 24, 2025

Giom-V commented Mar 31, 2025

JonathanMShaw commented Apr 3, 2025 •

edited

Loading

Batch Prediction with Long Context and Context Caching using Google Gemini API in Colab - #550 #602

Are you sure you want to change the base?

Batch Prediction with Long Context and Context Caching using Google Gemini API in Colab - #550 #602

Conversation

william-Dic commented Mar 24, 2025

Description of the feature request:

What problem are you trying to solve with this feature?

Demonstration of the Current Gemini Video Analysis Solution

review-notebook-app bot commented Mar 24, 2025

Giom-V commented Mar 24, 2025

Giom-V commented Mar 31, 2025

JonathanMShaw commented Apr 3, 2025 • edited Loading

JonathanMShaw commented Apr 3, 2025 •

edited

Loading