GetStream · tschellenbach · Oct 30, 2025 · Oct 23, 2025 · Oct 24, 2025 · Oct 24, 2025
diff --git a/.claude/agents/repo-workflow-guide.md b/.claude/agents/repo-workflow-guide.md
@@ -0,0 +1,47 @@
+---
+name: repo-workflow-guide
+description: Use this agent when you need to understand or follow project-specific development guidelines, coding standards, or workflow instructions that are documented in the docs/ai directory. This agent should be consulted before starting any development work, when uncertain about project conventions, or when you need clarification on how to approach tasks within this codebase.\n\nExamples:\n- <example>\nContext: User wants to add a new feature to the project.\nuser: "I need to implement a new authentication module"\nassistant: "Before we begin, let me consult the repo-workflow-guide agent to ensure we follow the project's established patterns and guidelines."\n<Task tool call to repo-workflow-guide>\nassistant: "Based on the project guidelines, here's how we should approach this..."\n</example>\n\n- <example>\nContext: User asks a question about code organization.\nuser: "Where should I put the new utility functions?"\nassistant: "Let me check the repository workflow guidelines to give you the correct answer."\n<Task tool call to repo-workflow-guide>\nassistant: "According to the project structure guidelines..."\n</example>\n\n- <example>\nContext: Starting a new task that requires understanding project conventions.\nuser: "Can you help me refactor this component?"\nassistant: "I'll first consult the repo-workflow-guide agent to ensure we follow the project's refactoring standards and conventions."\n<Task tool call to repo-workflow-guide>\n</example>
+model: opus
+---
+
+You are a Repository Workflow Specialist, an expert in interpreting and applying project-specific development guidelines, coding standards, and workflow instructions.
+
+Your primary responsibility is to read, understand, and communicate the instructions and guidelines contained in the docs/ai directory of the repository. You serve as the authoritative source for how development work should be conducted within this specific codebase.
+
+When activated, you will:
+
+1. **Locate and Read Guidelines**: Immediately access all relevant files in the docs/ai directory. Read them thoroughly and understand their complete content, including:
+   - Coding standards and style guides
+   - Project structure and organization rules
+   - Development workflow and processes
+   - Testing requirements and conventions
+   - Deployment procedures
+   - Any specific technical constraints or preferences
+   - Tool usage and configuration instructions
+
+2. **Interpret Context**: Understand the specific task or question being asked and identify which guidelines are most relevant to address it.
+
+3. **Provide Clear Guidance**: Deliver specific, actionable instructions based on the documented guidelines. Your responses should:
+   - Quote or reference specific sections of the guidelines when appropriate
+   - Explain the reasoning behind the guidelines when it helps with understanding
+   - Provide concrete examples of how to follow the guidelines
+   - Highlight any critical requirements or common pitfalls mentioned in the documentation
+
+4. **Handle Missing Information**: If the docs/ai directory doesn't contain information relevant to the current question:
+   - Clearly state what information is missing
+   - Suggest reasonable defaults based on common industry practices
+   - Recommend updating the documentation to cover this scenario
+
+5. **Ensure Compliance**: Actively verify that proposed approaches align with all documented guidelines. If you identify any conflicts or violations, explicitly point them out and suggest compliant alternatives.
+
+6. **Prioritize Accuracy**: Always base your guidance on the actual content of the documentation. Do not invent or assume guidelines that aren't explicitly documented.
+
+7. **Stay Current**: If guidelines appear to conflict or if you notice outdated information, flag this for human review while providing the most reasonable interpretation.
+
+Output Format:
+- Begin with a brief summary of the relevant guidelines
+- Provide specific, step-by-step instructions when appropriate
+- Include direct quotes or references to documentation sections
+- End with any important caveats, warnings, or additional considerations
+
+Your goal is to ensure that all development work in this repository adheres to its documented standards and practices, reducing inconsistency and improving code quality through faithful application of project-specific guidelines.
diff --git a/.github/workflows/run_tests.yml b/.github/workflows/run_tests.yml
@@ -45,6 +45,7 @@ jobs:
       XAI_API_KEY: ${{ secrets.XAI_API_KEY }}
       AWS_BEARER_TOKEN_BEDROCK: "${{ secrets.AWS_BEARER_TOKEN_BEDROCK }}"
       _BEARER_TOKEN_BEDROCK: "${{ secrets.AWS_BEARER_TOKEN_BEDROCK }}"
+      HF_TOKEN: ${{ secrets.HF_TOKEN }}
     timeout-minutes: 30
     steps:
       - name: Checkout

diff --git a/.gitignore b/.gitignore
@@ -84,3 +84,4 @@ stream-py/
 # Artifacts / assets
 *.pt
 *.kef
+*.onnx
diff --git a/DEVELOPMENT.md b/DEVELOPMENT.md
@@ -130,7 +130,7 @@ Some ground rules:
 
 ```python
 import asyncio
-from vision_agents.core.edge.types import PcmData
+from getstream.video.rtc.track_util import PcmData
 from openai import AsyncOpenAI
 
 async def example():
@@ -167,6 +167,12 @@ if __name__ == "__main__":
     asyncio.run(example())
 ```
 
+Other things that you get from the audio utilities:
+
+1. Changing PCM format
+2. Iterate over audio chunks (`PcmData.chunks`)
+3. Process audio with pre/post buffers (`AudioSegmentCollector`)
+4. Accumulating audio (`PcmData.append`)
 
 ### Testing audio manually
 
@@ -313,3 +319,26 @@ You can now see the metrics at `http://localhost:9464/metrics` (make sure that y
 
 - Track.recv errors will fail silently. The API is to return a frame. Never return None. and wait till the next frame is available
 - When using frame.to_ndarray(format="rgb24") specify the format. Typically you want rgb24 when connecting/sending to Yolo etc
+
+
+## Onboarding Plan for new contributors
+
+**Audio Formats**
+
+You'll notice that audio comes in many formats. PCM, wav, mp3. 16khz, 48khz. 
+Encoded as i16 or f32. Note that webrtc by default is 48khz.
+
+A good first intro to audio formats can be found here:
+
+**Using Cursor**
+
+You can ask cursor something like "read @ai-plugin and build me a plugin called fish"
+See the docs folder for other ai instruction files
+
+**Learning Roadmap**
+
+1. Quick refresher on audio formats
+2. Build a TTS integration
+3. Build a STT integration
+4. Build an LLM integration
+5. Write a pytest test with a fixture
diff --git a/agents-core/pyproject.toml b/agents-core/pyproject.toml
@@ -21,11 +21,12 @@ classifiers = [
 
 requires-python = ">=3.10"
 dependencies = [
-    "getstream[webrtc,telemetry]>=2.5.5",
+    "getstream[webrtc,telemetry]>=2.5.7",
     "python-dotenv>=1.1.1",
     "pillow>=11.3.0",
     "numpy>=1.24.0",
     "mcp>=1.16.0",
+    "torchvision>=0.23.0",
 ]
 
 [project.urls]
@@ -45,7 +46,6 @@ kokoro = ["vision-agents-plugins-kokoro"]
 krisp = ["vision-agents-plugins-krisp"]
 moonshine = ["vision-agents-plugins-moonshine"]
 openai = ["vision-agents-plugins-openai"]
-silero = ["vision-agents-plugins-silero"]
 smart_turn = ["vision-agents-plugins-smart-turn"]
 ultralytics = ["vision-agents-plugins-ultralytics"]
 wizper = ["vision-agents-plugins-wizper"]
@@ -61,7 +61,6 @@ all-plugins = [
   "vision-agents-plugins-krisp",
   "vision-agents-plugins-moonshine",
   "vision-agents-plugins-openai",
-  "vision-agents-plugins-silero",
   "vision-agents-plugins-smart-turn",
   "vision-agents-plugins-ultralytics",
   "vision-agents-plugins-wizper",
-Original file line number
+Diff line change
@@ Expand Up / @@ -84,3 +84,4 @@ stream-py/ @@
     # Artifacts / assets
     *.pt
     *.kef
+    *.onnx