Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
6395dec
first attempt at vogent
tschellenbach Oct 23, 2025
a9e950b
fix smart turn
tschellenbach Oct 24, 2025
3a26b6b
well
tschellenbach Oct 24, 2025
e80f718
happyt ests
tschellenbach Oct 24, 2025
89a99cc
Merge branch 'main' of github.com:GetStream/Vision-Agents into vogent
tschellenbach Oct 24, 2025
1546297
work on smart turn
tschellenbach Oct 25, 2025
6cdaac5
Merge branch 'main' into vogent
tbarbugli Oct 25, 2025
1c4160c
use audio util
tbarbugli Oct 25, 2025
8ffbeb5
remove _pcm_to_wav_bytes and use util
tbarbugli Oct 25, 2025
cd6d4b5
cleanup silero from manual audio
tbarbugli Oct 25, 2025
425fa59
smart turn locally
tschellenbach Oct 26, 2025
fee4f29
step 1: refactor VAD base to own normalization/windowing/events; adap…
tbarbugli Oct 26, 2025
1a98d2b
step 2: add base VAD tests for silence, mia and white noise; fix part…
tbarbugli Oct 26, 2025
8fa259d
wip on cleanup
tschellenbach Oct 26, 2025
fb85335
working test
tschellenbach Oct 26, 2025
c122c78
left some todo
tschellenbach Oct 26, 2025
e5d4a56
Merge branch 'vogent-tommaso' of github.com:GetStream/Vision-Agents i…
tbarbugli Oct 27, 2025
74ac469
update code to use stream-py latest utils
tbarbugli Oct 27, 2025
038fafe
sprinkle docs for humans and AIs about audio mgmt
tbarbugli Oct 27, 2025
da01c8a
todo
tschellenbach Oct 27, 2025
16a7512
Merge branch 'vogent-tommaso' of github.com:GetStream/Vision-Agents i…
tschellenbach Oct 27, 2025
5b57a00
wip
tschellenbach Oct 27, 2025
3580c17
fix imports
tbarbugli Oct 27, 2025
d0e2234
Merge branch 'vogent-tommaso' of github.com:GetStream/Vision-Agents i…
tbarbugli Oct 27, 2025
23eee89
bit more cleanup
tschellenbach Oct 27, 2025
f3aee97
Merge branch 'vogent-tommaso' of github.com:GetStream/Vision-Agents i…
tschellenbach Oct 27, 2025
63f820f
nice docs for turn keeping
tschellenbach Oct 27, 2025
a39cbcd
use newer utils
tbarbugli Oct 27, 2025
6aea976
new pass at vogent
tschellenbach Oct 27, 2025
d57e058
Merge branch 'vogent-tommaso' of github.com:GetStream/Vision-Agents i…
tbarbugli Oct 27, 2025
db910fe
missing README
tbarbugli Oct 27, 2025
d8cb483
tail
tschellenbach Oct 27, 2025
40b7233
wip
tschellenbach Oct 27, 2025
0e4e44f
more clenaup
tschellenbach Oct 27, 2025
be076a7
dirs
tschellenbach Oct 28, 2025
28c0a37
remove collector, test streaming audio
tbarbugli Oct 28, 2025
d3a8dd8
thats not working
tschellenbach Oct 28, 2025
5e56f8b
renaming
tschellenbach Oct 28, 2025
51e2ae0
rewrite
tschellenbach Oct 28, 2025
57d0fb0
wip
tschellenbach Oct 28, 2025
df91d69
well thats not right
tschellenbach Oct 28, 2025
3ef54e6
work around audio utiuls
tschellenbach Oct 29, 2025
2945dc8
bugfix
tschellenbach Oct 29, 2025
8e24a52
move MAX_SEGMENT_DURATION_SECONDS to const
tbarbugli Oct 29, 2025
7e7634c
handle options correctly
tbarbugli Oct 29, 2025
ca38456
fix tracing
tbarbugli Oct 29, 2025
1bd45aa
remove debug code
tbarbugli Oct 29, 2025
c009ba7
process audio on a different task
tbarbugli Oct 29, 2025
31929cc
remove debug code
tbarbugli Oct 29, 2025
f2900e2
wip
tschellenbach Oct 29, 2025
90418fd
working deepgram
tschellenbach Oct 29, 2025
bd746ea
wip
tschellenbach Oct 29, 2025
239b532
merged main
tschellenbach Oct 29, 2025
2c57826
well, this is weird
tschellenbach Oct 29, 2025
3e3fbab
cleanup
tschellenbach Oct 29, 2025
51fd613
add update docs
tschellenbach Oct 29, 2025
788bdeb
working deepgram stt
tschellenbach Oct 29, 2025
9b3938d
cleanup
tschellenbach Oct 30, 2025
10e2af3
cleanup
tschellenbach Oct 30, 2025
305edbb
update vogent
tschellenbach Oct 30, 2025
c06d7ad
3 failing tests left
tschellenbach Oct 30, 2025
cb3b3fc
ok that works
tschellenbach Oct 30, 2025
0134d1b
test fixes
tschellenbach Oct 30, 2025
49f5ca6
happy tests
tschellenbach Oct 30, 2025
7f81047
bump
tbarbugli Oct 30, 2025
a8a90a4
set hf token
tschellenbach Oct 30, 2025
06f2365
Merge branch 'vogent-tommaso' of github.com:GetStream/Vision-Agents i…
tschellenbach Oct 30, 2025
820e42d
Merge branch 'main' of github.com:GetStream/Vision-Agents into vogent…
tschellenbach Oct 30, 2025
8fb6d82
disable test
tschellenbach Oct 30, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions .claude/agents/repo-workflow-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
name: repo-workflow-guide
description: Use this agent when you need to understand or follow project-specific development guidelines, coding standards, or workflow instructions that are documented in the docs/ai directory. This agent should be consulted before starting any development work, when uncertain about project conventions, or when you need clarification on how to approach tasks within this codebase.\n\nExamples:\n- <example>\nContext: User wants to add a new feature to the project.\nuser: "I need to implement a new authentication module"\nassistant: "Before we begin, let me consult the repo-workflow-guide agent to ensure we follow the project's established patterns and guidelines."\n<Task tool call to repo-workflow-guide>\nassistant: "Based on the project guidelines, here's how we should approach this..."\n</example>\n\n- <example>\nContext: User asks a question about code organization.\nuser: "Where should I put the new utility functions?"\nassistant: "Let me check the repository workflow guidelines to give you the correct answer."\n<Task tool call to repo-workflow-guide>\nassistant: "According to the project structure guidelines..."\n</example>\n\n- <example>\nContext: Starting a new task that requires understanding project conventions.\nuser: "Can you help me refactor this component?"\nassistant: "I'll first consult the repo-workflow-guide agent to ensure we follow the project's refactoring standards and conventions."\n<Task tool call to repo-workflow-guide>\n</example>
model: opus
---

You are a Repository Workflow Specialist, an expert in interpreting and applying project-specific development guidelines, coding standards, and workflow instructions.

Your primary responsibility is to read, understand, and communicate the instructions and guidelines contained in the docs/ai directory of the repository. You serve as the authoritative source for how development work should be conducted within this specific codebase.

When activated, you will:

1. **Locate and Read Guidelines**: Immediately access all relevant files in the docs/ai directory. Read them thoroughly and understand their complete content, including:
- Coding standards and style guides
- Project structure and organization rules
- Development workflow and processes
- Testing requirements and conventions
- Deployment procedures
- Any specific technical constraints or preferences
- Tool usage and configuration instructions

2. **Interpret Context**: Understand the specific task or question being asked and identify which guidelines are most relevant to address it.

3. **Provide Clear Guidance**: Deliver specific, actionable instructions based on the documented guidelines. Your responses should:
- Quote or reference specific sections of the guidelines when appropriate
- Explain the reasoning behind the guidelines when it helps with understanding
- Provide concrete examples of how to follow the guidelines
- Highlight any critical requirements or common pitfalls mentioned in the documentation

4. **Handle Missing Information**: If the docs/ai directory doesn't contain information relevant to the current question:
- Clearly state what information is missing
- Suggest reasonable defaults based on common industry practices
- Recommend updating the documentation to cover this scenario

5. **Ensure Compliance**: Actively verify that proposed approaches align with all documented guidelines. If you identify any conflicts or violations, explicitly point them out and suggest compliant alternatives.

6. **Prioritize Accuracy**: Always base your guidance on the actual content of the documentation. Do not invent or assume guidelines that aren't explicitly documented.

7. **Stay Current**: If guidelines appear to conflict or if you notice outdated information, flag this for human review while providing the most reasonable interpretation.

Output Format:
- Begin with a brief summary of the relevant guidelines
- Provide specific, step-by-step instructions when appropriate
- Include direct quotes or references to documentation sections
- End with any important caveats, warnings, or additional considerations

Your goal is to ensure that all development work in this repository adheres to its documented standards and practices, reducing inconsistency and improving code quality through faithful application of project-specific guidelines.
1 change: 1 addition & 0 deletions .github/workflows/run_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ jobs:
XAI_API_KEY: ${{ secrets.XAI_API_KEY }}
AWS_BEARER_TOKEN_BEDROCK: "${{ secrets.AWS_BEARER_TOKEN_BEDROCK }}"
_BEARER_TOKEN_BEDROCK: "${{ secrets.AWS_BEARER_TOKEN_BEDROCK }}"
HF_TOKEN: ${{ secrets.HF_TOKEN }}
timeout-minutes: 30
steps:
- name: Checkout
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -84,3 +84,4 @@ stream-py/
# Artifacts / assets
*.pt
*.kef
*.onnx
31 changes: 30 additions & 1 deletion DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ Some ground rules:

```python
import asyncio
from vision_agents.core.edge.types import PcmData
from getstream.video.rtc.track_util import PcmData
from openai import AsyncOpenAI

async def example():
Expand Down Expand Up @@ -167,6 +167,12 @@ if __name__ == "__main__":
asyncio.run(example())
```

Other things that you get from the audio utilities:

1. Changing PCM format
2. Iterate over audio chunks (`PcmData.chunks`)
3. Process audio with pre/post buffers (`AudioSegmentCollector`)
4. Accumulating audio (`PcmData.append`)

### Testing audio manually

Expand Down Expand Up @@ -313,3 +319,26 @@ You can now see the metrics at `http://localhost:9464/metrics` (make sure that y

- Track.recv errors will fail silently. The API is to return a frame. Never return None. and wait till the next frame is available
- When using frame.to_ndarray(format="rgb24") specify the format. Typically you want rgb24 when connecting/sending to Yolo etc


## Onboarding Plan for new contributors

**Audio Formats**

You'll notice that audio comes in many formats. PCM, wav, mp3. 16khz, 48khz.
Encoded as i16 or f32. Note that webrtc by default is 48khz.

A good first intro to audio formats can be found here:

**Using Cursor**

You can ask cursor something like "read @ai-plugin and build me a plugin called fish"
See the docs folder for other ai instruction files

**Learning Roadmap**

1. Quick refresher on audio formats
2. Build a TTS integration
3. Build a STT integration
4. Build an LLM integration
5. Write a pytest test with a fixture
5 changes: 2 additions & 3 deletions agents-core/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,12 @@ classifiers = [

requires-python = ">=3.10"
dependencies = [
"getstream[webrtc,telemetry]>=2.5.5",
"getstream[webrtc,telemetry]>=2.5.7",
"python-dotenv>=1.1.1",
"pillow>=11.3.0",
"numpy>=1.24.0",
"mcp>=1.16.0",
"torchvision>=0.23.0",
]

[project.urls]
Expand All @@ -45,7 +46,6 @@ kokoro = ["vision-agents-plugins-kokoro"]
krisp = ["vision-agents-plugins-krisp"]
moonshine = ["vision-agents-plugins-moonshine"]
openai = ["vision-agents-plugins-openai"]
silero = ["vision-agents-plugins-silero"]
smart_turn = ["vision-agents-plugins-smart-turn"]
ultralytics = ["vision-agents-plugins-ultralytics"]
wizper = ["vision-agents-plugins-wizper"]
Expand All @@ -61,7 +61,6 @@ all-plugins = [
"vision-agents-plugins-krisp",
"vision-agents-plugins-moonshine",
"vision-agents-plugins-openai",
"vision-agents-plugins-silero",
"vision-agents-plugins-smart-turn",
"vision-agents-plugins-ultralytics",
"vision-agents-plugins-wizper",
Expand Down
Loading
Loading