-
Notifications
You must be signed in to change notification settings - Fork 72
Moondream Detection API #136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Nash0x7E2
merged 30 commits into
GetStream:main
from
Nash0x7E2:feat/moondream-experiment
Nov 3, 2025
Merged
Changes from all commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
72c0130
Basic structure setup and stubbing
Nash0x7E2 c6d0d66
Basic person detection and test
Nash0x7E2 4a7866a
multi-object detection
Nash0x7E2 45f219f
Clean up and focus in on only detection
Nash0x7E2 efaa529
Further simplification
Nash0x7E2 1b484d5
Rebase latest main
Nash0x7E2 ea2ffa8
remove extra av
Nash0x7E2 0c73436
Okay detection
Nash0x7E2 68d7d6e
further cleanup for detection
Nash0x7E2 4e5b057
Merge branch 'main' into feat/moondream-experiment
Nash0x7E2 22bd5ab
Merge branch 'main' into feat/moondream-experiment
Nash0x7E2 fe120dc
Merge branch 'main' into feat/moondream-experiment
Nash0x7E2 b43dd08
Experimenting with HF version
Nash0x7E2 7e93949
Move processing to CPU for MPS (CUDA/Model limit)
Nash0x7E2 45ded3d
Basic test for inference, device selction and model load
Nash0x7E2 1865173
Rename public detection classes
Nash0x7E2 476bb88
Extract moondream video track to a common file
Nash0x7E2 e934ffa
Use util video track instead
Nash0x7E2 62a2552
Update plugins/moondream/vision_agents/plugins/moondream/moondream_cl…
Nash0x7E2 434adb7
avoid swallowing too many exceptions
Nash0x7E2 a5f5468
clean up
Nash0x7E2 a5af460
Extract detection logic to utils
Nash0x7E2 864eb4f
ruff and mypy clean up
Nash0x7E2 0d8a9b7
Update public exports
Nash0x7E2 f58d0cd
Fix test imports
Nash0x7E2 a065085
Clean up remaining issues
Nash0x7E2 6f54577
Doc string clean up
Nash0x7E2 8dac772
Clean up readme
Nash0x7E2 2240ac1
Update plugins/moondream/README.md
dangusev 4707351
Update plugins/moondream/README.md
dangusev File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,172 @@ | ||
| # Moondream Plugin | ||
|
|
||
| This plugin provides Moondream 3 detection capabilities for vision-agents, enabling real-time zero-shot object detection on video streams. Choose between cloud-hosted or local processing depending on your needs. | ||
|
|
||
| ## Installation | ||
|
|
||
| ```bash | ||
| uv add vision-agents-plugins-moondream | ||
| ``` | ||
|
|
||
| ## Choosing the Right Processor | ||
|
|
||
| ### CloudDetectionProcessor (Recommended for Most Users) | ||
| - **Use when:** You want a simple setup with no infrastructure management | ||
| - **Pros:** No model download, no GPU required, automatic updates | ||
| - **Cons:** Requires API key, 2 RPS rate limit by default (can be increased) | ||
| - **Best for:** Development, testing, low-to-medium volume applications | ||
|
|
||
| ### LocalDetectionProcessor (For Advanced Users) | ||
| - **Use when:** You need higher throughput, have your own GPU infrastructure, or want to avoid rate limits | ||
| - **Pros:** No rate limits, no API costs, full control over hardware | ||
| - **Cons:** Requires GPU for best performance, model download on first use, infrastructure management | ||
| - **Best for:** Production deployments, high-volume applications, Digital Ocean Gradient AI GPUs, or custom infrastructure | ||
|
|
||
| ## Quick Start | ||
|
|
||
| ### Using CloudDetectionProcessor (Hosted) | ||
|
|
||
| The `CloudDetectionProcessor` uses Moondream's hosted API. By default it has a 2 RPS (requests per second) rate limit and requires an API key. The rate limit can be adjusted by contacting the Moondream team to request a higher limit. | ||
|
|
||
| ```python | ||
| from vision_agents.plugins import moondream | ||
| from vision_agents.core import Agent | ||
|
|
||
| # Create a cloud processor with detection | ||
| processor = moondream.CloudDetectionProcessor( | ||
| api_key="your-api-key", # or set MOONDREAM_API_KEY env var | ||
| detect_objects="person", # or ["person", "car", "dog"] for multiple | ||
| fps=30 | ||
| ) | ||
|
|
||
| # Use in an agent | ||
| agent = Agent( | ||
| processors=[processor], | ||
| llm=your_llm, | ||
| # ... other components | ||
| ) | ||
| ``` | ||
|
|
||
| ### Using LocalDetectionProcessor (On-Device) | ||
|
|
||
| If you are running on your own infrastructure or using a service like Digital Ocean's Gradient AI GPUs, you can use the `LocalDetectionProcessor` which downloads the model from HuggingFace and runs on device. By default it will use CUDA for best performance. Performance will vary depending on your specific hardware configuration. | ||
|
|
||
| **Note:** The moondream3-preview model is gated and requires HuggingFace authentication: | ||
| - Request access at https://huggingface.co/moondream/moondream3-preview | ||
| - Set `HF_TOKEN` environment variable: `export HF_TOKEN=your_token_here` | ||
| - Or run: `huggingface-cli login` | ||
|
|
||
| ```python | ||
| from vision_agents.plugins import moondream | ||
| from vision_agents.core import Agent | ||
|
|
||
| # Create a local processor (no API key needed) | ||
| processor = moondream.LocalDetectionProcessor( | ||
| detect_objects=["person", "car", "dog"], | ||
| conf_threshold=0.3, | ||
| device="cuda", # Auto-detects CUDA, MPS, or CPU | ||
| fps=30 | ||
| ) | ||
|
|
||
| # Use in an agent | ||
| agent = Agent( | ||
| processors=[processor], | ||
| llm=your_llm, | ||
| # ... other components | ||
| ) | ||
| ``` | ||
|
|
||
| ### Detect Multiple Objects | ||
|
|
||
| ```python | ||
| # Detect multiple object types with zero-shot detection | ||
| processor = moondream.CloudDetectionProcessor( | ||
| api_key="your-api-key", | ||
| detect_objects=["person", "car", "dog", "basketball"], | ||
| conf_threshold=0.3 | ||
| ) | ||
|
|
||
| # Access results for LLM | ||
| state = processor.state() | ||
| print(state["detections_summary"]) # "Detected: 2 persons, 1 car" | ||
| print(state["detections_count"]) # Total number of detections | ||
| print(state["last_image"]) # PIL Image for vision models | ||
| ``` | ||
|
|
||
| ## Configuration | ||
|
|
||
| ### CloudDetectionProcessor Parameters | ||
|
|
||
| - `api_key`: str - API key for Moondream Cloud API. If not provided, will attempt to read from `MOONDREAM_API_KEY` environment variable. | ||
| - `detect_objects`: str | List[str] - Object(s) to detect using zero-shot detection. Can be any object name like "person", "car", "basketball". Default: `"person"` | ||
| - `conf_threshold`: float - Confidence threshold for detections (default: 0.3) | ||
dangusev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - `fps`: int - Frame processing rate (default: 30) | ||
| - `interval`: int - Processing interval in seconds (default: 0) | ||
| - `max_workers`: int - Thread pool size for CPU-intensive operations (default: 10) | ||
|
|
||
| **Rate Limits:** By default, the Moondream Cloud API has a 2rps (requests per second) rate limit. Contact the Moondream team to request a higher limit. | ||
|
|
||
| ### LocalDetectionProcessor Parameters | ||
|
|
||
| - `detect_objects`: str | List[str] - Object(s) to detect using zero-shot detection. Can be any object name like "person", "car", "basketball". Default: `"person"` | ||
| - `conf_threshold`: float - Confidence threshold for detections (default: 0.3) | ||
| - `fps`: int - Frame processing rate (default: 30) | ||
| - `interval`: int - Processing interval in seconds (default: 0) | ||
| - `max_workers`: int - Thread pool size for CPU-intensive operations (default: 10) | ||
| - `device`: str - Device to run inference on ('cuda', 'mps', or 'cpu'). Auto-detects CUDA, then MPS (Apple Silicon), then defaults to CPU. Default: `None` (auto-detect) | ||
| - `model_name`: str - Hugging Face model identifier (default: "moondream/moondream3-preview") | ||
| - `options`: AgentOptions - Model directory configuration. If not provided, uses default which defaults to tempfile.gettempdir() | ||
|
|
||
| **Performance:** Performance will vary depending on your hardware configuration. CUDA is recommended for best performance on NVIDIA GPUs. The model will be downloaded from HuggingFace on first use. | ||
|
|
||
| ## Video Publishing | ||
|
|
||
| The processor publishes annotated video frames with bounding boxes drawn on detected objects: | ||
|
|
||
| ```python | ||
| processor = moondream.CloudDetectionProcessor( | ||
| api_key="your-api-key", | ||
| detect_objects=["person", "car"] | ||
| ) | ||
|
|
||
| # The track will show: | ||
| # - Green bounding boxes around detected objects | ||
| # - Labels with confidence scores | ||
| # - Real-time annotation overlay | ||
| ``` | ||
|
|
||
| ## Testing | ||
|
|
||
| The plugin includes comprehensive tests: | ||
|
|
||
| ```bash | ||
| # Run all tests | ||
| pytest plugins/moondream/tests/ -v | ||
|
|
||
| # Run specific test categories | ||
| pytest plugins/moondream/tests/ -k "inference" -v | ||
| pytest plugins/moondream/tests/ -k "annotation" -v | ||
| pytest plugins/moondream/tests/ -k "state" -v | ||
| ``` | ||
|
|
||
| ## Dependencies | ||
|
|
||
| ### Required | ||
| - `vision-agents` - Core framework | ||
| - `moondream` - Moondream SDK for cloud API (CloudDetectionProcessor only) | ||
| - `numpy>=2.0.0` - Array operations | ||
| - `pillow>=10.0.0` - Image processing | ||
| - `opencv-python>=4.8.0` - Video annotation | ||
| - `aiortc` - WebRTC support | ||
|
|
||
| ### LocalDetectionProcessor Additional Dependencies | ||
| - `torch` - PyTorch for model inference | ||
| - `transformers` - HuggingFace transformers library for model loading | ||
|
|
||
| ## Links | ||
|
|
||
| - [Moondream Documentation](https://docs.moondream.ai/) | ||
| - [Vision Agents Documentation](https://visionagents.ai/) | ||
| - [GitHub Repository](https://github.com/GetStream/Vision-Agents) | ||
|
|
||
|
|
||
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| [build-system] | ||
| requires = ["hatchling", "hatch-vcs"] | ||
| build-backend = "hatchling.build" | ||
|
|
||
| [project] | ||
| name = "vision-agents-plugins-moondream" | ||
| dynamic = ["version"] | ||
| description = "Moondream 3 vision processor plugin for Vision Agents" | ||
| readme = "README.md" | ||
| requires-python = ">=3.10" | ||
| license = "MIT" | ||
| dependencies = [ | ||
| "vision-agents", | ||
| "numpy>=2.0.0", | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| "pillow>=10.4.0", | ||
| "opencv-python>=4.8.0", | ||
| "moondream>=0.1.1", # Now compatible with vision-agents pillow>=10.4.0 | ||
| "transformers>=4.40.0", # For local model loading | ||
| "torch>=2.0.0", # PyTorch for model inference | ||
| "accelerate>=0.20.0", # Required for device_map and device management | ||
| ] | ||
|
|
||
| [project.urls] | ||
| Documentation = "https://visionagents.ai/" | ||
| Website = "https://visionagents.ai/" | ||
| Source = "https://github.com/GetStream/Vision-Agents" | ||
|
|
||
| [tool.hatch.version] | ||
| source = "vcs" | ||
| raw-options = { root = "..", search_parent_directories = true, fallback_version = "0.0.0" } | ||
|
|
||
| [tool.hatch.build.targets.wheel] | ||
| packages = [".", "vision_agents"] | ||
|
|
||
| [tool.uv.sources] | ||
| vision-agents = { workspace = true } | ||
|
|
||
| [dependency-groups] | ||
| dev = [ | ||
| "pytest>=8.4.1", | ||
| "pytest-asyncio>=1.0.0", | ||
| ] | ||
|
|
||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.