AI-Based AprilTag Pipeline Acceleration#2410
AI-Based AprilTag Pipeline Acceleration#2410DoctorFogarty wants to merge 27 commits intoPhotonVision:mainfrom
Conversation
| public final FrameStaticProperties frameStaticProperties; | ||
|
|
||
| /** Optional ML detection ROI bounding boxes for visualization. Set by ML-assisted pipelines. */ | ||
| public List<RotatedRect> mlDetectionRois = List.of(); |
There was a problem hiding this comment.
Frame isn't the right place to maintain this state. can it move to the pipeline result?
| } | ||
|
|
||
| /** Result container for ML hybrid detection */ | ||
| private static class MLDetectionResult { |
There was a problem hiding this comment.
Let's refactor this to not live as an inner class
| * Performs ML-assisted hybrid AprilTag detection. Stage 1: ML model detects ROIs Stage 2: | ||
| * Traditional detector decodes tags within ROIs | ||
| */ | ||
| private MLDetectionResult processMLHybrid(Frame frame) { |
There was a problem hiding this comment.
This logic feels like it wants to be a Pipe
5f60a17 to
93c2d80
Compare
93c2d80 to
ca0e47b
Compare
| <pv-slider | ||
| v-if=" | ||
| (currentPipelineSettings.pipelineType === PipelineType.AprilTag || | ||
| currentPipelineSettings.pipelineType === PipelineType.Aruco) && | ||
| useCameraSettingsStore().isCurrentVideoFormatCalibrated && | ||
| useCameraSettingsStore().currentPipelineSettings.solvePNPEnabled && | ||
| currentPipelineSettings.doMultiTarget | ||
| " | ||
| v-model="currentPipelineSettings.multiTagAmbiguityThreshold" | ||
| label="Max Allowed Ambiguity" | ||
| tooltip="Tags with pose ambiguity above this value are excluded from multi-tag estimation. Lower = stricter. 0 = only unambiguous tags. 1 = include all (disabled)." | ||
| :min="0" | ||
| :max="1" | ||
| :step="0.05" | ||
| :switch-cols="interactiveCols" | ||
| @update:modelValue=" | ||
| (value) => useCameraSettingsStore().changeCurrentPipelineSetting({ multiTagAmbiguityThreshold: value }, false) | ||
| " | ||
| /> |
There was a problem hiding this comment.
This feature should be split to a separate PR
fd791a8 to
076b6ba
Compare
|
What are the performance benefits of this like? |
Ditto, I'm curious to see performance benefits from doing this. Quad fitting versus ROI cropping which still requires either a DMA transfer or a mem-copy to the NPU. I'd also want to see the performance benefits of being able to reduce decimate in just those areas given less pixels are being searched to begin with (increased range for the same baseline latency addition from using ML). |
|
@me-it-is @srimanachanta see above. |
Insanely cool. Good work. |
|
I think this is worth a design doc in the developer section of our website + some extra words added to our normal user docs as well before we merge. There's a lotta brains and thinking going on here and I want to support both future devs and users confused about why the tags have a bounding box now |
|
@DoctorFogarty I'm sure you're busy now that CMP is over and you're heading back. When you have time, I'd love to see that docs section get written up. We should also think about whether we want to keep this as a part of the atag pipeline, or make it a new separate pipeline. Last thing, the branding we've been using for CMP conferences and whatnot has been YOLOtag, shall we change naming to reflect that? If you had a different name in mind or whatever, that's perfectly fine too. |
|
Other things, I want to wait until after CPU-OD hits (paging @spacey-sooty) so we can test this properly. This is also gonna have to wait until all the 2027 stuff gets merged and pretty. |
Setup Basic Tests Included Roboflow model tflite yolov8n trained
…ix type is cited from
This reverts commit e40f174.
…ackaged V8 model as I will replace it soon.
Removed old V8 model for AprilTags. Added entries for current V11 AprilTag Models for Rubik and OPi5
00a6270 to
bd602f3
Compare
I'd be concerned by using that name because the YoloTag research paper (https://arxiv.org/abs/2409.02334) I read before starting out on this path is not quite the same idea as what we are doing here. I view what we have here is just an AI-accelerator for the AprilTag pipeline There is no reason that in the future it can't be done using a different type of model/classifier as well. I think of this as like adding a Turbo/Supercharger to an Engine. TurboTag 😆 or how about just give it a brand coded name like PhotonTag. |
|
My vote for name is something like "ML Accelerated AprilTag Detection", I want a name that actually tells me clearly what the thing does, I shouldn't have to read docs to have any idea what it is. Flashy names are cool for chief posts but are worse UX IMO |
We can abbreviate to MLTag? I can live with that |
|
I don't see where an abbreviation is needed? |
😄 |
|
I mean IDC if we use some flashy name in a chief post but call it something different in the UI |
481c186 to
a609bcf
Compare








Description
Adds a two-stage hybrid ML/traditional AprilTag detection pipeline that leverages NPU hardware for accelerated tag
detection. A YOLO v11 model identifies AprilTag regions of interest (ROIs) on the NPU, then the traditional WPILib AprilTag detector decodes only the cropped sub-images for accurate tag ID and pose. This reduces the per-frame computational load on the CPU by narrowing the search space. Falls back to full-frame traditional detection when ML finds no tags if the user wishes to enable a fallback setting.
Two-Stage Hybrid Pipeline
AprilTagROIDetectionPiperuns a YOLO v11 model on the NPU to produce bounding boxes around candidate tagsAprilTagROIDecodePipeextracts each ROI sub-image, runs the WPILib AprilTag detector on it, and maps corners + homography back to full-frame coordinatesmlFallbackToTraditionalis enabled (default), the pipeline falls back to full-frame traditional detectionDrawMLROIPiperenders cyan bounding boxes around ML-detected ROIs on the output stream for tuningHomography Coordinate Transformation
transformHomography()applies translation-only mapping (ROI offset to full frame)transformHomographyWithScale()applies combined inverse-scaling and translation for ATR-resized ROIsAdaptive Tag Resizing (ATR)
atrEnabled(default:true),atrTargetDimension(default:200px),atrMinScaleFactor(default:0.25, caps at 4× downscale)New AprilTag Pipeline Settings
useMLDetection,mlConfidenceThreshold(0.5),mlNmsThreshold(0.45),mlRoiPaddingPixels(40),mlFallbackToTraditional(true),mlModelName,showDetectionBoxes(true)atrEnabled(true),atrTargetDimension(200),atrMinScaleFactor(0.25)multiTagAmbiguityThreshold(0.2) — filters high-ambiguity single-tag poses before multi-tag PNP estimationoutputShowMultipleTargetswith numericoutputMaximumTargets(default: 20, max: 127). Backward-compatible deserialization via@JsonAnySettermigrationModel Management
apriltagV4-yolo11.rknn(RK3588) andapriltagV4-yolo11.tflite(QCS6490/Rubik Pi 3)NeuralNetworkModelManagerhandles platform-aware model discovery and loadingFrontend / UI
supportedBackendsis non-emptyPipelineTypes.tsandSettingTypes.tsBug Fixes & Improvements
Frame.java: AddedmlDetectionRoisfield to carry ROI bounding boxes through the pipeline for visualizationMeta
Merge checklist: