AI-Based AprilTag Pipeline Acceleration by DoctorFogarty · Pull Request #2410 · PhotonVision/photonvision

DoctorFogarty · 2026-03-26T04:49:56Z

Description

Adds a two-stage hybrid ML/traditional AprilTag detection pipeline that leverages NPU hardware for accelerated tag
detection. A YOLO v11 model identifies AprilTag regions of interest (ROIs) on the NPU, then the traditional WPILib AprilTag detector decodes only the cropped sub-images for accurate tag ID and pose. This reduces the per-frame computational load on the CPU by narrowing the search space. Falls back to full-frame traditional detection when ML finds no tags if the user wishes to enable a fallback setting.

Two-Stage Hybrid Pipeline

Stage 1 (ML ROI detection): AprilTagROIDetectionPipe runs a YOLO v11 model on the NPU to produce bounding boxes around candidate tags
Stage 2 (Traditional decode): AprilTagROIDecodePipe extracts each ROI sub-image, runs the WPILib AprilTag detector on it, and maps corners + homography back to full-frame coordinates
Fallback: When ML detection finds zero tags and mlFallbackToTraditional is enabled (default), the pipeline falls back to full-frame traditional detection
Visualization: DrawMLROIPipe renders cyan bounding boxes around ML-detected ROIs on the output stream for tuning

Homography Coordinate Transformation

transformHomography() applies translation-only mapping (ROI offset to full frame)
transformHomographyWithScale() applies combined inverse-scaling and translation for ATR-resized ROIs
Mathematics derived from UMich AprilTag library homography conventions (row-major 3×3)
Deduplication by tag ID (keeps highest decision margin) when overlapping ROIs detect the same tag

Adaptive Tag Resizing (ATR)

Downscales large/close tags within each ROI to a target pixel dimension before detection, improving decode performance for near-field tags
atrEnabled (default: true), atrTargetDimension (default: 200 px), atrMinScaleFactor (default: 0.25, caps at 4× downscale)
Coordinates and homography are correctly inverse-scaled back to full-frame space after detection

New AprilTag Pipeline Settings

ML detection: useMLDetection, mlConfidenceThreshold (0.5), mlNmsThreshold (0.45), mlRoiPaddingPixels (40), mlFallbackToTraditional (true), mlModelName, showDetectionBoxes (true)
ATR: atrEnabled (true), atrTargetDimension (200), atrMinScaleFactor (0.25)
Multi-tag ambiguity filtering: multiTagAmbiguityThreshold (0.2) — filters high-ambiguity single-tag poses before multi-tag PNP estimation
Configurable max targets: Replaced boolean outputShowMultipleTargets with numeric outputMaximumTargets (default: 20, max: 127). Backward-compatible deserialization via @JsonAnySetter migration

Model Management

Added apriltagV4-yolo11.rknn (RK3588) and apriltagV4-yolo11.tflite (QCS6490/Rubik Pi 3)
NeuralNetworkModelManager handles platform-aware model discovery and loading

Frontend / UI

AprilTagTab.vue: New "AI-Assisted Detection (NPU)" section with model selector, confidence/NMS/padding sliders, fallback toggle, and ROI box visualization toggle. Conditionally rendered only when supportedBackends is non-empty
OutputTab.vue: New "Max Allowed Ambiguity" slider for multi-tag filtering. Replaced "Show Multiple Targets" toggle with "Maximum Targets" numeric slider
TypeScript type updates in PipelineTypes.ts and SettingTypes.ts

Bug Fixes & Improvements

Fixed stale dashboard UI components persisting when activating cameras
CVMat memory management improvements (release processed Focus mat, fix refcounting)
Additive pixel padding strategy for ROI expansion (naturally adaptive: small/far tags get proportionally more expansion)
Thread pooling configuration defaults (4 threads)
Frame.java: Added mlDetectionRois field to carry ROI bounding boxes through the pipeline for visualization

Meta

Merge checklist:

Pull Request title is short, imperative summary of proposed changes
The description documents the what and why, including events that led to this PR
If this PR changes behavior or adds a feature, user documentation is updated
If this PR touches photon-serde, all messages have been regenerated and hashes have not changed unexpectedly
If this PR touches configuration, this is backwards compatible with all settings going back to the previous seasons's last release (seasons end after champs ends)
If this PR touches pipeline settings or anything related to data exchange, the frontend typing is updated
If this PR addresses a bug, a regression test for it is added
If this PR adds a dependency, the license has been checked for compatibility and steps taken to follow it

mcm001 · 2026-03-26T04:57:40Z

    public final FrameStaticProperties frameStaticProperties;

+    /** Optional ML detection ROI bounding boxes for visualization. Set by ML-assisted pipelines. */
+    public List<RotatedRect> mlDetectionRois = List.of();


Frame isn't the right place to maintain this state. can it move to the pipeline result?

mcm001 · 2026-03-26T04:58:29Z

    }

+    /** Result container for ML hybrid detection */
+    private static class MLDetectionResult {


Let's refactor this to not live as an inner class

Should be resolved in
DoctorFogarty@0dac7b3

mcm001 · 2026-03-26T04:58:48Z

+     * Performs ML-assisted hybrid AprilTag detection. Stage 1: ML model detects ROIs Stage 2:
+     * Traditional detector decodes tags within ROIs
+     */
+    private MLDetectionResult processMLHybrid(Frame frame) {


This logic feels like it wants to be a Pipe

spacey-sooty · 2026-03-27T17:05:07Z

+    <pv-slider
+      v-if="
+        (currentPipelineSettings.pipelineType === PipelineType.AprilTag ||
+          currentPipelineSettings.pipelineType === PipelineType.Aruco) &&
+        useCameraSettingsStore().isCurrentVideoFormatCalibrated &&
+        useCameraSettingsStore().currentPipelineSettings.solvePNPEnabled &&
+        currentPipelineSettings.doMultiTarget
+      "
+      v-model="currentPipelineSettings.multiTagAmbiguityThreshold"
+      label="Max Allowed Ambiguity"
+      tooltip="Tags with pose ambiguity above this value are excluded from multi-tag estimation. Lower = stricter. 0 = only unambiguous tags. 1 = include all (disabled)."
+      :min="0"
+      :max="1"
+      :step="0.05"
+      :switch-cols="interactiveCols"
+      @update:modelValue="
+        (value) => useCameraSettingsStore().changeCurrentPipelineSetting({ multiTagAmbiguityThreshold: value }, false)
+      "
+    />


This feature should be split to a separate PR

me-it-is · 2026-04-17T00:26:54Z

What are the performance benefits of this like?

srimanachanta · 2026-04-18T03:07:14Z

What are the performance benefits of this like?

Ditto, I'm curious to see performance benefits from doing this. Quad fitting versus ROI cropping which still requires either a DMA transfer or a mem-copy to the NPU.

I'd also want to see the performance benefits of being able to reduce decimate in just those areas given less pixels are being searched to begin with (increased range for the same baseline latency addition from using ML).

DoctorFogarty · 2026-04-19T01:28:21Z

OV9281
1280x800
AI DETECTION OFF - Decimate 1

DoctorFogarty · 2026-04-19T01:28:53Z

OV9281
1280x800
Decimate 1 Hardcoded for ROI Frames
AI DETECTOR ON

DoctorFogarty · 2026-04-19T01:32:39Z

ThriftyCam 2MP Camera
1600x1304 YUYV
AI DETECTOR OFF
Decimate 1

DoctorFogarty · 2026-04-19T01:33:23Z

ThriftyCam 2MP Camera
1600x1304 YUYV
AI DETECTOR ON
Decimate 1 Hardcoded for ROI Frames

DoctorFogarty · 2026-04-19T01:34:31Z

@me-it-is @srimanachanta see above.

srimanachanta · 2026-04-19T08:17:06Z

@me-it-is @srimanachanta see above.

Insanely cool. Good work.

mcm001 · 2026-04-23T04:39:28Z

I think this is worth a design doc in the developer section of our website + some extra words added to our normal user docs as well before we merge. There's a lotta brains and thinking going on here and I want to support both future devs and users confused about why the tags have a bounding box now

samfreund · 2026-05-03T03:03:13Z

@DoctorFogarty I'm sure you're busy now that CMP is over and you're heading back. When you have time, I'd love to see that docs section get written up. We should also think about whether we want to keep this as a part of the atag pipeline, or make it a new separate pipeline. Last thing, the branding we've been using for CMP conferences and whatnot has been YOLOtag, shall we change naming to reflect that? If you had a different name in mind or whatever, that's perfectly fine too.

samfreund · 2026-05-03T17:37:29Z

Other things, I want to wait until after CPU-OD hits (paging @spacey-sooty) so we can test this properly. This is also gonna have to wait until all the 2027 stuff gets merged and pretty.

Setup Basic Tests Included Roboflow model tflite yolov8n trained

…curs

…ix type is cited from

This reverts commit e40f174.

…ackaged V8 model as I will replace it soon.

Removed old V8 model for AprilTags. Added entries for current V11 AprilTag Models for Rubik and OPi5

DoctorFogarty · 2026-05-06T02:01:53Z

the branding we've been using for CMP conferences and whatnot has been YOLOtag, shall we change naming to reflect that? If you had a different name in mind or whatever, that's perfectly fine too.

I'd be concerned by using that name because the YoloTag research paper (https://arxiv.org/abs/2409.02334) I read before starting out on this path is not quite the same idea as what we are doing here.

I view what we have here is just an AI-accelerator for the AprilTag pipeline
We are just more efficiently selecting the frames that get then handed off to the traditional AprilTag detector pipeline which is why I didn't make them separate.

There is no reason that in the future it can't be done using a different type of model/classifier as well.

I think of this as like adding a Turbo/Supercharger to an Engine. TurboTag 😆 or how about just give it a brand coded name like PhotonTag.

spacey-sooty · 2026-05-06T02:32:14Z

My vote for name is something like "ML Accelerated AprilTag Detection", I want a name that actually tells me clearly what the thing does, I shouldn't have to read docs to have any idea what it is. Flashy names are cool for chief posts but are worse UX IMO

samfreund · 2026-05-06T02:33:32Z

My vote for name is something like "ML Accelerated AprilTag Detection", I want a name that actually tells me clearly what the thing does, I shouldn't have to read docs to have any idea what it is. Flashy names are cool for chief posts but are worse UX IMO

We can abbreviate to MLTag? I can live with that

spacey-sooty · 2026-05-06T02:34:05Z

I don't see where an abbreviation is needed?

samfreund · 2026-05-06T02:34:39Z

I don't see where an abbreviation is needed?

Flashy names are cool for chief posts

😄

spacey-sooty · 2026-05-06T02:35:33Z

I mean IDC if we use some flashy name in a chief post but call it something different in the UI

DoctorFogarty requested a review from a team as a code owner March 26, 2026 04:49

github-actions Bot added frontend Having to do with PhotonClient and its related items backend Things relating to photon-core and photon-server labels Mar 26, 2026

mcm001 reviewed Mar 26, 2026

View reviewed changes

DoctorFogarty force-pushed the apriltag-ml-experimental-sync branch from 5f60a17 to 93c2d80 Compare March 26, 2026 05:12

github-actions Bot added documentation Anything relating to https://docs.photonvision.org photonlib Things related to the PhotonVision library labels Mar 26, 2026

DoctorFogarty force-pushed the apriltag-ml-experimental-sync branch from 93c2d80 to ca0e47b Compare March 26, 2026 05:18

github-actions Bot removed documentation Anything relating to https://docs.photonvision.org photonlib Things related to the PhotonVision library labels Mar 26, 2026

spacey-sooty reviewed Mar 27, 2026

View reviewed changes

samfreund force-pushed the apriltag-ml-experimental-sync branch 2 times, most recently from fd791a8 to 076b6ba Compare March 30, 2026 16:04

DoctorFogarty added 6 commits May 5, 2026 20:37

Created Apriltag ML assisted Apriltag settings

3a5dd04

Setup Basic Tests Included Roboflow model tflite yolov8n trained

Small change to stop spamming logs every time a pipe setting check oc…

d05a07c

…curs

remove duplicated .tflite, remove tool-versions.yaml

d5834a7

Homography transform added to requirements

f839059

Match existing pattern for settingsStore

385efbe

Minor comment cleanup, added source umich documentation to where matr…

bb6226e

…ix type is cited from

DoctorFogarty and others added 19 commits May 5, 2026 20:37

Unit testing homography transformation

7242710

ROI Decimate should always be 1

dd405a2

testing enhancements to ROI size fallback

c1d3558

Thread pooling

bab0fdd

Revert "testing enhancements to ROI size fallback"

fcb648e

This reverts commit e40f174.

Adaptive Tag Resizing to fix poor near field performance

caf1ddb

AprilTags are Y down, UI selector

0a7a283

Removed Subpix refinement as it was not well informed. Removed auto p…

032a635

…ackaged V8 model as I will replace it soon.

feat: Model selector on AprilTag screen after choosing AI Acceleration

6188bc1

Slightly higher atr

da50228

AprilTag Pipeline ROI box viewer

33d48bf

Additive Pixel Padding strategy change

b5d40e7

Synced Changes from 2026.3.2

e0a72ca

Add TFLite and RKNN models to source for review

ff5f586

Removed old model weights.

6fb5e95

Removed old V8 model for AprilTags. Added entries for current V11 AprilTag Models for Rubik and OPi5

Added a configurable multi-tag estimate ambiguity filter

3ff9b45

lint

fd40986

Move MLDetectionResult to standalone record

f2565b3

WPIFormat on MLDetectionResult

bd602f3

samfreund force-pushed the apriltag-ml-experimental-sync branch from 00a6270 to bd602f3 Compare May 6, 2026 01:38

backend 27 migratoin

a609bcf

samfreund force-pushed the apriltag-ml-experimental-sync branch from 481c186 to a609bcf Compare May 6, 2026 05:07

migrate to proper model references

588de64

Conversation

DoctorFogarty commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Two-Stage Hybrid Pipeline

Homography Coordinate Transformation

Adaptive Tag Resizing (ATR)

New AprilTag Pipeline Settings

Model Management

Frontend / UI

Bug Fixes & Improvements

Meta

Uh oh!

mcm001 Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

mcm001 Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

judsonjames Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

mcm001 Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

spacey-sooty Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

me-it-is commented Apr 17, 2026

Uh oh!

srimanachanta commented Apr 18, 2026

Uh oh!

DoctorFogarty commented Apr 19, 2026

Uh oh!

DoctorFogarty commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DoctorFogarty commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DoctorFogarty commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DoctorFogarty commented Apr 19, 2026

Uh oh!

srimanachanta commented Apr 19, 2026

Uh oh!

mcm001 commented Apr 23, 2026

Uh oh!

samfreund commented May 3, 2026

Uh oh!

samfreund commented May 3, 2026

Uh oh!

DoctorFogarty commented May 6, 2026

Uh oh!

spacey-sooty commented May 6, 2026

Uh oh!

samfreund commented May 6, 2026

Uh oh!

spacey-sooty commented May 6, 2026

Uh oh!

samfreund commented May 6, 2026

Uh oh!

spacey-sooty commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

DoctorFogarty commented Mar 26, 2026 •

edited

Loading

DoctorFogarty commented Apr 19, 2026 •

edited

Loading

DoctorFogarty commented Apr 19, 2026 •

edited

Loading

DoctorFogarty commented Apr 19, 2026 •

edited

Loading