Fregata's CoreML detection on the Apple Neural Engine

Fregata runs object detection on Apple Silicon’s Neural Engine (ANE) via Apple’s CoreML framework — the same hardware path that powers Face ID, Siri’s on-device speech recognition, and photo-library scene classification. The detector is what makes real-time multi-camera detection viable on an energy-efficient Mac mini, and it’s the single most important reason Fregata exists as a macOS port.

If you just want detection to work, leave the defaults alone. This page is for the curious — what performance to expect, how to confirm the ANE is actually being used, and what to do when a custom model lands on the CPU instead.

For the user-facing config knobs (model swapping, mask & zone editing, per-object thresholds, ANE-vs-GPU override), see Detection tuning.

What you should expect

For the bundled YOLOv9-tiny model at 320×320, on a current Apple Silicon Mac:

Compute path	Per-frame inference	Power
Apple Neural Engine (default)	1–2 ms	very low
GPU (Metal)	5–12 ms	~5–10× ANE
CPU fallback	30–80 ms	highest

The ANE numbers are why Fregata exists. The same model run via Frigate-in-Docker on a Mac ends up on the CPU — Docker’s Linux VM has no access to the ANE — and lands somewhere in that 30–80 ms band. The gap is the difference between real-time detection on 8+ cameras and a stuttery 1-camera demo.

Why this performs the way it does

The Apple Neural Engine is a dedicated, extremely power-efficient inference chip. It works best running one model at a time on fixed-shape, low-precision tensors — exactly the shape a YOLO object detector takes. Fregata’s detector pins object detection to the ANE; enrichment models (face recognition, semantic search, LPR, audio detection) route to the GPU or CPU instead, so a burst of search indexing can never stall the detector. That breakdown is documented at AI Models for Frigate Enrichments.

Two more performance properties fall out of this:

No low-resolution detection sub-streams. Detection on the ANE is fast enough that Fregata runs it on the full-resolution stream from each camera. Frigate-on-Linux and most other NVRs rely on a low-resolution sub-stream to keep CPU detection affordable; Fregata doesn’t need one. Detection accuracy improves accordingly (small / distant objects don’t get pixel-soup’d before the model sees them).
The dedicated media engine handles decode and encode. H.264 and HEVC streams go through VideoToolbox on Apple’s separate media-engine silicon, not the CPU, so even an 8-camera 4K install leaves the CPU effectively idle.

How to tell what’s actually running

After the model loads, Fregata times a warmup inference and classifies the result. You see it on the Detector row in the menu-bar tray, and as a line in the log. Three tiers:

~1–4 ms per frame — detection is on the Apple Neural Engine. This is the correct, intended path.
5–15 ms per frame — detection is on the GPU (Metal). Still real-time across many cameras; about 5–10× the power draw of the ANE. Happens when the model uses operations the ANE doesn’t support, or when inference_backend: gpu is set in config.
30+ ms per frame — detection has fallen back to the CPU. Not viable for a real install. The log line includes a warning with a pointer to this page.

The web UI’s System tab graphs inference time over the last hour, so you can confirm the warmup tier holds in steady state.

When a custom model lands on the CPU

If you’ve swapped the bundled model for your own ONNX export (see Bringing your own model) the most common failure mode is the warmup landing on the CPU tier. Three things to try, in order:

Re-export with a fixed input shape. The ANE strongly prefers fixed 1×3×W×H tensors. Dynamic shapes typically push the whole graph onto the GPU or CPU.
Lower the ONNX opset. Newer exporters default to opset 18+, which can include operations CoreML hasn’t fully lowered yet. Re-export with opset_version=17 (or 16) and try again.
Switch to FP16 weights. Most YOLO exports default to FP32. The ANE prefers FP16 or INT8; FP32 models often run on GPU but not ANE.

If you’ve done all three and the warmup still lands on the CPU, the model uses an operation the ANE doesn’t support. Set inference_backend: gpu to force the GPU path — the GPU is not as fast as the ANE and uses more power, but is still faster than most other Frigate setups. Troubleshooting covers the specific log-line shapes.

How this compares to other Frigate detectors

Frigate-on-Linux supports a long list of detector backends (edgetpu, tensorrt, openvino, rknn, hailo8l, etc.). None of those run on a Mac — Apple Silicon has no PCIe slot, no NVIDIA GPU, no Intel iGPU, no Hailo accelerator. The coreml detector covered here is the only first-party hardware-accelerated path on a Mac.

For the YOLOv9-tiny model at 320×320:

Frigate detector	Hardware	Per-frame inference
`edgetpu` (Coral)	USB / M.2 Coral TPU	~6–10 ms
`tensorrt` (NVIDIA)	discrete GPU	1–3 ms
`openvino` (Intel)	iGPU / NPU	5–15 ms
`cpu` (anywhere)	x86 / ARM CPU	30–100 ms
`coreml` (Fregata)	Apple Neural Engine	1–4 ms

Fregata’s ANE path is competitive with the power hungry discrete-GPU TensorRT path and beats every other detector option on a per-frame basis, in a cheap, energy-efficient Mac Mini. That’s the specific niche Fregata is built for.

For the wider Fregata-vs-Frigate-vs-Linux picture, see Fregata vs Frigate and the enrichment-model breakdown at AI Models for Frigate Enrichments.