How Apple VisionKit Works: On-Device Document Scanning Explained
A technical look at VNDocumentCameraViewController, the Neural Engine, and why Apple's approach keeps your documents private.
Apple's VisionKit framework is the technology behind document scanning on iOS. What makes it special? Every scan, every OCR request, every AI operation happens entirely on your iPhone—no cloud servers involved.
Here's a technical look at how VisionKit keeps your documents private.
What is VisionKit?
VisionKit is Apple's framework for computer vision tasks. First introduced in iOS 13, it provides developers with tools for:
- Document scanning
- Text recognition (OCR)
- Barcode scanning
- Data scanning (iOS 16+)
The key difference from other scanning solutions: VisionKit runs entirely on-device using Apple's Neural Engine.
Key Components
VNDocumentCameraViewController
This is the scanner UI you see when scanning documents. According to Apple's documentation, it provides:
- Automatic edge detection
- Perspective correction
- Shadow removal
- Multi-page capture
The camera uses machine learning to detect document boundaries in real-time—all processed locally.
Vision Framework (VNRecognizeTextRequest)
The Vision framework handles OCR (text recognition). The VNRecognizeTextRequest class:
- Runs on the Neural Engine
- Supports 20+ languages
- Provides character-level accuracy
- Works completely offline
Neural Engine
Apple's Neural Engine is dedicated hardware for machine learning. Starting with the A11 chip (iPhone 8), every iPhone has ML-specific processing capability.
Current Neural Engines can perform:
- A15/A16: 15.8 trillion operations per second
- A17 Pro: 35 trillion operations per second
This hardware acceleration makes on-device processing fast enough to compete with cloud solutions.
Why On-Device Matters for Privacy
When VisionKit processes a document:
- Camera captures image → stays on device
- ML detects document edges → processed on Neural Engine
- OCR extracts text → processed on Neural Engine
- Result returned → stored locally
At no point does the document leave your iPhone. Compare this to cloud-based scanners where:
- Camera captures image
- Image uploaded to company servers
- Server processes document
- Result sent back
- Copy potentially stored on server
How Apps Use VisionKit
Apps like ScanDash use VisionKit APIs to provide scanning without cloud dependencies:
- Scanning: VNDocumentCameraViewController
- Text extraction: VNRecognizeTextRequest
- Data extraction: Using Vision for pattern matching
The app never needs internet connectivity for these features because the frameworks run locally.
Recent Improvements
iOS 16: DataScannerViewController
iOS 16 added DataScannerViewController for live camera text recognition. This enables:
- Real-time OCR in viewfinder
- Automatic data type detection (dates, amounts, phone numbers)
- Interactive text selection
iOS 17: Enhanced Detection
iOS 17 improved document detection accuracy and added support for more document types.
Limitations
On-device processing has some constraints:
- Device age: Older devices have slower Neural Engines
- Model size: On-device models are smaller than cloud models
- Language support: Some languages may have lower accuracy
That said, Apple's on-device ML has improved dramatically. According to Apple's WWDC presentations, results are "very accurate most of the time, despite using on-device machine learning."
The Bottom Line
VisionKit enables document scanning that's both powerful and private. By running everything on the Neural Engine, apps using VisionKit can offer features comparable to cloud solutions—without ever uploading your documents to someone else's server.
Try ScanDash Free
The document scanner that never sees your data. 100% on-device processing.
Download for iPhone