Overview
I built an iOS app using the Meta DAT SDK for real-time OCR through Meta Ray-Ban smart glasses, with AI text simplification designed for dyslexia support. The app streams camera frames from the glasses, performs on-device text recognition, and delivers simplified audio through dual local and cloud processing paths.
The Challenge
Students with dyslexia need instant, seamless reading help — not another app to fiddle with. The solution had to:
- Work through smart glasses with minimal latency for natural reading flow
- Process text on-device to keep response times under 100ms
- Support offline use for school environments with limited connectivity
- Meet COPPA and FERPA compliance for use with minors in educational settings
- Provide audio-guided navigation accessible to users with reading difficulties
What I Built
1. Real-Time OCR Pipeline
A high-performance text recognition system optimized for smart glasses:
- Apple Vision on-device OCR — Sub-100ms text recognition without network dependency
- Frame selection optimization — Smart filtering to process only high-quality frames
- 24fps streaming — Real-time camera feed from Meta Ray-Ban glasses via DAT SDK
2. Dual-Path Audio System
Two parallel audio processing paths to balance speed and quality:
- Instant local TTS — Sub-400ms response using on-device text-to-speech
- Enhanced cloud processing — AI-simplified text with ElevenLabs high-quality voice synthesis
- Seamless fallback — Local path activates immediately while cloud processing runs in background
3. Gamified Learning Experience
Engagement features designed to make reading practice rewarding:
- XP system with levels and achievement tracking
- Reading streaks to encourage daily practice
- Character-driven practice sessions
- Vocabulary building with personalized word lists
Technical Architecture
Built as a modular iOS application with 14 feature modules:
- Device Layer: Meta DAT SDK integration for glasses camera streaming and control
- Vision Layer: Apple Vision framework for on-device OCR processing
- Audio Layer: Dual-path system with local AVSpeechSynthesizer and cloud ElevenLabs TTS
- Backend: Firebase for user data, progress tracking, and analytics
- Offline-First: SwiftData for local persistence with background sync
- AI Services: Google Gemini and OpenAI for text simplification and comprehension
Security & Quality
Privacy and accessibility are foundational requirements for an education tool targeting minors:
- COPPA/FERPA compliant — No personal data collected without consent; parental controls built in
- Camera privacy — Frames are processed on-device and never persisted to storage or cloud
- Offline-first architecture — Core reading features work without internet connectivity
- Audio-guided navigation — Full app usability without reading the screen
Outcome
- Production-ready iOS application tested on physical Meta Ray-Ban glasses and simulator
- Sub-100ms on-device OCR with dual-path audio for instant and enhanced responses
- 14 feature modules with gamified learning experience
- Full Statement of Work delivered with COPPA/FERPA compliance documentation