AutoDrive - GenEdu

Overview

I designed and built a multi-tenant document ingestion and RAG system on AWS. The platform handles automatic classification, deduplication, review workflows, and privacy-safe analytics, all accessible through a 12-page web interface built with SvelteKit.

The Challenge

An enterprise needed automated document processing at scale. The system had to handle:

Automatic document classification across 10+ document types
Intelligent deduplication to prevent redundant storage and processing
Multi-tenant data isolation with strict access controls
Cost visibility and tracking per tenant
Full auditability of every document action and system decision

What I Built

1. Intelligent Document Pipeline

A complete ingestion-to-storage pipeline that processes documents through multiple stages:

Ingestion — Multi-format upload with automatic type detection
Parsing — Text extraction via AWS Textract with OCR fallback
Classification — AI-powered document categorization with confidence scoring
Canonical Storage — Organized S3 storage with multi-tenant isolation

2. RAG Chat System

A retrieval-augmented generation system combining keyword search with Claude-powered answers. Features include:

Hybrid search combining keyword matching and semantic retrieval
Claude-generated answers with source citations
Streaming SSE responses for real-time chat experience
Context-aware conversation management

3. Enterprise Web Interface

A comprehensive 12-page SvelteKit application providing full operational control:

Dashboard with system health and processing metrics
Chat interface for RAG-powered document querying
Document browser with filtering and search
Cost tracking dashboard per tenant
Audit log viewer with full action history
Review queue for low-confidence classifications

Technical Architecture

The system is built on a serverless-first AWS architecture designed for scale and cost efficiency:

Storage: S3 multi-tenant layout with per-tenant prefixes and lifecycle policies
Database: DynamoDB single-table design for fast access patterns
Processing: SQS async message queues for decoupled document processing
OCR: AWS Textract with intelligent fallback for complex documents
AI: AWS Bedrock with Claude for classification and RAG responses
API: FastAPI backend with async handlers and streaming support

Security & Quality

Enterprise-grade security and data integrity are built into every layer:

SHA256 deduplication — Content-based hashing prevents redundant storage
Multi-tenant isolation — Strict data boundaries with per-tenant access controls
Privacy-safe analytics — No raw text stored in analytics; only metadata and aggregates
Audit logging — Every document action recorded with timestamps and actor IDs
Confidence scoring — Low-confidence classifications routed to human review workflow

Outcome

97% feature-complete with production-ready multi-tenant architecture
85%+ classification accuracy across 10+ document types
Full RAG pipeline with streaming responses and source citations
12-page enterprise web interface for complete operational control

Overview

Role

Technologies

Links

The Challenge

What I Built

1. Intelligent Document Pipeline

2. RAG Chat System

3. Enterprise Web Interface

Technical Architecture

Security & Quality

Outcome

Clark Kent — AI Reading Assistant →

AutoDrive — Enterprise Document Intelligence Platform

Overview

Role

Technologies

Links

The Challenge

What I Built

1. Intelligent Document Pipeline

2. RAG Chat System

3. Enterprise Web Interface

Technical Architecture

Security & Quality

Outcome

Clark Kent — AI Reading Assistant →