JM.

Automated Invoice Processing System

Full Stack / Backend FocusFlask · Celery · MariaDB · Redis · OCR · OpenAI · Next.jsActive
ocraiflaskceleryredisopenapi
GitHub ↗

OCR + AI extraction pipeline to structure data from scanned invoices.


This project started after repeatedly seeing small teams waste hours manually re-typing invoice data. The idea: treat invoices like streaming events—ingest, extract, validate, confirm. I chose a decoupled architecture: Flask API handles fast request/response; Celery workers perform OCR (Tesseract) and structured field extraction using OpenAI; Redis acts as broker + ephemeral cache; MariaDB stores canonical invoice states and transitions. WebSockets broadcast status changes so users get immediate feedback. The challenge was balancing determinism (for auditing) with probabilistic extraction (LLM). The solution: store raw OCR, model prompts, and extracted fields with version markers so improvements never silently mutate historical data. OpenAPI-first design kept clients stable while internals evolved. Result: a pipeline that shortens validation loops and creates a foundation for analytics (processing latency, field accuracy, retry rates).

Goal

Automate ingestion, normalization and validation of invoices with async processing & real‑time updates.

Back to projects