Intelligent OCR & Computer Vision Pipeline

May 15, 2024 · 4 min read

A serverless Azure Function that uses LLM vision models to extract, categorize, and normalize data from customer receipt images — replacing manual extraction with 85–90% accuracy and reducing analyst workload by 70%.

Project Overview

The company needed to validate customer receipts (tickets) from various merchants as part of its credit-line evaluation process. The challenge was that every ticket has a different structure — different layouts, fonts, fields, and formats — making traditional OCR or rule-based extraction unreliable. Analysts were extracting information manually, which was slow, error-prone, and couldn’t scale.

With the emergence of LLM vision capabilities, the project leveraged Azure OpenAI to intelligently read receipt images, categorize the merchant, extract the relevant fields, and normalize the output into structured JSON — all within a single serverless function call.

The structured output is then forwarded to an external credit-line calculator and persisted in CosmosDB as part of the customer’s evaluation profile.


Architecture

High-Level Flow

graph TD A["🏢 Internal Services\n(Client Application)"] -->|"Up to 3 ticket images"| B["⚡ Azure Function"] B --> C["🤖 Azure OpenAI\n(Vision Model)"] C -->|"Structured JSON\nper ticket"| B B --> D["🧮 External\nCredit-Line Calculator"] D -->|"Credit result"| B B --> E[("🗄️ CosmosDB\n(Customer Summary)")] B -->|"Final result"| A

Component Breakdown

  • Internal Services — The client-facing applications send up to 3 receipt images per request along with the customer context to trigger the extraction process.
  • Azure Function — The core serverless compute unit that orchestrates the entire flow: receives images, calls the vision model, processes results, forwards to the calculator, assembles the customer summary, and persists everything.
  • Azure OpenAI (Vision Model) — Processes each receipt image and returns structured data. The model handles merchant identification, field extraction, and data normalization in a single inference call. Hosted on Azure to ensure customer data confidentiality never leaves the company’s cloud boundary.
  • External Credit-Line Calculator — A separate service that receives the normalized ticket data and computes the credit-line recommendation.
  • CosmosDB — Stores the assembled customer summary including extracted ticket data, calculator results, and metadata for downstream consumption.

LLM-Powered Extraction Process

The vision model performs three tasks in a single prompt:

1. Merchant Categorization

Determines whether the ticket belongs to a valid merchant category and classifies it accordingly. Invalid or unrecognizable tickets are flagged and excluded from the credit calculation.

2. Field Extraction

Extracts key information from each receipt regardless of its layout:

  • Merchant name and category
  • Date and time of purchase
  • Individual line items and amounts
  • Total amount
  • Payment method (when visible)

3. Data Normalization

The raw extracted data is normalized into a consistent JSON schema — standardizing date formats, currency values, merchant categories, and field names — so that every ticket, regardless of its original format, produces a uniform output for the calculator.


Processing Pipeline

Within the Azure Function, the flow after LLM extraction follows a straightforward pipeline:

StepAction
1. ReceiveAccept up to 3 ticket images from the requesting service
2. ExtractSend each image to Azure OpenAI vision model, receive structured JSON per ticket
3. ValidateVerify extracted fields for completeness and merchant validity
4. CalculateForward normalized data to the external credit-line calculator
5. AssembleBuild a customer summary combining ticket data, calculator output, and metadata
6. PersistSave the customer summary to CosmosDB
7. RespondReturn the final result to the calling service

Validation & Accuracy

To ensure the model was performing reliably before full production rollout, a manual validation process was conducted:

  • The call center team was asked to manually extract the same receipt information using their existing process.
  • The manual extractions were compared against the LLM-generated outputs to measure accuracy across categories, amounts, dates, and merchant identification.
  • This parallel validation confirmed the model’s production readiness.

Production Results

MetricResult
Content & category accuracy85–90%
Workload reduction~70% less manual extraction effort
Processing timeNear real-time (seconds per request vs. minutes manually)

Tech Stack

LayerTechnology
ComputeAzure Functions (Serverless)
AI / VisionAzure OpenAI (GPT Vision Model)
StorageAzure CosmosDB
IntegrationExternal Credit-Line Calculator API
LanguagePython

Conclusion

This project proved that LLM vision models can effectively replace manual data extraction from unstructured document images, even when formats vary significantly across sources. By keeping the architecture simple — a single Azure Function orchestrating the vision model and downstream services — the solution achieved fast deployment, low operational cost, and a measurable 70% reduction in analyst workload. The 85–90% accuracy rate validated the approach for production use, with the parallel manual validation providing confidence in the model’s reliability.