Audience
Enterprise AI and data teams that need multilingual document extraction, structured OCR, RAG ingestion, and self-hostable document intelligence for sensitive workflows
About Mistral OCR 4
Mistral OCR 4 is a document extraction and understanding model built for enterprise search, RAG, domain-specific retrieval pipelines, and production-grade document intelligence. It extracts and structures content from a wide range of documents, moving beyond clean text and tables to return a structured representation of each page. Alongside extracted text, OCR 4 provides bounding boxes, typed-block classification, and inline confidence scores, helping downstream systems understand not only what the document says, but where each element sits, what role it plays, and how confident the model is in each region. Bounding boxes make in-context highlighting and reliable data pipelines possible, while block types and confidence scores support source-grounded citations, redactions, and human-in-the-loop verification. OCR 4 accepts common enterprise formats, including PDF, DOC, PPT, and OpenDocument, and supports 170 languages across 10 language groups.
