RAGFlow is an open-source Retrieval-Augmented Generation (RAG) engine that integrates deep document understanding with agentic workflow capabilities. It is engineered to transform complex, unstructured data into high-fidelity, production-ready AI systems by combining intelligent document parsing, embedding, and retrieval with configurable LLM-powered agents and workflows. README.md76-78
Core Characteristics: RAGFlow specializes in parsing complex document formats such as PDF, DOCX, Excel, and PPT using state-of-the-art deep learning models for document layout and table extraction. It segments documents into meaningful chunks using multiple, template-driven chunking strategies, generates contextual embeddings for these chunks, and stores them in a hybrid vector and keyword index. It supports conversational AI applications backed by traceable, grounded citations. Additionally, RAGFlow includes a visual agent workflow system called Canvas for creating multi-step AI agents with persistent memory. README.md112-122 pyproject.toml4
Key Value Statement: "Quality in, quality out" — the system prioritizes deep document understanding to reliably find relevant information even from vast, unbounded token spaces. README.md114-118
Sources: README.md76-122 pyproject.toml4
RAGFlow is built upon a multi-tier microservices architecture, optimizing scalability and decoupling synchronous API operations from intensive background processing tasks. The architecture follows a producer-consumer pattern with Redis Streams managing task queues. docker/.env140-146
Sources: README.md141-145 docker/.env13-159 docker/docker-compose-base.yml1-230
RAGFlow is logically divided into three tiers:
Frontend/API Tier:
cmd/server_main.go) for native components and advanced search handling. docker/.env161-162API_PROXY_SCHEME). docker/.env161-162Asynchronous Task Tier:
TaskExecutor or the Go-based Ingestor. .github/workflows/tests.yml132-139Persistence Tier:
Sources: pyproject.toml9-169 docker/.env13-159 docker/docker-compose-base.yml1-230
Bridging natural language descriptions of system functions and their implementation code entities, this diagram relates the major conceptual components to their main code locations:
Sources: .github/workflows/tests.yml132-139 README.md93-101
| Service/Subsystem | Main File(s) | Description |
|---|---|---|
| Python REST API Server | api/ragflow_server.py | Quart-based async API server handling REST endpoints and SDK. |
| Go Server | cmd/server_main.go | High-performance backend server layer with native components. |
| Task Executor Worker | rag/svr/task_executor.py | Python background task worker for document ingestion. |
| Go Ingestor | cmd/ingestor.go | Go-based ingestion worker for high-throughput processing. |
| MCP Server | mcp/server/server.py | Model Context Protocol server for agent communication. |
| Admin Server | cmd/admin_server.go | Go-based administration service. |
Sources: .github/workflows/tests.yml132-139 docker/.env156-159 pyproject.toml68
Sources: README.md75-140 docker/.env13-20 docker/docker-compose-base.yml158-184
Sources: pyproject.toml1-180 Dockerfile1-154 docker/.env1-165
Refresh this wiki
This wiki was recently refreshed. Please wait 3 days to refresh again.