This document describes RAGFlow's system health monitoring and status reporting infrastructure. It covers the health check endpoints exposed for operational monitoring, component status reporting, task executor heartbeat monitoring, and the Model Context Protocol (MCP) server integration. This system ensures high availability by providing real-time visibility into the backend services, document engines, and task processing layers.
RAGFlow exposes multiple health check endpoints to support different monitoring use cases, from simple liveness probes to detailed component status reporting. The system uses a multi-layered approach involving the core Python backend (Quart/Flask), the Go-based administrative layer, and specialized storage engine probes.
| Endpoint | Authentication | Implementation | Purpose |
|---|---|---|---|
GET /api/v1/admin/ping | None | admin/server/routes.py | Admin connectivity check admin/server/routes.py38-40 |
GET /api/v1/admin/auth | Required | admin/server/routes.py | Verify admin session validity admin/server/routes.py67-73 |
GET /oceanbase/status | Required | api/utils/health_utils.py | OceanBase specific metrics api/utils/health_utils.py104-133 |
GET /api/v1/mcp/list | Required | internal/handler/mcp.go | List and monitor registered MCP servers internal/handler/mcp.go98-140 |
The following diagram maps the health monitoring flow from external requests to internal code entities and storage systems.
System Health Data Flow
Sources: api/utils/health_utils.py34-69 admin/server/routes.py38-40
The comprehensive health check logic resides in api/utils/health_utils.py. It uses a series of probes to verify the entire stack:
check_db() executes a lightweight SELECT 1 query via the DB model api/utils/health_utils.py34-41check_redis() uses the REDIS_CONN.health() method to verify connectivity api/utils/health_utils.py44-51check_doc_engine() calls the health() method of the configured docStoreConn (Elasticsearch, Infinity, or OceanBase) api/utils/health_utils.py53-61check_storage() verifies the health of the STORAGE_IMPL (MinIO or S3) api/utils/health_utils.py63-69The system monitors the state of document metadata storage. During user creation or deletion, the system ensures that tenant-specific metadata indices (e.g., ragflow_doc_meta_<tenant_id>) are properly managed api/db/services/doc_metadata_service.py70-80 Failure to initialize or clean up these indices results in logged exceptions during the user account lifecycle api/db/joint_services/user_account_service.py109-113
Sources: api/utils/health_utils.py34-69 api/db/services/doc_metadata_service.py70-80 api/db/joint_services/user_account_service.py106-113
RAGFlow monitors the background task processing layer through a distributed lock and progress update mechanism within the main server.
The ragflow_server.py initializes a dedicated thread update_progress that runs periodically api/ragflow_server.py53-69
RedisDistributedLock named update_progress to ensure only one server instance updates progress at a time api/ragflow_server.py55-59DocumentService.update_progress() to synchronize document parsing states and handle heartbeats api/ragflow_server.py60Sources: api/ragflow_server.py53-69 api/ragflow_server.py139-143
The Model Context Protocol (MCP) server integration allows RAGFlow to connect to external tool providers. This integration is managed across the Python backend and the Go server layer.
The RAGFlow server manages the lifecycle of MCP sessions. Upon receiving interrupt signals (SIGINT, SIGTERM), the server triggers shutdown_all_mcp_sessions() to cleanly close connections to MCP servers api/ragflow_server.py71-76
The Go server provides robust handling for MCP server registration and health testing:
CreateMCPServer in internal/service/mcp.go validates server types (e.g., sse, streamable-http) and URLs before persisting to the database via MCPServerDAO internal/service/mcp.go102-157mcpErrorResponse in internal/handler/mcp.go maps internal Go errors (like ErrMCPInvalidURL or ErrMCPTestFailed) to standardized API responses for the frontend internal/handler/mcp.go193-207MCP Integration Mapping
Sources: api/ragflow_server.py44 api/ragflow_server.py71-76 internal/service/mcp.go102-157 internal/handler/mcp.go193-207 docs/develop/mcp/launch_mcp_server.md63-66
For high-scale deployments using OceanBase as the document engine, RAGFlow provides deep visibility into performance metrics through specialized health functions.
The get_oceanbase_status() and check_oceanbase_health() functions collect comprehensive telemetry api/utils/health_utils.py104-200:
OBConnection is "healthy" or "timeout" api/utils/health_utils.py120Sources: api/utils/health_utils.py104-200
The Admin Service (running on port 9381 by default) provides a dedicated management layer admin/server/admin_server.py73
The admin layer provides endpoints for managing users and monitoring system-wide configurations. It is initialized via admin_server.py, which sets up a Flask application and registers the admin_bp blueprint admin/server/admin_server.py52-53
The Admin service uses flask_login and a custom setup_auth to monitor and validate administrative access admin/server/auth.py39-85 It verifies JWT tokens against the UserService and checks the is_superuser status via the check_admin_auth decorator to ensure only authorized operators can access system status routes admin/server/auth.py144-157
Sources: admin/server/admin_server.py41-84 admin/server/auth.py39-85 admin/server/auth.py144-157 admin/server/routes.py35
Refresh this wiki
This wiki was recently refreshed. Please wait 2 days to refresh again.