#

ai-reliability

Here are 82 public repositories matching this topic...

relai-ai / relai-sdk

A platform for building reliable AI agents

ai-agents ai-reliability

Updated Apr 3, 2026
Python

hermes-labs-ai / zer0dex

zer0dex is a local dual-layer memory pattern for AI agents: a compressed, human-readable markdown index plus a vector store queried automatically before each message. Built for cross-project recall and cross-reference where flat memory files or vector-only RAG fall short. Local-first, low-latency. Reference implementation by Hermes Labs.

persistent-memory ai-agents rag vector-search local-first semantic-memory llm llm-memory ai-memory agent-memory claude-code ai-reliability hermes-labs dual-layer-memory

Updated Jun 7, 2026
Python

hermes-labs-ai / lintlang

lintlang is a static linter for AI agent configs, tool descriptions, and system prompts that runs zero-LLM quality gating in CI. Catches language-level failures (vague tool descriptions, missing stop conditions, schema gaps) before they reach runtime, with deterministic regex + structural detectors and no model calls.

Updated Jun 2, 2026
Python

ai-agent-eval-harness

najeed / ai-agent-eval-harness

The open-source MultiAgentOps evaluation and verification harness for any industry business workflow.

Updated Jun 7, 2026
Python

hermes-labs-ai / fidelis

fidelis is zero-LLM agent memory for Claude Code and AI agents: a local-first memory layer whose default retrieval path uses BM25, dense vectors, and reciprocal rank fusion with no LLM call. It returns your original passages verbatim instead of paraphrasing and runs fully local. Benchmarked on LongMemEval-S. MIT, by Hermes Labs.

retrieval mcp bm25 fidelity ai-agents rag local-first llm llm-memory agent-memory claude-code longmemeval ai-reliability zero-llm hermes-labs

Updated Jun 7, 2026
Python

TaimoorKhan10 / replayd

Turn failed AI agent runs into replayable regression tests. Catch regressions before you ship.

python open-source sdk regression-testing ai-agents release-control prompt-testing llm-ops llm-testing ai-infrastructure agent-ops agent-testing ai-reliability replay-testing

Updated Jun 4, 2026
Python

Harshit-J004 / toolguard

The "Cloudflare for AI Agents". 7-layer security interceptor, real-time observability dashboard, and automated reliability testing for MCP and AI tool chains. Prevent hallucinations, prompt injection, and destructive tool calls.

Updated May 4, 2026
Python

elsium-ai / elsium-ai

Production-grade TypeScript AI runtime focused on reliability, governance, and reproducible LLM systems. Multi-provider gateway, agents, RAG, workflows, policy engine, audit trails, and deterministic testing — built for teams shipping AI in production.

typescript ai-framework rag agent-framework ai-compliance llm ai-governance ai-runtime open-source-ai ai-infrastructure llm-gateway reproducible-ai llm-runtime ai-reliability deterministic-ai ai-production

Updated Jun 4, 2026
TypeScript

ejentum / ejentum-mcp

MCP server for the Ejentum API. 8 cognitive operations across 4 harnesses (reasoning, code, anti-deception, memory) in dynamic and adaptive modes.

typescript mcp code-review claude anthropic llm-tools agentic-ai model-context-protocol mcp-server ai-reliability reasoning-harness ejentum anti-deception cognitive-scaffold

Updated May 31, 2026
JavaScript

openvals

vishwanathakuthota / openvals

Open-source AI model evaluation and benchmarking framework for LLMs (OpenAI, Ollama, Claude, Gemini)

machine-learning gemini openai ai-safety ai-agents ai-evaluation ai-testing ai-quality llm-tools ollama llm-benchmarking ai-evaluation-framework calude ai-reliability vishwanath-akuthota

Updated Jun 4, 2026
Python

FailproofAI / ai-reliability-standards

Architectural standards and best practices for building reliable AI Agents and LLM workflows. Defining the framework for AI Reliability Engineering (AIRE).

enterprise ai reliability-engineering evaluation sre observability ai-agents aiops evals durable-execution ai-reliability

Updated Feb 14, 2026
Dockerfile

hermes-labs-ai / hermes-blind

Context-compensation scaffold for LLM evaluation prompts. A short language prefix you prepend so the model discloses prior exposure, scores on quoted evidence only, and hedges on thin evidence — for scorers that can see your CLAUDE.md, memory, or session context. Backend-agnostic. Experimental: variance-reduction effect not yet measured.

evaluation scaffold ai-safety ai-agents rubric multi-turn debiasing llm prompt-engineering evals llm-evaluation ai-reliability lpci hermes-labs context-compensation language-as-state agent-scaffold drift-recovery recovery-scaffold

Updated May 27, 2026
Python

hermes-labs-ai / quick-gate-js

quick-gate-js (npm: quick-gate) is a deterministic JS/TS CI quality gate that unifies ESLint, TypeScript, build, and Lighthouse checks into one fail-fast result, with bounded auto-repair and structured escalation evidence for humans or agents. Works with Next.js, React, Vue, Svelte, or any Node project. A gate-and-escalate wrapper, not a dashboard.

eslint frontend linting ci static-analysis devtools ci-cd code-quality agents lighthouse quality-gate auto-repair ai-reliability

Updated Jun 1, 2026
JavaScript

vbepipe / vmrrb-benchmark

Benchmark for evaluating advanced reasoning, recursive dependency resolution, and robustness capabilities of large language models in dynamic, noisy, and structurally challenging environments.

benchmark dependency-resolution ai-accuracy multistep-reasoning ai-evaluation large-language-models ai-reasoning llm-evaluation reasoning-benchmark llm-benchmark ai-reliability recursive-reasoning ai-stability long-chain-reasoning

Updated May 15, 2026
Python

AionSystem / VERITAS

Sheldon K. Salmon — AI Reliability Architect. Creator of the AION Constitutional Stack and the CERTUS certainty‑engineering methodology. He designed, directed, and red‑teamed VERITAS — applying epistemic scoring, Uncertainty Mass, and permanent STP seals to community crisis data. Code is open source. The judgment is not.

community ai humanitarian photo undp crisis-support disaster-response disaster-relief damage-assessment crisis-response ai-audit ai-reliability aion-system sheldon-k-salmon fsve certus-engine

Updated May 16, 2026
JavaScript

Nefza99 / Rebis-AI-auditing-Architecture

Orchestration runtime for AI agent workflows that preserves task-state fidelity, prevents reasoning drift, and reduces wasted computation in long-horizon pipelines.

Updated Mar 19, 2026
JavaScript

aditikhare007 / ai-decision-intelligence-system

Enterprise AI system for decision intelligence — transforming research into scalable, context-aware insights at production scale | AditiKhare.com — AI Product Ecosystem

ai mlops ai-systems inference-optimization ai-platform ai-product decision-intelligence ai-evaluation llm generative-ai context-engineering ai-reliability

Updated Apr 20, 2026

AionSystem / AION-SCAFFOLDING

AION Scaffold — Intelligent tree-to-filesystem generator. Built by Sheldon K. Salmon, AI Reliability Architect. Part of the AION Constitutional Stack. Free forever. No tracking.

tools scaffold dev developer-tools scaffolder scaffolding dev-tools ai-architect ai-audit ai-reliability aion-system

Updated May 6, 2026
HTML

UAICP / uaicp

UAICP (Universal Agentic Interoperability Control Protocol): open reliability contract for AI agent workflows with evidence gating, policy controls, and auditability.

python rust typescript orchestration interoperability compliance risk-management policy-engine ai-agents audit-trail autogen guardrails ai-governance llmops langgraph agentic-ai ai-reliability uaicp rig-rs

Updated Feb 27, 2026
TypeScript

Yuchi-Wang02 / bizhallu

Span-level hallucination detection for LLM-generated business analysis on Online Retail transaction data.

business-analytics retail-analytics qwen llm-evaluation hallucination-detection ai-reliability

Updated May 26, 2026
Python

Improve this page

Add a description, image, and links to the ai-reliability topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-reliability topic, visit your repo's landing page and select "manage topics."