Hermes Labs

Epistemic engineering for production AI.

Reliability infrastructure, evals, audits, and open-source tools for teams shipping LLM applications and agents. EU AI Act, ISO/IEC 42001, and NIST AI RMF readiness across the stack.

What Hermes is

An independent research lab building the audit, runtime, and evidence layer for AI systems that can't afford to fail silently. We study how language models fail structurally (sycophancy, null-result bias, hermeneutic drift, intent exceptionalism), then ship tools and audits that surface those failures before production does.

Engagement tracks

AI Assurance Audit — pre-deployment prompt, tool, and scaffold audit; adversarial testing; written findings with prioritized fixes
Runtime Assurance & Evidence — input-side prompt-injection sensing, runtime policy enforcement, signed receipts and transcript evidence
AI Compliance & Audit Readiness — technical readiness scoring and evidence packaging mapped across EU AI Act (Annex IV), ISO/IEC 42001, and NIST AI RMF

Start a conversation →

Research

The Asymmetric Burden of Proof. 14-page report. LLMs systematically discount negative findings across matched scientific vignettes. 19.6 to 56.7pp probability gaps across 3 models, directionally consistent in 23 of 24 conditions.
A Taxonomy of Epistemic Failure Modes in LLMs. Seven structural failure modes: null-result asymmetry, source-status credibility bias, agency dissolution, performative hedging, constraint evasion, silent instruction relaxation, controversy-truth conflation.

1,500+ controlled epistemic evaluations. 5 US patent filings (1 non-provisional pending, 4 provisional).

Open-source contributions

20+ PRs merged upstream. Four land in AI frameworks themselves:

Repo	PR	Fix
langchain-ai/langchain	#35544	Drop forced `tool_choice` when extended thinking is on
microsoft/semantic-kernel	#13610	Fix truncation reducer silently deleting system prompts
pytorch/ignite	#3591	Typing modernization in `tqdm_logger`
optuna/optuna	#6478	Simplify `Union` under `TYPE_CHECKING`

The rest ship with production AI stacks: React Router, Nuxt, Cloudflare Workers, Sentry, Meta jscodeshift, MobX, ngrx, Microsoft TSDoc/Griffel, and more.

Reliability stack (flagships)

The current set, all open-source, all installable in seconds.

Tool	What it does	Install
hermes-rubric	Evidence-first structured scoring. Synthesize rubric, collect citations, hedge on thin evidence.	`pip install hermes-rubric`
fidelis	Zero-LLM agent memory. 73.0% end-to-end QA on LongMemEval-S, $0/query, fully local.	`pip install fidelis`
hermes-blind	Context-compensation scaffold for LLM evaluation prompts. Disclose, gate on evidence, hedge on thin.	`pip install hermes-blind`
hermeneutic	Mine corrections from chat logs; gate the next response before drift ships.	`pip install hermeneutic`
hermes-prime	Bootstrap a fresh Claude Code session with conventions and grounding triggers already loaded. Stop re-deriving the same rules at minute 30.	`pip install hermes-prime`

Adjacent tools

Tool	What it does	Install
lintlang	Static linter for AI agent configs, tool descriptions, system prompts. Zero LLM calls.	`pip install lintlang`
little-canary	Input-side prompt injection detection via sacrificial canary-model probes.	`pip install little-canary`
claude-router	Routes prompts to the right Claude tier via local embeddings.	`pip install claude-router`
langquant	Stateless LLM coherence via refreshing language scaffold (LPCI).	`pip install langquant`
quickthink	Local-first inference control layer for small LLMs.	`pip install quickthink`
agent-gorgon	Stop AI agents from fabricating tool output when a registered tool exists.	`pip install agent-gorgon`
suy-sideguy	Runtime policy guard for autonomous AI agents.	`pip install suy-sideguy`
zer0dex	Dual-layer memory for AI agents (compressed index plus vector retrieval).	`pip install zer0dex`
agent-convergence-scorer	Score how similar N agent outputs are.	`pip install agent-convergence-scorer`
hermes-jailbench	Jailbreak regression benchmark for LLM endpoints.	`pip install hermes-jailbench`
rule-audit	Static prompt audit CLI for LLM system prompts.	`pip install rule-audit`
colony-probe	Defensive prompt-confidentiality audit.	`pip install colony-probe`
quick-gate-js / quick-gate-python	CI quality gate with bounded auto-repair.	`npm i quick-gate` · `pip install quick-gate-python`
csv-quality-gate	CSV preflight validation for pipeline inputs.	`pip install csv-quality-gate`
intent-verify	Repo intent verification and spec drift checks.	`pip install intent-verify`
forgetted	Mid-conversation incognito mode for AI agents.	`pip install forgetted`
zer0lint	Memory extraction diagnostics for `mem0` configs.	`pip install zer0lint`

Founded by Roli Bosch (Rolando Bosch on LinkedIn / academic publications). roli@hermes-labs.ai · hermes-labs.ai

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hermes Labs

Hermes Labs

What Hermes is

Engagement tracks

Research

Open-source contributions

Reliability stack (flagships)

Adjacent tools

Popular repositories Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!