open-scaffold

Your AI agent's work belongs in your repo, not its chat history.

Ambient work records, compact handoffs, and bounded review/gate checks from cheap and local models — for AI-assisted work that needs evidence, recovery, and human gates, with pilot-grade proof boundaries.

The problem

You pay frontier prices for review, status checks, and "where were we" because nothing cheaper can be trusted. Cheaper models guess; when a chat ends, the work's memory dies with it; the next session reconstructs from a scrollback buffer and invents what it can't recover.

What it does

Open Scaffold keeps a repo-native work record — git-tracked, observed-fact files about what your agents did — and turns it into three things:

Record (ambient). Extracted from observed facts — transcripts, receipts, test results — costing the working model nothing. osc capture --from claude-code|codex reads a finished session into a record with no worker cooperation. Add a plan and evidence files to check claims against intent; feedback and lessons carry forward instead of being relearned.
Handoff. osc handoff compiles the record into a budgeted, secret-redacted packet so the next reader — a fresh session, a smaller model, another vendor's agent, or a teammate — resumes from truth instead of re-deriving or inventing it.
Review and gate. osc review reports plateaus, failing criteria, and requirements worth questioning; osc gate turns that into a retry authorization with stop authority outside the worker. Any file-reading model can be the judge. Fails closed: no parseable verdict means no authorization.

Command	Meaning
`osc handoff`	Compile the work record into a resume packet for the next session or model.
`osc review`	Review recorded attempts: plateaus, failing criteria, question-the-requirement signals.
`osc gate`	Authorize or block the next attempt from the analysis plus an optional independent judge.

What's measured

The interesting result is not "the scaffold makes your agent smarter" — measured, it does not. A naked frontier model matched or beat every scaffolded arm on in-session task quality. That dead result is published at equal weight with the wins.

What the record fixes is amnesia and memory errors. In preregistered trials, a mid-tier reviewer model answering factual questions about finished work — graded against answer keys committed before it ran — hit 94% accuracy with the record vs 30% without, zero confident wrong-history answers, at half the review cost. The record turns a cheap model into a trustworthy auditor. Boundaries in docs/PROOF_HARNESS.md.

A bounded Codex cold-resume fixture: a 1,557-byte resume capsule vs 419,233 bytes of raw transcript, three replicates per arm, decision quality tied at 6/6 on a deterministic human-facing reader-usability rubric, 4.330033x fewer tokens (median 137,327 → 31,715). One cold-resume decision, not a universal claim. This is not a production-readiness claim.

Audit every receipt, zero spend: REPRODUCE.md.

Start in 60 seconds

npx open-scaffold@latest first-run

Three questions produce MISSION.md, an active plan with acceptance criteria, and an evidence skeleton. Work however you already work; the record accumulates as files:

osc verify
osc evidence new first-slice
osc close first-slice --message "verified first slice"

Scope changed? osc amend first-slice --message "what changed" — plans are immutable, learnings appended. More plans: osc plan new <slug> --stage active. Fresh session: osc handoff.

Want the discipline without the CLI? SKILL.md — the methodology works with plain files.

It is just files

MISSION.md                     why this repo exists
.osc/plans/                    scoped work with acceptance criteria (active/backlog/done/blocked)
.osc/runs/<run>/run.json       handoff package for a worker or reviewer
.osc/releases/                 evidence notes and release records

No daemon, no database, no SaaS — reviewable in a PR, survives any tool change.

Mental model

You own the goal, taste, risk, merge, and publish gates.
Your agent does the work — Open Scaffold never runs or disciplines it.
Open Scaffold keeps the record: what was asked, what happened, what was claimed versus verified, what the next session needs to know.

Use for multi-session AI work, PRs needing intent and evidence, audit-sensitive delivery. Skip for one-off scripts and prototypes that die in one session.

Honest limits

Pre-1.0 (v0.33.x). Does not make your model smarter — it makes the loop around the model disciplined. Maturity contract: docs/STABILITY.md. Claim ledger: docs/PROOF_HARNESS.md. Raw receipts: harness-bench.

Key docs

docs/START_HERE.md — the single entry point.
docs/PROOF_HARNESS.md — measured claims, raw pointers, proof boundaries.
docs/STABILITY.md — command maturity, version truth, honest limits.
REPRODUCE.md — audit every receipt, zero spend.
SKILL.md — the methodology as a portable agent skill.
docs/FAQ.md — deeper questions.
docs/GLOSSARY.md — the vocabulary.

Dogfooded

Open Scaffold is built with Open Scaffold. This repo carries its own mission, plans, run records, evidence notes, decisions, and releases — inspect the method instead of taking it on faith.

Name		Name	Last commit message	Last commit date
Latest commit History 291 Commits
.devcontainer		.devcontainer
.github		.github
.osc		.osc
docs		docs
examples		examples
packages/runtime-omx		packages/runtime-omx
python		python
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
LLM_QUICKSTART.md		LLM_QUICKSTART.md
MISSION.md		MISSION.md
README.md		README.md
REPRODUCE.md		REPRODUCE.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
SKILL.md		SKILL.md
amend.sh		amend.sh
bootstrap.sh		bootstrap.sh
close.sh		close.sh
delegate.sh		delegate.sh
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
verify.sh		verify.sh
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

open-scaffold

The problem

What it does

What's measured

Start in 60 seconds

It is just files

Mental model

Honest limits

Key docs

Dogfooded

About

Uh oh!

Releases 35

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

open-scaffold

The problem

What it does

What's measured

Start in 60 seconds

It is just files

Mental model

Honest limits

Key docs

Dogfooded

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 35

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages