Aquileo | feat(v1,sdk): rerun eval suites against their saved server selection by chelojimenez · Pull Request #2609 · MCPJam/inspector

chelojimenez · 2026-06-11T18:35:04Z

Summary

Follow-up to #2603, closing out the Codex P2 finding there (confirmed: "implement the API fix"). Pairs with MCPJam/mcpjam-backend#514, which must deploy first.

The v1 rerun path connected its MCP manager from the caller's serverIds, while startTestSuiteRun snapshots the run's server set from the suite itself (attachments / suite host config / environment bindings). Any caller-side guess that missed the suite's frozen selection — including run_eval_suite's previous all-enabled-HTTP default — produced a 202 followed by execution-time failures on never-connected servers.

`POST /projects/{p}/eval-runs`

serverIds is now optional on reruns: when omitted with a suiteId, the route derives the suite's saved selection via the new testSuites:getSuiteRunServerSelection query (backend PR) and connects exactly the set the run snapshot references. Explicit serverIds keeps overriding.
New-suite creation (suiteName + tests) still requires serverIds — there is no saved selection to derive.
A suite with no saved selection rejects with VALIDATION_ERROR + details.reason: "NO_SAVED_SERVER_SELECTION"; a suite the caller can't see maps to 404; a backend without the query yet degrades to an explicit-serverIds instruction instead of a 500 (deploy-order tolerance).
The 202 response now lists the resolved servers ({id, name?}) — additive — so callers that omitted serverIds can see what the run targets.
OpenAPI (EvalRunCreateRequest anyOf/required, EvalRunCreated.servers) and public-api.mdx updated.

`@mcpjam/sdk/platform` — `run_eval_suite`

Drops the client-side all-enabled-HTTP default: when servers is omitted, the request omits serverIds and the platform resolves the suite's saved selection; the result's servers comes from the API response. Explicit servers overrides keep their client-side resolution and validation (id/unique-name, stdio/URL-less rejection, disabled-selectable semantics). PlatformEvalRunCreated gains the optional servers field. Changeset: @mcpjam/sdk minor.

Testing

routes/v1/__tests__/write-routes.test.ts: derivation happy path (query called, manager gets derived ids+names, 202 lists servers), NO_SAVED_SERVER_SELECTION, not-visible → 404, older-backend degradation, new-suite-requires-serverIds. 73 v1 route tests green.
sdk/tests/platform/: omitted-servers sends no serverIds and performs no project-server listing; explicit-override tests unchanged. Full SDK suite 87 files / 1446 passed; worker tests + typecheck and both runtime-safety guards green.
Server tsc error count identical to main (285 pre-existing, none in routes/v1).

Deploy order

MCPJam/mcpjam-backend#514 (the Convex query)
This PR (route + SDK + docs)

Until 1 deploys, omitted-serverIds requests answer with the actionable VALIDATION_ERROR rather than failing opaquely.

https://claude.ai/code/session_01UQSNMVcYZrJd9vabfXk1nY

Generated by Claude Code

Note

Medium Risk
Changes eval-run connection semantics and API contract (optional serverIds on reruns); depends on backend deploy order but includes graceful degradation for skew.

Overview
Eval suite reruns no longer require callers to guess which MCP servers to connect. POST /eval-runs treats serverIds as optional when suiteId is present: the route loads the suite’s saved selection via testSuites:getSuiteRunServerSelection, wires the manager to that set, and returns the resolved servers on the 202. New suites (suiteName + tests) still require serverIds. Missing selection yields NO_SAVED_SERVER_SELECTION; older backends without the Convex query get a VALIDATION_ERROR telling callers to pass serverIds.

run_eval_suite in @mcpjam/sdk drops the client-side “all enabled HTTP servers” default: omitted servers means the request omits serverIds, and the result’s servers come from the API. Explicit servers overrides keep client resolution/validation. OpenAPI, public API docs, MCP README, and tests are updated accordingly.

^{Reviewed by Cursor Bugbot for commit b54d367. Bugbot is set up for automated code reviews on this repo. Configure here.}

Extends the @mcpjam/sdk/platform catalog (PR #2589) over the public API's eval and chatbox surface, and registers every operation as a tool on the remote MCP worker. PlatformApiClient gains listChatboxes/getChatbox, createEvalRun, getEvalRun, listEvalRunIterations, getEvalIterationTrace, and listEvalSuiteRuns, with matching wire DTOs. Nine new curated operations: list_eval_suites, list_eval_suite_runs, run_eval_suite (async rerun: suite and servers resolve by name or ID, servers default to the project's enabled HTTP servers), get_eval_run, list_eval_run_iterations, get_eval_iteration_trace, list_chatboxes, get_chatbox, and list_chat_sessions. Operations now carry a readOnly flag; the worker maps it to MCP tool annotations (run_eval_suite is the only non-read tool: readOnlyHint false, destructiveHint false). Worker tool registration is now catalog-driven (platformTools.ts); showServers.ts keeps only the MCP Apps widget variant. README Status, Auth, and Architecture sections refreshed (they still described the pre-#2589 Convex path and a single tool). https://claude.ai/code/session_01UQSNMVcYZrJd9vabfXk1nY

…ed-runnable Explicitly selected stdio or URL-less servers now fail at resolution with an error naming the selector, instead of shipping a serverId the hosted runner can never connect to. The default branch makes the stdio exclusion explicit alongside the URL check. Disabled servers remain explicitly selectable by design: naming one is a deliberate choice and the default selection already excludes them. Also clarifies the readOnly-flag phrasing in the changeset. https://claude.ai/code/session_01UQSNMVcYZrJd9vabfXk1nY

get_eval_run, list_eval_run_iterations, and get_eval_iteration_trace no longer default the project to the most recently updated one. A run is an existing resource in one specific project; guessing the project made a run created anywhere else read as NOT_FOUND when polled with only its runId. run_eval_suite and list_eval_suite_runs return the resolved project, so callers address the polls exactly. Listing operations keep their default-project ergonomics, where a wrong default produces an actionable, enumerated error instead of a phantom-missing resource. https://claude.ai/code/session_01UQSNMVcYZrJd9vabfXk1nY

A whitespace-only selector passed min(1), then resolveProject trimmed it to empty and fell back to the most recently updated project. For list_chat_sessions that silently narrowed the unfiltered listing to one project; for the eval polling reads it would have reintroduced the default-project guess their required project exists to prevent. Selector inputs now trim before validating, so blank values fail loudly; list_chat_sessions also trims in execute() so raw SDK callers bypassing the schema get the unfiltered listing rather than a silent filter. Pagination cursors stay untrimmed: they're opaque. https://claude.ai/code/session_01UQSNMVcYZrJd9vabfXk1nY

Explicitly naming a disabled server runs it; the default selection excludes disabled servers. This mirrors the platform's own contract: eval-run authorization (convex webAuthorize, behind /web/authorize-batch) is project-membership-based and never consults the enabled toggle, which only shapes the default connection set. Stated in the tool's input schema, where agents read it before spending credits, plus the resolver comment and worker README. https://claude.ai/code/session_01UQSNMVcYZrJd9vabfXk1nY

…tkgt

POST /eval-runs no longer requires serverIds on a rerun: when omitted with a suiteId, the route derives the suite's saved server selection via the new backend query testSuites:getSuiteRunServerSelection and connects exactly the set the run snapshot references. Previously the manager connected whatever the caller guessed, while startTestSuiteRun froze the suite's own server set — a guess that missed it produced a 202 followed by execution-time failures. New-suite creation still requires serverIds (there is no saved selection to derive). A suite with no saved selection rejects with VALIDATION_ERROR (NO_SAVED_SERVER_SELECTION), and older backends without the query degrade to an explicit-serverIds instruction instead of a 500. The 202 now lists the resolved servers. The SDK's run_eval_suite drops its client-side all-enabled-HTTP default in favor of the platform-derived selection; explicit servers overrides keep their resolution and validation. OpenAPI and the public-api docs updated to match. Deploy order: mcpjam-backend (getSuiteRunServerSelection) first. https://claude.ai/code/session_01UQSNMVcYZrJd9vabfXk1nY

chatgpt-codex-connector · 2026-06-11T18:35:09Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

chelojimenez · 2026-06-11T18:35:21Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scan Engine	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

… deploy) https://claude.ai/code/session_01UQSNMVcYZrJd9vabfXk1nY

github-actions · 2026-06-11T18:39:10Z

Internal preview

Preview URL: https://mcp-inspector-pr-2609.up.railway.app
Deployed commit: 5c5bf37
PR head commit: b54d367
Backend target: staging fallback.
Health: ✅ Convex reachable
Access is employee-only in non-production environments.

github-actions · 2026-06-11T18:40:45Z

MCP worker preview

Preview worker mcpjam-mcp-pr-2609 deleted — the preview URL no longer resolves.
Merged changes are live on mcpjam-mcp-staging via deploy-mcp-staging.yml.

coderabbitai · 2026-06-11T18:41:59Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9043f314-f73a-46b0-934f-a677ac1e34cc

📥 Commits

Reviewing files that changed from the base of the PR and between b08e053 and b54d367.

📒 Files selected for processing (2)

mcpjam-inspector/server/routes/v1/__tests__/write-routes.test.ts
mcpjam-inspector/server/routes/v1/evals.ts

Walkthrough

This PR shifts eval run server selection from the SDK's client-side defaulting to the Platform API's saved suite selection. When a rerun request omits servers, the platform resolves the suite's saved configuration and returns the server set. The SDK now sends serverIds only for explicit overrides; a new backend Convex helper validates the saved selection and propagates it through the response.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@mcpjam-inspector/server/routes/v1/evals.ts`:
- Around line 228-268: The query result `selection` from
convex.query("testSuites:getSuiteRunServerSelection") can be null and should
yield a 404, not a 400; after the try/catch and before you derive
serverIds/serverNames, add a null-check: if selection === null then throw new
WebRouteError(404, ErrorCode.NOT_FOUND, "Eval suite not found") (matching other
read-not-found behavior in this file). This ensures the nullable `selection` is
treated as not found rather than falling through to the
NO_SAVED_SERVER_SELECTION validation error.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 40530527-157d-4850-bb3f-78ceb6578930

📥 Commits

Reviewing files that changed from the base of the PR and between a8ae274 and b08e053.

📒 Files selected for processing (9)

.changeset/run-eval-suite-saved-selection.md
docs/reference/openapi.json
docs/reference/public-api.mdx
mcp/README.md
mcpjam-inspector/server/routes/v1/__tests__/write-routes.test.ts
mcpjam-inspector/server/routes/v1/evals.ts
sdk/src/platform/operations.ts
sdk/src/platform/types.ts
sdk/tests/platform/operations.test.ts

A null result from getSuiteRunServerSelection means the suite itself was not found; surface it like every other Convex read-not-found in this file instead of misreporting it as NO_SAVED_SERVER_SELECTION. https://claude.ai/code/session_01UQSNMVcYZrJd9vabfXk1nY

mintlify · 2026-06-11T18:52:48Z

Preview deployment for your docs. Learn more about Mintlify Previews.

Project	Status	Preview	Updated (UTC)
mcpjam	🟢 Ready	View Preview	Jun 11, 2026, 6:53 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

claude added 7 commits June 11, 2026 08:08

Merge remote-tracking branch 'origin/main' into claude/sharp-gauss-ao…

3169b51

…tkgt

dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jun 11, 2026

chelojimenez had a problem deploying to preview-pr-2609 June 11, 2026 18:35 — with GitHub Actions Error

dosubot Bot added the enhancement New feature or request label Jun 11, 2026

chore: retrigger CI (transient Cloudflare auth failure on mcp preview…

47ae13f

… deploy) https://claude.ai/code/session_01UQSNMVcYZrJd9vabfXk1nY

chelojimenez had a problem deploying to preview-pr-2609 June 11, 2026 18:38 — with GitHub Actions Error

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread mcpjam-inspector/server/routes/v1/evals.ts

chelojimenez deployed to preview-pr-2609 June 11, 2026 18:43 — with GitHub Actions View deployment

mintlify Bot deployed to staging - docs June 11, 2026 18:53 View deployment

chelojimenez merged commit 9180118 into main Jun 11, 2026
16 checks passed

chelojimenez deleted the claude/sharp-gauss-aotkgt branch June 11, 2026 19:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(v1,sdk): rerun eval suites against their saved server selection#2609

feat(v1,sdk): rerun eval suites against their saved server selection#2609
chelojimenez merged 9 commits into
mainfrom
claude/sharp-gauss-aotkgt

chelojimenez commented Jun 11, 2026 •
edited by cursor Bot

Loading

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

chelojimenez commented Jun 11, 2026 •
edited

Loading

Uh oh!

github-actions Bot commented Jun 11, 2026 •
edited

Loading

Uh oh!

github-actions Bot commented Jun 11, 2026 •
edited

Loading

Uh oh!

coderabbitai Bot commented Jun 11, 2026 •
edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

mintlify Bot commented Jun 11, 2026 •
edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

chelojimenez commented Jun 11, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

POST /projects/{p}/eval-runs

@mcpjam/sdk/platform — run_eval_suite

Testing

Deploy order

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

chelojimenez commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Internal preview

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

MCP worker preview

Uh oh!

coderabbitai Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mintlify Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chelojimenez commented Jun 11, 2026 •
edited by cursor Bot

Loading

`POST /projects/{p}/eval-runs`

`@mcpjam/sdk/platform` — `run_eval_suite`

chelojimenez commented Jun 11, 2026 •
edited

Loading

github-actions Bot commented Jun 11, 2026 •
edited

Loading

github-actions Bot commented Jun 11, 2026 •
edited

Loading

coderabbitai Bot commented Jun 11, 2026 •
edited

Loading

mintlify Bot commented Jun 11, 2026 •
edited

Loading