feat(v1,sdk): rerun eval suites against their saved server selection#2609
Conversation
Extends the @mcpjam/sdk/platform catalog (PR #2589) over the public API's eval and chatbox surface, and registers every operation as a tool on the remote MCP worker. PlatformApiClient gains listChatboxes/getChatbox, createEvalRun, getEvalRun, listEvalRunIterations, getEvalIterationTrace, and listEvalSuiteRuns, with matching wire DTOs. Nine new curated operations: list_eval_suites, list_eval_suite_runs, run_eval_suite (async rerun: suite and servers resolve by name or ID, servers default to the project's enabled HTTP servers), get_eval_run, list_eval_run_iterations, get_eval_iteration_trace, list_chatboxes, get_chatbox, and list_chat_sessions. Operations now carry a readOnly flag; the worker maps it to MCP tool annotations (run_eval_suite is the only non-read tool: readOnlyHint false, destructiveHint false). Worker tool registration is now catalog-driven (platformTools.ts); showServers.ts keeps only the MCP Apps widget variant. README Status, Auth, and Architecture sections refreshed (they still described the pre-#2589 Convex path and a single tool). https://claude.ai/code/session_01UQSNMVcYZrJd9vabfXk1nY
…ed-runnable Explicitly selected stdio or URL-less servers now fail at resolution with an error naming the selector, instead of shipping a serverId the hosted runner can never connect to. The default branch makes the stdio exclusion explicit alongside the URL check. Disabled servers remain explicitly selectable by design: naming one is a deliberate choice and the default selection already excludes them. Also clarifies the readOnly-flag phrasing in the changeset. https://claude.ai/code/session_01UQSNMVcYZrJd9vabfXk1nY
get_eval_run, list_eval_run_iterations, and get_eval_iteration_trace no longer default the project to the most recently updated one. A run is an existing resource in one specific project; guessing the project made a run created anywhere else read as NOT_FOUND when polled with only its runId. run_eval_suite and list_eval_suite_runs return the resolved project, so callers address the polls exactly. Listing operations keep their default-project ergonomics, where a wrong default produces an actionable, enumerated error instead of a phantom-missing resource. https://claude.ai/code/session_01UQSNMVcYZrJd9vabfXk1nY
A whitespace-only selector passed min(1), then resolveProject trimmed it to empty and fell back to the most recently updated project. For list_chat_sessions that silently narrowed the unfiltered listing to one project; for the eval polling reads it would have reintroduced the default-project guess their required project exists to prevent. Selector inputs now trim before validating, so blank values fail loudly; list_chat_sessions also trims in execute() so raw SDK callers bypassing the schema get the unfiltered listing rather than a silent filter. Pagination cursors stay untrimmed: they're opaque. https://claude.ai/code/session_01UQSNMVcYZrJd9vabfXk1nY
Explicitly naming a disabled server runs it; the default selection excludes disabled servers. This mirrors the platform's own contract: eval-run authorization (convex webAuthorize, behind /web/authorize-batch) is project-membership-based and never consults the enabled toggle, which only shapes the default connection set. Stated in the tool's input schema, where agents read it before spending credits, plus the resolver comment and worker README. https://claude.ai/code/session_01UQSNMVcYZrJd9vabfXk1nY
POST /eval-runs no longer requires serverIds on a rerun: when omitted with a suiteId, the route derives the suite's saved server selection via the new backend query testSuites:getSuiteRunServerSelection and connects exactly the set the run snapshot references. Previously the manager connected whatever the caller guessed, while startTestSuiteRun froze the suite's own server set — a guess that missed it produced a 202 followed by execution-time failures. New-suite creation still requires serverIds (there is no saved selection to derive). A suite with no saved selection rejects with VALIDATION_ERROR (NO_SAVED_SERVER_SELECTION), and older backends without the query degrade to an explicit-serverIds instruction instead of a 500. The 202 now lists the resolved servers. The SDK's run_eval_suite drops its client-side all-enabled-HTTP default in favor of the platform-derived selection; explicit servers overrides keep their resolution and validation. OpenAPI and the public-api docs updated to match. Deploy order: mcpjam-backend (getSuiteRunServerSelection) first. https://claude.ai/code/session_01UQSNMVcYZrJd9vabfXk1nY
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
Internal previewPreview URL: https://mcp-inspector-pr-2609.up.railway.app |
MCP worker previewPreview worker |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughThis PR shifts eval run server selection from the SDK's client-side defaulting to the Platform API's saved suite selection. When a rerun request omits Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@mcpjam-inspector/server/routes/v1/evals.ts`:
- Around line 228-268: The query result `selection` from
convex.query("testSuites:getSuiteRunServerSelection") can be null and should
yield a 404, not a 400; after the try/catch and before you derive
serverIds/serverNames, add a null-check: if selection === null then throw new
WebRouteError(404, ErrorCode.NOT_FOUND, "Eval suite not found") (matching other
read-not-found behavior in this file). This ensures the nullable `selection` is
treated as not found rather than falling through to the
NO_SAVED_SERVER_SELECTION validation error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 40530527-157d-4850-bb3f-78ceb6578930
📒 Files selected for processing (9)
.changeset/run-eval-suite-saved-selection.mddocs/reference/openapi.jsondocs/reference/public-api.mdxmcp/README.mdmcpjam-inspector/server/routes/v1/__tests__/write-routes.test.tsmcpjam-inspector/server/routes/v1/evals.tssdk/src/platform/operations.tssdk/src/platform/types.tssdk/tests/platform/operations.test.ts
A null result from getSuiteRunServerSelection means the suite itself was not found; surface it like every other Convex read-not-found in this file instead of misreporting it as NO_SAVED_SERVER_SELECTION. https://claude.ai/code/session_01UQSNMVcYZrJd9vabfXk1nY
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
Summary
Follow-up to #2603, closing out the Codex P2 finding there (confirmed: "implement the API fix"). Pairs with MCPJam/mcpjam-backend#514, which must deploy first.
The v1 rerun path connected its MCP manager from the caller's
serverIds, whilestartTestSuiteRunsnapshots the run's server set from the suite itself (attachments / suite host config / environment bindings). Any caller-side guess that missed the suite's frozen selection — includingrun_eval_suite's previous all-enabled-HTTP default — produced a202followed by execution-time failures on never-connected servers.POST /projects/{p}/eval-runsserverIdsis now optional on reruns: when omitted with asuiteId, the route derives the suite's saved selection via the newtestSuites:getSuiteRunServerSelectionquery (backend PR) and connects exactly the set the run snapshot references. ExplicitserverIdskeeps overriding.suiteName+tests) still requiresserverIds— there is no saved selection to derive.VALIDATION_ERROR+details.reason: "NO_SAVED_SERVER_SELECTION"; a suite the caller can't see maps to404; a backend without the query yet degrades to an explicit-serverIds instruction instead of a 500 (deploy-order tolerance).202response now lists the resolvedservers({id, name?}) — additive — so callers that omittedserverIdscan see what the run targets.EvalRunCreateRequestanyOf/required,EvalRunCreated.servers) andpublic-api.mdxupdated.@mcpjam/sdk/platform—run_eval_suiteDrops the client-side all-enabled-HTTP default: when
serversis omitted, the request omitsserverIdsand the platform resolves the suite's saved selection; the result'sserverscomes from the API response. Explicitserversoverrides keep their client-side resolution and validation (id/unique-name, stdio/URL-less rejection, disabled-selectable semantics).PlatformEvalRunCreatedgains the optionalserversfield. Changeset:@mcpjam/sdkminor.Testing
routes/v1/__tests__/write-routes.test.ts: derivation happy path (query called, manager gets derived ids+names,202lists servers),NO_SAVED_SERVER_SELECTION, not-visible → 404, older-backend degradation, new-suite-requires-serverIds. 73 v1 route tests green.sdk/tests/platform/: omitted-serverssends noserverIdsand performs no project-server listing; explicit-override tests unchanged. Full SDK suite 87 files / 1446 passed; worker tests + typecheck and both runtime-safety guards green.tscerror count identical to main (285 pre-existing, none inroutes/v1).Deploy order
Until 1 deploys, omitted-
serverIdsrequests answer with the actionable VALIDATION_ERROR rather than failing opaquely.https://claude.ai/code/session_01UQSNMVcYZrJd9vabfXk1nY
Generated by Claude Code
Note
Medium Risk
Changes eval-run connection semantics and API contract (optional serverIds on reruns); depends on backend deploy order but includes graceful degradation for skew.
Overview
Eval suite reruns no longer require callers to guess which MCP servers to connect.
POST /eval-runstreatsserverIdsas optional whensuiteIdis present: the route loads the suite’s saved selection viatestSuites:getSuiteRunServerSelection, wires the manager to that set, and returns the resolvedserverson the202. New suites (suiteName+tests) still requireserverIds. Missing selection yieldsNO_SAVED_SERVER_SELECTION; older backends without the Convex query get aVALIDATION_ERRORtelling callers to passserverIds.run_eval_suitein@mcpjam/sdkdrops the client-side “all enabled HTTP servers” default: omittedserversmeans the request omitsserverIds, and the result’sserverscome from the API. Explicitserversoverrides keep client resolution/validation. OpenAPI, public API docs, MCP README, and tests are updated accordingly.Reviewed by Cursor Bugbot for commit b54d367. Bugbot is set up for automated code reviews on this repo. Configure here.