Skip to content

Changelog

Sitaraman Subramanian edited this page Apr 4, 2026 · 3 revisions

Changelog

April 2026

Swarm Mode & Racer Models

  • Introduced swarm mode for parallel multi-model attack runs. The orchestrator spawns a swarm of racers — each running a different model — that attack the same task simultaneously. The first to succeed can end the race (first_success) or all can run to completion (all_complete).
  • Added the spawn_swarm agent tool (available only in swarm-agent context). The orchestrator uses it to launch sub-swarms with per-agent task specs, optional timeout, and a configurable win condition.
  • Orchestrator model — the main model is now explicitly configured as the orchestrator (ORCHESTRATOR_* env vars). It drives high-level planning and swarm coordination.
  • Racer models — up to 8 independent racer models can be configured (RACER_1_*RACER_8_*), each with its own provider, model, API key, base URL, and reasoning mode. Racers are assigned round-robin when a swarm spawns more agents than configured racers.
  • Environment variable cleanup: MODEL_API_KEY, MODEL_PROVIDER, MODEL_BASE_PATH, and REASONING_MODE are removed. All model config now lives under ORCHESTRATOR_*. Existing .env files with only MODEL_* set continue to work via the fallback in readSwarmModelsFromEnv.
  • run.sh config gains a dedicated Racers option (option 2). The guided startup now prompts to configure racers as an optional step after the main model.

CTF Solver

  • Swarm mode integrates with CTF solving: use /solve <challenge> to focus the orchestrator, then let it spawn a racer swarm to attack the challenge with multiple models in parallel.

Agentic Architecture

  • The system now operates as an autonomous agent. Previously, the workflow was strictly human-in-the-loop with multiple stops where the user had to nudge the assistant to continue. Now the agent loops on its own for up to 25 iterations per turn, calling tools, analyzing output, and deciding next steps independently.
  • Added a consent model with two modes: "Auto run" (agent executes freely, with safety checks on dangerous commands) and "Ask consent" (agent pauses for approval before tool execution).
  • Manual execution fallback: when SSH connectivity drops, the agent presents commands for the user to run manually and accepts pasted output.

Tools (formerly Plugins)

  • Renamed "plugins" to "tools" throughout the codebase and UI.
  • 16 agent tools available, organized into Core, Intelligence, Burp Suite, and Browser groups.
  • Tools can be toggled on/off per session from the sidebar.
  • Tools that depend on unconfigured integrations are automatically hidden.

Capabilities System

  • Introduced a capability registry with 100+ security tools and Python packages organized into 7 buckets: Core, Network & Recon, Reverse Engineering, Binary Exploitation, Cryptography, Forensics, and Steganography.
  • Capabilities are exposed to the agent's system prompt so it knows what's available.
  • Added "Detect Installed" to scan the exploit box and determine which tools are already present.
  • Agent can auto-install capabilities on the fly via the run_install_tool tool.

Checklist / Todo Removed

  • Removed the copilot checklist/todo list feature. The agentic workflow replaces the need for a step-by-step checklist, as the agent manages its own task progression.

Burp Suite Integration

  • Added integration with Burp Suite Professional via the burp-rpc gRPC extension.
  • Dedicated proxy history page in the UI with filtering, request/response inspection, and intercept toggle.
  • Agent tools: search_burp_proxy_history, send_to_burp_repeater, send_to_burp_intruder, burp_collaborator.
  • "Send to Workspace" action to attach captured requests to the agent's chat context.

Browser Agent (Magnitude)

  • Introduced browser automation via Magnitude. The agent can control a real browser to interact with web applications: navigate, fill forms, click buttons, extract data.
  • Configurable proxy URL to route browser traffic through Burp Suite, combining browser-based testing with Burp's analysis tools.
  • Headless and visible modes (visible mode uses the VNC display).
  • Separate model configuration for the browser agent.

VPN Management

  • Added a VPN management page for handling OpenVPN connections on the exploit box.
  • Upload .ovpn / .conf profiles from the browser.
  • Connect and disconnect VPN profiles with one click.
  • Support for multiple simultaneous VPN connections.
  • Status display showing PID, tunnel interface, and assigned IP.

Subagents

  • The agent can spawn subagents to work on tasks in parallel using the spawn_subagent tool.
  • Each subagent gets its own iteration loop (up to 15 iterations, 10-minute timeout) with access to the same tools.
  • Results are merged back into the main conversation.

Setup and Configuration

  • Introduced run.sh as the primary setup and launcher script. Handles configuration, Docker builds, and container orchestration in a guided flow.
  • Three modes: Core Docker, Docker + Kali, Developer.
  • Split configuration into config.toml (static, requires restart) and backend/.env (dynamic, hot-reloadable).
  • Settings are configurable from both run.sh and the in-app Settings overlay.

Safety

  • Dangerous command detection: commands matching patterns like recursive forced deletion of system paths, writes to block devices, fork bombs, and system shutdown are flagged and require explicit approval.
  • Global consent override toggle in Settings.
  • Commands run in ~/pentest-workspace sandbox by default.

Slash Commands

  • Added 8 slash commands: /help, /status, /summarize, /targets, /export, /shells, /clear, /reset.
  • Run outside the agent loop for quick session management.

Models

  • Support for multiple LLM providers: OpenAI, Anthropic, Google, Mistral, and any OpenAI-compatible API.
  • Anthropic OAuth authentication as an alternative to API keys.
  • Reasoning mode support for models that offer extended thinking.
  • Configurable from the Settings overlay.

Context Management

  • Automatic summarization of older messages when context gets large.
  • Context usage indicator in the sidebar showing token consumption.
  • /summarize and /clear slash commands for manual context management.

Shell Management

  • Shell sessions on the exploit box with PTY (interactive) and exec (non-interactive) modes.
  • Multiple named shell tabs in the UI.
  • Agent can spawn, write to, read from, and close shells programmatically.
  • Automatic SSH reconnection on connection drops.

VNC / GUI

  • Auto-setup for VNC on the exploit box.
  • Built-in VNC diagnostic and repair tools.
  • noVNC-based browser access to the Kali desktop.
  • The built-in Kali container includes XFCE, Firefox ESR, and noVNC pre-configured.

Developer Mode

  • run.sh dev starts only infrastructure (MongoDB, Redis, optionally Kali) so you can run the frontend and backend locally for development.
  • Dev mode sets connection strings to localhost automatically.

Getting Started

Using Pentest Copilot

Configuration

Integrations

Reference

Clone this wiki locally