Aquileo | feat(ci): test slicing across GH actions jobs by ethernet8023 · Pull Request #30575 · NousResearch/hermes-agent

ethernet8023 · 2026-05-22T19:34:13Z

Summary

Splits the CI test suite into 6 parallel slices using LPT (Longest Processing Time first) duration-balanced distribution, with per-file subprocess isolation.

Before: single job, ~7-8min wall time, flaky from cross-file state leakage
After: 6 parallel jobs, ~3.3min wall time, zero shared state between files

What changed

`scripts/run_tests_parallel.py` (new)

Discovers all test_*.py files under tests/ (excludes integration/ + e2e/)
Runs each file in a freshly-spawned subprocess (multiprocessing) — no shared interpreter state
--slice I/N flag: partitions files across N jobs using LPT algorithm on cached durations
HERMES_TEST_SLICE=I/N env var alternative (for CI)
Writes test_durations.json after each run; reads it before slicing for balanced distribution
Per-file timing display + end-of-run distribution summary

`.github/workflows/tests.yml`

6-way matrix: slice: [1, 2, 3, 4, 5, 6]
Each slice restores test_durations.json from actions/cache (key: test-durations)
Each slice uploads partial durations as an artifact
New save-durations job (main branch only) merges all slice artifacts into the cache
PRs get the cache from main for balanced slicing from the first run

Flake fixes

test_allows_normal_url (test_browser_secret_exfil.py): mocked _run_browser_command to avoid launching Chrome in CI while still testing the secret-exfiltration check
test_pub_broadcasts_to_events_subscribers (test_web_server.py): wrapped receive_text() in a thread with 10s timeout so the test fails fast instead of hanging for 30s; added a small yield (time.sleep(0.05)) to let the TestClient's event loop process the broadcast before receiving
_broadcast_event (web_server.py): added _log.warning(...) in the except Exception block so silently-dropped broadcasts are visible in CI logs

Benchmark results

Tested 4/5/6/7/8 slices with LPT-balanced distribution (cached durations from prior runs):

Slices	Wall time	Spread (max−min)
4	4.8m	135s
5	3.4m	46s
6	3.3m	26s
7	3.9m	109s
8	3.7m	96s

6 is the sweet spot: lowest wall time, tightest spread. 7+ gets slower because per-slice startup overhead (pytest collection + subprocess isolation) dominates when slices get too thin.

How the duration cache works

Main branch: after every run, save-durations merges all slice artifacts into actions/cache with key test-durations
PRs: restore the cache from main, so LPT splits files based on real timings (not the 2s default)
First PR run may be slightly unbalanced if cache is stale; subsequent main runs refresh it
Stale entries are fine — better to have approximate durations for unrun files than no data at all

Infographic

github-actions · 2026-05-22T19:34:57Z

🔎 Lint report: `ethie/faster-tests` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8991 on HEAD, 8991 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4770 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

run_tests_parallel.py: - --slice I/N flag (also HERMES_TEST_SLICE env var) runs only the I-th slice of N, distributing files across slices by cached duration using LPT (Longest Processing Time first) greedy algorithm so each slice gets roughly equal wall time - Duration cache (test_durations.json): maps relative file paths to last-observed subprocess wall time. _save_durations merges with existing cache so entries from other slices are preserved. - Per-file subprocess timing in progress output + end-of-run distribution summary (percentiles, top-10 slowest, <1s/<2s counts) - Unknown files default to 2.0s estimate (~P50), spread evenly by LPT .github/workflows/tests.yml: - Matrix strategy: slice [1, 2, 3, 4] with fail-fast: false - Each slice restores duration cache from main (stable key, no SHA), runs its portion, uploads per-slice durations as artifacts - save-durations job (main only, if: always()) downloads all 4 artifacts, merges into single cache entry for future PRs - Timeout reduced from 60min to 30min per slice (~1/4 the work) Cache design: - Stable key (test-durations) not keyed by commit SHA — durations are about files, not commits, and SHA-keyed caches miss on every new commit and on PR merge commits - actions/cache scoping: main's cache is visible to all PRs targeting main; feature branches without a cache still work (default 2.0s) - No dotfile prefix (upload-artifact v7 skips hidden files)

- test_browser_secret_exfil: mock _run_browser_command instead of launching real Chrome (secret check is pre-launch, browser is irrelevant to the assertion) - test_web_server: add time.sleep(0.05) after pub.send_text() to yield the event loop before receive_text(). TestClient's sync mode can race the broadcast handler otherwise, hanging the test.

Benchmarked 4/5/6/7/8 slices with LPT duration-balanced distribution: - 4 slices: 4.8m wall, 135s spread - 5 slices: 3.4m wall, 46s spread - 6 slices: 3.3m wall, 26s spread ← optimal - 7 slices: 3.9m wall, 109s spread - 8 slices: 3.7m wall, 96s spread 6 slices is the sweet spot: lowest wall time, tightest spread. 7+ gets slower due to per-slice startup overhead dominating. Also removes benchmark branch markers from save-durations condition.

ethernet8023 changed the title ~~feat(ci): 4-way matrix slicing with LPT duration-balanced distribution~~ feat(ci): test slicing across GH actions jobs May 22, 2026

ethernet8023 force-pushed the ethie/faster-tests branch from c95517f to ed1a2f6 Compare May 22, 2026 19:36

alt-glitch added type/perf Performance improvement or optimization P3 Low — cosmetic, nice to have labels May 22, 2026

ethernet8023 added 5 commits May 22, 2026 18:04

test: 4-way slice benchmark (with cache save)

0a20c97

fix: clean push triggers

8c45ee7

ethernet8023 force-pushed the ethie/faster-tests branch from 58d4b17 to ecbb1ab Compare May 22, 2026 22:04

teknium1 merged commit dc4b046 into main May 23, 2026
26 checks passed

teknium1 deleted the ethie/faster-tests branch May 23, 2026 02:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ci): test slicing across GH actions jobs#30575

feat(ci): test slicing across GH actions jobs#30575
teknium1 merged 5 commits into
mainfrom
ethie/faster-tests

ethernet8023 commented May 22, 2026 •
edited by teknium1

Loading

Uh oh!

github-actions Bot commented May 22, 2026 •
edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ethernet8023 commented May 22, 2026 • edited by teknium1 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

scripts/run_tests_parallel.py (new)

.github/workflows/tests.yml

Flake fixes

Benchmark results

How the duration cache works

Infographic

Uh oh!

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔎 Lint report: ethie/faster-tests vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ethernet8023 commented May 22, 2026 •
edited by teknium1

Loading

`scripts/run_tests_parallel.py` (new)

`.github/workflows/tests.yml`

github-actions Bot commented May 22, 2026 •
edited

Loading

🔎 Lint report: `ethie/faster-tests` vs `origin/main`