Skip to content

feat(ci): test slicing across GH actions jobs#30575

Merged
teknium1 merged 5 commits into
mainfrom
ethie/faster-tests
May 23, 2026
Merged

feat(ci): test slicing across GH actions jobs#30575
teknium1 merged 5 commits into
mainfrom
ethie/faster-tests

Conversation

@ethernet8023

@ethernet8023 ethernet8023 commented May 22, 2026

Copy link
Copy Markdown
Collaborator

Summary

Splits the CI test suite into 6 parallel slices using LPT (Longest Processing Time first) duration-balanced distribution, with per-file subprocess isolation.

Before: single job, ~7-8min wall time, flaky from cross-file state leakage
After: 6 parallel jobs, ~3.3min wall time, zero shared state between files

What changed

scripts/run_tests_parallel.py (new)

  • Discovers all test_*.py files under tests/ (excludes integration/ + e2e/)
  • Runs each file in a freshly-spawned subprocess (multiprocessing) — no shared interpreter state
  • --slice I/N flag: partitions files across N jobs using LPT algorithm on cached durations
  • HERMES_TEST_SLICE=I/N env var alternative (for CI)
  • Writes test_durations.json after each run; reads it before slicing for balanced distribution
  • Per-file timing display + end-of-run distribution summary

.github/workflows/tests.yml

  • 6-way matrix: slice: [1, 2, 3, 4, 5, 6]
  • Each slice restores test_durations.json from actions/cache (key: test-durations)
  • Each slice uploads partial durations as an artifact
  • New save-durations job (main branch only) merges all slice artifacts into the cache
  • PRs get the cache from main for balanced slicing from the first run

Flake fixes

  • test_allows_normal_url (test_browser_secret_exfil.py): mocked _run_browser_command to avoid launching Chrome in CI while still testing the secret-exfiltration check
  • test_pub_broadcasts_to_events_subscribers (test_web_server.py): wrapped receive_text() in a thread with 10s timeout so the test fails fast instead of hanging for 30s; added a small yield (time.sleep(0.05)) to let the TestClient's event loop process the broadcast before receiving
  • _broadcast_event (web_server.py): added _log.warning(...) in the except Exception block so silently-dropped broadcasts are visible in CI logs

Benchmark results

Tested 4/5/6/7/8 slices with LPT-balanced distribution (cached durations from prior runs):

Slices Wall time Spread (max−min)
4 4.8m 135s
5 3.4m 46s
6 3.3m 26s
7 3.9m 109s
8 3.7m 96s

6 is the sweet spot: lowest wall time, tightest spread. 7+ gets slower because per-slice startup overhead (pytest collection + subprocess isolation) dominates when slices get too thin.

How the duration cache works

  1. Main branch: after every run, save-durations merges all slice artifacts into actions/cache with key test-durations
  2. PRs: restore the cache from main, so LPT splits files based on real timings (not the 2s default)
  3. First PR run may be slightly unbalanced if cache is stale; subsequent main runs refresh it
  4. Stale entries are fine — better to have approximate durations for unrun files than no data at all

Infographic

pr-30575-ci-slicing

@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: ethie/faster-tests vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8991 on HEAD, 8991 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4770 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@ethernet8023 ethernet8023 changed the title feat(ci): 4-way matrix slicing with LPT duration-balanced distribution feat(ci): test slicing across GH actions jobs May 22, 2026
@alt-glitch alt-glitch added type/perf Performance improvement or optimization P3 Low — cosmetic, nice to have labels May 22, 2026
run_tests_parallel.py:
  - --slice I/N flag (also HERMES_TEST_SLICE env var) runs only the
    I-th slice of N, distributing files across slices by cached
    duration using LPT (Longest Processing Time first) greedy
    algorithm so each slice gets roughly equal wall time
  - Duration cache (test_durations.json): maps relative file paths to
    last-observed subprocess wall time. _save_durations merges with
    existing cache so entries from other slices are preserved.
  - Per-file subprocess timing in progress output + end-of-run
    distribution summary (percentiles, top-10 slowest, <1s/<2s counts)
  - Unknown files default to 2.0s estimate (~P50), spread evenly by LPT

.github/workflows/tests.yml:
  - Matrix strategy: slice [1, 2, 3, 4] with fail-fast: false
  - Each slice restores duration cache from main (stable key, no SHA),
    runs its portion, uploads per-slice durations as artifacts
  - save-durations job (main only, if: always()) downloads all 4
    artifacts, merges into single cache entry for future PRs
  - Timeout reduced from 60min to 30min per slice (~1/4 the work)

Cache design:
  - Stable key (test-durations) not keyed by commit SHA — durations
    are about files, not commits, and SHA-keyed caches miss on every
    new commit and on PR merge commits
  - actions/cache scoping: main's cache is visible to all PRs targeting
    main; feature branches without a cache still work (default 2.0s)
  - No dotfile prefix (upload-artifact v7 skips hidden files)
- test_browser_secret_exfil: mock _run_browser_command instead of
  launching real Chrome (secret check is pre-launch, browser is
  irrelevant to the assertion)
- test_web_server: add time.sleep(0.05) after pub.send_text() to
  yield the event loop before receive_text(). TestClient's sync mode
  can race the broadcast handler otherwise, hanging the test.
Benchmarked 4/5/6/7/8 slices with LPT duration-balanced distribution:
- 4 slices: 4.8m wall, 135s spread
- 5 slices: 3.4m wall, 46s spread
- 6 slices: 3.3m wall, 26s spread ← optimal
- 7 slices: 3.9m wall, 109s spread
- 8 slices: 3.7m wall, 96s spread

6 slices is the sweet spot: lowest wall time, tightest spread.
7+ gets slower due to per-slice startup overhead dominating.

Also removes benchmark branch markers from save-durations condition.
@teknium1 teknium1 merged commit dc4b046 into main May 23, 2026
26 checks passed
@teknium1 teknium1 deleted the ethie/faster-tests branch May 23, 2026 02:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P3 Low — cosmetic, nice to have type/perf Performance improvement or optimization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants