feat(ci): test slicing across GH actions jobs#30575
Merged
Merged
Conversation
Contributor
🔎 Lint report:
|
c95517f to
ed1a2f6
Compare
run_tests_parallel.py:
- --slice I/N flag (also HERMES_TEST_SLICE env var) runs only the
I-th slice of N, distributing files across slices by cached
duration using LPT (Longest Processing Time first) greedy
algorithm so each slice gets roughly equal wall time
- Duration cache (test_durations.json): maps relative file paths to
last-observed subprocess wall time. _save_durations merges with
existing cache so entries from other slices are preserved.
- Per-file subprocess timing in progress output + end-of-run
distribution summary (percentiles, top-10 slowest, <1s/<2s counts)
- Unknown files default to 2.0s estimate (~P50), spread evenly by LPT
.github/workflows/tests.yml:
- Matrix strategy: slice [1, 2, 3, 4] with fail-fast: false
- Each slice restores duration cache from main (stable key, no SHA),
runs its portion, uploads per-slice durations as artifacts
- save-durations job (main only, if: always()) downloads all 4
artifacts, merges into single cache entry for future PRs
- Timeout reduced from 60min to 30min per slice (~1/4 the work)
Cache design:
- Stable key (test-durations) not keyed by commit SHA — durations
are about files, not commits, and SHA-keyed caches miss on every
new commit and on PR merge commits
- actions/cache scoping: main's cache is visible to all PRs targeting
main; feature branches without a cache still work (default 2.0s)
- No dotfile prefix (upload-artifact v7 skips hidden files)
- test_browser_secret_exfil: mock _run_browser_command instead of launching real Chrome (secret check is pre-launch, browser is irrelevant to the assertion) - test_web_server: add time.sleep(0.05) after pub.send_text() to yield the event loop before receive_text(). TestClient's sync mode can race the broadcast handler otherwise, hanging the test.
Benchmarked 4/5/6/7/8 slices with LPT duration-balanced distribution: - 4 slices: 4.8m wall, 135s spread - 5 slices: 3.4m wall, 46s spread - 6 slices: 3.3m wall, 26s spread ← optimal - 7 slices: 3.9m wall, 109s spread - 8 slices: 3.7m wall, 96s spread 6 slices is the sweet spot: lowest wall time, tightest spread. 7+ gets slower due to per-slice startup overhead dominating. Also removes benchmark branch markers from save-durations condition.
58d4b17 to
ecbb1ab
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Splits the CI test suite into 6 parallel slices using LPT (Longest Processing Time first) duration-balanced distribution, with per-file subprocess isolation.
Before: single job, ~7-8min wall time, flaky from cross-file state leakage
After: 6 parallel jobs, ~3.3min wall time, zero shared state between files
What changed
scripts/run_tests_parallel.py(new)test_*.pyfiles undertests/(excludesintegration/+e2e/)multiprocessing) — no shared interpreter state--slice I/Nflag: partitions files across N jobs using LPT algorithm on cached durationsHERMES_TEST_SLICE=I/Nenv var alternative (for CI)test_durations.jsonafter each run; reads it before slicing for balanced distribution.github/workflows/tests.ymlslice: [1, 2, 3, 4, 5, 6]test_durations.jsonfromactions/cache(key:test-durations)save-durationsjob (main branch only) merges all slice artifacts into the cacheFlake fixes
test_allows_normal_url(test_browser_secret_exfil.py): mocked_run_browser_commandto avoid launching Chrome in CI while still testing the secret-exfiltration checktest_pub_broadcasts_to_events_subscribers(test_web_server.py): wrappedreceive_text()in a thread with 10s timeout so the test fails fast instead of hanging for 30s; added a small yield (time.sleep(0.05)) to let the TestClient's event loop process the broadcast before receiving_broadcast_event(web_server.py): added_log.warning(...)in theexcept Exceptionblock so silently-dropped broadcasts are visible in CI logsBenchmark results
Tested 4/5/6/7/8 slices with LPT-balanced distribution (cached durations from prior runs):
6 is the sweet spot: lowest wall time, tightest spread. 7+ gets slower because per-slice startup overhead (pytest collection + subprocess isolation) dominates when slices get too thin.
How the duration cache works
save-durationsmerges all slice artifacts intoactions/cachewith keytest-durationsInfographic