Skip to content

.pr_agent_accepted_suggestions

qodo-merge-bot edited this page Jun 22, 2026 · 387 revisions
                     PR 17690 (2026-06-19)                    
[maintainability] Import order not formatted
Import order not formatted Some changed BiDi classes place `import org.openqa.selenium.Beta;` before `java.*` imports (e.g., `HasBiDi`), which is inconsistent with the repository’s google-java-format output. The CI “Format” job runs `./go format` (google-java-format) and fails if it produces diffs, so this will block CI until imports are reformatted.

Issue description

Some modified Java files have import blocks that are not in the canonical order produced by the repo’s formatter (google-java-format). This causes CI’s format check to modify files and then fail the job because the working tree is dirty.

Issue Context

CI runs ./go format, which applies google-java-format to all java/**.java files, and then fails if git diff is non-empty.

Fix

Run ./go format and commit the resulting changes, or manually reorder imports to match google-java-format output (then re-run ./go format to confirm no changes).

Fix Focus Areas

  • java/src/org/openqa/selenium/bidi/HasBiDi.java[18-23]
  • java/src/org/openqa/selenium/bidi/browser/ClientWindowInfo.java[18-23]
  • java/src/org/openqa/selenium/bidi/browser/SetDownloadBehaviorParameters.java[18-27]


                     PR 17678 (2026-06-12)                    
[reliability] `clearAll` deletes completion marker
`clearAll` deletes completion marker `RedisBackedNewSessionQueue.addToQueue` now unconditionally calls `safelyClearRedisState(reqId)` in a `finally` block, and that helper deletes the Redis completion marker (`completedKey`) and stored result keys via `clearAll`. Removing these guard/result keys can let a later/retried `complete(reqId, ...)` “win” again and/or return `tracked=false`, which can lead to incorrect result overwrites or upstream teardown decisions in concurrent/failover scenarios.

Issue description

addToQueue now always runs Redis cleanup in a finally block via safelyClearRedisState(reqId), and that cleanup calls clearAll(reqId) which deletes completedKey(reqId) and the result:* keys. These keys are required for the “winner-takes-all”/idempotent completion semantics used by complete() and for communicating the terminal result across retries/replicas; deleting them can allow late/duplicate complete() calls to succeed again or return tracked=false, triggering incorrect result overwrites or upstream teardown decisions.

Issue Context

The completion marker (completedKey) is intended to ensure only one terminal completion wins when timeouts race with success and to keep complete() idempotent across retries/replicas via a SET-NX guard. The recent change makes addToQueue exception-safe by always cleaning up Redis state, but the cleanup is now too aggressive because it removes the completion marker and stored result keys, effectively re-opening the request for a second completion in concurrent/failover scenarios.

Fix Focus Areas

  • java/src/org/openqa/selenium/grid/sessionqueue/redis/RedisBackedNewSessionQueue.java[245-269]
  • java/src/org/openqa/selenium/grid/sessionqueue/redis/RedisBackedNewSessionQueue.java[608-636]

[reliability] `addToQueue` not exception-safe
`addToQueue` not exception-safe `RedisBackedNewSessionQueue.addToQueue` registers per-request state in in-memory registries (`waiters`, `contexts`) and performs multiple Redis side effects, but the cleanup of those registries/keys only happens on the normal path after `awaitResult` returns. If any Redis write or other runtime error occurs before that point, the method can leak per-request registrations and leave partial/stale Redis queue state behind, especially during transient Redis failures/outages.

Issue description

RedisBackedNewSessionQueue.addToQueue performs multi-step side effects (registering request state in contexts/waiters, writing Redis keys, pushing a queue entry) but only cleans up that state on the happy path after awaitResult returns. If a Redis operation (or any runtime exception) occurs before the normal cleanup block, the method can exit without removing the in-memory tracking entries and without best-effort cleanup of partially-created Redis keys/list entries, leaking memory and leaving stale queue state.

Issue Context

This is concurrent/resource-heavy, Redis-backed queue code expected to run in failure-prone environments (timeouts, connection resets, closed connections), so cleanup must be bounded and exception-safe across all terminal paths, including partial failures while enqueuing. The current implementation lacks a try/finally to guarantee removal from waiters/contexts, and does not ensure best-effort Redis cleanup when failures happen early.

Fix Focus Areas

  • java/src/org/openqa/selenium/grid/sessionqueue/redis/RedisBackedNewSessionQueue.java[227-256]


                     PR 17657 (2026-06-08)                    
[reliability] Artifact outputs can collide
Artifact outputs can collide generate_bidi_library hardcodes the shared artifact output filenames (bidi-ast.json / bidi-model.json) independent of the macro instance name. A second invocation of this macro in the same Bazel package would attempt to declare the same output paths and fail Bazel analysis due to output collisions.

Issue description

generate_bidi_library always emits bidi-ast.json and bidi-model.json as output filenames. If the macro is ever instantiated more than once in the same Bazel package, Bazel will report output path collisions.

Issue Context

This PR introduces shared artifacts via js_run_binary (*_ast and *_json targets). These outputs are currently fixed strings instead of being scoped by the macro instance name.

Fix Focus Areas

  • javascript/selenium-webdriver/private/generate_bidi.bzl[157-191]

Suggested fix

  • Scope outputs by the macro instance name, e.g.:
    • ast_out = name + "-bidi-ast.json" (or name + "_ast.json")
    • model_out = name + "-bidi-model.json"
  • Keep the public artifact names stable if needed by adding a thin filegroup (or alias target) that re-exports the uniquely named file under a stable label, rather than a stable filename.


                     PR 17650 (2026-06-06)                    
[reliability] Empty driver path crash
Empty driver path crash delete-browsers-drivers.ps1 now builds $paths only from WebDriver env vars and calls Remove-Item unconditionally. If none of those env vars are set on the Windows runner, $paths is empty and PowerShell parameter binding fails before -ErrorAction applies, breaking the workflow step and leaving preinstalled drivers in place.

Issue description

On Windows, Remove-Item -Path $paths ... is called even when $paths is empty (because it’s filtered from env vars only). PowerShell throws a parameter-binding error for an empty collection before -ErrorAction SilentlyContinue can suppress anything, causing the step to fail.

Issue Context

The workflow does not set these env vars explicitly for Windows; the step relies on the runner to provide them.

Fix Focus Areas

  • scripts/github-actions/delete-browsers-drivers.ps1[12-20]
  • .github/workflows/bazel.yml[202-208]

Suggested fix approach

  • Add a guard:
    • If $paths.Count -eq 0, print a message and exit 0 (or discover driver locations via Get-Command chromedriver/msedgedriver/geckodriver and delete those paths).
  • Optionally: set these env vars in the workflow step (if known stable locations exist), or add fallback known locations used by GitHub Windows runners.

[maintainability] Misleading script filename
Misleading script filename The Windows step and script comments now indicate only drivers are deleted, but the script filename remains delete-browsers-drivers.ps1. This mismatch can mislead future changes/reviews into assuming browsers are still deleted on Windows.

Issue description

The Windows deletion behavior was narrowed to drivers-only, but the script name still suggests it deletes browsers too.

Issue Context

This is a maintainability/clarity issue (not functional) that could cause confusion later.

Fix Focus Areas

  • .github/workflows/bazel.yml[205-208]
  • scripts/github-actions/delete-browsers-drivers.ps1[1-20]

Suggested fix approach

  • Rename the script to delete-drivers.ps1 (or similar) and update the workflow invocation accordingly; or
  • Keep the filename but add a short comment in the workflow step noting that the script is drivers-only on Windows.


                     PR 17645 (2026-06-05)                    
[reliability] Quoted driver path breaks
Quoted driver path breaks The driver_finder fixture strips quotes from --browser-binary but does not strip quotes from --driver-binary before passing it into the Service. If the driver path is single-quoted (a pattern already handled elsewhere in this repo), DriverFinder will treat it as a literal path including quotes and fail file validation.

Issue description

driver_finder normalizes --browser-binary by stripping surrounding single quotes, but it passes --driver-binary through without stripping. When Service.path contains quotes, DriverFinder validates it with Path(path).is_file() and will fail.

Issue Context

The existing test harness already anticipates quoted executable paths and strips them in other code paths.

Fix Focus Areas

  • py/test/selenium/webdriver/common/driver_finder_tests.py[46-53]

Suggested change

Normalize the executable path the same way as the browser binary:

executable = request.config.option.executable
exe_path = _resolve_bazel_path(executable).strip("'") if executable else None
service = module.service.Service(executable_path=exe_path)

(Optionally, consider a small helper to avoid duplicating quote-stripping logic.)



                     PR 17644 (2026-06-05)                    
[reliability] `xdpyinfo` loop never fails
`xdpyinfo` loop never fails The workflow’s X server readiness wait loop can time out without failing, so the step still proceeds to start `fluxbox` (backgrounded) even when `xdpyinfo` never succeeds and no display is actually available. This violates the requirement to validate preconditions and to have deterministic timeout failure paths for async waits, and can preserve the same intermittent/silent failure mode the wait was meant to eliminate.

Issue description

The X server (Xvfb) readiness polling loop does not have a deterministic failure path: if xdpyinfo never succeeds within the timeout, the workflow still starts fluxbox in the background and continues, allowing silent/intermittent failures when no display is actually ready.

Issue Context

This step is intended to eliminate a race where fluxbox starts before Xvfb is ready. CI/workflow scripts must validate preconditions and fail safe on errors, and async waits must have explicit timeouts with actionable failure when the timeout is exceeded; without a post-loop assertion, the race is only reduced, not eliminated.

Fix Focus Areas

  • .github/workflows/bazel.yml[171-182]


                     PR 17627 (2026-06-04)                    
[correctness] Null transport factory result
Null transport factory result BiDiOptionsBuilder.UseTransport(Func) does not validate that the factory returns a non-null ITransport, so a null return will flow into BiDi.ConnectAsync and crash later when Broker calls SendAsync/ReceiveAsync on the transport.

Issue description

UseTransport(Func<ITransport>) validates the factory delegate itself, but not the object it returns. If factory() returns null, BiDi.ConnectAsync will pass it to Broker, leading to a NullReferenceException at runtime.

Issue Context

BiDi.ConnectAsync awaits TransportFactory and immediately constructs Broker(transport, bidi).

Fix Focus Areas

  • dotnet/src/webdriver/BiDi/BiDiOptionsBuilder.cs[56-63]

Suggested fix

Update the wrapper delegate to validate the result:

  • Call var transport = factory();
  • ArgumentNullException.ThrowIfNull(transport);
  • Return Task.FromResult(transport)

Optionally, wrap exceptions from factory() into a faulted Task<ITransport> for consistency with async factory patterns.



                     PR 17626 (2026-06-04)                    
[correctness] Java rerun blocks other releases
Java rerun blocks other releases The publish matrix exits with an error on any rerun for the Java entry before it checks whether Java is actually being released, so rerunning a ruby/dotnet/javascript-only release will still fail due to Java. This prevents reruns for non-Java patch releases (and any rerun attempt >1) even when Java would otherwise be skipped.

Issue description

The publish matrix currently fails on rerun attempts (github.run_attempt != '1') for the Java matrix entry even when the workflow is releasing a different language (e.g., selenium-4.28.1-ruby). This happens because the Java rerun guard runs before the “is this language selected?” check.

Issue Context

  • publish runs a matrix over [java, ruby, dotnet, javascript].
  • For patch releases, parse-tag sets outputs.language to the tag suffix (e.g., ruby).
  • On reruns, the Java matrix entry should skip when outputs.language != 'java', but it currently fails early.

Fix

Move or gate the Java rerun guard so it only triggers when Java is actually being released:

Option A (recommended): nest the Java guard inside the selected-language branch:

if [ "${{ needs.parse-tag.outputs.language == 'all' || needs.parse-tag.outputs.language == matrix.language }}" = "true" ]; then
  if [ "${{ matrix.language == 'java' && github.run_attempt != '1' }}" = "true" ]; then
    echo "::error::Java release is not yet rerun-safe — ..."
    exit 1
  fi
  ./go ${{ matrix.language }}:release
else
  echo skipping
fi

Option B: add the selected-language predicate directly into the Java guard condition.

Fix Focus Areas

  • .github/workflows/release.yml[144-153]


                     PR 17625 (2026-06-04)                    
[reliability] `UseTransport` ignores cancellationToken
`UseTransport` ignores cancellationToken `BiDiOptionsBuilder.UseTransport(ITransport transport)` sets `TransportFactory` to return `Task.FromResult(transport)` and discards the provided `CancellationToken`, so `BiDi.ConnectAsync(..., cancellationToken)` cannot be cancelled (and may even succeed when the token is already cancelled) when this injected-transport path is used. This breaks expected .NET cancellation semantics, is inconsistent with the default WebSocket transport path that honors cancellation, and can lead to non-deterministic hangs or delayed cancellation during connection setup.

Issue description

BiDiOptionsBuilder.UseTransport(ITransport transport) creates a TransportFactory that ignores the CancellationToken parameter (returns a completed task via Task.FromResult(transport)), so BiDi.ConnectAsync(..., cancellationToken) cannot cancel transport acquisition and may even proceed successfully when the token is already cancelled.

Issue Context

BiDi.ConnectAsync forwards the caller’s cancellationToken into builder.TransportFactory(...). The default WebSocket transport factory honors cancellation during connection (e.g., via ClientWebSocket.ConnectAsync), so callers expect consistent .NET cancellation semantics across transport configuration paths; however, the injected transport-instance path currently discards the token.

Fix Focus Areas

  • dotnet/src/webdriver/BiDi/BiDiOptionsBuilder.cs[53-56]

Suggested fix

Update the instance-based factory to observe cancellation before returning the provided transport instance, e.g.:

  • implement UseTransport((_, ct) => ct.IsCancellationRequested ? Task.FromCanceled<ITransport>(ct) : Task.FromResult(transport)), or equivalently call ct.ThrowIfCancellationRequested() before returning the transport.

[correctness] No transport null-check
No transport null-check BiDiOptionsBuilder.UseTransport captures the provided transport without validating it, so passing null makes TransportFactory return null and later causes a NullReferenceException when Broker uses the transport. This turns a simple caller mistake into a late, non-actionable runtime crash during connection setup.

Issue description

BiDiOptionsBuilder.UseTransport(ITransport transport) does not validate transport and will happily set TransportFactory to return null when the caller passes null, leading to a runtime NullReferenceException downstream.

Issue Context

BiDi.ConnectAsync calls builder.TransportFactory(...) and passes the resulting transport into Broker, which unconditionally calls SendAsync, ReceiveAsync, and DisposeAsync on that instance.

Fix

  • Add ArgumentNullException.ThrowIfNull(transport); at the start of UseTransport.

Fix Focus Areas

  • dotnet/src/webdriver/BiDi/BiDiOptionsBuilder.cs[49-58]


                     PR 17620 (2026-06-03)                    
[reliability] `delete-browsers-drivers.sh` ignores errors
`delete-browsers-drivers.sh` ignores errors The browser/driver deletion script does not validate inputs or command results, and it can pass empty path arguments to `rm`, producing noisy/undefined behavior while still continuing. This violates the requirement to fail safely and handle external command failures explicitly in CI automation.

Issue description

scripts/github-actions/delete-browsers-drivers.sh does not use strict mode and runs sudo rm -rf with possibly-empty env-derived paths, without validating that deletions succeeded.

Issue Context

In CI, silent cleanup failures can cause later steps to use unintended system browsers/drivers or run out of disk space without a clear signal.

Fix Focus Areas

  • scripts/github-actions/delete-browsers-drivers.sh[7-21]

[security] rm option injection risk
rm option injection risk `delete-browsers-drivers.sh` passes env-var-derived paths directly to `sudo rm -rf` without a `--` terminator or safety checks, so values beginning with `-` can be parsed as options and lead to unintended deletions. This is a defensive-scripting regression versus the prior cleanup pattern and can break CI runners/jobs in hard-to-debug ways.

Issue description

sudo rm -rf is invoked with arguments sourced from environment variables (CHROMEWEBDRIVER, EDGEWEBDRIVER, GECKOWEBDRIVER) without a -- end-of-options marker or validation. If any value begins with -, rm can treat it as an option; combined with another argument like /, this can delete unintended paths.

Issue Context

This script is intended to delete specific preinstalled browser/driver binaries on CI runners, so it should be robust to unexpected env var contents and avoid option injection.

Fix

  • Build a list/array of paths to delete.
  • Filter out empty values.
  • Reject dangerous values (/, ., ..) and any value starting with - (or force them to be treated as operands via --).
  • Invoke rm as: sudo rm -rf -- "${paths[@]}".

Fix Focus Areas

  • scripts/github-actions/delete-browsers-drivers.sh[11-21]


                     PR 17617 (2026-06-03)                    
[maintainability] Format guidance mismatches CI
Format guidance mismatches CI AGENTS.md now claims running `./scripts/format.sh` before pushing avoids CI formatter failures, but the CI format job still runs `./go format` and fails on any diff produced by that task. Since `./go format` runs the full Rake formatting suite unconditionally while `scripts/format.sh` is change-scoped, contributors may follow the new guidance yet still fail CI or receive conflicting instructions.

Issue description

AGENTS.md was updated to recommend ./scripts/format.sh / --pre-push to prevent CI formatter failures, but CI currently runs ./go format and reports failures based on that command. This creates inconsistent guidance and a real risk that contributors follow the documented workflow yet still hit CI failures.

Issue Context

  • ./go format executes the Rake :format task, which formats across all languages unconditionally.
  • ./scripts/format.sh is change-scoped (runs some language formatters only when matching paths changed).
  • CI currently enforces formatting by running ./go format (and tells contributors to run it).

Fix Focus Areas

Pick one clear source-of-truth and make both CI + docs consistent:

  • Update CI workflows to run ./scripts/format.sh (optionally --pre-push) and update error messages to reference it, OR
  • Update AGENTS.md to explicitly state CI runs ./go format and recommend ./go format as the fallback/source-of-truth if CI fails.

References:

  • AGENTS.md[69-78]
  • .github/workflows/ci-lint.yml[47-74]
  • Rakefile[128-144]
  • scripts/format.sh[1-8], scripts/format.sh[35-137]


                     PR 17615 (2026-06-02)                    
[reliability] Retry catches too narrowly
Retry catches too narrowly Driver._start_local_driver retries only on WebDriverException, so any other exception type raised during session creation (e.g., from the HTTP layer) will bypass the retry and fail immediately. This undermines the intended stabilization for intermittent startup/connectivity failures.

Issue description

Driver._start_local_driver only catches WebDriverException, but driver startup can raise other exception types during the initial NEW_SESSION request path (the HTTP request call site is not wrapped here, and RemoteWebDriver.start_session re-raises the original exception type). As a result, retries may not trigger for common startup-connect failures.

Issue Context

RemoteConnection._request directly calls urllib3 without catching exceptions, and RemoteWebDriver.start_session catches generic Exception only to stop the service and then re-raises the same exception.

Fix Focus Areas

  • py/conftest.py[467-484]

Suggested approach

  • Expand the retry handler to catch broader exceptions for local startup (e.g., Exception or a targeted tuple that includes WebDriverException plus networking errors), while still re-raising immediately on the final attempt.
  • Keep the warning log including exception type/message to preserve debuggability.


                     PR 17614 (2026-06-02)                    
[reliability] Wrong `AdditionalMessageData` JSON assertion
Wrong `AdditionalMessageData` JSON assertion `CommandAdditionalMessageDataIsSerializedAsTopLevelFields` asserts a substring that cannot appear in the produced command JSON because it assumes `AdditionalMessageData` fields are the first properties and followed by a trailing comma, while the broker writes `id`, `method`, and `params` first and appends `AdditionalMessageData` after `params` (often as the final property). This makes the test fail (or become ordering-dependent) even when the implementation is correct, blocking CI and undermining the required coverage for the new behavior.

Issue description

CommandAdditionalMessageDataIsSerializedAsTopLevelFields currently asserts the sent JSON contains the substring {"baz":"qux",, which assumes property ordering (that baz is first) and a trailing comma. In reality, the message writer serializes id, method, and params first, then appends CommandOptions.AdditionalMessageData fields after params, so baz will typically not be at the start and may be the last property (no trailing comma), making the test fail even when serialization is correct.

Issue Context

This is a new/updated unit test intended to validate that CommandOptions.AdditionalMessageData is serialized into the top-level command envelope. To satisfy the required test coverage (PR Compliance ID 18) without brittleness, the assertion should validate the semantic behavior (a top-level property baz with value qux) in an order/format-independent way, e.g., by parsing the outgoing JSON or by using a substring match that doesn’t depend on ordering or commas.

Fix Focus Areas

  • dotnet/test/webdriver/BiDi/SessionUnitTests.cs[158-170]

[correctness] Non-object JSON silently dropped
Non-object JSON silently dropped `AdditionalData(string json)` accepts any JSON root (including arrays/null/primitives), but the rest of the type treats non-object roots as “empty”, so the provided data is silently ignored and won’t be merged into outgoing commands. This can hide user mistakes and make the new extensibility feature unreliable to debug.

Issue description

AdditionalData(string json) currently accepts any valid JSON, but AdditionalData is used as a property-bag (object) for extension properties. When callers provide a valid non-object JSON (e.g., "null", "[]", "123"), IsEmpty becomes true and the data is silently dropped (never merged/sent).

Issue Context

The API surface suggests the string overload is a convenience for providing additional properties. Silent dropping is surprising and makes debugging extensions difficult.

Fix

In AdditionalData(string json), validate doc.RootElement.ValueKind == JsonValueKind.Object.

  • If not an object, throw ArgumentException (or JsonException) with a clear message (e.g., "AdditionalData JSON must be an object").

Fix Focus Areas

  • dotnet/src/webdriver/BiDi/AdditionalData.cs[39-43]

[correctness] TryGetValue throws on empty
TryGetValue throws on empty `AdditionalData.TryGetValue` calls `JsonElement.TryGetProperty` without guarding for non-object/empty state, so calling it on `AdditionalData.Empty` (the default) can throw instead of returning `false`.

Issue description

AdditionalData.TryGetValue should follow the .NET Try-pattern and never throw for an empty value. Currently, AdditionalData.Empty is default (non-object JsonElement), but TryGetValue calls _data.TryGetProperty(...) unconditionally.

Issue Context

IsEmpty is defined as _data.ValueKind != JsonValueKind.Object, meaning Empty/default instances are explicitly non-object, yet TryGetValue does not check IsEmpty (or ValueKind) before accessing object-only APIs.

Fix Focus Areas

  • dotnet/src/webdriver/BiDi/AdditionalData.cs[62-66]

Suggested change

Implement:

  • if (IsEmpty) { value = default; return false; }
  • otherwise call _data.TryGetProperty(key, out value)

Optionally also guard the indexer similarly (or leave it throwing by design, but ensure TryGetValue is safe).


[correctness] Options AdditionalData dropped
Options AdditionalData dropped CommandOptions.AdditionalData is never merged into the serialized "params" object, so callers setting options.AdditionalData will silently lose those fields in outgoing commands. Since command parameter objects are internal and constructed without copying options.AdditionalData, public APIs effectively cannot send params-level extension fields.

Issue description

CommandOptions.AdditionalData is defined as part of the public options surface, but it is never applied to the serialized command params. As a result, options-based additional params data is silently dropped, and callers cannot practically use params-level extension data because the concrete *Parameters types are internal and created inside modules.

Issue Context

Broker.ExecuteAsync serializes only @params as "params", and only uses options.AdditionalMessageData for envelope-level additions.

Fix Focus Areas

  • dotnet/src/webdriver/BiDi/Broker.cs[68-109]
  • dotnet/src/webdriver/BiDi/Command.cs[50-59]

Implementation guidance

  • In Broker.ExecuteAsync (centralized fix): if options?.AdditionalData is non-empty, merge it into @params.RawAdditionalData (or set @params.AdditionalData via the backing RawAdditionalData) before calling JsonSerializer.Serialize.
  • Define merge behavior for key collisions (preferably throw a clear exception if the same key already exists in @params.RawAdditionalData).
  • Add/adjust tests to assert that options.AdditionalData appears inside the serialized params object.

[reliability] Reserved keys overridable
Reserved keys overridable Broker.ExecuteAsync writes options.AdditionalMessageData properties at the message root without filtering reserved keys, allowing duplicates of "id", "method", or "params" in the emitted JSON. If the remote end honors the last duplicate name, command correlation can break (timeouts or completing the wrong command).

Issue description

AdditionalMessageData is appended to the outgoing command envelope as arbitrary root-level JSON properties. Without validation, callers can provide keys like id, method, or params, producing duplicate JSON property names and potentially breaking protocol semantics and command correlation.

Issue Context

Outgoing envelope already writes id, method, and params before iterating AdditionalMessageData.

Fix Focus Areas

  • dotnet/src/webdriver/BiDi/Broker.cs[95-109]

Implementation guidance

  • Add a reserved-name check before writing each AdditionalMessageData entry.
    • At minimum reject: id, method, params.
    • Consider also rejecting other envelope keys that appear in incoming messages: type, result, error, message.
  • On collision, throw an ArgumentException (or similar) with a clear message so failures are deterministic and debuggable.
  • Add unit/serialization tests ensuring reserved keys are rejected and non-reserved keys serialize correctly.

[maintainability] Whitespace-only line in `AdditionalData`
Whitespace-only line in `AdditionalData` `AdditionalData(string json)` contains a whitespace-only blank line, which can violate repository formatting standards and trigger formatter/CI churn. This should be normalized (empty line with no trailing spaces).

Issue description

A whitespace-only blank line was introduced, which may violate repo formatting standards and be modified by the formatter.

Issue Context

The new line in AdditionalData(string json) appears to contain indentation/trailing whitespace on an otherwise blank line.

Fix Focus Areas

  • dotnet/src/webdriver/BiDi/AdditionalData.cs[47-47]

[reliability] `CanGetStatusWithAdditionalData` lacks assertions
`CanGetStatusWithAdditionalData` lacks assertions The new `CanGetStatusWithAdditionalData` test exercises the API but only asserts that `status` is non-null, so it does not actually validate that `AdditionalData`/`AdditionalMessageData` are serialized/propagated and handled as intended. This leaves the new behavior effectively untested and risks regressions where additional data is silently dropped while the test still passes.

Issue description

CanGetStatusWithAdditionalData() currently acts as a smoke test (it only asserts status is non-null) and does not verify the PR’s intended behavior that AdditionalData/AdditionalMessageData are actually serialized onto the wire and/or propagated, so the feature it claims to test can regress without failing.

Issue Context

PR Compliance ID 19 expects meaningful coverage for changed behavior; this PR adds support for serializing/propagating extra JSON properties, but the test would still pass even if the additional data is ignored/dropped. Since StatusAsync talks to a real remote end, the response may not echo extension fields, so validating serialization typically requires inspecting the outgoing message (e.g., via a stub/mock transport or a broker-level unit test), or alternatively renaming the test to reflect that it only verifies the call succeeds without error.

Fix Focus Areas

  • dotnet/test/webdriver/BiDi/Session/SessionTests.cs[46-59]

[correctness] ExecuteAsync mutates passed parameters
ExecuteAsync mutates passed parameters `Broker.ExecuteAsync` merges `options.AdditionalData` by mutating the caller-provided `@params.RawAdditionalData` before serialization, introducing observable side effects and breaking immutability expectations for command inputs. Because `Parameters.RawAdditionalData` is publicly readable and exposes a mutable `Dictionary`, consumers can also mutate parameter state after construction/passing to execution, leading to races and non-deterministic serialization if instances are reused concurrently.

Issue description

Broker.ExecuteAsync currently applies options.AdditionalData by mutating the caller-provided @params.RawAdditionalData prior to serialization, creating observable side effects and potential races if parameters are reused or accessed concurrently. Separately, Parameters.RawAdditionalData is publicly readable and exposes a mutable Dictionary<string, JsonElement>, allowing callers to mutate command inputs after construction, undermining the compliance requirement for immutable command option/input surfaces.

Issue Context

PR Compliance ID 8 requires command option/input types to be immutable from the public API surface to prevent post-creation mutation from affecting execution/serialization. Even if CommandOptions is init-only, mutating the provided parameters during send makes behavior non-deterministic when parameter instances are cached, reused, or concurrently accessed; exposing a mutable dictionary via a public getter similarly enables external mutation after the command parameters have been created or handed off.

Fix Focus Areas

  • dotnet/src/webdriver/BiDi/Broker.cs[100-107]
  • dotnet/src/webdriver/BiDi/Command.cs[38-40]


                     PR 17613 (2026-06-02)                    
[reliability] Skipped tests now run
Skipped tests now run scripts/github-actions/ci-build.sh now runs `bazel test --config=rbe-ci //...` without the prior explicit deselection mechanism, so targets not tagged `skip-rbe` will start executing on RBE CI. Many Selenium browser tests/macros add `requires-network` but do not add `skip-rbe`, while the repo documents RBE network limitations, so this change can reintroduce CI flakes/failures.

Issue description

ci-build.sh now runs all tests (//...) under --config=rbe-ci without excluding the targets that were previously being deselected for RBE CI. Because test:rbe only filters -skip-rbe, any RBE-incompatible targets that are not tagged skip-rbe will now execute on RBE.

Issue Context

  • test:rbe uses --test_tag_filters=-skip-rbe and does not exclude requires-network.
  • Selenium browser test rules commonly include requires-network (via COMMON_TAGS) and are frequently generated by macros (Java selenium_test, Ruby rb_integration_test), so the safest replacement for the removed deselection list is to tag the formerly-deselected targets (or split suites) with skip-rbe.

Fix Focus Areas

  • scripts/github-actions/ci-build.sh[15-20]
  • .bazelrc.remote[35-37]
  • common/browsers.bzl[1-8]
  • java/private/selenium_test.bzl[22-56]
  • java/test/org/openqa/selenium/chrome/BUILD.bazel[26-47]
  • rb/spec/tests.bzl[181-257]
  • py/BUILD.bazel[64-76]

Acceptable fixes (pick one)

  1. Preferred: Add skip-rbe tags to the specific targets that were previously deselected (may require splitting macros/suites so only those targets get the tag).
  2. Temporarily restore an explicit exclusion mechanism in CI until tagging is completed.
  3. If intentional, update RBE filtering logic to exclude the appropriate tag(s) (e.g., requires-network)—but only if that matches desired CI coverage.


                     PR 17599 (2026-05-30)                    
[security] Unmasked WebSocket URI
Unmasked WebSocket URI JdkHttpClient.openSocket() puts the raw URI into ConnectionFailedException messages without applying the existing credential-masking helper. If the configured base URI contains userinfo (or other sensitive components), those values can be exposed via exception text/logging.

Issue description

openSocket() now embeds the raw URI in several ConnectionFailedException messages, but does not apply the existing maskUrlCredentials(...) logic used elsewhere in this class. This can leak embedded credentials when URIs contain userInfo.

Issue Context

  • maskUrlCredentials is already implemented and used in other error paths.
  • messages.getRawUri(req) returns URI.create(rawUrl) with userInfo intact.

Fix Focus Areas

  • Wrap URIs in maskUrlCredentials(...) before interpolating into exception messages.

  • Avoid calling messages.getRawUri(request) inside the URISyntaxException catch in a way that could throw a new exception; use a safe fallback (e.g., request.getUri()), then mask.

  • java/src/org/openqa/selenium/remote/http/jdk/JdkHttpClient.java[157-262]

  • java/src/org/openqa/selenium/remote/http/jdk/JdkHttpClient.java[516-546]

  • java/src/org/openqa/selenium/remote/http/jdk/JdkHttpMessages.java[123-147]



                     PR 17597 (2026-05-30)                    
[reliability] os-sensitive tag filtered
os-sensitive tag filtered The new driver_finder_spec target sets `tags = ["os-sensitive"]`, but rb_integration_test filters that tag to only apply to specific browsers, so chrome/firefox/ie variants of this test will not actually be tagged os-sensitive. This breaks tag-based selection/exclusion consistency (some browser variants will be included/excluded while others won’t).

Issue description

driver_finder_spec is configured with tags = ["os-sensitive"], but the rb_integration_test macro treats os-sensitive as a browser-filtered tag. As a result, the driver_finder_spec-{chrome,firefox,ie} targets won’t actually receive the os-sensitive tag, making tag filtering inconsistent across browser variants.

Issue Context

  • rb/spec/integration/selenium/webdriver/BUILD.bazel adds a dedicated rb_integration_test stanza for _NO_GRID with tags = ["os-sensitive"].
  • rb/spec/tests.bzl filters os-sensitive to only apply to chrome-beta, edge, firefox-beta, and safari.

Fix Focus Areas

Choose one of these approaches:

  1. If driver_finder_spec should be os-sensitive for all browsers, use a universal tag (one that is not filtered) or adjust tag filtering to include all browsers this target runs on.
  2. If it is only os-sensitive for some browsers, set tags in a way that matches the intended browsers (e.g., update _BROWSER_TAG_FILTERS["os-sensitive"] or use a new tag name).
  • rb/spec/integration/selenium/webdriver/BUILD.bazel[65-76]
  • rb/spec/tests.bzl[169-179]

[maintainability] Edge cache test excluded
Edge cache test excluded The "downloads the browser into the Selenium cache" example is guarded with `except: {browser: %i[safari ie edge]...}`, so it will never run for Edge even though this spec is executed in an Edge browser matrix. This removes coverage for validating Edge browser downloads into `SE_CACHE_PATH` under `SE_FORCE_BROWSER_DOWNLOAD`.

Issue description

driver_finder_spec.rb excludes browser: :edge from the "downloads the browser into the Selenium cache" example. Since the spec target is executed for Edge via the Bazel browser matrix, this prevents validating browser-download caching behavior for Edge at all.

Issue Context

The guard system supports combining multiple conditions (e.g., browser + platform), and the suite already provides :platform and :browser conditions.

Fix Focus Areas

  • rb/spec/integration/selenium/webdriver/driver_finder_spec.rb[54-56]

Suggested change

Decide the intended behavior:

  • If Edge should only be excluded on specific platforms (e.g., Windows where Edge may already be present), make the exclusion conditional, e.g. add platform: :windows (or the relevant platform(s)).
  • Otherwise, remove :edge from the exclusion list and/or update the guard reason to reflect the actual rationale for excluding Edge from browser-download caching verification.

[maintainability] Unconditional debug logging
Unconditional debug logging `SpecSupport::Helpers#includes_path?` unconditionally calls `warn`, emitting debug output (including full paths) to STDERR on every invocation. This will spam CI logs during the browser-matrix integration tests and can obscure real warnings or contribute to log-size related flakiness.

Issue description

includes_path? prints a warn "DEBUG ..." line unconditionally, which pollutes integration test output.

Issue Context

This helper is used by the new DriverFinder integration spec and will be called repeatedly across the browser matrix, so the debug output will be amplified in CI.

Fix Focus Areas

  • rb/spec/integration/selenium/webdriver/spec_support/helpers.rb[149-151]

Suggested fix

Remove the warn line entirely, or guard it behind an opt-in flag (e.g., if ENV['SE_DEBUG_INCLUDES_PATH'] == 'true') so normal CI runs are silent.


[correctness] Cache path check fails
Cache path check fails `SpecSupport::Helpers#includes_path?` converts paths using `Platform.unix_path` and then checks for a `"#{root}/"` prefix, which will be false for Windows-style paths after `unix_path` turns `/` into `\`. This will cause the new DriverFinder cache assertions to fail on Windows even when Selenium Manager correctly downloads into the cache.

Issue description

includes_path? currently mixes Windows normalization (via Platform.unix_path, which normalizes to OS separators) with a hard-coded '/' prefix check. On Windows this makes the check always return false.

Issue Context

  • Platform.unix_path normalizes by converting File::ALT_SEPARATOR to File::SEPARATOR. On Windows, that converts / to \\.
  • includes_path? then assumes / separators when doing start_with?("#{root}/").

Fix Focus Areas

  • rb/spec/integration/selenium/webdriver/spec_support/helpers.rb[146-150]

Suggested fix

Update includes_path? to normalize both path and root to a consistent separator before comparing, e.g.:

  • Convert backslashes to forward slashes using tr('\\', '/') (works cross-platform).
  • File.expand_path both values to avoid relative-path surprises.
  • On Windows, consider case-insensitive comparison for drive letters (downcase) before start_with?.

Example implementation:

def includes_path?(path, root)
  path = File.expand_path(path).tr('\\', '/')
  root = File.expand_path(root).tr('\\', '/').chomp('/')
  if WebDriver::Platform.windows?
    path = path.downcase
    root = root.downcase
  end
  path.start_with?("#{root}/")
end


                     PR 17594 (2026-05-30)                    
[correctness] pkg_archive deletes extracted tree
pkg_archive deletes extracted tree `_pkg_archive_impl` unconditionally deletes `pkg_name` after the (optional) `move` loop, so when `move` is empty the rule removes the only extracted payload and leaves the repository with no usable contents. This is a functional regression because `move` is not a mandatory attribute.

Issue description

pkg_archive always deletes the expanded package directory (pkg_name) at the end. Since move is optional, a caller can legally omit it, in which case nothing is moved out and the extracted tree is deleted, leaving an empty external repository.

Issue Context

The rule’s attrs define move as a non-mandatory attr.string_dict(), so an empty dict is a valid configuration.

Fix Focus Areas

  • common/private/pkg_archive.bzl[50-59]
  • common/private/pkg_archive.bzl[60-71]

Suggested fix

  • Always delete the downloaded archive (download_name).
  • Only delete the expanded directory (pkg_name) if move is non-empty (i.e., you actually moved desired artifacts out), or make move mandatory / explicitly fail() when move is empty so the rule can’t silently delete everything.


                     PR 17576 (2026-05-27)                    
[correctness] `check_error_with_driver_in_path` now errors
`check_error_with_driver_in_path` now errors `check_error_with_driver_in_path` now suppresses errors only when `*is_driver_in_path` is true and `self.get_browser_path().is_empty()`, whereas it previously suppressed errors whenever `*is_driver_in_path` was true. Because `browser_path` can be populated during auto-detection, this changes user-visible behavior for driver-in-PATH fallback and can cause previously-successful setups to abort, breaking upgrades for consumers relying on the prior non-failing behavior.

Issue description

check_error_with_driver_in_path() has changed fallback semantics: it now propagates errors when *is_driver_in_path is true but get_browser_path() is non-empty, whereas previously driver-in-PATH mode would swallow these errors and continue. Because browser_path is frequently populated by auto-detection (not just explicit user input like --browser-path), this additional predicate can cause setup flows that intend to fall back to a PATH driver to abort during later failures (e.g., driver version discovery), creating a breaking behavior change for upgrades.

Issue Context

  • setup() computes use_driver_in_path based on detecting a driver in PATH and the original browser version being empty, then routes errors from discovery/download/version steps through check_error_with_driver_in_path().
  • Browser discovery/version logic calls detect_browser_path() when browser_path is empty; detect_browser_path() updates configuration via set_browser_path(), so get_browser_path() reflects derived/auto-detected state, not solely user-provided state.
  • The PR’s main goal is switching command execution to argv, but this extra gating changes error/fallback semantics relied on by use_driver_in_path flows and may violate the “no breaking changes on upgrade” requirement.

Fix Focus Areas

  • rust/src/lib.rs[804-854]
  • rust/src/lib.rs[950-964]

[observability] Trace logs empty browser path
Trace logs empty browser path The trace log at line 1177 fires before `detect_browser_path()` is called (lines 1178-1181), so when `get_browser_path()` returns an empty string the trace message shows an empty path, giving a misleading diagnostic when `--trace` is used to debug browser detection failures.

Issue description

The trace log *** Browser path {} is emitted before detect_browser_path() is called, so it always shows the raw (often empty) value from get_browser_path() rather than the resolved path that will actually be used.

Issue Context

In general_discover_browser_version, browser_path starts as self.get_browser_path().to_string() (empty when no --browser-path flag is given). The trace is logged immediately, then detect_browser_path() may populate it. The trace should reflect the final resolved path.

Fix Focus Areas

  • rust/src/lib.rs[1175-1181]: Move the .trace(...) call to after the if browser_path.is_empty() block (i.e., after line 1181), so it logs the path that will actually be used for version probing.

[correctness] Broken version argv flag
Broken version argv flag On non-Windows, `general_discover_browser_version` passes `cmd_version_arg` as a literal argv element, but Chrome/Edge/Firefox callers pass the format templates `DASH_DASH_VERSION`/`DASH_VERSION` (e.g. "{}{}{} --version"), so the browser is invoked with an invalid argument and version discovery will fail.

Issue description

general_discover_browser_version now builds a direct std::process::Command with args = [cmd_version_arg]. However, current callers pass DASH_VERSION / DASH_DASH_VERSION, which are format templates (contain {} placeholders) and were previously only safe because the old code formatted them into a full command string before going through sh -c.

This makes non-Windows browser version discovery invoke the browser with an invalid literal argument like "{}{}{} --version".

Issue Context

  • DASH_VERSION / DASH_DASH_VERSION are currently defined as format templates.
  • Chrome/Edge/Firefox pass these constants into general_discover_browser_version.
  • The non-Windows path now directly uses cmd_version_arg.to_string() as an argv value.

Fix Focus Areas

  • rust/src/lib.rs[87-88]
  • rust/src/lib.rs[1169-1211]
  • rust/src/chrome.rs[281-287]
  • rust/src/edge.rs[165-189]
  • rust/src/firefox.rs[194-200]

Suggested fix

Option A (minimal): Change the constants to be actual flags:

  • DASH_VERSION = "-v"
  • DASH_DASH_VERSION = "--version"

Option B (API cleanup): Change general_discover_browser_version to accept Vec<String> (or &[&str]) for version args, and update each manager to pass vec!["--version".into()] (or vec!["-v".into()]).



                     PR 17575 (2026-05-27)                    
[reliability] Cache triggers miss lockfiles
Cache triggers miss lockfiles The gh-cache workflow’s push path filter omits pnpm-lock.yaml and multitool.lock.json, so updates to these Bazel dependency inputs won’t trigger cache population for macOS/Windows. Because MODULE.bazel uses these files to generate external repositories, CI may still need to download new artifacts (and hit the same transient 502/cert failures) until the next scheduled run.

Issue description

.github/workflows/gh-cache.yml uses a push.paths filter to decide when to repopulate the GitHub cache, but it does not include key dependency inputs (pnpm-lock.yaml, multitool.lock.json) that are used by Bazel to generate external repositories. This means macOS/Windows cache population will not run when those files change, leaving caches stale.

Issue Context

MODULE.bazel references both //:multitool.lock.json (rules_multitool hub) and //:pnpm-lock.yaml (npm_translate_lock). These files directly affect what external assets Bazel will download.

Fix Focus Areas

  • .github/workflows/gh-cache.yml[4-15]

Proposed fix

Add the missing files to the on.push.paths list (at minimum pnpm-lock.yaml and multitool.lock.json). Consider also including other npm-related inputs referenced by npm_translate_lock (e.g. package.json, pnpm-workspace.yaml, .npmrc, and relevant javascript/**/package.json) if you want cache population to run immediately when those change.



                     PR 17571 (2026-05-25)                    
[maintainability] RBS missing browser_name=
RBS missing browser_name= `Safari::Options` now defines `browser_name=` at runtime, but `rb/sig/.../safari/options.rbs` only declares `browser_name` and not the setter, so typed consumers cannot call `options.browser_name = ...` without type errors.

Issue description

Selenium::WebDriver::Safari::Options defines a browser_name= setter in Ruby, but the RBS signature file does not declare this method. This creates a mismatch between runtime API and the typed interface.

Issue Context

The PR introduced/updated Safari::Options#browser_name= in Ruby, and updated the RBS to include browser_name, but the setter signature was not added.

Fix Focus Areas

  • rb/lib/selenium/webdriver/safari/options.rb[39-45]
  • rb/sig/lib/selenium/webdriver/safari/options.rbs[13-16]

Suggested fix

Add a setter signature in rb/sig/lib/selenium/webdriver/safari/options.rbs, e.g.:

  • def browser_name=: (String value) -> void (or return String if you want to model Ruby’s assignment return value).

[reliability] Test clobbers `SE_CHROMEDRIVER`
Test clobbers `SE_CHROMEDRIVER` The new DriverFinder env-precedence unit test overwrites `ENV['SE_CHROMEDRIVER']` and then unconditionally deletes it in `ensure`, which can clobber a pre-existing value from a developer/CI environment and leak side effects into later examples. This makes the suite potentially order-dependent and flaky across environments, undermining the expectation that tests remain isolated and reliable.

Issue description

The DriverFinder env-precedence spec sets ENV['SE_CHROMEDRIVER'] and then unconditionally calls ENV.delete('SE_CHROMEDRIVER') in ensure, which can wipe a pre-existing value from the parent environment and leak state across examples.

Issue Context

This test is intended to validate env-var precedence, but to comply with the expectation that test changes are reliable across environments (PR Compliance ID 13) and to avoid order-dependent failures, it should not permanently mutate global process environment. Other unit specs in this repo follow a safer pattern by preserving/restoring prior env values rather than always deleting them.

Fix Focus Areas

  • rb/spec/unit/selenium/webdriver/common/driver_finder_spec.rb[36-48]

[correctness] `Safari.path` now nil
`Safari.path` now nil `Selenium::WebDriver::Safari.path` now defaults to `nil`, removing the previous behavior of providing/validating a default Safari binary path and raising actionable errors. This is a user-visible behavior change that can break callers relying on a non-nil default or on the earlier deterministic exceptions.

Issue description

Safari.path now memoizes nil, changing the public method’s behavior so it no longer provides a default Safari path nor performs early validation/OS checks. This is a backward-incompatible, user-visible behavior change.

Issue Context

Safari.path is a public singleton method. With the new implementation, any caller expecting a default path (or early, actionable errors) will instead receive nil and may fail later with less clear errors.

Fix Focus Areas

  • rb/lib/selenium/webdriver/safari.rb[48-50]

Suggested approach

  • Reintroduce the previous default path and validation/error behavior (e.g., default to the Safari binary on macOS and raise a clear WebDriverError when unsupported/unavailable).
  • If the new nil default is intentional, add a deprecation path and update callers/documentation accordingly so upgrades do not silently change behavior.

[correctness] `Service#env_path` removed
`Service#env_path` removed `Selenium::WebDriver::Service#env_path` was removed without a deprecation path, which is a breaking change for callers and leaves the RBI/RBS contract inconsistent with the runtime implementation. This can trigger runtime `NoMethodError` for existing consumers and also break RBS/Steep validation during upgrades.

Issue description

Selenium::WebDriver::Service#env_path was removed from the Ruby implementation without a deprecation path, but the corresponding RBS signature still declares it, creating an API break for callers and leaving the type/signature contract out of sync with runtime behavior.

Issue Context

  • The Ruby signature file still exposes env_path as a public method (def env_path).
  • The Service implementation no longer defines env_path, so existing callers will hit NoMethodError at runtime.
  • The mismatch will also fail RBS/Steep (or other signature validation) because the method is declared but not implemented.
  • Compliance requires maintaining API compatibility and deprecating public functionality before removal.

Fix Focus Areas

  • rb/lib/selenium/webdriver/common/service.rb[89-105]
  • rb/sig/lib/selenium/webdriver/common/service.rbs[48-60]

[reliability] No test for `env_path`
No test for `env_path` The new behavior that prioritizes the driver environment variable in `DriverFinder#paths` is not covered by unit tests. This risks regressions and CI behavior differences across environments.

Issue description

DriverFinder now checks an environment variable (DRIVER_PATH_ENV_KEY) when resolving the driver path, but there is no unit test asserting this precedence and behavior.

Issue Context

There are existing unit tests for DriverFinder behavior (class path, proc path, Selenium Manager calls), so adding a small/unit test for ENV precedence is feasible and keeps tests aligned with the changed behavior.

Fix Focus Areas

  • rb/lib/selenium/webdriver/common/driver_finder.rb[50-68]
  • rb/spec/unit/selenium/webdriver/common/driver_finder_spec.rb[24-99]

[correctness] Env var Service init regression
Env var Service init regression Service#initialize no longer applies the DRIVER_PATH_ENV_KEY environment variable, but existing unit specs expect Service.new.executable_path to reflect that env var immediately. This will fail multiple unit tests and is a behavioral break for any caller reading executable_path before DriverFinder/launch runs.

Issue description

Service#initialize no longer reads the driver-path environment variable (e.g., SE_CHROMEDRIVER) to populate @executable_path. The repo’s unit specs currently assert that Service.new.executable_path is derived from the env var, so the change will cause test failures and breaks the previously-tested behavior.

Issue Context

The env-var lookup was removed from Service#initialize and moved into DriverFinder. That makes env vars apply later (when DriverFinder runs), but it also means service.executable_path stays nil after construction.

Fix Focus Areas

Choose one consistent behavior and align code + tests:

  • Preserve existing behavior (least breaking): make Service#executable_path (reader) fall back to ENV when @executable_path is nil (or re-introduce env assignment in initialize).
  • If behavior change is intended: update the affected unit specs to assert env usage via DriverFinder or Service#launch, not via Service#executable_path immediately after new.

Relevant locations:

  • rb/lib/selenium/webdriver/common/service.rb[69-92]
  • rb/spec/unit/selenium/webdriver/chrome/service_spec.rb[153-173]
  • rb/spec/unit/selenium/webdriver/safari/service_spec.rb[118-138]
  • rb/spec/unit/selenium/webdriver/edge/service_spec.rb[163-183]
  • rb/spec/unit/selenium/webdriver/firefox/service_spec.rb[215-236]
  • rb/spec/unit/selenium/webdriver/ie/service_spec.rb[153-173]
  • rb/spec/unit/selenium/webdriver/common/driver_finder_spec.rb[22-99]


                     PR 17570 (2026-05-25)                    
[reliability] Cache listing not paginated
Cache listing not paginated `prune-codeql-caches.sh` fetches only 100 caches (`gh cache list --limit 100`), so any matching caches beyond that are never considered for deletion. Since the script itself notes CodeQL creates a new cache per commit, older caches can accumulate beyond 100 and remain unpruned.

Issue description

The script only lists up to 100 caches and prunes within that partial set, which can leave most stale caches untouched once the repository accumulates more than 100 CodeQL caches.

Issue Context

The script comments say CodeQL creates a new overlay-base-database cache per commit; on an active repository, that can exceed 100 entries quickly.

Fix Focus Areas

  • scripts/github-actions/prune-codeql-caches.sh[3-6]
  • scripts/github-actions/prune-codeql-caches.sh[24-28]

Suggested fix

  • Replace the single gh cache list ... --limit 100 call with logic that retrieves all matching caches.
    • Option A: increase --limit to the maximum supported by gh (if sufficiently high for this repo’s scale).
    • Option B (more robust): implement pagination via the GitHub API (gh api) to iterate through all pages of caches, building the rows array from the full result set.
  • Keep the rest of the grouping/deletion logic unchanged so behavior stays “keep newest per group”.


                     PR 17569 (2026-05-25)                    
[reliability] Disk status step can fail
Disk status step can fail `disk-status.sh` runs `du -sh "$path"/* | sort -h` when the directory exists, which returns a non-zero status when the glob doesn’t match (e.g., empty directory), and this script is sourced directly inside multiple workflow steps. That non-zero can abort those steps, turning an observability checkpoint into a CI failure.

Issue description

scripts/github-actions/disk-status.sh is sourced from workflow steps. Inside measure(), the du pipelines can exit non-zero (e.g., when $path/* doesn’t match because the directory is empty, or due to permissions). When sourced, that failure can propagate and fail the workflow step.

Issue Context

This script is intended for diagnostics only; it should be best-effort and never fail CI.

Fix Focus Areas

  • scripts/github-actions/disk-status.sh[29-37]
  • .github/workflows/bazel.yml[206-214]
  • .github/workflows/bazel.yml[239-247]
  • .github/workflows/bazel.yml[293-300]


                     PR 17565 (2026-05-24)                    
[reliability] `with_env` deletes preexisting ENV
`with_env` deletes preexisting ENV The new tests call `with_env`, which unconditionally deletes ENV keys on cleanup instead of restoring any pre-existing values. This can clobber global process state (e.g., `HTTP_PROXY`/`NO_PROXY`) and cause cross-test side effects or flaky behavior.

Issue description

with_env mutates ENV but its cleanup unconditionally deletes keys, which can remove pre-existing environment variables and leak global state changes across tests.

Issue Context

New tests added in default_spec.rb rely on with_env to set http_proxy and no_proxy/NO_PROXY. The helper currently deletes those keys in ensure rather than restoring prior values (or leaving them unset only if they were originally absent).

Fix Focus Areas

  • rb/spec/unit/selenium/webdriver/spec_helper.rb[38-43]
  • rb/spec/unit/selenium/webdriver/remote/http/default_spec.rb[127-139]


                     PR 17559 (2026-05-23)                    
[reliability] Unsafe cast of `se:browserName`
Unsafe cast of `se:browserName` `DriverFinder.toArguments()` directly casts `options.getCapability("se:browserName")` to `String`, which can throw `ClassCastException` if the capability is non-string or protocol-derived and thereby break Selenium Manager-based driver discovery. This violates the requirement to validate external/protocol inputs early and fail with deterministic, actionable exceptions (or safely fall back).

Issue description

DriverFinder.toArguments() reads the se:browserName capability via an unchecked cast to String. Since capability values are Object and may be protocol-/user-provided (including the new Electron override), a non-String value can trigger a ClassCastException, breaking Selenium Manager-based driver discovery instead of producing a deterministic, actionable validation failure or a safe fallback.

Issue Context

Capabilities are externally influenced (protocol/user) and getCapability() returns Object, so DriverFinder should validate type and null/emptiness before using se:browserName. When the value is absent/invalid, it should either ignore it and fall back to options.getBrowserName(), or throw a clear IllegalArgumentException describing the expected type/value; optionally, if the value is present but not a String, log a warning and ignore it.

Fix Focus Areas

  • java/src/org/openqa/selenium/remote/service/DriverFinder.java[133-138]

[correctness] Brittle Electron detection
Brittle Electron detection DriverFinder infers Electron mode by substring-matching the Capabilities implementation class name, which can silently break Electron driver discovery when options are subclassed or wrapped (and can misclassify unrelated capabilities). This can cause Selenium Manager to be invoked with the wrong `--browser` value and return an incompatible driver/browser pairing.

Issue description

DriverFinder.toArguments() uses options.getClass().getName().toLowerCase().contains("electron") to decide whether to pass --browser electron to Selenium Manager. This is brittle: it fails for subclasses/wrappers (false negatives) and can misclassify unrelated capabilities classes (false positives).

Issue Context

DriverFinder accepts a generic Capabilities instance, so relying on the implementation type name is not a stable contract.

Fix Focus Areas

  • java/src/org/openqa/selenium/remote/service/DriverFinder.java[133-138]
  • java/src/org/openqa/selenium/chrome/ElectronOptions.java[46-64]
  • java/test/org/openqa/selenium/remote/service/DriverFinderTest.java[198-218]

Suggested fix

  1. Add an explicit, Selenium-namespaced marker capability in ElectronOptions (e.g., se:electron = true) and ensure it survives merge().
  2. Update DriverFinder.toArguments() to check that marker (e.g., Boolean.TRUE.equals(options.getCapability("se:electron"))) instead of class-name matching.
  3. Update/add tests:
    • Existing ElectronOptions test should assert the marker capability is present.
    • DriverFinderTest should assert --browser electron is sent when the marker is present, and that wrapping in ImmutableCapabilities still works.


                     PR 17550 (2026-05-22)                    
[security] Unvalidated `targets` shell injection
Unvalidated `targets` shell injection `workflow_dispatch` introduces a user-provided `targets` input that is later interpolated into a shell command without validation/quoting, enabling command injection and unsafe argument splitting. This violates the CI/script hardening requirement and can lead to unexpected command execution or incorrect Bazel invocation.

Issue description

A workflow_dispatch string input (inputs.targets) is user-controlled and later inserted directly into a bazel test shell command, allowing shell metacharacters (e.g., ;, &&) to be interpreted as commands.

Issue Context

This workflow now accepts targets via manual dispatch. The value is used in the os-tests job’s run script, so it must be handled as data (arguments) rather than executable shell text.

Fix Focus Areas

  • .github/workflows/ci-ruby.yml[10-15]
  • .github/workflows/ci-ruby.yml[73-80]

Suggested approach

  • Read inputs.targets into a variable inside quotes.
  • Split into a bash array safely (e.g., read -r -a targets_arr <<< "$targets").
  • Invoke Bazel with "${targets_arr[@]}" so each target is a separate argument and shell metacharacters are not executed.
  • Optionally validate that each entry matches expected Bazel target patterns (e.g., starts with //).

[correctness] IE remote tests included
IE remote tests included The Windows `os-tests` job uses `--test_tag_filters=edge,edge-remote,unit,skip-rbe,-ie`, but IE remote targets are tagged `ie-remote` (not `ie`) and also include `skip-rbe`, so they can still be selected and executed on Windows despite the `-ie` exclusion.

Issue description

The Windows os-tests matrix entry tries to exclude IE by adding -ie, but IE remote targets use the tag ie-remote and inherit skip-rbe, so they still match the positive filters.

Issue Context

rb_integration_test generates both local and *-remote test targets, and remote targets inherit the browser-specific tag list (including skip-rbe for IE).

Fix Focus Areas

  • .github/workflows/ci-ruby.yml[57-60]

Suggested fix

Update the Windows tag-filters to also exclude the remote IE tag, e.g.:

  • edge,edge-remote,unit,skip-rbe,-ie,-ie-remote

(Alternatively, remove skip-rbe from the Windows include list if it is not intended to pull in unrelated skip-rbe tests.)


[reliability] grep exits read-targets
grep exits read-targets In ci.yml, process_binding assigns lang_targets via a pipeline containing grep; when there are no matching Ruby targets, grep exits non-zero and can fail the entire “Read targets” step instead of simply producing no rb_targets output.

Issue description

process_binding() uses grep inside a command substitution without guarding for the “no matches” exit code. When no Ruby targets exist, this can terminate the step early and fail CI.

Issue Context

The workflow should continue and simply omit rb_targets when there are no matching targets.

Fix Focus Areas

  • .github/workflows/ci.yml[81-100]

Suggested fix

  • Make the pipeline tolerant of no matches, e.g. append || true to the whole pipeline, or use grep ... || :.
    • Example:
      • lang_targets=$(echo "$targets" | tr ' ' '\n' | grep -E "^${pattern}[:/]" | tr '\n' ' ' | sed 's/ *$//' || true)
  • Consider switching echo to printf '%s\n' "$targets" to avoid echo edge cases.

[correctness] Dispatch missing targets input
Dispatch missing targets input ci-ruby.yml uses `${{ inputs.targets }}` in the os-tests Bazel command, but `targets` is only defined under `workflow_call`; a manual `workflow_dispatch` run can expand to an empty target list and cause `bazel test` to fail.

Issue description

os-tests references inputs.targets, but targets is only declared for workflow_call. For workflow_dispatch runs, this input is not defined, so the Bazel command may receive no explicit targets.

Issue Context

The workflow declares workflow_dispatch: with no inputs, but still relies on inputs.targets.

Fix Focus Areas

  • .github/workflows/ci-ruby.yml[3-11]
  • .github/workflows/ci-ruby.yml[47-76]

Suggested fix

Add workflow_dispatch.inputs.targets mirroring the workflow_call input (same default //rb/...), or change the command to fall back to //rb/... when inputs.targets is empty.


[reliability] Bazel exit code 4 ignored
Bazel exit code 4 ignored The `os-tests` job treats `bazel test` exit code `4` as success, which can mask misconfiguration (e.g., filters producing “no tests found”) and allow CI to go green without running any Ruby tests. This violates the requirement to avoid error-swallowing constructs in CI/scripts that hide failures.

Issue description

The os-tests Bazel invocation converts exit code 4 into success (exit 0). This can hide “no tests found” situations caused by incorrect target selection or tag filters, resulting in false-green CI.

Issue Context

bazel test exit code 4 typically indicates that no tests were executed. Treating this as success in PR CI can allow Ruby changes to merge without test coverage.

Fix Focus Areas

  • .github/workflows/ci-ruby.yml[74-76]

[correctness] Boolean passed as browser
Boolean passed as browser ci-ruby.yml sets `matrix.browser: true` (boolean) and passes it into bazel.yml where `browser` is declared as a string input; this can break reusable-workflow input validation or lead to unexpected conditional behavior (`inputs.browser != ''`).

Issue description

matrix.browser is set to YAML boolean true for ubuntu/windows, then forwarded to reusable workflow input browser which is typed as string. This risks workflow-call validation errors and also changes behavior of steps gated on inputs.browser != ''.

Issue Context

The called workflow declares browser as type: string.

Fix Focus Areas

  • .github/workflows/ci-ruby.yml[52-66]
  • .github/workflows/bazel.yml[19-28]

Suggested fix

  • Change browser: true to a string value:
    • If you want “no specific browser but enable UI setup”, use browser: 'enabled' (string) and update bazel.yml docs accordingly.
    • If you actually intend Edge on Windows, set browser: 'edge'.
    • If Ubuntu is unit-only, set browser: '' and adjust tag filters accordingly.
  • Keep macOS as browser: safari for safaridriver enablement.


                     PR 17533 (2026-05-20)                    
[maintainability] Useless ImmutableArray null-check
Useless ImmutableArray null-check BiDi.SubscribeAsync/StreamAsync call ArgumentNullException.ThrowIfNull(descriptors) even though ImmutableArray is a value type and cannot be null, so this guard never triggers and can mislead about validation coverage. Callers passing default/empty arrays will fail later with a different exception path instead of a clear upfront argument validation.

Issue description

ArgumentNullException.ThrowIfNull(descriptors) is ineffective for ImmutableArray<T> parameters because ImmutableArray<T> is a struct (non-nullable value type). This can hide the real validation intention and doesn’t provide a clear exception for empty/default arrays.

Issue Context

The affected APIs are public entry points (IBiDi / BiDi) for multi-descriptor subscription/stream creation.

Fix Focus Areas

  • Replace ThrowIfNull(descriptors) with explicit validation for default/empty immutable arrays (e.g., descriptors.IsDefaultOrEmpty or descriptors.Length == 0) and throw ArgumentException with a clear message.

  • Apply the same validation to both SubscribeAsync(... ImmutableArray<EventDescriptor> ...) overloads and StreamAsync(... ImmutableArray<EventDescriptor> ...).

  • dotnet/src/webdriver/BiDi/BiDi.cs[104-132]



                     PR 17522 (2026-05-19)                    
[reliability] Shared driver disposed unsafely
Shared driver disposed unsafely DriverTestFixture disposes the shared EnvironmentManager-held driver without clearing EnvironmentManager's internal driver reference, so later GetCurrentDriver() calls can return a disposed driver. ResetOnError() also disposes before calling CreateFreshDriver(), which then calls CloseCurrentDriver()/Quit on an already-disposed instance and can throw, masking the real test failure.

Issue description

DriverTestFixture currently calls driver.Dispose() in [OneTimeTearDown] and in ResetOnError(). Because the driver is owned/tracked by EnvironmentManager (singleton), disposing the driver directly can leave EnvironmentManager holding a disposed instance and cause later calls to reuse or quit a disposed driver.

Issue Context

  • EnvironmentManager.GetCurrentDriver() returns the cached driver field if non-null.
  • EnvironmentManager.CloseCurrentDriver() is the method that both quits the driver and clears the cached reference (driver = null).

Fix Focus Areas

  • dotnet/test/webdriver/DriverTestFixture.cs[128-148]

Suggested fix

  1. In DriverTestFixture.TearDown() (the [OneTimeTearDown]), replace driver?.Dispose(); with EnvironmentManager.Instance.CloseCurrentDriver(); (and optionally set driver = null; for clarity).
  2. In ResetOnError(), remove the explicit driver?.Dispose(); and just call driver = EnvironmentManager.Instance.CreateFreshDriver(); (since CreateFreshDriver() already closes the current driver via CloseCurrentDriver()).
  3. Ensure there is no remaining path where EnvironmentManager.driver can remain non-null while the underlying driver is disposed.


                     PR 17519 (2026-05-19)                    
[performance] Repeated Bazel gen runs
Repeated Bazel gen runs Multi-targeting Selenium.Support causes Selenium.WebDriver to be built once per TFM via ProjectReference, and WebDriver’s GenerateBazelArtifacts runs an unconditional `bazel build ...` before every CoreCompile, multiplying the Bazel invocation count (and build time) for Support builds/packs.

Issue description

dotnet build/pack of Selenium.Support.csproj is now cross-targeting (net462/netstandard2.0/net8.0). Because it has a ProjectReference to Selenium.WebDriver.csproj, WebDriver is built per target framework, and WebDriver’s GenerateBazelArtifacts target executes bazel build ... before every CoreCompile.

This leads to redundant Bazel builds (previously Support was single-TFM), significantly increasing build time and increasing exposure to intermittent Bazel/env failures during normal MSBuild workflows.

Issue Context

  • Support is now multi-targeted and project-references WebDriver.
  • WebDriver runs Bazel generation unconditionally in a BeforeTargets="CoreCompile" target.

Fix Focus Areas

  • Gate Bazel generation so it runs once per multi-target build (outer build) while still running for single-TFM builds.
  • A typical approach is to:
    • Add a target that runs when $(IsCrossTargetingBuild) == 'true' (outer build) before inner builds dispatch, generating artifacts once.
    • Keep a separate path for single-target builds ($(IsCrossTargetingBuild) != 'true') that runs before CoreCompile.
  • Ensure generated files exist before compilation in all cases.

Fix focus areas (files/lines):

  • dotnet/src/support/Selenium.Support.csproj[3-41]
  • dotnet/src/webdriver/Selenium.WebDriver.csproj[62-69]


                     PR 17504 (2026-05-18)                    
[reliability] `git ls-remote origin` without checkout
`git ls-remote origin` without checkout The `detect-branch` job runs `git ls-remote ... origin ...` without checking out the repository or otherwise configuring an `origin` remote, so the step is likely to fail or produce an incorrect `exists` output that prevents downstream `evaluate`/`commit-pins`/`promote` jobs from running even when the target branch exists. This violates the requirement to harden CI/scripts so they behave deterministically and robustly.

Issue description

The detect-branch job calls git ls-remote ... origin ... even though no repository checkout (or equivalent remote setup) occurs, so origin is undefined and the branch-existence detection becomes unreliable; this can incorrectly skip downstream evaluate/commit-pins/promote jobs.

Issue Context

detect-branch runs on a fresh GitHub-hosted runner; without actions/checkout, there is no local git repository and no configured remote named origin, so git ls-remote origin ... can fail or yield incorrect results. Other jobs/workflows that successfully use git ls-remote ... origin do so after actions/checkout, which configures origin, indicating the missing prerequisite here.

Fix Focus Areas

  • .github/workflows/renovate-dependency-pr.yml[25-44]
  • .github/workflows/renovate-dependency-pr.yml[40-44]

[reliability] `gh api` output quoting broken
`gh api` output quoting broken In `renovate-dependency-pr.yml`, the `detect-branch` step uses nested, unescaped double-quotes inside an `echo "exists=$(...)"` command substitution when writing to `$GITHUB_OUTPUT`, which can break shell parsing and cause the workflow to fail before setting the `exists` output. Because downstream evaluation/pinning/promotion is gated on `steps.detect.outputs.exists`, this can lead to jobs being skipped or failing, violating the requirement to harden CI scripts with safe, deterministic shell handling.

Issue description

The detect-branch step writes exists=... to $GITHUB_OUTPUT using a command substitution containing nested, unescaped double-quotes (via echo "true" / echo "false" inside an outer echo "exists=$(...)"), which can break shell parsing and prevent the exists output from being set, causing downstream jobs gated on this value to skip or fail.

Issue Context

This is part of a CI workflow and must be robust and deterministic (per PR Compliance ID 17) to avoid silent failures or broken runs. The problematic pattern is effectively echo "exists=$( ... && echo "true" || echo "false")", where the inner quotes can prematurely terminate the outer string; downstream logic depends on needs.detect-branch.outputs.exists being 'true' for the evaluate job (and related evaluation/pinning/promotion) to run.

Fix Focus Areas

  • .github/workflows/renovate-dependency-pr.yml[35-46]

[reliability] PR promotion token missing
PR promotion token missing `renovate-dependency-pr.yml` declares `SELENIUM_CI_TOKEN` as optional but passes it as the `token` for `peter-evans/create-pull-request@v8` without fallback, which can cause promotion to fail authentication when the secret isn’t set/passed. This is inconsistent with other workflows in the repo that already use `secrets.SELENIUM_CI_TOKEN || github.token` for safe fallback.

Issue description

The promote step uses ${{ secrets.SELENIUM_CI_TOKEN }} directly even though the secret is declared required: false, so the token may be empty and PR creation will fail.

Issue Context

Other workflows (including this PR’s renovate job and the reusable commit-changes.yml) already use a || github.token fallback.

Fix Focus Areas

  • .github/workflows/renovate-dependency-pr.yml[14-17]
  • .github/workflows/renovate-dependency-pr.yml[78-83]

Suggested fix

Either:

  • Change to token: ${{ secrets.SELENIUM_CI_TOKEN || github.token }} in the create-pull-request step, or
  • Mark SELENIUM_CI_TOKEN as required: true if fallback is not acceptable.

[reliability] Temp branch not ensured
Temp branch not ensured `renovate-dependencies.yml` always checks out `temp/bazel-updates`, but the reusable `commit-changes.yml` workflow explicitly skips the push when `changes.patch` is missing/empty. When the Bazel update step produces no diff (and `bazel.yml` deletes `changes.patch`), the temp branch may not exist or may remain stale from a previous run, causing checkout failures or Renovate to run on old content.

Issue description

The workflow assumes temp/bazel-updates exists and reflects the current base-ref, but the branch is only pushed when changes.patch is non-empty. If there are no Bazel update changes, the branch is not pushed/created, yet the Renovate job still checks it out.

Issue Context

  • bazel.yml deletes changes.patch when empty.
  • commit-changes.yml skips commit/push when changes.patch is empty.
  • renovate-dependencies.yml always checks out temp/bazel-updates.

Fix Focus Areas

  • .github/workflows/renovate-dependencies.yml[36-63]
  • .github/workflows/commit-changes.yml[46-56]
  • .github/workflows/bazel.yml[274-287]

Suggested fix

Ensure temp/bazel-updates is force-updated to the current base-ref even when there are no patch changes, for example:

  • Update commit-changes.yml to still git push --force origin HEAD:"$PUSH_BRANCH" in the else branch (no patch), or
  • Add a dedicated step/job in renovate-dependencies.yml that force-pushes inputs.base-ref to temp/bazel-updates when no patch is present, and gate the Renovate checkout accordingly.


                     PR 17503 (2026-05-18)                    
[security] `PUSH_BRANCH` defaults to `github.ref_name`
`PUSH_BRANCH` defaults to `github.ref_name` The workflow checks out `${{ inputs.ref || github.ref }}` but defaults `PUSH_BRANCH` to `github.ref_name`, which can diverge when callers pass an explicit `inputs.ref` (notably on `pull_request`, where `github.ref_name` can be a merge ref). This mismatch can cause `git push --force origin HEAD:"$PUSH_BRANCH"` to push the checked-out HEAD into an unintended branch, risking accidental branch overwrite.

Issue description

The reusable workflow checks out inputs.ref || github.ref but defaults PUSH_BRANCH to inputs.push-branch || github.ref_name, which may not match the ref actually checked out (notably on pull_request, where github.ref_name can be a merge ref name). This can cause the workflow to run git push --force origin HEAD:"$PUSH_BRANCH" against the wrong branch.

Issue Context

  • The workflow checks out ${{ inputs.ref || github.ref }} but sets PUSH_BRANCH: ${{ inputs.push-branch || github.ref_name }}; if a caller provides inputs.ref that differs from the called workflow run’s github.ref_name, the push target becomes incorrect.
  • ci-lint.yml calls commit-changes.yml on PRs with ref: ${{ github.event.pull_request.head.ref }} and does not set push-branch, so the called workflow uses the problematic default.

Fix Focus Areas

  • .github/workflows/commit-changes.yml[39-40]
  • .github/workflows/commit-changes.yml[53-53]
  • .github/workflows/commit-changes.yml[59-59]
  • .github/workflows/commit-changes.yml[39-59]

[reliability] Missing patch treated as ok
Missing patch treated as ok The new guard `if [ ! -s changes.patch ]; then` treats a missing `changes.patch` the same as an intentionally empty patch, so an artifact download failure/misconfiguration can be silently skipped as “no changes”. Because the download step is `continue-on-error: true`, this can produce a green job without indicating that the expected artifact wasn’t applied.

Issue description

The workflow currently checks only file size (-s) and exits 0 when the patch is missing/empty. With continue-on-error: true on the download step, an actual artifact download failure can be silently treated as a successful no-op.

Issue Context

This reusable workflow is invoked from release/CI workflows where a missing patch may indicate an upstream failure or wrong artifact name.

Fix Focus Areas

  • .github/workflows/commit-changes.yml[41-59]

Suggested fix

  • Emit a ::notice:: before exiting when no patch is present.
  • Optionally, distinguish:
    • if [ ! -f changes.patch ]; then ... (artifact missing)
    • elif [ ! -s changes.patch ]; then ... (empty patch) and decide whether “missing” should fail or remain a no-op based on intended caller behavior.
  • If you want to keep it as a no-op, at minimum log clearly that commit/push is being skipped due to missing/empty patch.

[correctness] Output misrepresents push
Output misrepresents push The reusable workflow output `changes-committed` is documented as "committed and pushed", but the job can push (when `push-branch` is set) while still emitting `committed=false`, causing callers to make incorrect decisions.

Issue description

The workflow output changes-committed is documented as representing whether changes were "committed and pushed", but the step output is derived solely from whether a commit occurred. With the current logic, a push can occur while committed=false, violating the output contract.

Issue Context

  • Output documentation indicates both commit and push.
  • Push can happen even without a commit when PUSH_BRANCH is set.

Fix Focus Areas

  • .github/workflows/commit-changes.yml[24-27]
  • .github/workflows/commit-changes.yml[58-69]

Suggested fix

  • Either:
    1. Change logic so committed becomes true only after a successful push (and ensure push is only attempted when appropriate), or
    2. Update the output description/name (and optionally add a separate pushed output) so consumers can reliably detect what happened.

[security] Unconditional `git push --force`
Unconditional `git push --force` The workflow force-pushes `HEAD` to `PUSH_BRANCH` whenever that input is set, even when `changes.patch` is missing/empty and no commit was created. This can reset/overwrite the remote branch history (accidental data loss), especially because the artifact download is allowed to fail without stopping the job.

Issue description

The workflow currently runs git push --force origin HEAD:"$PUSH_BRANCH" whenever PUSH_BRANCH is set, even if changes.patch is missing/empty and no commit was created. This can rewrite the remote branch pointer and clobber remote history (accidental data loss), particularly when the patch artifact download fails but the job continues.

Issue Context

  • The workflow tracks whether it created a commit via a committed=false/true flag, but the push-to-PUSH_BRANCH path is not conditioned on committed.
  • The patch/artifact download is allowed to fail (continue-on-error: true), and commits occur only when changes.patch exists/is non-empty.
  • As a result, setting inputs.push-branch can cause a force-push of the current HEAD even when this run produced no changes (or the artifact was missing), effectively resetting/overwriting the target branch.

Fix Focus Areas

  • .github/workflows/commit-changes.yml[47-69]


                     PR 17499 (2026-05-18)                    
[reliability] `pkg_dir` unquoted on Windows
`pkg_dir` unquoted on Windows The new `pnpm` invocation passes an absolute `--dir` path that may contain spaces, but `Bazel.execute` builds a Windows command string via `cmd.join(' ')` without quoting. This can break the `node:install` task on Windows and violates the safe-argument-handling requirement for scripts.

Issue description

node:install now computes pkg_dir via File.expand_path(...) and passes it to Bazel.execute as --dir <path>. On Windows, Bazel.execute assembles a single command line with cmd.join(' ') (no quoting), so any space-containing path will break argument parsing.

Issue Context

This change introduces a new space-sensitive argument (--dir) that was not present previously.

Fix Focus Areas

  • rake_tasks/node.rake[117-118]
  • rake_tasks/bazel.rb[26-32]


                     PR 17497 (2026-05-18)                    
[correctness] Wrong exec_root mapping
Wrong exec_root mapping `exec_root` is derived from `ctx.bin_dir.path` using `split("/bin/")`, which can leave it pointing at the output bin directory (or include `/bin/` at the end) instead of the Bazel execution root, so `/pathmap` may not match and builds remain non-deterministic. This also breaks when the path ends with `/bin` (no trailing slash), since the split won’t remove the bin segment at all.

Issue description

The rule claims to map the Bazel execution root, but computes exec_root from ctx.bin_dir.path using split("/bin/"). This is fragile (won’t strip when the string ends with /bin) and does not reliably represent the execution root, so the /pathmap prefix may not match the paths that end up embedded in PDBs.

Issue Context

Other code in this repo that truly needs the execution root obtains it at runtime via bazel info execution_root and handles Windows path formatting explicitly.

Fix Focus Areas

  • dotnet/private/sourcelink.bzl[29-81]
  • dotnet/private/docfx.bzl[35-49]

Suggested direction

  • Stop using ctx.bin_dir.path.split("/bin/").
  • Either:
    • Derive the execroot/workspace root by stripping from the /bazel-out/ segment (as other repo code does), using a robust operation like rsplit/rpartition (and handle both .../bin and .../bin/... cases), or
    • Move execroot detection to execution time (similar to dotnet/private/docfx.bzl using bazel info execution_root) by introducing a small wrapper that expands the correct /pathmap argument before invoking the compiler.

[reliability] Windows separator mismatch
Windows separator mismatch The new `/pathmap` uses POSIX separators and is passed even when `is_windows` selects the `.bat` wrapper, but other Windows rules in this repo explicitly convert `/` to `\` before invoking tools. If the compiler wrapper or compiler sees `\`-style paths, the `/pathmap` prefix won’t match, leaving Windows builds non-deterministic.

Issue description

On Windows, the rule still constructs exec_root/pathmap using forward slashes and the POSIX-only split("/bin/"), but the rest of the repo treats Windows tool invocations as requiring backslash paths. This can prevent /pathmap from applying on Windows.

Issue Context

Existing Windows actions convert Bazel-provided paths (which often contain /) into \ before invoking cmd.exe or writing .bat files.

Fix Focus Areas

  • dotnet/private/sourcelink.bzl[31-43]
  • dotnet/private/sourcelink.bzl[78-81]
  • dotnet/private/copy_files.bzl[13-31]
  • dotnet/private/docfx.bzl[18-24]

Suggested direction

  • When is_windows is true, build a Windows-normalized pathmap (e.g., convert the computed prefix to \ consistently) or ensure the wrapper uses consistent separator normalization for both the paths being mapped and the /pathmap prefix.


                     PR 17493 (2026-05-17)                    
[reliability] CI may skip all tests
CI may skip all tests `ci.yml` downloads the `check-targets` artifact with `digest-mismatch: warn` and then treats a missing/unreadable `bazel-targets.txt` as an empty string, which can cause every language job to be skipped. The final `ci-success` gate only fails on failures/cancellations, so a run with all test jobs skipped can still pass.

Issue description

The Read Targets job can succeed while effectively selecting zero test jobs if bazel-targets.txt is missing/unreadable (it falls back to an empty string). Combined with digest-mismatch: warn, this makes it more likely for a corrupted/invalid artifact to allow the workflow to proceed and still report overall success.

Issue Context

Downstream test jobs are guarded by if: needs.read-targets.outputs.<lang> != '', and ci-success only checks for failures/cancellations (skipped jobs do not fail the workflow).

Fix Focus Areas

  • .github/workflows/ci.yml[68-92]
  • .github/workflows/ci.yml[130-141]

Suggested fix

Make the Read Targets job fail (or fall back to running a safe default like the full test suite) when bazel-targets.txt is missing/empty/unparseable. Also consider making digest mismatch fatal for the check-targets artifact specifically, since it drives which tests run.


[security] `digest-mismatch` set to `warn`
`digest-mismatch` set to `warn` Multiple CI workflows configure `actions/download-artifact@v8` with `digest-mismatch: warn`, allowing artifact integrity mismatches to proceed without failing the job and weakening determinism. In release/publish flows this is especially risky because the workflows can immediately publish or upload the downloaded artifacts, potentially distributing corrupted outputs despite a detected mismatch.

Issue description

Workflows set digest-mismatch: warn when downloading artifacts via actions/download-artifact@v8, allowing digest mismatches to pass without failing the workflow; in release/publish jobs, this can result in publishing or releasing artifacts even when an integrity mismatch was detected.

Issue Context

The PR updates actions/download-artifact to v8 and explicitly configures digest mismatch handling to warning; per build hardening guidance (PR Compliance ID 14), integrity mismatches should fail deterministically rather than be downgraded. This is particularly important for jobs that publish to PyPI or upload assets to GitHub releases (and other release-like flows), because they use the downloaded artifacts for distribution without any subsequent integrity check.

Fix Focus Areas

  • .github/workflows/ci-build-index.yml[27-30]
  • .github/workflows/ci-rust.yml[238-277]
  • .github/workflows/ci-rust.yml[245-248]
  • .github/workflows/ci.yml[69-72]
  • .github/workflows/commit-changes.yml[49-52]
  • .github/workflows/nightly.yml[80-102]
  • .github/workflows/nightly.yml[89-93]
  • .github/workflows/pin-browsers.yml[35-38]
  • .github/workflows/pre-release.yml[260-264]
  • .github/workflows/release.yml[160-229]
  • .github/workflows/release.yml[171-175]
  • .github/workflows/release.yml[198-203]
  • .github/workflows/release.yml[206-210]
  • .github/workflows/update-documentation.yml[99-103]


                     PR 17490 (2026-05-16)                    
[reliability] Tox groups now missing
Tox groups now missing `py/tox.ini` still configures `dependency_groups = lint/validate`, but this PR removes the corresponding `[dependency-groups]` definitions from `py/pyproject.toml`. As a result, `tox -e linting` and `tox -e validate` will fail (and won’t install/run `ruff` / `validate-pyproject`).

Issue description

py/tox.ini references dependency groups (dependency_groups = lint / validate) that no longer exist because [dependency-groups] was removed from py/pyproject.toml. This breaks tox -e linting and tox -e validate, and also removes the only declarations of ruff and validate-pyproject used by those envs.

Issue Context

  • tox uses dependency_groups to install tools like ruff and validate-pyproject.
  • This PR deletes the group definitions from py/pyproject.toml, and the new py/requirements.txt does not include those tools either.

Fix approach options (pick one)

  1. Re-add [dependency-groups] to py/pyproject.toml with lint and validate entries (include ruff and validate-pyproject, and any other needed pins/ranges).

  2. Stop using dependency groups in tox:

    • Update py/tox.ini validate and linting envs to install from -r requirements_lock.txt (or a dedicated dev-tools requirements file).
    • Add ruff and validate-pyproject (and any required companions like packaging if needed) to py/requirements.txt, then regenerate py/requirements_lock.txt.

Fix Focus Areas

  • py/pyproject.toml[37-56]
  • py/tox.ini[8-13]
  • py/tox.ini[38-43]
  • py/requirements.txt[6-22]


                     PR 17483 (2026-05-16)                    
[correctness] Dotnet format args reordered
Dotnet format args reordered The wrapper now calls `dotnet format $@`, but repo tooling invokes `//dotnet:format` with positional commands like `style`/`whitespace`/`analyzers` as the first arguments; this changes the resulting CLI shape from `dotnet format style ... ` to `dotnet format style ...`. This is likely to cause parsing failures and break formatting/lint workflows that call `bazel run //dotnet:format -- style ...`.

Issue description

dotnet/private/dotnet_format.bzl changed the invocation from passing user-provided args before the workspace (... format "$@" "$proj") to passing the workspace first (... format "$SOLUTION" "$@"). In this repo, //dotnet:format is invoked with positional commands like style, whitespace, and analyzers (without leading dashes), which are expected to be the first tokens after format.

Issue Context

Existing callers (rake tasks and format scripts) run bazel run //dotnet:format -- style ... and -- whitespace, so the wrapper must preserve the previous ordering where $@ comes before the workspace.

Fix Focus Areas

  • dotnet/private/dotnet_format.bzl[61-63]
  • dotnet/private/dotnet_format.bzl[102-104]

Suggested change

  • Unix: change to "$DOTNET" format "$@" "$SOLUTION" (and adjust the echo if desired)
  • Windows: change to "%DOTNET%" format %* "%SOLUTION%" (keeping quoting correct)


                     PR 17469 (2026-05-15)                    
[reliability] No HTTP timeouts
No HTTP timeouts maven_stable_release uses Net::HTTP.get without open/read timeouts, so `rake java:update` can hang indefinitely on network stalls. This makes dependency updates flaky and can block CI/developer workflows.

Issue description

maven_stable_release fetches maven-metadata.xml via Net::HTTP.get without any timeouts, which can cause rake java:update to hang indefinitely if Maven Central is slow or the connection stalls.

Issue Context

This rake file already uses Net::HTTP.start with explicit open_timeout and read_timeout for other HTTP calls, so timeouts are an established reliability pattern here.

Fix Focus Areas

  • rake_tasks/java.rake[310-323]
  • rake_tasks/java.rake[101-114]

Suggested fix

Replace Net::HTTP.get(URI(url)) with a Net::HTTP.start call that sets use_ssl: true plus open_timeout / read_timeout, and read the response body via Net::HTTP::Get so the call fails fast instead of hanging.



                     PR 17466 (2026-05-15)                    
[correctness] Lockfile pins old multer
Lockfile pins old multer javascript/selenium-webdriver/package.json now requires multer ^2.1.1, but pnpm-lock.yaml still resolves javascript/selenium-webdriver’s devDependency multer to 2.0.2, so Bazel/npm_translate_lock will keep using the old version (or fail due to mismatch). This defeats the intended dependency upgrade and can break CI/reproducible builds.

Issue description

javascript/selenium-webdriver/package.json was updated to require multer: ^2.1.1, but the workspace lockfile (pnpm-lock.yaml) still pins javascript/selenium-webdriver to multer 2.0.2. Because Bazel uses npm_translate_lock with pnpm_lock, the dependency upgrade will not be realized (or lock verification may fail due to specifier mismatch).

Issue Context

Bazel’s npm_translate_lock consumes javascript/selenium-webdriver/package.json together with pnpm-lock.yaml. The lockfile must be updated whenever a workspace package.json changes dependency specifiers.

Fix Focus Areas

  • pnpm-lock.yaml[179-181]
  • pnpm-lock.yaml[3005-3008]
  • MODULE.bazel[90-104]
  • javascript/selenium-webdriver/package.json[35-50]

What to do

  1. Regenerate/update pnpm-lock.yaml so the importers.javascript/selenium-webdriver.devDependencies.multer specifier matches ^2.1.1 and the resolved version is >=2.1.1.
  2. Ensure the packages: section no longer pins only multer@2.0.2 and includes the updated resolved version entry.
  3. Re-run the Bazel JS dependency translation step (if part of your workflow) to confirm it succeeds with the updated lock.


                     PR 17463 (2026-05-15)                    
[reliability] Missing report validation
Missing report validation scripts/update_py_deps.py treats missing packages in pip’s JSON report as “no change”, so it can print “All repo tooling requirements are up to date!” even when version resolution didn’t return entries for some/all requested requirements. This can silently leave py/requirements.txt stale while the command exits successfully.

Issue description

scripts/update_py_deps.py builds installed from report.get("install", []) and then uses installed.get(name_normalized); missing entries are treated as unchanged, and if updates stays empty the script prints a success message and exits. This can mask cases where the report is empty/partial and no versions were actually resolved.

Issue Context

This script is used to update pinned versions in py/requirements.txt and is invoked via Bazel (bazel run //scripts:update_py_deps). A silent success with no updates when resolution data is missing makes dependency maintenance unreliable.

Fix Focus Areas

  • scripts/update_py_deps.py[54-73]

Suggested fix

  • After parsing the report, validate that every name_normalized from packages exists in installed.
    • If any are missing, raise a RuntimeError that includes a helpful diagnostic (e.g., mention missing names and include result.stderr and a truncated result.stdout).
  • Optionally wrap json.loads(result.stdout) in try/except json.JSONDecodeError and raise a clearer error including stderr/stdout context.

[reliability] `pip --report` JSON unvalidated
`pip --report` JSON unvalidated The new dependency resolution flow parses `pip --report -` output with `json.loads(result.stdout)` and then assumes specific keys exist, which can crash with non-actionable `JSONDecodeError`/`KeyError` if pip emits non-JSON output or the report schema differs. This violates the requirement to validate external/protocol inputs and raise deterministic, descriptive exceptions.

Issue description

scripts/update_py_deps.py parses pip --report - output without validating that the output is valid JSON or that expected keys exist, which can produce unclear crashes (e.g., JSONDecodeError, KeyError).

Issue Context

This script consumes external/protocol-derived data (pip report JSON). Per compliance, failures should be deterministic and actionable (explicit exceptions indicating what was wrong and how to fix).

Fix Focus Areas

  • scripts/update_py_deps.py[54-56]

Suggested implementation notes

  • Wrap json.loads(result.stdout) in try/except json.JSONDecodeError and raise a RuntimeError (or custom exception) that includes a short message and relevant stderr/stdout context.
  • Validate report is a dict and that report.get("install") is a list.
  • When building installed, use .get() accessors (or explicit checks) and raise a descriptive exception if required fields like metadata.name/metadata.version are missing.

[reliability] py:update breaks on Windows
py:update breaks on Windows The py:update rake task now runs //scripts:update_py_deps, but update_py_deps.py hardcodes the venv pip location as /bin/pip, which does not exist on Windows (venv uses Scripts/). This will make ./go py:update fail on Windows environments with a FileNotFoundError when trying to execute pip.

Issue description

py:update now invokes //scripts:update_py_deps, but scripts/update_py_deps.py assumes a POSIX virtualenv layout (venv/bin/pip). On Windows the venv executables live under venv\Scripts\ (e.g., pip.exe), so the script will fail when it tries to run pip.

Issue Context

This regression is introduced by changing py:update from an alias of py:pin to executing //scripts:update_py_deps.

Fix Focus Areas

  • scripts/update_py_deps.py[34-71]
  • rake_tasks/python.rake[127-136]

Suggested fix

Update scripts/update_py_deps.py to invoke pip via the venv’s python in a cross-platform way:

  • Determine the venv python path with Scripts/python.exe on Windows and bin/python on POSIX, then run python -m pip ....
  • Avoid directly calling venv_dir / "bin" / "pip".

Example sketch:

import os
bin_dir = "Scripts" if os.name == "nt" else "bin"
python = venv_dir / bin_dir / ("python.exe" if os.name == "nt" else "python")

subprocess.run([str(python), "-m", "pip", "install", "-q", "--upgrade", "pip"], ...)
subprocess.run([str(python), "-m", "pip", "install", "-q", *install_names], ...)
subprocess.run([str(python), "-m", "pip", "list", "--format=json"], ...)


                     PR 17452 (2026-05-13)                    
[maintainability] Docs install wrong requirements
Docs install wrong requirements README.md and py/docs/.readthedocs.yaml still instruct `pip install -r py/requirements.txt`, but after this PR that file no longer includes selenium runtime dependencies. Users (and the ReadTheDocs build) can end up with missing imports when generating docs or running selenium from source.

Issue description

The PR changes py/requirements.txt to be dev-tooling only, but documentation still instructs installing only that file. This omits selenium runtime dependencies and can break doc generation and local-from-source usage.

Issue Context

py/requirements_lock.txt already contains the full resolved set (including runtime deps coming from py/pyproject.toml). Updating docs to install from the lock file (or otherwise install runtime deps as well) keeps a single workflow aligned with the Bazel lock generation.

Fix Focus Areas

  • README.md[235-242]
  • py/docs/.readthedocs.yaml[10-18]

[reliability] Tox missing runtime deps
Tox missing runtime deps py/tox.ini installs only py/requirements.txt, but this PR removes runtime dependencies (e.g., urllib3[socks], websocket-client, typing_extensions) from that file, so tox docs/typecheck will run with selenium on PYTHONPATH but without its runtime imports installed. This will cause sphinx/mypy failures due to missing modules (e.g., urllib3, websocket).

Issue description

py/tox.ini currently installs dependencies via -r {toxinidir}/requirements.txt for docs and typecheck, but py/requirements.txt is now dev-tooling only and no longer includes selenium runtime deps (urllib3, websocket-client, typing-extensions, etc.). Since tox runs with PYTHONPATH={toxinidir}/. and Sphinx autodoc/mypy will import selenium modules, these environments will fail with missing-module errors.

Issue Context

The combined pinned set (dev + runtime) is already available in py/requirements_lock.txt (generated from both pyproject.toml and requirements.txt via //py:requirements.update).

Fix Focus Areas

  • py/tox.ini[15-36]


                     PR 17450 (2026-05-13)                    
[reliability] Rust version leaves stale lock
Rust version leaves stale lock `rust:version` no longer invokes `rust:update`, but callers like `release_updates` still call `rust:version` without subsequently regenerating `rust/Cargo.lock`. This leaves `rust/Cargo.lock`’s `selenium-manager` version entry unchanged even after updating `rust/Cargo.toml`, making the version bump incomplete.

Issue description

The PR removes the implicit rust:update invocation from rust:version. At least one existing caller (release_updates) invokes rust:version but does not invoke rust:update, so rust/Cargo.lock will no longer be regenerated/synced as part of that flow.

Issue Context

  • rust:update is explicitly described/implemented as regenerating rust/Cargo.lock.
  • rust/Cargo.lock contains a selenium-manager package entry with an explicit version.
  • release_updates calls rust:version but does not call rust:update.

Fix Focus Areas

  • Rakefile[104-125]
  • rake_tasks/rust.rake[26-33]
  • rust/Cargo.lock[1858-1861]

What to change

  • Update release_updates (and any other flows that require a synced Cargo.lock) to explicitly invoke rust:update after rust:version.
  • If repeated invocations are expected in the same process, mirror the previous approach by re-enabling the task before invoking it.


                     PR 17442 (2026-05-12)                    
[correctness] Approval workflow skipped
Approval workflow skipped For whitelisted `github.triggering_actor` values, `get-approval.yml` skips `notify`, and `authorize` is gated on `needs.notify.result == 'success'`, so both jobs are skipped and the reusable workflow can conclude as `skipped` instead of `success`. `restrict-trunk.yml` only proceeds with trunk management when the reusable `get-approval` job result is `success` (unless `skip_approval` is set), which can block trunk restriction at the start of pre-release/release workflows.

Issue description

Whitelisted actors cause the Get Approval reusable workflow to run zero jobs (notify is skipped; authorize is skipped because it requires notify success). This can make the reusable workflow job conclude as skipped, which downstream workflows treat as not-approved.

Issue Context

  • restrict-trunk.yml requires needs.get-approval.result == 'success' to run manage-trunk when restrict: true.
  • pre-release.yml (and release flows) call restrict-trunk.yml with restrict: true, so a skipped approval job can prevent trunk restriction and cascade into the rest of the workflow being skipped.

Fix Focus Areas

  • .github/workflows/get-approval.yml[20-46]
  • .github/workflows/restrict-trunk.yml[38-55]
  • .github/workflows/pre-release.yml[76-89]

Suggested change

Create an explicit success path for whitelisted actors (e.g., an auto-authorize job without an environment) so the called workflow concludes success when the actor is in the whitelist.

Example structure:

  • Keep notify only for non-whitelisted actors.
  • Keep authorize (with environment: production) only for non-whitelisted actors.
  • Add auto-authorize (no environment) for whitelisted actors.

This ensures exactly one of authorize / auto-authorize runs, and the reusable workflow result is success in both cases.


[reliability] No unlock on cancel
No unlock on cancel `unlock-trunk-on-failure` only runs when `failure()` is true, so if the pre-release workflow is cancelled after `restrict-trunk` succeeds, the auto-unlock path will not trigger and trunk can remain restricted. This can leave the repo locked until someone manually runs the Unlock Trunk workflow.

Issue description

The auto-unlock job in pre-release only triggers on failures (failure()), not on cancellations, so a cancelled run after trunk is restricted can leave trunk locked.

Issue Context

Trunk restriction is applied early in the workflow; cancellation mid-run is a plausible operational scenario (manual cancel, timeouts, concurrency cancellations), and leaving trunk restricted requires manual intervention.

Fix Focus Areas

  • .github/workflows/pre-release.yml[339-349]

Suggested change

Adjust the condition to also cover cancellations, e.g.:

  • if: (failure() || cancelled()) && needs.restrict-trunk.result == 'success'

If you find job-level cancellation prevents this job from starting in your environment, consider moving unlock into a separate workflow_run cleanup workflow that triggers on completion of the pre-release workflow and unlocks when the conclusion is failure or cancelled and trunk had been restricted.



                     PR 17439 (2026-05-11)                    
[reliability] Release check not fail-safe
Release check not fail-safe The nightly workflow only sets `release-in-progress=true` when the rulesets query succeeds and returns true; if `gh api` fails (e.g., auth/API error), the condition evaluates false and the output is left unset, so nightly releases proceed despite an unknown release state. This can cause the nightly pipeline to run during an active release or when the ruleset check is unavailable.

Issue description

The prepare job’s “Check if release ruleset is active” step only writes release-in-progress=true in the success path. If the gh api call fails, the step produces no output, and downstream logic treats it as not in progress.

Issue Context

needs.prepare.outputs.release-in-progress gates the nightly release job. An unset output is not equal to 'true', so the nightly release will run even when the ruleset check failed.

Fix Focus Areas

  • .github/workflows/nightly.yml[52-61]

Suggested fix

Update the step to be fail-safe and to always write an explicit output value.

For example:

  • Call gh api and check its exit status.
  • If the API call fails, emit a warning and set release-in-progress=true (safe default).
  • Otherwise set release-in-progress=true|false based on the jq result.

(Any equivalent approach that distinguishes “false” from “error” and always sets the output is fine.)


[reliability] `jq any(.[])` assumes array
`jq any(.[])` assumes array The ruleset existence check uses `jq -e any(.[]; .name == $n)` against `$existing` without ensuring `$existing` is a single JSON array, so API errors or multi-document output (e.g., from `gh api --paginate`) can cause parse/iteration failures or make `jq` decide based only on the last page, potentially missing an existing ruleset and creating a duplicate that breaks or complicates the trunk restrict/unrestrict workflow.

Issue description

The workflow’s ruleset existence check uses jq -e --arg n "$name" 'any(.[]; .name == $n)' <<<"$existing", which assumes $existing is a single valid JSON array. When gh api ... --paginate emits multiple JSON documents (one per page), or when the API returns unexpected/error output, this can cause parse/iteration failures or make the check effectively depend on only the last page, potentially missing an existing ruleset and creating a duplicate, disrupting trunk restrict/unrestrict behavior.

Issue Context

This job locks/unlocks trunk during release; failures can block releases or leave trunk in the wrong state. Missing an existing ruleset due to pagination can create duplicate rulesets with the same .name, making subsequent deletion/unlock behavior ambiguous or incomplete.

Fix Focus Areas

  • .github/workflows/restrict-trunk.yml[61-67]

[reliability] `gh api` missing pagination
`gh api` missing pagination The workflow captures `existing` rulesets via `gh api` without `--paginate`, which can silently omit rulesets beyond the first page. This can cause non-deterministic behavior (e.g., failing to detect or delete some release rulesets), potentially leaving trunk incorrectly restricted/unrestricted.

Issue description

The workflow lists existing rulesets using gh api "repos/$GH_REPO/rulesets" but does not use --paginate, so it may only read the first page of results.

Issue Context

The create/delete logic relies on the existing JSON to determine whether to POST new rulesets and which IDs to DELETE. If the repository has more rulesets than fit in one API page, some relevant rulesets may be missed.

Fix Focus Areas

  • .github/workflows/restrict-trunk.yml[62-63]
  • .github/workflows/restrict-trunk.yml[77-78]

[reliability] Unvalidated `$id` in delete
Unvalidated `$id` in delete The workflow deletes rulesets by using a `$id` derived from `gh api` name matching without validating that exactly one ruleset ID was found, so empty or multiple matches can yield an invalid DELETE URL and non-deterministic failures that can block trunk unlock. This violates the requirement to harden scripts by validating preconditions and handling external/config inputs deterministically.

Issue description

Harden the ruleset deletion logic so it does not blindly construct and call DELETE .../rulesets/$id from gh api output; instead, it must validate that the name lookup yields a safe, deterministic set of IDs (and handle empty/multiple matches explicitly) to avoid invalid URLs and non-deterministic failures during unlock.

Issue Context

This is release automation: the unlock path (Delete release rulesets) runs when unlock-trunk.yml calls the workflow with restrict: false. Rulesets are created via POST in a loop and can be recreated on reruns, so duplicate names/IDs are possible; if the lookup returns empty output (not found) or multiple newline-separated IDs (duplicates), the current deletion step can fail unpredictably, produce unclear errors, and block trunk unlock. Consider also validating that the glob .github/rulesets/release-*.json matches at least one file before looping, and fail with a clear message if none are found.

Fix Focus Areas

  • .github/workflows/restrict-trunk.yml[58-64]
  • .github/workflows/restrict-trunk.yml[65-73]
  • .github/workflows/unlock-trunk.yml[8-16]

[reliability] Ruleset creation duplicates
Ruleset creation duplicates The restrict path always POSTs new rulesets from JSON files without checking whether rulesets with the same name already exist, so reruns can create duplicates and make later deletion-by-name ambiguous.

Issue description

The Create release rulesets step uses POST /rulesets for every JSON file every time restriction is enabled. This can create duplicate rulesets on reruns and interacts badly with name-based deletion.

Issue Context

The delete step selects by .name, so duplicates will either break deletion (multi-id) or leave extra rulesets behind.

Fix Focus Areas

  • .github/workflows/restrict-trunk.yml[58-64]
  • .github/workflows/restrict-trunk.yml[65-73]

Implementation notes

Choose one:

  • Upsert approach: before POST, query existing rulesets by name; if found, PATCH/PUT that ID (or delete then recreate).
  • Delete-then-create approach: on restrict, delete any existing matching rulesets by name first, then create exactly one. In all cases, ensure the script handles 0/1/N matches deterministically.


                     PR 17436 (2026-05-11)                    
[reliability] Hanging stream LINQ query
Hanging stream LINQ query NetworkTests.CanDisownData awaits FirstAsync() on an event stream without any cancellation/timeout, so if the matching ResponseCompleted event never arrives the test can hang indefinitely and stall the suite.

Issue description

CanDisownData awaits an async stream LINQ query ending in FirstAsync() without providing a CancellationToken or any timeout. If no matching event is produced, the test can hang indefinitely.

Issue Context

Other tests in this repo bound async stream waits using a CancellationTokenSource(TimeSpan...) (e.g., sub.FirstAsync(cts.Token)) or WaitAsync(TimeSpan...) to avoid CI stalls.

Fix Focus Areas

  • dotnet/test/webdriver/BiDi/Network/NetworkTests.cs[290-304]

Suggested change

Wrap the FirstAsync with a timeout, e.g.:

  • create using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(5));
  • call await stream.Where(...).Select(...).FirstAsync(cts.Token); (or equivalently use WaitAsync(TimeSpan.FromSeconds(5)) on the returned task/value-task).


                     PR 17435 (2026-05-11)                    
[reliability] Pending frames leak onClose
Pending frames leak onClose On a pre-handshake upstream `onClose`, the listener records the close and closes the `HttpClient` but intentionally keeps buffered `WebSocketFrame`s for a future `onUpgrade` drain. If the client-side upgrade never completes (e.g., client disconnects mid-handshake), those buffered ref-counted frames are never released, risking a Netty buffer leak.

Issue description

DirectForwardingListener keeps pre-handshake buffered WebSocketFrames on upstream onClose so they can be drained during onUpgrade. If the client-side upgrade never happens, those buffered (ref-counted) frames never get released, creating a potential memory/leak-detector issue.

Issue Context

The code already treats buffered frames as ref-counted (it calls frame.release() on other terminal paths like overflow), but the pre-handshake onClose path retains the buffer without any guaranteed later release.

Fix Focus Areas

  • java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[333-353]
  • java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[295-313]

[reliability] Pre-upgrade close not forwarded
Pre-upgrade close not forwarded If upstream onClose/onError happens before the client-side upgrade completes, the listener sets a terminal "closed" state without retaining close details, and onUpgrade later publishes the channel without closing it—so the client WebSocket can remain open even though upstream is already gone. This can lead to downstream clients hanging until external timeouts/keepalives rather than receiving a close handshake promptly.

Issue description

When the upstream WebSocket closes/errors before the downstream WebSocket upgrade completes, DirectForwardingListener latches closed=true but drops the close details and onUpgrade(Channel) does not react to the terminal state. If the downstream upgrade completes later, the client channel may be published and left open without a close handshake.

Issue Context

  • Upstream connection is initiated before the Netty handshake completes, so pre-upgrade upstream terminal events are possible.
  • closeClient() only closes the upstream HttpClient, not the downstream Netty channel.

Fix approach

  • Record a terminal state for pre-upgrade close/error (e.g., store closeCode/closeReason, or store a CloseWebSocketFrame to be written post-upgrade).
  • In onUpgrade(Channel ch), if a terminal state was observed (closed==true), immediately write a close frame (if applicable) and close the channel (or close without sending if protocol requires), instead of publishing the channel for normal forwarding.
  • Add a unit test that calls listener.onClose(...) (or onError(...)) before listener.onUpgrade(channel) and asserts the channel is closed and/or a CloseWebSocketFrame is emitted.

Fix Focus Areas

  • java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[241-306]
  • java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[308-345]
  • java/test/org/openqa/selenium/grid/router/DirectForwardingListenerTest.java[1-106]

[reliability] Unbounded frame buffer
Unbounded frame buffer DirectForwardingListener buffers pre-upgrade frames in an unbounded deque with no cap, so if the client upgrade is delayed/stalled while upstream is producing frames, memory can grow without bound (including retaining ref-counted frame buffers). This risks router memory pressure or OOM under pathological conditions.

Issue description

DirectForwardingListener uses an unbounded Deque<WebSocketFrame> to buffer frames arriving before onUpgrade(Channel) is called. If the downstream upgrade stalls while upstream continues sending, the deque can grow without limit, retaining ref-counted frames and risking memory exhaustion.

Issue Context

  • Buffering is correct for ordering, but needs a safety valve.

Fix approach

  • Add a hard limit (by frame count and/or total bytes) for pending.
  • When the limit is exceeded, choose a deterministic policy: drop newest/oldest with release(), and/or close the downstream channel once available; also log at WARNING with session context.
  • Consider a time-based cutoff (e.g., if upgrade hasn’t completed within N seconds, discard pending and mark closed).

Fix Focus Areas

  • java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[220-285]
  • java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[331-336]


                     PR 17433 (2026-05-10)                    
[correctness] Mypy devtools override mismatch
Mypy devtools override mismatch `tool.mypy.overrides.module` contains a slash-separated pattern (`selenium/webdriver/common/devtools.*`) that won’t match real Python module names, so the override won’t apply and mypy will still typecheck/flag generated DevTools modules when they exist in-tree.

Issue description

The mypy override intended to ignore generated DevTools modules uses a slash-separated path pattern (selenium/webdriver/common/devtools.*) which does not match Python module names, so ignore_errors will not take effect.

Issue Context

  • py/run_mypy.py runs mypy against the selenium package.
  • Generated devtools code lives under selenium/webdriver/common/devtools/... and is imported as selenium.webdriver.common.devtools....

Fix

Update the override module pattern to dotted form, e.g. selenium.webdriver.common.devtools.* (and optionally also include selenium.webdriver.common.devtools to cover the package __init__.py).

Fix Focus Areas

  • py/pyproject.toml[139-145]


                     PR 17432 (2026-05-10)                    
[security] PAT used for checkout
PAT used for checkout The pin-browsers create-pr job checks out the repository using the write-scoped SELENIUM_CI_TOKEN even though checkout itself only needs read access, increasing the exposure surface of that secret in the job workspace/git config. This violates least-privilege and makes accidental leakage more impactful.

Issue description

pin-browsers.yml uses the write-scoped secrets.SELENIUM_CI_TOKEN for actions/checkout, which is broader than necessary for a read-only clone and increases secret exposure.

Issue Context

The same job already provides SELENIUM_CI_TOKEN to peter-evans/create-pull-request, so checkout does not need to be performed with the PAT.

Fix

  • Change the checkout step to use the default github.token (or omit token: entirely).
  • Optionally set persist-credentials: false on checkout to avoid leaving credentials in the local git config.

Fix Focus Areas

  • .github/workflows/pin-browsers.yml[29-32]


                     PR 17426 (2026-05-08)                    
[correctness] `assumeThat` not imported
`assumeThat` not imported The newly added `assumeThat(...)` calls will not compile because there is no static import (or qualified reference) for `assumeThat`. This breaks the test build and prevents the intended Chrome-version gating from executing.

Issue description

assumeThat(...) is used in ContentEditableTest but is not imported or qualified, causing compilation failures and preventing the intended Chrome-version test gating.

Issue Context

Other tests in this repo use import static org.assertj.core.api.Assumptions.assumeThat; (or qualify the call via Assumptions.assumeThat(...)). This file currently only imports assertThat and JUnit's assumeFalse.

Fix Focus Areas

  • java/test/org/openqa/selenium/ContentEditableTest.java[20-33]
  • java/test/org/openqa/selenium/ContentEditableTest.java[101-106]
  • java/test/org/openqa/selenium/ContentEditableTest.java[115-123]


                     PR 17423 (2026-05-07)                    
[reliability] Connected true after close
Connected true after close The WebSocket 'open' handler unconditionally sets connected=true even if close()/_failPending() already marked the connection closed, which can leave the instance in an inconsistent state (_closed===true but isConnected===true). This can happen when close() is called during CONNECTING and the handshake completes anyway before the close event fires.

Issue description

The WebSocket open handler sets this.connected = true even if _closed was already set by close()/_failPending(), allowing isConnected to become true after the connection is logically closed.

Issue Context

close() calls _failPending() immediately (setting _closed=true), but does not remove/guard the constructor open listener. If open fires after close() is initiated, the state becomes inconsistent.

Fix Focus Areas

  • javascript/selenium-webdriver/bidi/index.js[39-46]
  • javascript/selenium-webdriver/bidi/index.js[96-113]
  • javascript/selenium-webdriver/bidi/index.js[275-295]

Suggested change

In the open handler, early-return if this._closed is true (and optionally immediately this._ws.close()/terminate()), so connected cannot be re-enabled after closure.


[maintainability] `waitForConnection()` now rejects
`waitForConnection()` now rejects `waitForConnection()` previously only resolved (or could hang) but now rejects when the socket is closed or closes before opening, which can break callers that do not handle promise rejection. This changes the public API contract/behavior for a user-facing interface.

Issue description

waitForConnection() now rejects on closed/failed connections, which is a user-visible behavioral change that may break downstream code that previously awaited it without handling rejections.

Issue Context

This class is exported from selenium-webdriver/bidi, so changes to promise rejection behavior can be compatibility-impacting.

Fix Focus Areas

  • javascript/selenium-webdriver/bidi/index.js[134-155]

[reliability] close() can hang waiters
close() can hang waiters Index.waitForConnection() depends on a WebSocket 'close'/'error' listener to reject when the socket dies before opening, but Index.close() removes all 'close' listeners before initiating ws.close(), which can remove waitForConnection()’s onFailure handler. If close() is called while a send() is awaiting waitForConnection(), that wait can remain pending indefinitely.

Issue description

waitForConnection() uses this._ws.once('close'|'error', ...) to reject when the socket fails before opening. But close() calls this._ws.removeAllListeners('close') before triggering this._ws.close(), which can remove waitForConnection()’s failure handler if a caller closes while another caller is awaiting connection.

Issue Context

This can lead to an indefinitely pending waitForConnection() (and therefore send()) promise during connection setup/cancellation scenarios.

Fix Focus Areas

  • javascript/selenium-webdriver/bidi/index.js[133-155]
  • javascript/selenium-webdriver/bidi/index.js[272-291]

Suggested approach

  • Avoid removeAllListeners('close') before the socket actually closes. Instead:
    • Remove only the internal close handler(s) you own (store them as named/bound functions so they can be removed explicitly), or
    • Defer removeAllListeners() until inside the once('close', ...) callback (after other listeners like waitForConnection() have had a chance to run).
  • Optionally, make _failPending() also reject/resolve a tracked “connect promise” if you introduce one, so close() can reliably unblock waitForConnection() without relying on WS listener ordering.

[observability] Generic error on connect
Generic error on connect waitForConnection() rejects with "BiDi connection closed before opening" even when the WebSocket emits an 'error' event before 'open', discarding the underlying error message. This makes connection failures harder to diagnose for callers awaiting waitForConnection().

Issue description

waitForConnection() uses a single onFailure handler for both 'close' and 'error' before the socket opens, but it always rejects with BiDi connection closed before opening, losing the real connection error detail.

Issue Context

This only affects the pre-open wait path (callers awaiting waitForConnection() / send() while not connected). Including the underlying error message improves diagnosability.

Fix Focus Areas

  • javascript/selenium-webdriver/bidi/index.js[143-154]

Suggested change

Split the handlers:

  • onClose -> reject with the current close message
  • onError(err) -> reject with something like BiDi connection error before opening: ${err.message} Also keep the listener cleanup symmetric (remove the other listeners on success/failure).

[reliability] send() can hang forever
send() can hang forever After a socket 'close'/'error', _failPending() sets connected=false, but send() will then await waitForConnection(), which only resolves on a future 'open' event; on a closed/failed WebSocket this can leave new send() calls pending indefinitely. This can stall higher-level BiDi helpers that call bidi.send() without checking connection state.

Issue description

After _failPending() runs, connected becomes false, and future send() calls can await waitForConnection() forever because there will be no future 'open' event on a closed/failed socket.

Issue Context

_failPending() is triggered on the underlying WebSocket 'close'/'error' events and marks the connection closed. Many BiDi helper classes call bidi.send() directly; if the connection drops and a command is attempted afterward, the call can hang indefinitely.

Fix Focus Areas

  • javascript/selenium-webdriver/bidi/index.js[91-101]

Suggested fix

Add a fast-fail guard in send() (or waitForConnection()):

  • If this._closed is true, immediately reject/throw a clear error.
  • Also consider checking this._ws.readyState and rejecting when CLOSING/CLOSED.
  • Optionally, have waitForConnection() reject on 'close'/'error' if it is waiting for 'open'.

[correctness] Invalid BiDi messages ignored
Invalid BiDi messages ignored The new shared `message` dispatcher silently drops JSON parse failures and messages without a numeric `id`, which can turn protocol/input issues into misleading request timeouts and makes diagnosis non-actionable. Protocol-derived inputs should be validated with deterministic, actionable errors rather than being accepted/dropped silently.

Issue description

The shared WebSocket message handler swallows protocol/input problems (JSON parse failures or unexpected payload shapes) by returning early, which can cause callers to see unrelated timeouts and makes failures hard to diagnose.

Issue Context

Messages received over the BiDi WebSocket are protocol-derived external inputs. When they are malformed/unexpected, we should surface a deterministic, actionable error (e.g., emit an error event and/or reject impacted pending requests) rather than silently dropping the message.

Fix Focus Areas

  • javascript/selenium-webdriver/bidi/index.js[45-54]

[reliability] Unexpected close stalls sends
Unexpected close stalls sends Index only rejects pending send() calls on timeout or explicit close(); it does not handle underlying WebSocket 'close'/'error' events. If the peer disconnects mid-request, callers can hang until RESPONSE_TIMEOUT (30s) instead of failing immediately.

Issue description

Index tracks in-flight requests in _pending, but it does not subscribe to the underlying WebSocket 'close' / 'error' events. If the connection drops without an explicit Index.close() call, pending send() promises will only fail after RESPONSE_TIMEOUT, delaying failure propagation.

Issue Context

  • close() now correctly rejects _pending when invoked, but unexpected disconnects still leave callers waiting up to 30 seconds.
  • Since there is now a single dispatcher and centralized _pending, the most reliable place to fail outstanding requests is in a single socket-level close/error handler.

Fix Focus Areas

  • javascript/selenium-webdriver/bidi/index.js[33-63]
  • javascript/selenium-webdriver/bidi/index.js[217-241]

Suggested approach

  • Add _ws.on('close', ...) and _ws.on('error', ...) handlers in the constructor.
  • In those handlers:
    • set this.connected = false
    • iterate _pending.values() and clearTimeout(timeoutId) then reject(new Error(...))
    • _pending.clear()
  • Ensure handlers are idempotent (e.g., guard with a boolean like this._closed = true) so close() and 'close' event don’t double-reject.
  • (Optional) extend tests to simulate server-side disconnect and assert pending sends reject promptly (without waiting for the timeout).


                     PR 17421 (2026-05-07)                    
[reliability] Missing artifact download perms
Missing artifact download perms The publish-python job sets job-level permissions to only `id-token: write`, overriding the workflow default and stripping all GITHUB_TOKEN scopes. As a result, `actions/download-artifact@v4` may be unable to download `pypi-distributions`, causing the PyPI publish step to fail.

Issue description

publish-python sets permissions to only id-token: write, which overrides the workflow defaults and leaves the job’s GITHUB_TOKEN without the scopes that actions/download-artifact@v4 may require to fetch artifacts. This can break Python publishing because the pypi-distributions artifact may not download.

Issue Context

This workflow already uses a broader permission set in other jobs that perform authenticated GitHub API operations (including artifact downloads).

Fix Focus Areas

  • .github/workflows/release.yml[96-110]

Suggested change

Update publish-python permissions to include the minimal required scopes for artifact download, e.g.:

permissions:
  id-token: write
  contents: read
  actions: read

(Keep id-token: write for PyPI trusted publishing.)



Clone this wiki locally