Skip to content

Hash64: hybrid inline/getBytes UTF-8 path in updateString#1236

Merged
brharrington merged 1 commit into
Netflix:mainfrom
brharrington:h64-alloc
May 28, 2026
Merged

Hash64: hybrid inline/getBytes UTF-8 path in updateString#1236
brharrington merged 1 commit into
Netflix:mainfrom
brharrington:h64-alloc

Conversation

@brharrington

Copy link
Copy Markdown
Contributor

Replaces the previous single-path updateString with a length-based hybrid:

  • For length ≤ 128 (typical metric tag values) or any non-String CharSequence: inline UTF-8 char loop, no allocation, no per-call encoder setup overhead.
  • For longer String inputs: fall back to getBytes(UTF_8) + Unsafe long-stride updateBytes, where the intrinsified UTF-8 encoder and 8-byte-per-iter writes win on throughput.

Output is bit-identical to updateBytes(s.getBytes(UTF_8)) regardless of which path is taken; the existing Hash64Test.checkString assertions cover both sides of the threshold (random strings of length 0-999).

Other changes:

  • New private writeByte helper handles per-byte stripe writes and rollover; updateByte delegates to it to avoid duplicate logic.
  • build.gradle: pin the jmh task to JDK 25 so benchmarks reflect the modern JIT, independent of the JDK 8 source toolchain.

JMH on JDK 25 (M1; see IdHash.java javadoc for the full table):

  IdHash.hash64             1.25M -> 2.05M ops/s   (+64%)
  IdHash.shortAsciiString     20M -> 22M           (+13%)
  IdHash.unicodeString        12M -> 14M           (+19%)
  IdHash.longAsciiString    4.15M -> 4.20M         (+1%)
  IdHash.hash64_2           2.98M -> 2.73M         (-9%)

Replaces the previous single-path updateString with a length-based
hybrid:

- For length ≤ 128 (typical metric tag values) or any non-String
  CharSequence: inline UTF-8 char loop, no allocation, no per-call
  encoder setup overhead.
- For longer String inputs: fall back to getBytes(UTF_8) +
  Unsafe long-stride updateBytes, where the intrinsified UTF-8
  encoder and 8-byte-per-iter writes win on throughput.

Output is bit-identical to updateBytes(s.getBytes(UTF_8)) regardless
of which path is taken; the existing Hash64Test.checkString assertions
cover both sides of the threshold (random strings of length 0-999).

Other changes:
- New private writeByte helper handles per-byte stripe writes and
  rollover; updateByte delegates to it to avoid duplicate logic.
- build.gradle: pin the jmh task to JDK 25 so benchmarks reflect the
  modern JIT, independent of the JDK 8 source toolchain.

JMH on JDK 25 (M1; see IdHash.java javadoc for the full table):

  IdHash.hash64             1.25M -> 2.05M ops/s   (+64%)
  IdHash.shortAsciiString     20M -> 22M           (+13%)
  IdHash.unicodeString        12M -> 14M           (+19%)
  IdHash.longAsciiString    4.15M -> 4.20M         (+1%)
  IdHash.hash64_2           2.98M -> 2.73M         (-9%)
@brharrington brharrington added this to the 1.9.7 milestone May 28, 2026
@brharrington brharrington merged commit df1a8d6 into Netflix:main May 28, 2026
1 check passed
@brharrington brharrington deleted the h64-alloc branch May 28, 2026 03:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant