Per entry overhead? #1894
javafanboy
started this conversation in
General
Replies: 3 comments 1 reply
-
|
I have a small program that does similar, see the Memory Overhead docs. It uses an older estimation technique whereas a more accurate would be to use Java Object Layout (that is used via build tasks) and the most accurate estimate would come with JEP-8249196. A few important things to note:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Thanks for the reference to this documentation I was not aware it existed -
you really have excellent and detailed docs for this project!
…On Fri, Aug 29, 2025 at 8:18 PM Ben Manes ***@***.***> wrote:
I have a small program that does similar, see the Memory Overhead
<https://github.com/ben-manes/caffeine/wiki/Memory-overhead> docs. It
uses an older estimation technique whereas the most accurate would be to
use Java Object Layout <https://github.com/openjdk/jol> (that is used via build
tasks
<https://github.com/ben-manes/caffeine/blob/master/gradle/plugins/src/main/kotlin/analyze/object-layout.caffeine.gradle.kts>).
The most accurate estimate would come with JEP-8249196
<https://openjdk.org/jeps/8249196>.
A few important things to note:
1. Java memory layout is aligned to machine word boundaries
2. Caffeine uses code generation of specialized classes per
configuration to avoid unused fields (e.g. timestamp if TTL is not used)
3. Caffeine maintains lazily initialized, secondary data structures
(timer wheel, countmin sketch, ring buffers). These are separate from the
entry but might be in your overhead counts.
4. Many caches use customized hash tables to inline their metadata
onto the map's entry, but concurrent hash tables have become very complex
due to performance sensitive. Caffeine instead wraps the value to benefit
from hash table improvements, avoid bugs or being pinned to a buggy fork,
and does so with penalty of a little extra overhead.
5. An on-heap cache is not often massive since there is object bloat,
GC penalty, etc. That leads to layered caches (heap, off-heap, remote)
which can trade-off serialization, compression, etc. costs to apply at the
proper tier. That means the Caffeine cache might be moderately sized where
its hit rate and performance are most critical to avoid missing to
L2/L3/SOR loads, but is small enough that the object/gc penalties are not
as important. On the flip side large remote caches like memcached are less
concerned with hit rates as they often require 99% hits for SLAs (e.g.
Twitter) and the network/serialization costs dominate any data structure
performance, so they care most about capacity planning to minimize
infrastructure costs (maximize entries per mb, aggressively expire to
reclaim space, scale enough to saturate the network link). All of this is
to say the cache's design has to optimize for its target audience since the
tradeoffs change significantly based on the layer it is being used at.
—
Reply to this email directly, view it on GitHub
<#1894 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADXQF7YWO4GOGPVSTPLH333QCKPLAVCNFSM6AAAAACFFA3EY6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIMRVHA2TKNQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
1 reply
-
|
Thanks for the updated metrics - looks like Caffeine is doing quite well!
…On Sun, Aug 31, 2025, 03:32 Ben Manes ***@***.***> wrote:
Since that documentation was from 2016, I updated it against the latest
version and added JOL into the comparison. A small fix was to take the
baseline of the cleared cache after it was initially populated once in
order to ensure the per-entry wasn't miscalculated by lazy initialized side
structures. The results are probably reasonable guesses.
—
Reply to this email directly, view it on GitHub
<#1894 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADXQFYIE7E23TM6OA3JFZT3QJGDFAVCNFSM6AAAAACFFA3EY6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIMRWGU2DEOA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I developed a small prgram that tries to measure the per key-value pair overhead of some caches and it seemed like that of Caffeinie is quite high - almost 110 bytes!
I thought this sounds high - in particular given what metadata is kept I would have thought it would need less than other caches - am I measuring this wrong or could it be correct that it is this high??
Coherence LocalCache ≈ 81 bytes
CaffeineCache ≈ 108 bytes
Guava Cache ≈ 70 bytes
HashMap ≈ 38 bytes
Beta Was this translation helpful? Give feedback.
All reactions