Describe the bug
@mo.persistent_cache generates inconsistent cache keys across separate notebook sessions when the decorated function's body contains a set literal (e.g., {"A", "B", "C", "D", "E"}). This causes cache misses on every run instead of reusing the existing cache, defeating the purpose of persistent caching.
Crucially, the set does not need to be executed, its mere presence in the function body is sufficient to trigger the bug. This suggests the issue lies in how the function body is hashed (e.g., via pickling or bytecode inspection), which is affected by Python's non-deterministic set ordering across processes (PYTHONHASHSEED).
To reproduce
Save the "Code to reproduce" as notebook.py, then run
rm -rf __marimo__/cache/myfunc/ && for i in $(seq 1 10); do python notebook.py; done
Expected behavior
The cache is created on the first run. All subsequent runs reuse the cache and return the same datetime value.
Actual behavior
New caches are frequently created across runs, producing different datetime values:
2026-06-09 09:17:47.313718
2026-06-09 09:17:47.781402
2026-06-09 09:17:48.182414
...
Setting p=True reveals the cause, the set is printed with a different ordering on each run:
{'A', 'C', 'D', 'E', 'B'}
{'E', 'B', 'D', 'C', 'A'}
{'D', 'A', 'C', 'B', 'E'}
...
Workaround
Removing the set literal (or replacing it with a sorted structure like a list or tuple) restores deterministic caching behavior.
Additional context
The non-determinism of set iteration across Python processes (due to PYTHONHASHSEED) is a known Python behavior. The fix likely involves normalizing or sorting set contents before hashing, or using a hash seed-independent serialization strategy (e.g., AST-based hashing rather than bytecode/pickle-based hashing).
Related issues: #3259 (caching robustness and non-determinism), #5542 (non-deterministic pickling of Pydantic models in cache keys).
Will you submit a PR?
Environment
Details
{
"marimo": "0.23.9",
"editable": false,
"location": "/home/horstl/projects/persistent_cache_debug/.venv/lib/python3.12/site-packages/marimo",
"OS": "Linux",
"OS Version": "5.15.153.1-microsoft-standard-WSL2",
"Processor": "x86_64",
"Python Version": "3.12.3",
"Locale": "--",
"Binaries": {
"Browser": "--",
"Node": "v18.19.1",
"uv": "--"
},
"Dependencies": {
"click": "8.4.1",
"docutils": "0.23",
"itsdangerous": "2.2.0",
"jedi": "0.19.2",
"markdown": "3.10.2",
"narwhals": "2.22.1",
"packaging": "26.2",
"psutil": "7.2.2",
"pygments": "2.20.0",
"pymdown-extensions": "10.21.3",
"pyyaml": "6.0.3",
"starlette": "1.2.1",
"tomlkit": "0.15.0",
"typing-extensions": "4.15.0",
"uvicorn": "0.49.0",
"websockets": "16.0"
},
"Optional Dependencies": {
"loro": "1.10.3"
},
"Experimental Flags": {}
}
Code to reproduce
import marimo
__generated_with = "0.23.9"
app = marimo.App(width="medium")
with app.setup:
import marimo as mo
import datetime
@app.function
@mo.persistent_cache
def myfunc(p=False):
if p:
print({"A", "B", "C", "D", "E"})
return datetime.datetime.now()
@app.cell
def _():
print(myfunc())
return
@app.cell
def _():
return
if __name__ == "__main__":
app.run()
Describe the bug
@mo.persistent_cachegenerates inconsistent cache keys across separate notebook sessions when the decorated function's body contains asetliteral (e.g.,{"A", "B", "C", "D", "E"}). This causes cache misses on every run instead of reusing the existing cache, defeating the purpose of persistent caching.Crucially, the set does not need to be executed, its mere presence in the function body is sufficient to trigger the bug. This suggests the issue lies in how the function body is hashed (e.g., via pickling or bytecode inspection), which is affected by Python's non-deterministic
setordering across processes (PYTHONHASHSEED).To reproduce
Save the "Code to reproduce" as
notebook.py, then runExpected behavior
The cache is created on the first run. All subsequent runs reuse the cache and return the same
datetimevalue.Actual behavior
New caches are frequently created across runs, producing different
datetimevalues:Setting
p=Truereveals the cause, the set is printed with a different ordering on each run:Workaround
Removing the set literal (or replacing it with a sorted structure like a
listortuple) restores deterministic caching behavior.Additional context
The non-determinism of
setiteration across Python processes (due toPYTHONHASHSEED) is a known Python behavior. The fix likely involves normalizing or sorting set contents before hashing, or using a hash seed-independent serialization strategy (e.g., AST-based hashing rather than bytecode/pickle-based hashing).Related issues: #3259 (caching robustness and non-determinism), #5542 (non-deterministic pickling of Pydantic models in cache keys).
Will you submit a PR?
Environment
Details
Code to reproduce