Skip to content

@mo.persistent_cache produces non-deterministic hash keys across sessions when function body contains set literals #9829

@Lucas-vdr-Horst

Description

@Lucas-vdr-Horst

Describe the bug

@mo.persistent_cache generates inconsistent cache keys across separate notebook sessions when the decorated function's body contains a set literal (e.g., {"A", "B", "C", "D", "E"}). This causes cache misses on every run instead of reusing the existing cache, defeating the purpose of persistent caching.

Crucially, the set does not need to be executed, its mere presence in the function body is sufficient to trigger the bug. This suggests the issue lies in how the function body is hashed (e.g., via pickling or bytecode inspection), which is affected by Python's non-deterministic set ordering across processes (PYTHONHASHSEED).

To reproduce
Save the "Code to reproduce" as notebook.py, then run

rm -rf __marimo__/cache/myfunc/ && for i in $(seq 1 10); do python notebook.py; done

Expected behavior
The cache is created on the first run. All subsequent runs reuse the cache and return the same datetime value.
Actual behavior
New caches are frequently created across runs, producing different datetime values:

2026-06-09 09:17:47.313718
2026-06-09 09:17:47.781402
2026-06-09 09:17:48.182414
...

Setting p=True reveals the cause, the set is printed with a different ordering on each run:

{'A', 'C', 'D', 'E', 'B'}
{'E', 'B', 'D', 'C', 'A'}
{'D', 'A', 'C', 'B', 'E'}
...

Workaround
Removing the set literal (or replacing it with a sorted structure like a list or tuple) restores deterministic caching behavior.
Additional context
The non-determinism of set iteration across Python processes (due to PYTHONHASHSEED) is a known Python behavior. The fix likely involves normalizing or sorting set contents before hashing, or using a hash seed-independent serialization strategy (e.g., AST-based hashing rather than bytecode/pickle-based hashing).

Related issues: #3259 (caching robustness and non-determinism), #5542 (non-deterministic pickling of Pydantic models in cache keys).

Will you submit a PR?

  • Yes

Environment

Details
{
  "marimo": "0.23.9",
  "editable": false,
  "location": "/home/horstl/projects/persistent_cache_debug/.venv/lib/python3.12/site-packages/marimo",
  "OS": "Linux",
  "OS Version": "5.15.153.1-microsoft-standard-WSL2",
  "Processor": "x86_64",
  "Python Version": "3.12.3",
  "Locale": "--",
  "Binaries": {
    "Browser": "--",
    "Node": "v18.19.1",
    "uv": "--"
  },
  "Dependencies": {
    "click": "8.4.1",
    "docutils": "0.23",
    "itsdangerous": "2.2.0",
    "jedi": "0.19.2",
    "markdown": "3.10.2",
    "narwhals": "2.22.1",
    "packaging": "26.2",
    "psutil": "7.2.2",
    "pygments": "2.20.0",
    "pymdown-extensions": "10.21.3",
    "pyyaml": "6.0.3",
    "starlette": "1.2.1",
    "tomlkit": "0.15.0",
    "typing-extensions": "4.15.0",
    "uvicorn": "0.49.0",
    "websockets": "16.0"
  },
  "Optional Dependencies": {
    "loro": "1.10.3"
  },
  "Experimental Flags": {}
}

Code to reproduce

import marimo

__generated_with = "0.23.9"
app = marimo.App(width="medium")

with app.setup:
    import marimo as mo
    import datetime


@app.function
@mo.persistent_cache
def myfunc(p=False):
    if p:
        print({"A", "B", "C", "D", "E"})

    return datetime.datetime.now()


@app.cell
def _():
    print(myfunc())
    return


@app.cell
def _():
    return


if __name__ == "__main__":
    app.run()

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions