Skip to content

Cache always updates clusters even if not needed anymore #38

Description

@shiosai

During fast_match, drain always iterates over all possible clusters and updates their access time in the cache. This leads to two problems:

  • The update slows down the performance
  • Even clusters that will never match anymore will never be removed from cache

Expected behavior:

Cluster will only be updated/touched in cache after they were actual used/chosen. There is actually a comment for this in the source code already:

Try to retrieve cluster from cache with bypassing eviction algorithm as we are only testing candidates for a match.
https://github.com/IBM/Drain3/blob/15470e391caed9a9ef5038cdd1dbd373bd2386a8/drain3/drain.py#L217

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingperformanceA performance issue

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions