Skip to content
Discussion options

You must be logged in to vote

Does that mean the memory consumption [from computing all buckets] will be reduced in Splink v4 by using a threshold weight?

Yes. At the very least the size of the output dataset should be far smaller, especially when using a very loose blocking rule like blocking on dob. It still has to the same number of calculations, it just only materialises a small % of the results.

Are rules operated on sequentially? Could I explode the rule set to do something that has a similar memory reduction effect?

No - they're computed in parallel. If you want the behaviour you're implying, you'd want to have an outer loop over MonthOfBirth, something like (pseudocode):

for each m in MonthOfBirth:
   df_f…

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@RobinL
Comment options

RobinL Apr 6, 2026
Maintainer

Answer selected by gringer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants