Is there any rule of thumb difference between settings blocking rule and "training" rules? #1543

jkginfinite · 2023-08-18T00:18:18Z

jkginfinite
Aug 18, 2023

I am wondering if there is any logical choice in difference between the rules we use for the settings "blocking rules to generate predicitons"

settings = {
"link_type": "dedupe_only",
"comparisons": [
ctl.name_comparison("first_name"),
ctl.name_comparison("surname"),
ctl.date_comparison("dob", cast_strings_to_date=True),
cl.exact_match("city", term_frequency_adjustments=True),
ctl.email_comparison("email", include_username_fuzzy_level=False),
],
"blocking_rules_to_generate_predictions": [
block_on("first_name"),
block_on("surname"),
],
"retain_matching_columns": True,
"retain_intermediate_calculation_columns": True,
}

linker = DuckDBLinker(df, settings)

And these rules;

training_blocking_rule = block_on(["first_name", "surname"])
training_session_fname_sname = linker.estimate_parameters_using_expectation_maximisation(training_blocking_rule)

gringer · 2026-03-04T00:06:18Z

gringer
Mar 4, 2026

For training rules, I've been trying to optimise to reduce false-positives when there's a link:

Unlike blocking rules for prediction, it does not matter if Training Rules excludes some true matches - it just needs to generate examples of matches and non-matches.

https://moj-analytical-services.github.io/splink/topic_guides/blocking/model_training.html

For the blocking rules for prediction, I try to optimise to reduce false-negatives, which means trying to think of all the edge cases I can think of that correspond to an "almost-there" link, and creating rule sets that are as small as possible that capture those edge cases:

The aim of our blocking rules are to:

Capture as many true matches as possible

Reduce the total number of comparisons being generated

There is a tension between these aims, because by choosing loose blocking rules which generate more comparisons, you have a greater chance of capturing all true matches.

https://moj-analytical-services.github.io/splink/topic_guides/blocking/blocking_rules.html

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there any rule of thumb difference between settings blocking rule and "training" rules? #1543

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is there any rule of thumb difference between settings blocking rule and "training" rules? #1543

Uh oh!

jkginfinite Aug 18, 2023

Replies: 1 comment

Uh oh!

gringer Mar 4, 2026

jkginfinite
Aug 18, 2023

gringer
Mar 4, 2026