Skip to content
Discussion options

You must be logged in to vote

Are you after https://moj-analytical-services.github.io/splink/api_docs/settings_dict_guide.html#additional_columns_to_retain? This lets you denote which columns you want to retain in outputs.

Alternatively, depending on what you want to do with the outputs, you could select a subset of columns that are the bare minimum required to describe the table and then reconstruct it later on once you UNION / UNION ALL them together.

For df_predict and df_cluster I save a concise/compressed version which only contains the following fields:

  • df_predict:

    • match_weight
    • match_probability
    • match_key
    • unique_id_l
    • unique_id_r
    • source_dataset_l
    • source_dataset_r
  • df_cluster:

    • cluster_id
    • unique_id
    • source_dat…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@gringer
Comment options

Answer selected by gringer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants