Skip to content

moj-analytical-services/splink_speed_testing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Splink Speed Testing

A benchmarking suite for testing the performance of Splink's comparison functions across different database backends.

Overview

This repository contains benchmarking tests to measure and compare the execution speed of Splink's comparison functions (like exact matching, Jaro similarity, and Jaro-Winkler similarity) across different database backends including DuckDB and Apache Spark.

Setup

  1. Install dependencies using Poetry:
poetry install

Running Tests

Execute the benchmarks using pytest-benchmark:

poetry run pytest benchmarks/

Features

  • Benchmarks common string comparison functions used in record linkage
  • Tests against multiple database backends (DuckDB, Spark)
  • Generates test datasets of configurable sizes
  • Uses pytest-benchmark for reliable performance measurements

Requirements

  • Python 3.11 (Spark 3.5 doesn't official support Python 3.12)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages