Benchmarking and Performance Tuning

Relevant source files

This page documents qsv's benchmarking infrastructure, results tracking, and performance tuning guidelines. It explains how the project measures performance, interprets results, and optimizes the toolkit for diverse data-wrangling workloads using standardized datasets and automated tooling.

For implementation details of specific performance features, see:

Indexing System — 4.1
Stats Caching Architecture — 4.2
Memory Management — 4.3
Parallel Processing with Rayon — 4.4
Compiler Optimizations — 4.5

Benchmark Infrastructure

qsv uses a standardized benchmarking suite centered around scripts/benchmarks.sh to track performance across versions and hardware platforms. The infrastructure relies on hyperfine for statistical accuracy, utilizing warmup runs and multiple iterations to minimize noise scripts/benchmarks.sh33-40 scripts/benchmarks.sh72-76

Benchmark Architecture and Data Flow

The benchmarking process involves downloading a standardized 1M row NYC 311 dataset (approx. 520MB), preparing indices and stats caches, and executing a battery of commands scripts/benchmarks.sh17-23 scripts/benchmarks.sh66-70

Benchmark Data Orchestration

Sources: scripts/benchmarks.sh17-40 scripts/benchmarks.sh54-63 scripts/results/latest_run_info.tsv1-2

Benchmark Metrics

The system captures high-fidelity metrics for every command variation. These are stored in a structured format to allow for regression analysis scripts/results/benchmark_results.csv1-2

Metric	Code Entity / Column	Description
Throughput	`recs_per_sec`	Calculated as `1,000,000 / mean`. Key for normalization scripts/results/benchmark_results_display.csv1-2
Central Tendency	`mean`, `median`	Average and middle execution times in seconds scripts/results/benchmark_results.csv1-2
Variance	`stddev`	Standard deviation to identify measurement instability scripts/results/benchmark_results.csv1-2
Resource Usage	`user`, `system`	CPU time spent in user space vs. kernel space scripts/results/benchmark_results.csv1-2
Bounds	`min`, `max`	The fastest and slowest runs recorded scripts/results/benchmark_results.csv1-2

The Benchmarking Tool (`scripts/benchmarks.sh`)

The scripts/benchmarks.sh script is the primary entry point for performance measurement. It is designed to be portable across Unix-like systems and supports targeted benchmarking scripts/benchmarks.sh1-14

Core Commands and Functions

The script provides several operational modes through arguments scripts/benchmarks.sh42-126:

Pattern Matching: ./benchmarks.sh sort runs only benchmarks containing the string "sort" scripts/benchmarks.sh4-5
Setup: ./benchmarks.sh setup installs dependencies like hyperfine and 7-Zip scripts/benchmarks.sh12 scripts/benchmarks.sh33-35
Reset: Re-downloads and prepares the 520MB sample dataset while preserving historical logs scripts/benchmarks.sh8-10
Clean: Deletes temporary files and indices scripts/benchmarks.sh11

Dogfooding and Environment Validation

The script "dogfoods" qsv by using a stable qsv_benchmarker_bin to process the results of the qsv_bin under test scripts/benchmarks.sh54-65 It validates that the benchmarker has necessary features enabled:

apply: For formatting results scripts/benchmarks.sh130-135
luau: For aggregating multi-run data scripts/benchmarks.sh137-142
to: For Excel spreadsheet generation scripts/benchmarks.sh144-152

Sources: scripts/benchmarks.sh42-118 scripts/benchmarks.sh130-152

Performance Tuning Guidelines

Hardware and Compilation Optimizations

To achieve the results seen in official benchmarks, qsv should be compiled with CPU-specific optimizations.

Target CPU: Using target-cpu=native allows the compiler to use modern instructions (AVX2, SIMD) scripts/benchmarks.sh27-28
Feature Flags: Performance-critical features like polars, apply, geocode, luau, and to should be enabled for full capability scripts/benchmarks.sh26-28
Memory Allocator: Modern versions of qsv utilize high-performance allocators like jemalloc or mimalloc to reduce contention scripts/results/run_info_history.tsv2-7

Indexing and Caching Strategy

The most significant performance gains in qsv come from leveraging its indexing and metadata caching subsystems. Indexing allows for nearly instantaneous record counts and faster random access, while the stats cache allows analytical commands to skip re-calculating univariate statistics.

Performance Gain Visualization

Sources: scripts/results/benchmark_results_display.csv16-18 scripts/results/benchmark_results_display.csv65-68

Throughput Benchmarks (NYC 311 Dataset)

Based on the latest recorded metrics (Version 21.0.0 on Apple M2 Pro), the following throughputs are typical for core operations scripts/results/latest_results.csv2-50:

Command Category	Specific Benchmark	Records/Sec
Instantaneous	`count_index`	~90,909,091 scripts/results/latest_results.csv18
High Speed	`behead`	~766,871 scripts/results/latest_results.csv10
Transformation	`apply_op_similarity`	~530,223 scripts/results/latest_results.csv7
Analysis	`frequency_index`	~1,742,160 scripts/results/latest_results.csv68
Complex	`excel` (conversion)	~108,413 scripts/results/latest_results.csv37

Result Tracking and History

qsv maintains a detailed history of every benchmark run in scripts/results/run_info_history.tsv. This allows developers to correlate performance changes with specific commits or environment changes scripts/results/run_info_history.tsv1-2

Run Info Schema

Each entry in the history log contains:

Environment: platform, cores, mem (total system memory) scripts/results/run_info_history.tsv1
Binary Info: version, binary variant, and version_info (features and compiler version) scripts/results/run_info_history.tsv1-2
Execution Meta: elapsed_secs (total suite time), warmup_runs, and benchmark_runs scripts/results/run_info_history.tsv1
Environment State: qsv_env (captures any performance-impacting environment variables set during the run) scripts/results/run_info_history.tsv1-2

Sources: scripts/results/run_info_history.tsv1-21 scripts/results/latest_run_info.tsv1-2

Performance Profiling Workflow

For developers looking to tune specific commands:

Baseline: Run the command through hyperfine using the benchmark dataset scripts/benchmarks.sh72-76
Profiling: Use a profiling tool (like samply or perf) to generate flamegraphs and identify hotspots.
Optimization: Focus on reducing allocations (using mimalloc or jemalloc) or increasing parallelism via Rayon.
Verification: Re-run ./benchmarks.sh <command_name> and compare the mean and recs_per_sec against the historical values in scripts/results/benchmark_results.csv scripts/results/benchmark_results.csv1-10

Sources: scripts/benchmarks.sh1-40 scripts/results/benchmark_results.csv1-10