How Much Faster is DuckDB 1.5 vs 1.0? A lot

I benchmarked every DuckDB minor release from 1.0.0 through the 1.5.0 dev build on TPC-H, TPC-DS, ClickBench, and SSB. The results tell a clear story of DuckDB quickly improving.

Feb 27, 2026

TL;DR: Is upgrading from DuckDB v1.0.0 worth it? Yes. v1.5.0-dev is 1.67× faster on TPC-H, 1.84× faster on ClickBench, and 1.45× faster on SSB, with a 1.73× higher TPC-DS Power@Size score.

Introduction

\(\begin{array}{|l|l|l|l|l|l|} \hline \textbf{Benchmark} & \textbf{Metric} & \textbf{v1.0.0} & \textbf{v1.5.0-dev} & \textbf{Improvement} & \textbf{Highlight} \\ \hline \text{TPC-H} & \text{Total runtime} & \text{15,228ms} & \text{9,142ms} & \text{1.67× faster} & \text{Q7: 4.3× single-query improvement} \\ \hline \text{TPC-DS} & \text{Power@Size} & \text{385,807} & \text{669,328} & \text{1.73× higher} & \text{v1.2.2 regression hump, full recovery} \\ \hline \text{ClickBench} & \text{Total runtime} & \text{729ms} & \text{396ms} & \text{1.84× faster} & \text{LIKE queries +73\%} \\ \hline \text{SSB} & \text{Total runtime} & \text{1,356ms} & \text{938ms} & \text{1.45× faster} & \text{v1.3/v1.4 regressed, v1.5 full recovery} \\ \hline \end{array}\)

This post evaluates the last six DuckDB versions (1.0.0 through 1.5.0-dev) on the TPC-H, TPC-DS, ClickBench, and Star Schema (SSB) benchmarks.

DuckDB is now common infrastructure for SQL analytics work. It runs in-process with no server, handles billion-row workloads on a laptop, and embeds into Python, R, and dozens of other runtimes. Since v1.0.0 shipped in June 2024, it has become a standard tool for data engineering, data science, and ad hoc analytics.

DuckDB is typically embedded in applications, notebooks, and scripts with the version locked to ensure consistent behavior. So it's worth highlighting version-level performance improvements that locked version embeds may be missing out on.

MotherDuck estimated a cumulative 2× improvement since v1.0.01. Because DuckDB is open source, I can easily compare the six major versions since 1.0, at the query level, and trace performance shifts to specific PRs and issues: not just "it's faster," but which execution changes produced which gains.

Query time distributions across all four benchmarks and six versions

Versions Tested

\(\begin{array}{|l|l|l|l|} \hline \textbf{Version} & \textbf{Codename} & \textbf{Release Date} & \textbf{Key Performance Features} \\ \hline \textbf{1.0.0} & \text{-} & \text{June 2024} & \text{First stable release (baseline)} \\ \hline \textbf{1.1.3} & \text{Eatoni} & \text{October 2024} & \text{Filter pushdown, join optimizations} \\ \hline \textbf{1.2.2} & \text{Histrionicus} & \text{February 2025} & \text{CSV parser rewrite (+15\%)} \\ \hline \textbf{1.3.2} & \text{Ossivalis} & \text{June 2025} & \text{Parquet reader/writer rewrite} \\ \hline \textbf{1.4.4} & \text{Andium (1.4.x LTS line)} & \text{January 2026} & \text{Sorting rewrite (2x+), 1.4.x patch line} \\ \hline \textbf{1.5.0-dev} & \text{Variegata} & \text{Pre-GA build tested} & \text{Pre-release build (1.5.0.dev311)} \\ \hline \end{array}\)

I tested the last patch release of each minor version to capture cumulative improvements. For v1.5, I used a pre-release dev build (1.5.0.dev311). DuckDB's release calendar lists 1.5.0 as upcoming on March 2, 2026, and GitHub Releases still shows v1.4.4 as latest GA2.

DuckDB's Performance Evolution

Versions 1.1 through 1.3: Execution Engine, I/O, and Parquet

The first three post-1.0 releases moved from core execution to storage. v1.13 shipped optimizer work (filter and join improvements) and produced the biggest single-version TPC-H jump in this matrix (+39.4% Power@Size). v1.24 focused on I/O (CSV parser rewrite, Parquet bloom filters) and delivered the largest single-version ClickBench drop (17.7% over v1.1). v1.35 rewrote Parquet reader/writer paths (deferred column fetching and stronger pushdown), changes that matter most in Parquet-heavy workloads rather than these in-memory runs.

Version 1.4: Sorting Rewrite

The v1.4^5] sorting rewrite (PR-17584) replaced DuckDB's sort implementation with a K-way merge sort, delivering 1.7-2.7× improvement on random data and up to 10× on pre-sorted data in isolated ORDER BY benchmarks[^8]. TPC-H shows modest gains (+1.4% on my sorting proxy queries) because its ORDER BY operations are one component of multi-join queries, not isolated sorts. DuckDB 1.4 also changed CTE behavior to materialize by default ([PR-17459), with the release notes reporting performance and correctness improvements for repeated CTE references6. In this matrix, TPC-DS Power@Size increased from 614,882 (v1.3.2) to 630,854 (v1.4.4).

Version 1.5.0-dev: Continued Acceleration

After major rewrites in v1.3 and v1.4, v1.5.0-dev shows another round of gains: +4.7% TPC-H Power@Size, +6.1% TPC-DS Power@Size, 0.5% faster ClickBench, and SSB recovering fully from the v1.3/v1.4 regression to become the fastest version (938ms). Because GA release notes aren't final, I focus on observed deltas rather than attributed features. Concurrent upstream work mapped from v1.5-variegata^7] includes join memory improvements ([PR-21022), window optimizer extensions (PR-21021), and plan correctness tightening (PR-21014). These are the most active performance-relevant threads on the branch at test time. The per-query delta table in the TPC-H results section shows where the gains landed.

Results: TPC-H (SF=10)

Overall Version Progression

TPC-H Power@Size, the TPC standard metric for single-stream query performance: 3600 × Scale_Factor / geometric_mean(per-query times). Higher is better. The geometric mean weights all queries equally, so a 2× improvement on any single query contributes the same regardless of absolute runtime.

\(\begin{array}{|l|l|l|l|} \hline \textbf{Version} & \textbf{Power@Size} & \textbf{vs. v1.0.0} & \textbf{vs. Previous} \\ \hline \text{v1.0.0} & \text{194,231} & \text{-} & \text{-} \\ \hline \text{v1.1.3} & \text{270,754} & \text{+39.4\% higher} & \text{+39.4\% higher} \\ \hline \text{v1.2.2} & \text{274,287} & \text{+41.2\% higher} & \text{+1.3\% higher} \\ \hline \text{v1.3.2} & \text{277,920} & \text{+43.1\% higher} & \text{+1.3\% higher} \\ \hline \text{v1.4.4} & \text{282,440} & \text{+45.4\% higher} & \text{+1.6\% higher} \\ \hline \text{v1.5.0-dev} & \text{295,792} & \text{+52.3\% higher} & \text{+4.7\% higher} \\ \hline \end{array}\)

Total Runtime (all 22 queries):

\(\begin{array}{|l|l|l|} \hline \textbf{Version} & \textbf{Total (ms)} & \textbf{vs. v1.0.0} \\ \hline \text{v1.0.0} & \text{15,228} & \text{-} \\ \hline \text{v1.1.3} & \text{10,273} & \text{32.5\% faster} \\ \hline \text{v1.2.2} & \text{10,085} & \text{33.8\% faster} \\ \hline \text{v1.3.2} & \text{9,906} & \text{34.9\% faster} \\ \hline \text{v1.4.4} & \text{9,683} & \text{36.4\% faster} \\ \hline \text{v1.5.0-dev} & \text{9,142} & \text{40.0\% faster} \\ \hline \end{array}\)

Per-Query Analysis

Biggest winners (largest improvement v1.0.0 to v1.5.0-dev):

\(\begin{array}{|l|l|l|l|l|} \hline \textbf{Query} & \textbf{v1.0.0 (ms)} & \textbf{v1.5.0-dev (ms)} & \textbf{Speedup} & \textbf{Notes} \\ \hline \text{Q7} & \text{429} & \text{99} & \text{4.3×} & \text{Join + aggregation heavy} \\ \hline \text{Q18} & \text{828} & \text{247} & \text{3.4×} & \text{GROUP BY with large aggregation} \\ \hline \text{Q17} & \text{192} & \text{96} & \text{2.0×} & \text{Subquery-heavy} \\ \hline \text{Q15} & \text{159} & \text{87} & \text{1.8×} & \text{View + aggregation} \\ \hline \text{Q5} & \text{184} & \text{102} & \text{1.8×} & \text{Multi-table join + aggregation} \\ \hline \end{array}\)

Q7's 4.3× improvement is the headline number from this analysis, and it surprised me. A 4.3× speedup from iterative algorithmic improvements alone (no schema changes, no index tricks, same hardware) is unusually large for a query that was already completing successfully.

Regressions

I found no query slower in v1.5.0-dev than in v1.0.0 across the full 22-query TPC-H suite. Adjacent-version regressions do appear (see Analysis section), but the cumulative direction is consistently positive.

Query Category Breakdown

\(\begin{array}{|l|l|l|l|l|} \hline \textbf{Category} & \textbf{Queries} & \textbf{v1.0.0 Mean (ms)} & \textbf{v1.5.0-dev Mean (ms)} & \textbf{Improvement} \\ \hline \text{Full scan} & \text{Q1, Q6} & \text{169.5} & \text{149.0} & \text{+12.1\%} \\ \hline \text{Join-heavy} & \text{Q9, Q21} & \text{384.5} & \text{284.0} & \text{+26.1\%} \\ \hline \text{Aggregation} & \text{Q5, Q18} & \text{506.0} & \text{174.5} & \text{+65.5\%} \\ \hline \text{Sorting} & \text{Q3, Q4, Q10, Q16} & \text{175.5} & \text{116.5} & \text{+33.6\%} \\ \hline \text{Subquery} & \text{Q17, Q20} & \text{158.5} & \text{103.5} & \text{+34.7\%} \\ \hline \end{array}\)

Key finding: Aggregation queries improved the most (65.5%), driven primarily by Q18's dramatic improvement (3.4×). Full-scan queries improved the least (+12.1%): if your workload is dominated by Q1-style full table scans, the cumulative v1.0→v1.5 improvement is real but not a compelling reason to rush an upgrade.

The v1.4 sorting rewrite (PR-17584) measured 1.7-2.7× on random data in isolated benchmarks7, but its TPC-H impact is modest. Using Q3, Q4, Q10, and Q16 as a sorting proxy (the four queries most dominated by ORDER BY):

\(\begin{array}{|l|l|l|} \hline \textbf{Version} & \textbf{Proxy Mean (ms)} & \textbf{vs. v1.3.2} \\ \hline \text{v1.3.2} & \text{126.50} & \text{- (baseline)} \\ \hline \text{v1.4.4} & \text{124.75} & \text{1.4\% faster} \\ \hline \text{v1.5.0-dev} & \text{116.50} & \text{7.9\% faster} \\ \hline \end{array}\)

That's expected: TPC-H sorting queries are multi-join queries where ORDER BY is one component, not isolated sorts where the rewrite's full gains apply.

v1.4.4 to v1.5.0-dev: Per-Query Deltas

Query-level movement from v1.4.4 to v1.5.0-dev is mixed but net-positive. The most-moved queries (both directions):

\(\begin{array}{|l|l|l|l|} \hline \textbf{Query} & \textbf{v1.4.4 (ms)} & \textbf{v1.5.0-dev (ms)} & \textbf{Change} \\ \hline \text{Q8} & \text{141} & \text{95} & \text{32.6\% faster} \\ \hline \text{Q7} & \text{119} & \text{99} & \text{16.8\% faster} \\ \hline \text{Q17} & \text{112} & \text{96} & \text{14.3\% faster} \\ \hline \text{Q5} & \text{117} & \text{102} & \text{12.8\% faster} \\ \hline \text{Q9} & \text{343} & \text{300} & \text{12.5\% faster} \\ \hline \text{Q14} & \text{94} & \text{100} & \text{6.4\% slower} \\ \hline \text{Q18} & \text{234} & \text{247} & \text{5.6\% slower} \\ \hline \text{Q6} & \text{59} & \text{62} & \text{5.1\% slower} \\ \hline \end{array}\)

The full six-version heatmap shows the cumulative per-query trajectory. Darker cells are faster; look for the v1.1 row (the biggest single jump) and the Q7/Q18 columns (the steepest per-query improvement over the full range):

Results: TPC-DS (SF=10)

TPC-DS has 99 queries (run as 103 individual variants), testing window functions, CTEs, correlated subqueries, and other advanced SQL features that TPC-H doesn't cover.

Overall Version Progression

TPC-DS Power Score (Power@Size):

\(\begin{array}{|l|l|l|l|} \hline \textbf{Version} & \textbf{Power@Size} & \textbf{Query Records Passed} & \textbf{vs. v1.0.0} \\ \hline \text{v1.0.0} & \text{385,807} & \text{308/309} & \text{-} \\ \hline \text{v1.1.3} & \text{594,666} & \text{309/309} & \text{+54.1\% higher} \\ \hline \text{v1.2.2} & \text{500,512} & \text{309/309} & \text{+29.7\% higher} \\ \hline \text{v1.3.2} & \text{614,882} & \text{309/309} & \text{+59.4\% higher} \\ \hline \text{v1.4.4} & \text{630,854} & \text{309/309} & \text{+63.5\% higher} \\ \hline \text{v1.5.0-dev} & \text{669,328} & \text{309/309} & \text{+73.5\% higher} \\ \hline \end{array}\)

v1.2.2 Dip: Power@Size dropped 16% from v1.1.3 before recovering in v1.3.2. The largest regression was Query 22, a GROUP BY ROLLUP over inventory data, which went from 881ms to 10,032ms, an 11× slowdown. Queries 67, 23A, 14A, 49, and 27 also regressed (43-136%). By v1.3.2, overall TPC-DS Power@Size exceeded v1.1.3 levels. But Q22 itself never came back. My plan/profiling evidence supports two compounding causes: details in the Q22 deep-dive below.

In my matrix summary artifacts, v1.0.0 shows 308/309 query records passed while v1.1.3+ shows 309/309; all versions report zero timeouts. Across versions, the dominant change is execution speed, not broad query-correctness drift.

The v1.2 Regression: What Happened to Query 22?

TPC-DS Query 22 runs a four-column GROUP BY ROLLUP over inventory, one of the largest tables in the schema. ROLLUP expands to five grouping sets: the full combination plus four progressively coarser subtotals. It's one of the most aggregation-intensive queries in TPC-DS, and it's where the v1.2 regression hit hardest.

\(\begin{array}{|l|l|l|l|} \hline \textbf{Version} & \textbf{Q22 median (ms)} & \textbf{vs. v1.1.3} & \textbf{Root cause state} \\ \hline \text{v1.1.3} & \text{881} & \text{baseline} & \text{Good hash table sizing, column pruning working} \\ \hline \text{v1.2.2} & \text{10,032} & \text{11.4× slower} & \text{Hash aggregation reworked for high-cardinality single-group workloads} \\ \hline \text{v1.3.2} & \text{1,319} & \text{50\% slower} & \text{HLL-based hash table sizing fixes aggregation; column pruning now disabled under ROLLUP} \\ \hline \text{v1.4.4} & \text{1,414} & \text{60\% slower} & \text{Same two-factor state} \\ \hline \text{v1.5.0-dev} & \text{1,593} & \text{81\% slower} & \text{Column pruning fix merged ([PR-20781](https://github.com/duckdb/duckdb/pull/20781)) but may not be in this build} \\ \hline \end{array}\)

The strongest hypothesis from plan/profiling evidence is two independent changes that compound against each other.

v1.2: hash aggregation rework
Between v1.1.3 and v1.2.0, DuckDB made hash aggregation performance improvements (PR-15251, PR-15321) targeting high-cardinality single-group-set workloads. These changes added row-width-aware partitioning thresholds and a "skip lookups if mostly unique" heuristic, both good for single-GROUP-BY queries, both bad for ROLLUP. ROLLUP produces NULL-padded rows across multiple grouping sets, which inflates the apparent uniqueness rate and triggers wider partitioning for the wider tuples. On Q22, this created a perfect storm: high base cardinality × five grouping sets × heuristics tuned for a different data pattern.
v1.3: one step forward, one step back
DuckDB v1.3.0 added HyperLogLog-based adaptive hash table sizing (PR-17236), which improves hash table cardinality estimates. DuckDB's own benchmarks showed TPC-DS Q67, another ROLLUP query, running ~2× faster with this optimization. But the same release also added a correctness fix (PR-17259) that disabled all column pruning below ROLLUP and CUBE operators. Instead of scanning the two inventory columns Q22 actually needs it scanned all of them8.

These two forces (fixed hash tables, broken column pruning) net out to ~50% slower than the v1.1.3 baseline. An improved column pruning fix landed in PR-20781, which re-enabled column pruning while keeping a targeted guard only in RemoveDuplicateGroups. That fix ships with v1.5.0 GA. When it does, Q22 should return toward its v1.1.3 speed, or better. To my knowledge, this is the only public, per-query, multi-version TPC-DS benchmark of DuckDB; if you've seen another, I'd like to know about it.

Results: ClickBench

ClickBench tests scan-heavy web analytics patterns on a single 100M-row table.

Overall Version Progression

ClickBench Total Runtime (ms):

\(\begin{array}{|l|l|l|} \hline \textbf{Version} & \textbf{Total (ms)} & \textbf{vs. v1.0.0} \\ \hline \text{v1.0.0} & \text{729} & \text{-} \\ \hline \text{v1.1.3} & \text{651} & \text{10.7\% faster} \\ \hline \text{v1.2.2} & \text{536} & \text{26.5\% faster} \\ \hline \text{v1.3.2} & \text{555} & \text{23.9\% faster} \\ \hline \text{v1.4.4} & \text{398} & \text{45.4\% faster} \\ \hline \text{v1.5.0-dev} & \text{396} & \text{45.7\% faster} \\ \hline \end{array}\)

Note that v1.3.2 shows a slight regression vs. v1.2.2 in total ClickBench runtime (555ms vs 536ms). Given observed ClickBench variance in this matrix, I treat this as directional rather than a strong signal.

Query Pattern Analysis

ClickBench queries are categorized by pattern:

\(\begin{array}{|l|l|l|l|} \hline \textbf{Pattern} & \textbf{v1.0.0 Mean (ms)} & \textbf{v1.5.0-dev Mean (ms)} & \textbf{Improvement} \\ \hline \text{COUNT(*)} & \text{0.67} & \text{∼0} & \text{+100\%} \\ \hline \text{GROUP BY (low cardinality)} & \text{4.63} & \text{3.25} & \text{+29.7\%} \\ \hline \text{GROUP BY (high cardinality)} & \text{7.68} & \text{4.42} & \text{+42.5\%} \\ \hline \text{String matching (LIKE)} & \text{5.86} & \text{1.57} & \text{+73.2\%} \\ \hline \text{ORDER BY with LIMIT} & \text{6.90} & \text{3.77} & \text{+45.4\%} \\ \hline \end{array}\)

Key finding: String matching (LIKE) is the biggest winner at 73.2%, and high-cardinality GROUP BY improved more than low-cardinality (42.5% vs 29.7%). String hash caching (PR-18580) is the most direct match for the LIKE gains: string processing changes have an obvious path to LIKE query performance. Dictionary-aware insertion (PR-15152) and fewer aggregation allocations (PR-16849) align with the high-cardinality GROUP BY gains concentrated in v1.2-v1.3. Higher-load-factor probing (PR-17718) fits the continued ORDER BY improvement through v1.4. I haven't run micro-benchmarks to isolate each contribution, but the per-pattern distribution matches the change history well.

Results: SSB (SF=10)

The Star Schema Benchmark tests classic dimensional model queries.

Overall Version Progression

SSB Total Runtime (ms):

\(\begin{array}{|l|l|l|} \hline \textbf{Version} & \textbf{Total (ms)} & \textbf{vs. v1.0.0} \\ \hline \text{v1.0.0} & \text{1,356} & \text{-} \\ \hline \text{v1.1.3} & \text{1,005} & \text{25.9\% faster} \\ \hline \text{v1.2.2} & \text{1,006} & \text{25.8\% faster} \\ \hline \text{v1.3.2} & \text{1,054} & \text{22.3\% faster} \\ \hline \text{v1.4.4} & \text{1,071} & \text{21.0\% faster} \\ \hline \text{v1.5.0-dev} & \text{938} & \text{30.8\% faster} \\ \hline \end{array}\)

The per-query heatmap makes the uneven progression visible. v1.1/v1.2 show broad improvement (bluer cells), v1.3/v1.4 regress on Flight 3-4 joins (warmer cells), and v1.5 recovers to the best result across most queries:

SSB is the one benchmark where improvement isn't consistent across every version. v1.3.2 and v1.4.4 are both slower than v1.1.3/v1.2.2, but v1.5.0-dev fully recovers and is the fastest version overall (938ms vs 1,005ms for v1.1.3). Per-subquery plan and profiling diffs show that the v1.3/v1.4 slowdown is concentrated in specific Flight 3-4 join queries rather than spread evenly.

One outlier worth noting: Q2.2 jumps from 4ms to 53ms in v1.4.4 (and 44ms in v1.5.0-dev), a large percentage regression but small in absolute terms. Because Q2.2 contributes <50ms to total runtime in every version, the Flight 3-4 joins are where the macro story plays out:

The persistent regression is mostly Flights 3 and 4, especially Q4.1 and Q4.2:

\(\begin{array}{|l|l|l|l|l|l|} \hline \textbf{Query} & \textbf{v1.2.2 (ms)} & \textbf{v1.3.2 (ms)} & \textbf{v1.4.4 (ms)} & \textbf{v1.5.0-dev (ms)} & \textbf{Key movement} \\ \hline \text{Q3.1} & \text{60} & \text{64} & \text{58} & \text{39} & \text{Temporary +6.7\% in v1.3.2, then -32.8\% in v1.5} \\ \hline \text{Q3.2} & \text{60} & \text{64} & \text{56} & \text{37} & \text{Temporary +6.7\% in v1.3.2, then -38.3\% in v1.5} \\ \hline \text{Q4.1} & \text{77} & \text{89} & \text{78} & \text{64} & \text{Persistent regression in v1.3.2 (+15.6\%), partial recovery in v1.4/v1.5} \\ \hline \text{Q4.2} & \text{56} & \text{58} & \text{54} & \text{54} & \text{Modest regression in v1.3.2 (+3.6\%), stabilizes from v1.4 onward} \\ \hline \end{array}\)

I traced the SSB slowdown to a two-stage execution story, similar to the Q22 analysis approach.

Stage 1, v1.2.2 -> v1.3.2/v1.4.4: regression without plan-shape change
For representative regressors (Q3.2, Q4.1), the EXPLAIN plans keep the same join skeleton, join predicates, and lineorder full-scan shape across v1.2.2, v1.3.2, and v1.4.4. That makes a structural join-plan rewrite an unlikely primary cause for this regression pattern, including PR #16443 as the dominant driver here.
Stage 2, v1.4.4 -> v1.5.0-dev: targeted recovery from scan-time filtering
In v1.5.0-dev, EXPLAIN ANALYZE for Q3.2 and Q4.1 shows Dynamic Filters on lineorder scan keys, consistent with the Bloom/SIP work in PR #19502. This aligns with the strong recovery in Q3.1 and Q3.2, and partial recovery in Q4.1.

The scan-time filtering recovery is strong enough that total SSB runtime in v1.5.0-dev (938ms) beats the previous best (v1.1.3 at 1,005ms). Q4.2 remains slightly slower than v1.2.2, but the gains in Flight 1 and Flight 3 queries more than compensate.

Analysis and Insights

Regression Analysis

Suite-level improvements don't tell the whole story. Optimizations for one pattern can regress another, and the per-query data shows where.

Adjacent-version TPC-H regressions (>10% slower):

\(\begin{array}{|l|l|l|l|l|} \hline \textbf{Version Jump} & \textbf{Query} & \textbf{Older (ms)} & \textbf{Newer (ms)} & \textbf{Slower by} \\ \hline \text{v1.3.2 → v1.4.4} & \text{Q12} & \text{98} & \text{115} & \text{+17.3\%} \\ \hline \text{v1.3.2 → v1.4.4} & \text{Q20} & \text{95} & \text{110} & \text{+15.8\%} \\ \hline \text{v1.2.2 → v1.3.2} & \text{Q19} & \text{140} & \text{156} & \text{+11.4\%} \\ \hline \end{array}\)

All three are adjacent-version regressions; none accumulate to v1.5.0-dev, where every TPC-H query is faster or equal to v1.0.0.

How I Ran These Benchmarks

All benchmarks used BenchBox for reproducibility.

Environment details:

Hardware: Mac Mini (M4, 10 cores, 16 GB unified memory)
OS captured in benchmark artifacts: Darwin 25.3.0
Python runtime: 3.10.17
BenchBox CLI version in this environment: 0.1.3
DuckDB config: threads=10, memory_limit='12GB', enable_progress_bar=false, result_cache_enabled=false

Run protocol:

Data generation ran once per benchmark.
Load phase ran once per version and benchmark.
Power phase ran 3 times per version and benchmark; median reported in all tables.
No explicit OS page-cache flush between power runs, so measurements reflect warm filesystem cache behavior with DuckDB result cache disabled.

Aggregation method: BenchBox computes per-run per-query medians and per-run Power@Size. I report the median of three runs for each published metric (per-query time, total runtime, Power@Size). I do not recompute Power@Size from cross-run per-query medians.

Representative commands:

# TPC-H SF=10
uv run benchbox run --platform duckdb \
  --benchmark tpch --scale 10 \
  --phases generate,load,power \
  --output results/duckdb-v150dev-tpch-sf10

Each version was tested using isolated Python environments:

uv pip install duckdb==1.0.0  # Baseline
uv pip install duckdb==1.1.3  # Last v1.1
uv pip install duckdb==1.2.2  # Last v1.2
uv pip install duckdb==1.3.2  # Last v1.3
uv pip install duckdb==1.4.4  # Current LTS
uv pip install duckdb==1.5.0.dev311  # Pre-GA v1.5 dev build

Three runs per benchmark per version, median reported.

Run-to-run spread (max within-version spread across the matrix, non-zero power runtimes): TPC-H 2.7%, TPC-DS 72.7%, ClickBench 54.1%, SSB 13.0%. TPC-DS v1.0.0 is the primary outlier (one run at 734s vs median 441s), and ClickBench v1.1.3 had one anomalous run at 999ms vs median 651ms.

Interpretation threshold used in this post:

<2% runtime deltas are directional unless variance is low for that suite/version.
>5% shifts are treated as stronger signals when also visible in per-query tables.

Reproducibility artifacts:

Conclusions: v1.0.0 to v1.5.0-dev

Cumulative improvement is real: 1.67× TPC-H, 1.73× TPC-DS, 1.84× ClickBench, 1.45× SSB
Major rewrites delivered: v1.1 lifted TPC-H Power@Size by 39%; overall I/O and runtime improvements reduced ClickBench runtime 46%
Regressions are contained: Regressions exist but are modest (11-17%) and don't accumulate. No TPC-H query is slower in v1.5.0-dev than v1.0.0.
Workload differences matter: SSB regressed in v1.3/v1.4, but v1.5.0-dev fully recovers to the best result overall (1.45×)
Open source enables attribution: specific PRs can be mapped to specific benchmark shifts, which is rare in database benchmarking

The bottom line: upgrade. DuckDB has earned your trust and, if you're on older versions, you're likely leaving meaningful performance on the table. In this matrix, v1.5.0-dev improves runtime by 40.0% on TPC-H, 45.7% on ClickBench, and 30.8% on SSB versus v1.0.0, with a 73.5% higher TPC-DS Power@Size score. v1.4.4 is the safe GA choice today. v1.5.0 ships March 2, 2026; run your critical queries against the pre-release build now so you know what to expect before upgrading.

Thanks for reading Oxbow Research! This post is public so feel free to share it.

Direct evidence is cited inline. Plausible hypotheses are explicitly noted as such.
This uses a pre-GA DuckDB 1.5 build (1.5.0.dev311), so final GA behavior may differ.
Results are from one hardware profile and may not extrapolate to x86 server environments.
Power-phase timings were collected without forced OS cache eviction, so this is a warm-cache profile.
TPC-DS v1.0.0 shows high run-to-run variance (one run at 734s vs median 441s), driven by Q23A/Q23B instability.
- Later versions are stable (<5% spread). Median aggregation absorbs these outliers.

References & Resources

Footnotes

This post is part of the DuckDB Performance series at Oxbow Research. I track DuckDB's evolution with systematic benchmarks and technical analysis across each release.

Faster Ducks - MotherDuck Blog, 2025. Performance analysis of DuckDB evolution.

Announcing DuckDB 1.1.0 "Eatoni" - DuckDB Blog, September 2024.

Announcing DuckDB 1.2.0 "Histrionicus" - DuckDB Blog, February 2025.

Announcing DuckDB 1.3.0 "Ossivalis" - DuckDB Blog, May 2025.

Announcing DuckDB 1.4.0 "Andium" - DuckDB Blog, September 2025.

DuckDB Release Calendar and DuckDB GitHub Releases - accessed February 25, 2026. Release calendar lists 1.5.0 as upcoming on March 2, 2026; latest GA listed in releases is v1.4.4 (January 26, 2026).

Redesigning DuckDB's Sort, Again - DuckDB Blog, September 2025. Benchmarks on M1 Max MacBook Pro (10 cores, 64 GB RAM): 1.7-2.7× on random data, up to 10× on pre-sorted data, with wide-table sorting 2-3.4× faster at SF10-SF100.

The column pruning issue under ROLLUP/CUBE was independently documented by GitHub user heldeo, who measured 9.3× more columns scanned on TPC-DS Q36 (another ROLLUP query) when running on S3-backed Parquet. See Column pruning disabled for GROUP BY ROLLUP/CUBE/GROUPING SETS. The underlying correctness fix was PR-17259; the targeted performance fix is PR-20781, shipping in v1.5.0.

Discussion about this post

Ready for more?

How Much Faster is DuckDB 1.5 vs 1.0? A lot

I benchmarked every DuckDB minor release from 1.0.0 through the 1.5.0 dev build on TPC-H, TPC-DS, ClickBench, and SSB. The results tell a clear story of DuckDB quickly improving.

Introduction

Versions Tested

DuckDB's Performance Evolution

Versions 1.1 through 1.3: Execution Engine, I/O, and Parquet

Version 1.4: Sorting Rewrite

Version 1.5.0-dev: Continued Acceleration

Results: TPC-H (SF=10)

Overall Version Progression

Per-Query Analysis

Regressions

Query Category Breakdown

v1.4.4 to v1.5.0-dev: Per-Query Deltas

Results: TPC-DS (SF=10)

Overall Version Progression

The v1.2 Regression: What Happened to Query 22?

Results: ClickBench

Overall Version Progression

Query Pattern Analysis

Results: SSB (SF=10)

Overall Version Progression

Analysis and Insights

Regression Analysis

How I Ran These Benchmarks

Conclusions: v1.0.0 to v1.5.0-dev

Limitations and Caveats

References & Resources

Footnotes

Discussion about this post

Ready for more?