Introducing Oxbow Research
Independent data platform analysis using open-source methodology.
TL;DR: What is Oxbow Research? Independent analysis of data platform performance and pricing. I also review market trends, vendor strategy, and post deep dives on historical companies and trends. Performance analysis is based on benchmarking run with BenchBox, the open-source benchmarking framework I created to provide transparency and reproducibility for this effort.
The problem with benchmarks
No one trusts data platform benchmarks. Data platform vendors don’t exactly mislead with their benchmarking efforts but, understandably, they only publish benchmarks if they win. Snowflake publishes benchmarks where they look best, Databricks publishes benchmarks where they look best, and data practitioners are left comparing apples to oranges with no easy way to verify either claim. The industry calls this “benchmarketing” and it’s been the norm for decades.
The same dynamic explains why there are so few official TPC results. TPC certification costs $100k+[1]. It only makes sense to publish a result if you are sure your competitors won’t beat it quickly. If you know a competitor could publish a better (or even close) result then publishing your TPC result is shooting yourself in the foot. That’s why most vendors never publish or publish once, claim the crown, and never update it. The incentives guarantee you won’t see an apples-to-apples comparison unless someone outside the vendor ecosystem creates one.
For data practitioners this creates a real problem: you need to justify your data platform choice and budget. You have a few outdated vendor benchmarks (apples to oranges), analyst quadrants (expensive and vague), and your own experience (limited to platforms you know). You can run your own benchmarks, but it soaks up engineering time: researching configs, debugging drivers, and fighting with cloud permissions - when you should be shipping useful data products for your business.
What about independent benchmarks?
They often suffer from a few common problems:
Conflict of Interest: Fivetran’s cloud data warehouse benchmark was very useful but Fivetran’s business requires close relationships with cloud data warehouse vendors. The conclusion “they’re all pretty good” might be accurate but it reads differently when their revenue depends on not offending anyone on the list.
Single Platform Experts: Often benchmarks are run by practitioners who are expert in a specific platform but have limited experience of the competing platforms. Doing this well requires considerable effort because it’s hard to create best case tunings for platforms you don’t know well. It’s all too easy to write of an unfamiliar platform as “slow” when it’s just misconfigured.
TPC “Inspired”: A common category of problematic benchmarks are TPC-H or TPC-DS “inspired”. They avoid the complex TPC official methodology requiring: data generation, query validation, specific query ordering, concurrent testing, refresh operations, and unique measurement logic. These “inspired” results can be directionally useful but they’re not directly comparable to other TPC-H or TPC-DS “inspired” results because they don’t adhere to the spec.
And finally there’s governance. Who decides if a benchmark was run fairly? Who handles complaints? Who updates results when new versions ship? Usually nobody. The benchmark gets published, gets shared on Hacker News, and sits there, frozen in time, increasingly outdated, with no process for correction or update.
What I built
Vendor Benchmarks vs Oxbow Research:
Funding: Vendor funded vs Subscriber funded
Methodology: Custom scripts vs Versioned toolkit
Reproducibility: Good luck vs
pip install benchboxGovernance: Trust us vs documented process
Analysis: “We’re the fastest!” vs “Fastest at what, exactly?”
Failed to render LaTeX expression — no expression found
BenchBox is the foundation, an easy-to-use open-source benchmarking toolkit with an MIT license.
uv pip install benchbox
uv run benchbox run --platform duckdb --benchmark tpch --scale 1 --phase powerBenchBox will run a spec-compliant TPC-H Power test: generating data with dbgen, running all 22 queries (1 warmup run + 3 measurement runs), queries in the correct order for each run, using proper parameterization, and reports the geometric mean performance metric (Power@Size). Scale factor 10? Same methodology, larger dataset, correct parameterization. Scale factor 100? Depends on your hardware, but you’ll know exactly what configuration produced those numbers, because you ran it.
Oxbow Research is independent and self-funded. I have no outside investors or employer to keep happy. Every benchmark I publish uses BenchBox, the same open-source tool anyone can run. If you disagree with my results, reproduce them and show me. Methodology debates happen on GitHub - if something is wrong (or missing) we’ll fix it for everyone in public.
What I’ll write about
Benchmark results with full methodology, TPC-H power tests, TPC-DS, ClickBench. Industry economics and vendor analysis. Technical deep-dives on data platform internals. Historical perspectives on analytics technology. I have opinions. I’ll tell you what they are and why.
Why “Oxbow”?
The data industry has gone through numerous cycles where a technology or approach seems to completely dominate the market (or the mindshare) for a few years and then becomes less relevant as the market moves onto a different trend. So that’s the metaphor for Oxbow Research: understanding the speed and course of the current path for data platforms and thinking about where and why the previous path diverged.
Here are a few “oxbows” that I’ve seen in my career:
Rowstore + Indexing - Oracle
MPP Rowstores - Teradata
DW Appliances - Netezza
Early Columnstores - Vertica
Data Lakes - Hadoop, S3
Cloud Data Warehouses - Redshift, Snowflake
Lakehouse + Open Table formats - Databricks, Delta Lake, Iceberg
The current path
Composable Data Stacks - DuckDB, DataFusion, Polars
The next path?
What’s next
Subscribe to the Oxbow Research newsletter to stay informed on upcoming research, analysis, and deep dive posts. BenchBox is freely available today.
TPC Policies - TPC, accessed January 2026. Full benchmark certification requires third-party auditing and TPC membership. ↩



