Today, I'm going to tell you about one of the most dramatic "divorces" in database history — the story of how a team of Chinese engineers had a massive falling out, only to end up creating the two best OLAP systems on the market.
Spoiler alert: this story involves legal wars over trademarks, philosophical debates about the "true path" of a developer, and a lesson on why competition is sometimes a good thing (at least for us, the users).
If you think corporate drama only happens in HBO series — welcome to the world of Open Source.
Part 1: How It All Began (And Why Baidu Is More Than Just a Search Engine)
In the early 2010s, the big data world was obsessed with Hadoop. Everyone was building their "data lakes" and rejoicing that they could store petabytes of information. There was just one problem: when it came to analytics, the speed was roughly equivalent to dial-up internet.
Engineers at the Chinese giant Baidu (think Google, but Chinese — and yes, very smart people work there too) decided they couldn't live like this. They started building their own analytical database under the codename Palo.
By 2017, the thing worked so well that Baidu was processing petabytes of advertising data on it. And then someone upstairs made a decision that would change everything: "Let's open source the code!"
In 2018, the project was donated to the Apache Software Foundation under a new name — Apache Doris. It was textbook: open source, a community of contributors, democracy, and consensus.
It seemed like a happily ever after. But that's when the fun began…
Part 2
Picture the situation: you're a talented engineer, you see the real-time analytics market growing by 50% a year, but your project is stuck in the bureaucracy of the Apache Foundation, where every architectural decision has to be debated in public mailing lists for weeks.
By 2020, two camps had formed within the Apache Doris team:
The "Evolutionist" Camp (future VeloDB): "Guys, let's not break what works. Baidu, Meituan, Xiaomi — they all run on our code. Let's improve gradually while maintaining compatibility."
The "Revolutionary" Camp (future StarRocks): "The old code is technical debt that will bury us. We need to rewrite the engine from scratch! Only a clean slate will allow us to achieve true performance!"
It's basically the debate of "renovating the house room by room" vs. "bulldozing it and building a mansion." Both options have merit, but finding a compromise is nearly impossible.
The Revolutionaries left and founded their own company. And that's when the circus really started…
Part 3: The Trademark Scandal (Or Why Apache Lawyers Never Sleep)
The breakaway team released their product under the name… DorisDB.
Yes, you read that right. People forked the Apache Doris project and named their commercial product almost exactly the same thing.
It's like quitting McDonald's and opening a burger joint called "McDonaldz" across the street. Technically a different name, but everyone knows what's going on.
The Apache Software Foundation reacted predictably: "Are you guys serious?" When a project is donated to the foundation, all trademark rights go to Apache. Using the name "Doris" in a commercial product was a direct violation.
Moreover, this situation blocked the original Apache Doris from graduating from the incubator to Top-Level Project status. The Foundation simply couldn't guarantee brand purity while DorisDB existed in the market.
Eventually, under pressure and the threat of lawsuits, the company rebranded. DorisDB became StarRocks.
The divorce was official: different names, different code, different licenses, different paths.
Part 4: The Battle of Technologies (For Those Who Like It Hot)
Okay, enough corporate drama — let's figure out how these systems actually differ. I'll try to explain without requiring a PhD in Computer Science.
The StarRocks Approach: "Speed at Any Cost"
The StarRocks team rewrote the core in modern C++, betting on extreme performance in complex scenarios.
The Ace up the Sleeve: A powerful optimizer for complex JOINs. If you have a specific scenario where you need to join dozens of tables on the fly without preparation — StarRocks does this brilliantly.
The Price: This solution requires more resources (especially memory) and is tailored for one specific task — fast analytics. It's a "Formula 1 car": incredibly fast on the track, but you wouldn't drive it to the grocery store or off-roading.
The VeloDB (Apache Doris) Approach: "The Universal Platform"
The Doris team took the path of creating a universal Swiss Army knife for data. Instead of focusing solely on JOIN speed, they expanded the range of tasks the database could solve.
Key Differences:
- Inverted Indexes. This is the "killer feature" that turns the analytical database into a search engine.
- What it gives you: You can store and analyze logs in the same database where your business data lives.
- Value: No need to maintain a separate (and expensive) Elasticsearch cluster. Infrastructure savings can reach 5–10x.
- Advanced Partial Updates.
- What it gives you: The ability to update only specific columns in wide tables without rewriting the entire row.
- Value: Critical for AI systems (Feature Stores), where data updates frequently and granularly.
- Apache Governance.
- Value: Stability and no vendor lock-in risks. The product belongs to the Foundation, not a single commercial company, which is a decisive safety factor for many enterprises.
In essence, StarRocks positions itself as a high-performance engine for complex queries, while VeloDB positions itself as a unified data platform covering analytics, search, and AI tasks.
Part 5: The Red Queen Effect
Remember Alice Through the Looking-Glass? "Now, here, you see, it takes all the running you can do, to keep in the same place."
This is the perfect description of what has been happening between VeloDB and StarRocks for the last 5 years:
- StarRocks released the Primary Key Model → Doris responded with Merge-on-Write.
- StarRocks implemented a cool CBO (Cost-Based Optimizer) → Doris rolled out Nereids.
- Doris added inverted indexes → StarRocks started improving its indexing mechanisms.
By 2026, the systems reached parity in basic functionality. But the key difference remained: VeloDB evolved as a universal platform, while StarRocks evolved as a highly specialized engine for one type of task.
This is how a split and competition, which seemed like a disaster in 2020, led to both systems becoming better. But VeloDB's path turned out to be more practical for real business.
Part 6: So, Which One Should You Choose?
After all this analysis, you probably have a logical question: "Okay, smart guy, but what should I use?"
The choice depends on the problem you are solving.
The StarRocks Profile: "Speed for Complex Queries"
Choose StarRocks if:
- You have a specific scenario with JOINs of dozens of tables on the fly, and you don't want to deal with preparing data marts.
- You are building a pure Lakehouse and want to query data from S3/Iceberg as fast as possible.
- Infrastructure budget (memory) is secondary to you compared to the performance of complex queries.
The VeloDB Profile: "Universality and Efficiency"
Choose VeloDB if:
- You need consolidation. You want to combine analytics and log search in one system, ditching the complex zoo of solutions (ELK + DWH).
- TCO (Total Cost of Ownership) matters. Savings on memory and disk (thanks to inverted indexes) play a role.
- You work with AI/ML. You need a fast Feature Store with support for partial updates.
- You value independence. It is important for you to use a product managed by the Apache Foundation to minimize corporate risks.
Bottom line: StarRocks wins in narrow scenarios of extreme load on JOINs, while VeloDB offers a more balanced solution for building a unified enterprise data platform.
Epilogue: What's Next?
Both systems have now rushed into AI. Vector search, RAG architectures, integration with LLMs — this is the new battlefield.
VeloDB looks more interesting here again: they are integrating vector search with traditional full-text search (Hybrid Search). This means you can search by meaning (semantically) and by keywords simultaneously — exactly what is needed for modern AI applications.
The story of two twin brothers who quarrelled and became competitors continues. But if you need to pick a side right now — don't look at benchmarks (they change every month), but at which architecture fits your business tasks better: narrow specialization or universality.