14 minute read · June 28, 2024
Dremio vs. Starburst Data: The Truth of Why Companies Choose Dremio
· Principal Product Marketing Manager
Two prominent solutions have emerged in the on-prem, cloud, and hybrid-cloud lakehouse space: Dremio and Starburst Data. Both platforms offer unique features and benefits. On the surface, the platforms look fairly similar, with federated query capability, object store connectivity, SQL on Hadoop functionality, Iceberg support, and support for hybrid cloud environments. A deeper dive reveals they are extremely different, which is why many companies are choosing Dremio over Starburst Data. Here, we explore some of the key reasons behind this preference.
Superior Performance
Raw Performance
When evaluating a platform's performance, you can look at both raw performance and accelerated performance. Raw performance is simply the performance of a platform without using mechanisms to speed up query performance, such as indexing or materializing data sets. Accelerated performance is when those performance accelerators are leveraged. It is important to look at both performance types to truly understand the overall performance profile of an analytical system.
When it comes to pure raw performance, Dremio provides better performance than Starburst Data, being twice as fast in real-world benchmarks. This significant speed advantage not only provides superior TCO but is critical for companies that rely on rapid data processing to drive their operations.
Technological Edge
Dremio’s raw performance is powered by its advanced technological components, which include:
- Based on Apache Arrow: Ensures efficient in-memory analytics.
- C3 Columnar Cache: Enhances data retrieval speeds.
- Streamlined Executors: Optimizes execution paths for faster processing.
- Advanced Vectorized Processing: Maximizes CPU utilization for complex computations.
On the other hand, Starburst Data falls short when it comes to delivering performance on the level of Dremio. Numerous factors play a role in the lack of raw performance. Starburst Data is based on Trino (a fork of Presto), which does not deliver the efficiency and performance that Dremio’s being based on Apache Arrow provides. Other factors, such as Starburst Data’s limited Java-based vector processing, add to the overall performance deficit. In situations where organizations want to leverage caching, Starburst Data’s cache services are complex to configure and get up and running, whereas Dremio’s caching just works.
Performance Accelerators
In looking at accelerating performance, there are numerous ways for analytical platforms to speed up query performance. What really matters when it comes to performance accelerators is how easy they are to set up and use, their scope (i.e., what data and workloads they accelerate), and their overall performance.
Dremio Reflections
When it comes to a Dremio environment, reflections are the #1 mechanism used to accelerate query performance. The reason is that they are simple to set up, transparent to the users, and provide sub-second response times to SQL queries.
Dremio Reflections are robust performance accelerators that enhance all data sources. This means that Dremio can accelerate performance not only within the lakehouse environment but also when federating queries to foreign data, such as databases, data lakes, or other data sources.
The great thing about Reflections is that the user does not need to know about them to take advantage of the performance acceleration. Dremio's query optimizer will automatically leverage one or more Reflections to partially or entirely fulfill queries, bypassing the need to process raw data from the underlying data source. Queries do not need to explicitly reference Reflections. Instead, Dremio automatically rewrites queries on the fly to leverage the needed Reflections, providing lightning-fast response times for SQL workloads.
When it comes to speeding up BI workloads, aggregations are paramount. Reflections in Dremio perform pre-aggregation and pre-sorting (indexing), significantly enhancing query performance. This approach results in BI workloads running 10-100 times faster, enabling quicker insights and more efficient data analysis.
Reflections are :
- Integrated: Part of the core Dremio solution.
- Available: Present in all versions of Dremio.
- User-Friendly: Simple to create and manage.
- Resilient: Can withstand node failures without performance degradation.
- Fast: pre-aggregation and pre-sorting enhance query performance
Starburst Data Warp Speed
Starburst Data’s Warp Speed is a highly touted query acceleration technology. It adds an indexing and caching layer to enhance performance. While it offers some performance gains over Starburst Data’s raw capabilities, in real-world applications, its effectiveness is limited by its scope. Comprising caching and indexing rather than aggregations, Warp Speed is extremely limited in accelerating typical analytical workloads. Business intelligence relies heavily on aggregations, whereas indexing is primarily beneficial for point queries and highly selective filters.
The truth is that Starburst Data's Warp Speed accelerator is more limited:
- Sources: Only accelerates lakehouse sources (Hive, Delta Lake, and Iceberg only).
- Scope: Only good for point queries BI queries need aggregations.
- Complexity: Requires Kubernetes, a additional database and limits on node sizing
- Availability: Only available in the enterprise edition and only for S3, GCS and ADLS (so doesn’t work for on-prem data lakes).
- Maintenance: Involves extensive initial setup and configuration.
- Reliability: Indexes and cache are not resilient to node failures.
In real world benchmarking, time and time again, Dremio with Reflections continues to outperform Starburst Data with Warp Speed.
Materialized Views vs. Reflections
Starburst Data also offers the ability to leverage materialized views as a query accelerator. A materialized view stores the results of a specific query as a physical table in the database. The data in the materialized view is precomputed and stored, meaning that the results are already available without the need to recompute the query each time the view is accessed. Though Starburst Data’s materialized views and Dremio’s Reflections seem similar on the surface, a deeper look reveals they are extremely different. This is why Dremio’s accelerated performance is superior to that of Starburst Data.
Starburst Data’s materialized views present several challenges:
- Complex Setup: Difficult to configure and manage.
- Direct Querying: Must be directly queried by users.
- No Query Rewrite: No automatic query rewrite capabilities.
- Update Issues: Limited auto-update and scheduling for materialized views.
- Lack of Recommendations: No materialized view recommendations.
In contrast, Dremio Reflections offer:
- Transparency: Optimizer is auto-aware and usage is transparent to users.
- Query Rewrite: Supports partial query rewrite.
- Incremental Updates: Automatic incremental reflection refresh and updates.
- Recommendations: Provides reflection recommendations.
- Scheduling: Includes flexible reflection scheduling tasks.
When it comes to overall performance, both raw and accelerated, with real-world BI workloads, Dremio consistently outperforms Starburst Data. The combination of superior scalability, performance, and ease of setup and management is one of the key reasons companies continue to choose Dremio over Starburst Data.
Enhanced Self-Service and Ease of Use
A major reason companies like our Dremio Unified Lakehouse Platform and choose us over the competition is our ability to provide self-service analytics to a broad user base. Coupled with the overall ease of use of the Dremio solution, this allows companies to rapidly deliver powerful analytics to their users, driving business value. We consistently hear from our customers, who have users ranging from business analysts, data analysts, and power users to data scientists, that Dremio is intuitive, simple, and easy to use.
Dremio excels in providing a user-friendly, self service experience through:
- Intuitive Semantic Layer: Simplifies data discovery and interaction.
- AI-Driven Tools: AI driven wiki generation and data tagging.
- Simple Integrations: One-click integration with BI tools.
- Unified Access: easy and seamless integration with various data sources
Starburst Data, unfortunately, offers less intuitive features or does not offer some of these self-service features at all. Even if they do offer the feature, it can be extremely difficult for users to leverage. For example, in an on-prem environment, if a user wants to simply add object storage as a data source, they cannot add it directly. To connect, it requires a metastore like Hive Metastore or Glue. On top of that, it requires a table to be created in the metastore before they can actually query it. On the other hand, in Dremio, a user simply navigates to a folder, easily promotes files to tables, and begins querying. It’s as simple as that. Features like this highlight our ease of use and the superior self-service capabilities we deliver. This makes Dremio a more appealing choice for companies looking to broadly enable their users to drive business value and insight through analytics.
Superior Lakehouse DataOps
A key component of a successful Lakehouse environment is robust DataOps. Dremio’s Lakehouse Management capability focuses on collaboration, automation, and integration between data engineering, data integration, and data analytics teams. We provide the only enterprise Iceberg lakehouse catalog across hybrid cloud environments in the market. We look to streamline and accelerate the entire data lifecycle, from data acquisition and preparation to analysis and insights generation. Dremio provides a flexible and user-centric DataOps solution through features like:
- Automatic Data Versioning: Simplifies data management.
- Git-Like Branching: Facilitates experimentation and testing.
- Zero Copy Clones: Enables efficient data sharing without duplication.
- Governance: Track all changes to data and metadata.
- Data Maintenance: Automatic data optimization and data cleanup
These capabilities make Dremio the preferred choice for companies seeking powerful, efficient, and user-friendly data management capabilities in their lakehouse environment. Dremio delivers these in an integrated, out-of-the-box manner that allows companies to take full advantage of advanced DataOps capabilities. Starburst Data, on the other hand, delivers some Iceberg data maintenance capabilities but lacks the broader DataOps capabilities, either integrated or out-of-the-box. The absence of these capabilities makes the overall lakehouse environment more difficult to manage and reduces its potential value. This is another key reason why companies prefer Dremio over Starburst Data.
While both Dremio and Starburst Data offer tools for data management and analytics, Dremio's superior performance, self service capabilities, user-friendly design, and robust DataOps functionality make it the clear winner. Dremio is the choice for companies looking to drive business value and insight with a next generation lakehouse solution.
Get hands-on with Dremio on your laptop with this easy exercise.