9 minute read · September 13, 2023
What’s New in Dremio: A Leap Forward in Accelerating SQL Query Performance on the Lakehouse
Dremio, the easy and open lakehouse, delivers analytics at the lowest cost. We're excited to show you some new things we've been working on to help you analyze your data faster and easier than ever, no matter where it's stored.
Catalyzing Performance with Next Generation Reflections
Let’s start with Reflections, Dremio’s innovative SQL query acceleration technology. Queries using Reflections often run 10 to 100 times faster than unaccelerated queries.
A Reflection is an optimized relational cache of source data that can be used to speed-up data processing. The Dremio query engine uses algebraic matching to accelerate queries or segments of a query that match an existing Reflection. Reflections are a major innovation in query acceleration and data operations:
- They’re built using business logic to abstract the data, and reduce the number datasets users need to be familiar with
- They enable rapid development of new analytics projects since you can add Reflections at any point
- They make data more manageable since Dremio takes care of managing the Reflections caches
- And the best part? You save time and money because Reflections don’t need to analyze all of your raw data to complete the query
All of this adds speed, agility, and efficiency when you're analyzing data.
Accelerate BI workloads in seconds with Reflection Recommender
We've got something even smarter for you: Reflection Recommender. Reflection Recommender evaluates an organization's SQL queries and generates precise suggested Reflections to turbocharge query acceleration.
Reflection Recommender eliminates manual data and workload analysis, ensuring the fastest, most intelligent query accelerations are effortless and only a few keystrokes away. Reflection Recommender is easy to use and puts advanced query acceleration technology into the hands of all users, saving significant time and cost.
Before, you had to be a data expert to make your queries faster with Reflections.You needed to have a detailed understanding of your data and workloads to know how best to create a Reflection. You had to identify expensive queries from your job history. Then, you examined the SQL to determine patterns to define a Reflection that would satisfy as many queries as possible.
Finally, and most critically, you created a new view (also known as supporting anchor), and then created one or more aggregate Reflections on this view.
Another challenge was that unnecessary Reflections could be created. The impact was two-fold: you would overuse system resources, and you would devote extra effort to create and manage unneeded accelerations.
Now, we've made it simple. Just identify your most common, slowest queries in Dremio, and Reflection Recommender automatically generates an accelerating Reflection in seconds. Dremio analyzes the identified queries, and recommends new Reflections. The resulting SQL can be used to create the new recommended Reflections. Once you’ve created a Reflection for a query, they are transparent to all of your data users. Their queries are accelerated automatically with no additional steps.
And, you’ll never spend time creating an unnecessary Reflection, since Dremio will only recommend beneficial Reflections that drive the queries you select.
Staying Fresh: Keeping Data Up to Date in Reflections is even easier!
We’ve also optimized how Reflections are refreshed to put the freshest data in your hands
faster!
As your data changes, it’s necessary to refresh associated Reflections to ensure that source data is fresh and accurate. We’re excited to announce our improved intelligent Reflections Refresh for Apache Iceberg tables. Now, Reflections use Iceberg manifests to track data changes and update your Reflection caches. These instant and incremental data updates make refreshes faster and less expensive - and give you faster access to up-to-date data.
We’ve also launched Reflections Hints. Hints let you further fine-tune your use of Reflections by guiding the query optimizer to make prioritized planning decisions that match your business needs. Hints are most helpful when you have multiple Reflections for the same view, and want to accelerate the query planning process even more using a preferred Reflection. Learn more about how to create and manage Reflections with these new features in our latest Next Gen Reflections blog post.
Getting Better All the Time: Improving Performance
No matter how you query your data in Dremio, it’s fast. We've been working hard to make things faster. With our latest updates, Dremio performance is now about 8% faster than before. That means you can get your answers quicker to drive your business.
To deliver this improved performance, we further optimized query planning time and runtime filters. We also sped-up our Parquet writer/reader by reducing I/O and optimally writing files.
Dremio acts as a unified access and analytics layer over all of your data, with no data movement - so you can query all your data on-demand, wherever it lives. Those rapid SQL queries work across all your data and massive data volumes, whether on premises or in the cloud.
And, a recent TCO analysis comparing Dremio to other lakehouse vendors shows that Dremio can reduce your analytics costs by 53%.
Travel Back in Time with Delta Lake
Open table formats are what allow Dremio to bring data warehouse functionality to the data lake to create a data lakehouse.
Dremio is designed to be the best SQL analytics engine for Apache Iceberg. But, we know that our customers need choice and flexibility when it comes to table formats. Our goal is to work seamlessly with other table formats. Under the hood, we want to make things simple so you don’t have to think about which table format to use.
That’s why we’ve expanded our support for Delta Lake to include time-travel. You can now compare historical point-in-time analysis, simplifying time-series analytics in Dremio using both Apache Iceberg and Delta Lake. Users can examine historical data to identify trends, seasonality, and other time dependent patterns to improve their understanding of the data and enhance predictive models.
More Ways to Ask Questions with Expanded SQL
Dremio continues to expand our SQL coverage to deliver the most comprehensive and fastest SQL query engine for all your data. SQL developers can code natively in Dremio. Business analysts can query data using a no-code, drag-and drop interface. And, text-to-SQL Generative AI capabilities mean that anyone can create a SQL query by simply asking a question.
We’ve added six new SQL functions, including multiple ARRAY functions, UNNEST, and support for literal syntax for array data type. You can see full documentation about our SQL coverage in our documentation.
Connect to More: Meet the Apache Druid Connector
We've also made it possible to connect to even more data sources. Dremio offers dozens of connectors to common data sources and platforms - and we’re always adding more. Our new Apache Druid Connector is the latest addition. Apache Druid is a column-oriented, open-source, distributed data store written in Java. Druid is designed to quickly ingest massive quantities of event data and provide low latency queries on top of the data. With the Druid connector, you can seamlessly connect your Druid datastore to Dremio for analytics.
Get started now!
All of these capabilities are available now! If you’re already a Dremio Software customer, it’s easy to upgrade. Contact your account team to get started. If you aren’t already using Dremio, you can try it for free using Dremio Cloud or the Self-Managed Community Edition.