6 minute read · September 10, 2024
Dremio Live Reflections on Iceberg
· Principal Product Manager, Dremio
Several of the world's largest data-driven organizations use Dremio to facilitate rapid analytics and achieve sub-second query response times directly on the lakehouse.
Reflections are one of the primary technologies in Dremio's query acceleration toolkit. Reflections are materializations that are aggregated, sorted, and partitioned in a variety of ways, and transparently accelerate queries irrespective of the size, format, or source of the dataset. Reflections are stored in your data lakehouse as Iceberg tables.
We are delighted to announce two new innovations that will expedite, enhance, and facilitate the query acceleration process when utilizing Dremio reflections: Automatic Reflection Recommendation and Dremio Live Reflections.
Automatic Reflection Recommendations:
Data engineers who are responsible for performance management are entrusted with the creation of reflections that enhance the performance of frequently executed low–latency queries. In order to expedite the creation of reflections, Dremio offers a REST API and table function that offer a reflection recommendation to expedite specific input ids of jobs that exhibit similar patterns. However, it can sometimes be challenging for data engineers to create an optimal set of reflections that provide a good return on investment, as it can involve identifying common query patterns by navigating through hundreds of thousands of queries.
We are introducing Automatic Reflection Recommendations to simplify the workload of data engineers. This feature will enhance the performance of frequently executed queries by recommending reflections that offer a high return on investment, based on historical query usage patterns.
Automatic reflection recommendations are generated daily, and are based on the query history of the preceding seven days. Dremio now recommends the top 10 reflections to create, based on the number of prior jobs that could be accelerated with a recommendation and the expected average improvement in performance. Data engineers can then easily create these reflections with the click of a button.
To access the list of recommendations for your project and add new reflections:
- Click the Project Settings icon in the side navigation bar.
- On your Project Settings page, click Reflections.
- To access your list of recommendations Click
Dremio will generate up to 10 recommendations. For each recommendation, Dremio will display additional information, such as reflection type, dataset, accelerated job count(estimate), query speedup (estimate), and query time saving(estimate).
- To review and edit a recommendation, click on the recommendation name.
- To add a reflection to your reflections list, click at the end of the row.
Click here to find more information on Automatic Reflection Recommendations.
Live Reflections:
Companies ingest data into their lakehouse quickly and frequently to ensure their data analysts can make business decisions on fresh, accurate data. To help end users analyze data quickly , Reflections should be refreshed either manually or on an automated schedule as part of a data pipeline.
We’re excited to introduce Live Reflections, which simplify the process of reflection management and reduce administrative burden on data engineers. Reflections based on Iceberg tables are now automatically updated when their underlying tables are updated using the Dremio engine or other engines, such as Spark and Snowflake.
After upgrading to the latest version of Dremio, which supports live reflections, customers can enable live reflections on the Iceberg table as source or at table level. Additionally, reflections that are not based on Iceberg tables are not impacted. Any redundant reflection refresh on an Iceberg table, whether scheduled or manually requested, will be identified and deemed NOOP.
Click here to find more information on Dremio Live Reflections.
Conclusion:
Dremio's own internal data lakehouse, which is built on Iceberg tables, was the first to adopt the aforementioned reflection innovations. Our data team observed a higher percentage of queries accelerated per day using reflections, resulting in exceptional outcomes in terms of improved price performance and management efficiency.
These two innovations are available today in Dremio Cloud, a fully managed offering from Dremio, and Dremio Software version 25.1