13 minute read · May 7, 2024
How to use Dremio’s Reflections to Reduce Your Snowflake Costs Within 60 minutes.
· Senior Tech Evangelist, Dremio
In previous articles, we explored the factors that drive cloud data warehouse costs and how Dremio’s Lakehouse Platform can mitigate these expenses, particularly with Snowflake. We also referenced a white paper highlighting actual savings experienced by Dremio customers. Also, during the keynote of the Subsurface 2024 conference, Dremio CEO Sendur Sellakumar showed a recent performance benchmark where Dremio was 70% faster than Snowflake. This article will focus on a straightforward approach to rapidly decrease your Snowflake expenses by using Dremio's superior price-performance, further bolstered by Dremio's performance, meaning less overall compute is required.
The most straightforward area to address in terms of reducing costs is your BI Dashboards. Whenever someone interacts with a BI dashboard that uses Snowflake as the data source, queries are sent to Snowflake, increasing your expenditure. Imagine if you could significantly cut the costs of serving dashboards from your Snowflake data by drastically reducing the amount of Snowflake compute resources needed.
This reduction is achievable thanks to Dremio's Snowflake connector, which allows access to the data in your Snowflake account and supports several use cases:
- Enriching your Snowflake data with additional datasets from other locations, such as object storage or other databases.
- Enriching your data in other locations with datasets purchased from the Snowflake marketplace.
- Reducing Snowflake costs by shifting SQL workloads to Dremio where feasible.
Next, let's explore how simple it is to connect Snowflake with Dremio and review different strategies for quickly offloading workloads and reducing costs.
Connecting Snowflake to Dremio
Use an existing table in your Snowflake account or use the SQL below to create an example table for this exercise.
CREATE TABLE sales ( sale_id INTEGER AUTOINCREMENT, date TIMESTAMP, product_id INTEGER, product_name VARCHAR(255), category VARCHAR(100), price DECIMAL(10, 2), quantity INTEGER, customer_id INTEGER, customer_name VARCHAR(255), region VARCHAR(100), sales_rep_id INTEGER, sales_rep_name VARCHAR(255) ); INSERT INTO sales (date, product_id, product_name, category, price, quantity, customer_id, customer_name, region, sales_rep_id, sales_rep_name) VALUES ('2021-01-01 00:00:00', 101, 'UltraWidget', 'Widgets', 19.99, 3, 501, 'John Doe', 'North', 701, 'Jane Smith'), ('2021-01-01 00:00:00', 102, 'MegaWidget', 'Widgets', 29.99, 2, 502, 'Alice Johnson', 'South', 702, 'Rick Brown'), ('2021-01-01 00:00:00', 103, 'WonderWidget', 'Gadgets', 39.99, 1, 503, 'Steve Smith', 'East', 703, 'Larry Wilson'), ('2021-01-02 00:00:00', 104, 'SuperWidget', 'Widgets', 25.50, 5, 504, 'Barbara White', 'West', 704, 'Emily Carter'), ('2021-01-02 00:00:00', 105, 'HyperWidget', 'Widgets', 15.75, 2, 505, 'James Wilson', 'North', 705, 'Donna Martin'), ('2021-01-02 00:00:00', 106, 'MiniWidget', 'Gadgets', 45.00, 1, 506, 'Patricia Brown', 'South', 706, 'Matthew Lee'), ('2021-01-02 00:00:00', 107, 'MaxWidget', 'Widgets', 22.95, 3, 507, 'Robert Harris', 'East', 707, 'Sandra Lewis'), ('2021-01-03 00:00:00', 108, 'MegaWidget', 'Widgets', 29.99, 2, 508, 'Jennifer Clark', 'West', 708, 'Kenneth White'), ('2021-01-03 00:00:00', 109, 'NanoWidget', 'Gadgets', 10.00, 10, 509, 'William Rodriguez', 'North', 709, 'Brian Walker'), ('2021-01-03 00:00:00', 110, 'PicoWidget', 'Widgets', 20.20, 2, 510, 'Mary Lewis', 'South', 710, 'Nancy Allen'), ('2021-01-03 00:00:00', 111, 'KiloWidget', 'Gadgets', 75.50, 1, 511, 'David Lee', 'East', 711, 'Lisa Young'), ('2021-01-04 00:00:00', 112, 'GigaWidget', 'Widgets', 55.25, 3, 512, 'Susan Wilson', 'West', 712, 'Joseph Harris'), ('2021-01-04 00:00:00', 113, 'TeraWidget', 'Widgets', 85.75, 2, 513, 'Michael King', 'North', 713, 'Sara Martin'), ('2021-01-04 00:00:00', 114, 'UltraWidget', 'Gadgets', 19.99, 4, 514, 'Jessica Moore', 'South', 714, 'Charles Thompson'), ('2021-01-04 00:00:00', 115, 'MegaWidget', 'Widgets', 29.99, 1, 515, 'Daniel Garcia', 'East', 715, 'Angela Martinez'), ('2021-01-05 00:00:00', 116, 'HyperWidget', 'Widgets', 15.75, 3, 516, 'Lisa Martinez', 'West', 716, 'Thomas Robinson'), ('2021-01-05 00:00:00', 117, 'MiniWidget', 'Gadgets', 45.00, 2, 517, 'Christopher Lee', 'North', 717, 'Barbara White'), ('2021-01-05 00:00:00', 118, 'MaxWidget', 'Widgets', 22.95, 4, 518, 'Sarah Walker', 'South', 718, 'James Clark'), ('2021-01-05 00:00:00', 119, 'NanoWidget', 'Gadgets', 10.00, 5, 519, 'Frank Wright', 'East', 719, 'Patricia Rodriguez'), ('2021-01-06 00:00:00', 120, 'PicoWidget', 'Widgets', 20.20, 3, 520, 'Jason Scott', 'West', 720, 'Michelle Lopez'), ('2021-01-06 00:00:00', 121, 'KiloWidget', 'Gadgets', 75.50, 1, 521, 'Laura Anderson', 'North', 721, 'Steven Clark'), ('2021-01-06 00:00:00', 122, 'GigaWidget', 'Widgets', 55.25, 2, 522, 'Kevin Thomas', 'South', 722, 'Elizabeth Martinez'), ('2021-01-06 00:00:00', 123, 'TeraWidget', 'Widgets', 85.75, 1, 523, 'Dorothy Taylor', 'East', 723, 'Jeffrey Lee'), ('2021-01-07 00:00:00', 124, 'MicroWidget', 'Gadgets', 17.50, 6, 524, 'Brian Hall', 'West', 724, 'Christina Wilson'), ('2021-01-07 00:00:00', 125, 'QuantumWidget', 'Widgets', 99.99, 1, 525, 'Rachel Johnson', 'North', 725, 'Edward Brown'), ('2021-01-07 00:00:00', 126, 'ElectroWidget', 'Widgets', 45.45, 2, 526, 'Nathan Evans', 'South', 726, 'Michelle Davis'), ('2021-01-07 00:00:00', 127, 'NanoWidget', 'Gadgets', 10.00, 4, 527, 'Julie Torres', 'East', 727, 'William Wilson'), ('2021-01-08 00:00:00', 128, 'MacroWidget', 'Widgets', 20.20, 3, 528, 'Andrew Smith', 'West', 728, 'Sarah Moore'), ('2021-01-08 00:00:00', 129, 'ExoWidget', 'Gadgets', 30.30, 5, 529, 'Alice Brown', 'North', 729, 'James Johnson'), ('2021-01-08 00:00:00', 130, 'HyperWidget', 'Widgets', 15.75, 6, 530, 'John Williams', 'South', 730, 'Patricia Taylor'), ('2021-01-08 00:00:00', 131, 'MegaWidget', 'Widgets', 29.99, 2, 531, 'Marie Wilson', 'East', 731, 'Robert Anderson'), ('2021-01-09 00:00:00', 132, 'TeraWidget', 'Widgets', 85.75, 1, 532, 'Joseph Miller', 'West', 732, 'Lisa Jones'), ('2021-01-09 00:00:00', 133, 'PicoWidget', 'Widgets', 20.20, 3, 533, 'Margaret Davis', 'North', 733, 'Mark Taylor'), ('2021-01-09 00:00:00', 134, 'NanoWidget', 'Gadgets', 10.00, 8, 534, 'Charles Martinez', 'South', 734, 'Elizabeth Garcia'), ('2021-01-09 00:00:00', 135, 'UltraWidget', 'Gadgets', 19.99, 4, 535, 'Susan Lopez', 'East', 735, 'Michael White'), ('2021-01-10 00:00:00', 136, 'MiniWidget', 'Gadgets', 45.00, 2, 536, 'Paul Walker', 'West', 736, 'Carol Harris'), ('2021-01-10 00:00:00', 137, 'MaxWidget', 'Widgets', 22.95, 3, 537, 'Laura Young', 'North', 737, 'Nancy Allen'), ('2021-01-10 00:00:00', 138, 'HyperWidget', 'Widgets', 15.75, 6, 538, 'Steven King', 'South', 738, 'Lisa Robinson'), ('2021-01-10 00:00:00', 139, 'GigaWidget', 'Widgets', 55.25, 1, 539, 'Dorothy Scott', 'East', 739, 'Mark Lee'), ('2021-01-11 00:00:00', 140, 'SuperWidget', 'Widgets', 25.50, 5, 540, 'Larry Martinez', 'West', 740, 'Steven Wright'), ('2021-01-11 00:00:00', 141, 'HyperWidget', 'Widgets', 15.75, 4, 541, 'Jennifer Hernandez', 'North', 741, 'Betty Phillips'), ('2021-01-11 00:00:00', 142, 'MiniWidget', 'Gadgets', 45.00, 2, 542, 'Benjamin Nelson', 'South', 742, 'Jessica Clark'), ('2021-01-11 00:00:00', 143, 'MaxWidget', 'Widgets', 22.95, 3, 543, 'Samantha Moore', 'East', 743, 'Sandra Lewis'), ('2021-01-12 00:00:00', 144, 'QuantumWidget', 'Widgets', 99.99, 1, 544, 'Edward Mitchell', 'West', 744, 'Karen Young'), ('2021-01-12 00:00:00', 145, 'ElectroWidget', 'Widgets', 45.45, 3, 545, 'Melissa Taylor', 'North', 745, 'Brian Clark'), ('2021-01-12 00:00:00', 146, 'MicroWidget', 'Gadgets', 17.50, 7, 546, 'Christopher Davis', 'South', 746, 'Rebecca Martinez'), ('2021-01-12 00:00:00', 147, 'ExoWidget', 'Gadgets', 30.30, 4, 547, 'Patricia Wilson', 'East', 747, 'Jason Thompson'), ('2021-01-13 00:00:00', 148, 'MacroWidget', 'Widgets', 20.20, 5, 548, 'James Anderson', 'West', 748, 'Jessica White'), ('2021-01-13 00:00:00', 149, 'NanoWidget', 'Gadgets', 10.00, 9, 549, 'John Smith', 'North', 749, 'Charles Harris'), ('2021-01-13 00:00:00', 150, 'PicoWidget', 'Widgets', 20.20, 2, 550, 'Robert Johnson', 'South', 750, 'Sandra Martin'), ('2021-01-13 00:00:00', 151, 'TeraWidget', 'Widgets', 85.75, 1, 551, 'Michael Lee', 'East', 751, 'Daniel Jackson'), ('2021-01-14 00:00:00', 152, 'GigaWidget', 'Widgets', 55.25, 2, 552, 'Elizabeth Brown', 'West', 752, 'Paula Taylor'), ('2021-01-14 00:00:00', 153, 'HyperWidget', 'Widgets', 15.75, 8, 553, 'Jennifer Garcia', 'North', 753, 'George White'), ('2021-01-14 00:00:00', 154, 'MaxWidget', 'Widgets', 22.95, 3, 554, 'William Martinez', 'South', 754, 'Lisa Wilson'), ('2021-01-14 00:00:00', 155, 'UltraWidget', 'Gadgets', 19.99, 5, 555, 'David Rodriguez', 'East', 755, 'Susan Brown'), ('2021-01-15 00:00:00', 156, 'MiniWidget', 'Gadgets', 45.00, 2, 556, 'Richard Miller', 'West', 756, 'Nancy Garcia'), ('2021-01-15 00:00:00', 157, 'SuperWidget', 'Widgets', 25.50, 6, 557, 'Joseph Taylor', 'North', 757, 'Carol Anderson'), ('2021-01-15 00:00:00', 158, 'MegaWidget', 'Widgets', 29.99, 1, 558, 'Thomas Moore', 'South', 758, 'Gary Martin'), ('2021-01-15 00:00:00', 159, 'NanoWidget', 'Gadgets', 10.00, 10, 559, 'Charles Wilson', 'East', 759, 'Angela Robinson'), ('2021-01-16 00:00:00', 160, 'PicoWidget', 'Widgets', 20.20, 3, 560, 'Mary Harris', 'West', 760, 'Betty Lewis'), ('2021-01-16 00:00:00', 161, 'KiloWidget', 'Gadgets', 75.50, 1, 561, 'Margaret Clark', 'North', 761, 'Diane Walker'), ('2021-01-16 00:00:00', 162, 'HyperWidget', 'Widgets', 15.75, 7, 562, 'James Lewis', 'South', 762, 'Steven Hall'), ('2021-01-16 00:00:00', 163, 'GigaWidget', 'Widgets', 55.25, 1, 563, 'Barbara Robinson', 'East', 763, 'Rachel Martinez');
You can use an existing Dremio account, or if you are just evaluating, you can run Dremio off your laptop for free with the following command:
docker run -p 9047:9047 -p 31010:31010 -p 45678:45678 -p 32010:32010 -e DREMIO_JAVA_SERVER_EXTRA_OPTS=-Dpaths.dist=file:///opt/dremio/data/dist --name try-dremio dremio/dremio-oss
Within a few minutes, you’ll see Dremio running at localhost:9047 in your browser. Once you are in your Dremio environment, click on “Add Source” in the bottom left corner select Snowflake as your data source.
Then, enter your Snowflake Computing URL (get this under accounts on the Snowflake Dashboard). The only other required info will be your username and password.
When you first make the connection, Dremio will retrieve a list of tables you have on Snowflake. In the monitoring section of Snowflake, you’ll see these queries listed. These metadata queries don’t require the use of a Snowflake Warehouse. Let’s query a Snowflake dataset out of the box with Dremio and see what happens.
We can now easily query our Snowflake table from Dremio. The initial benefit is that we can now federate queries using our Snowflake datasets with data lakes, databases, and other data warehouses. However, at this point, the query was pushed down to Snowflake, so it did use your Snowflake warehouse, so this doesn’t save you on compute costs.
We can address this using two options:
- Convert the dataset into an Apache Iceberg table on our data lake using a simple CTAS statement. Any queries on the new Apache Iceberg table don’t require the Snowflake Warehouse, this is a great option if you are moving to an Apache Iceberg lakehouse and plan to ingest directly to the Apache Iceberg tables in the future. Since Dremio and Snowflake can both access Apache Iceberg tables, the tables can be used by both engines without having to push down queries to the other; as long as the tables are in a catalog they both know how to access.
- Enable Dremio’s Reflections on the dataset within Dremio. Dremio will create an Apache Iceberg version of the raw data set (raw reflections) or pre-computed aggregates (aggregate reflections) on your data lake behind the scenes. Any future queries on the Snowflake table will substitute the version on the data lake, eliminating the need for push-downs and resulting in fewer Snowflake compute costs. Dremio will periodically refresh this reflection to keep it fresh. For BI Dashboards, the aggregate reflections will be used not only to save the cost of the pushdown but also of the time it’d take to do the aggregate computations.
Let’s focus on the second option.
Enabling Reflections
Open up the dataset in Dremio and click the button towards the right to open up the detailed view of the dataset.
From here, you’ll see a reflections tab, click on it, enable raw and aggregate reflections, then click save.
If you refresh your Snowflake monitoring, you’ll notice two more queries were added to it, one for the raw data and one for the aggregate data. Dremio used the results of these queries to generate Apache Iceberg tables behind the scenes. Now run another query on the dataset, and if you go to Dremio’s jobs page, which is accessible on the side bar on the left, you’ll see something like this.
We’ll notice a few things:
- The lightning icon means that the reflection was substituted to fulfill the query.
- The reflections query took about 700ms, about 74% less time than without the reflection (not are we using cheaper compute, but the better performance means we use less of that compute for further cost reduction). Remember that this was just using my laptop for Dremio compute; using production clusters would be even faster.
- If I refresh the Snowflake monitoring page, no new queries are processed by Snowflake, meaning no Snowflake compute costs for the query.
Conclusion
Essentially, any raw and aggregate queries to this dataset Dremio will now be able to execute significantly faster without the higher cost of Snowflake compute, so you can serve your Dashboards off Dremio without needing a full-scale migration off Snowflake to start experiencing cost savings.