8 minute read · August 29, 2022

Melt Away Your Cloud Data Warehouse Costs

Ben Hudson

Ben Hudson · Principal Product Manager, Dremio

Companies can choose between two high-level architectures to support analytics: data warehouses and data lakehouses. Cloud data warehouses have become popular because they’re relatively easy to get started with. However, once a company is really using their warehouse, it quickly becomes one of their largest expenses — often 10 times more expensive than anticipated.

It’s easy to see how data warehouse costs can quickly get out of hand:

  • The compute and personnel costs to maintain all the ETL/ELT flows into warehouses can amount to well into the 6 figures monthly
  • All the data copies being created in the warehouse, such as staging tables, pre-aggregated tables, and materialized views multiply storage costs
  • Cloud data warehouses will charge you for automated clustering operations required to optimize tables for performance, and make it incredibly easy for users to try and fix slow workloads by simply throwing more compute resources at them

It looks like the market agrees that cloud data warehouses are overwhelmingly expensive. Countless blogs have been written, pre-built dashboards have been developed, and entire companies have emerged to help organizations better manage their cloud data warehouse spend. Capital One, one of Snowflake’s largest customers (and an investor in Snowflake’s Series D), even released a tool, Slingshot, to help “[s]eamlessly manage your Snowflake data costs.”

The price isn’t purely monetary either. By requiring you to load data before usage*, data warehouses delay time to insight and introduce vendor lock-in:

  • Data engineers spend over half their time managing ETL/ELT flows into warehouses and creating derived datasets for consumers
  • Data consumers need to wait up to several weeks for data to reach the warehouse before they can run dashboards or notebooks
  • Once data reaches the warehouse, companies get locked into its functionality, and it’s difficult to use other compute engines on that data

Migrating to a new warehouse that proposes to charge less per consumable unit won’t solve these problems either — these issues are common across all warehouses.

*Here, “usage” specifically refers to running production workloads that fully benefit from the performance and other features/optimizations that warehouses provide.

Data lakehouses: an open alternative

Data lakehouses combine the scalability and openness of data lakes with the performance and functionality of data warehouses. Lakehouses enable companies to run all their analytical workloads on a single copy of data as it lives in cloud object storage (where most of their data already is), rather than needing to copy it into different proprietary systems before it can be analyzed. With a lakehouse architecture, data consumers can use their favorite tools to analyze data immediately, and data engineers save the time and money needed to load data into a warehouse for others to use (and to maintain the associated infrastructure).

In addition, data lakehouses eliminate vendor lock-in and lock-out that cloud data warehouses are notorious for. Data in cloud object storage is stored in open, vendor-agnostic formats like Apache Parquet and Apache Iceberg, so no vendor has leverage over the data. And, companies benefit from competition and innovation in the lakehouse space, so they can use best-of-breed compute engines to process this data today, and easily try new compute engines as they emerge.

If companies want to fully experience the many benefits of cloud storage and computing, they’ll see the most success by selecting vendors that embrace open architectures and understand the need for other vendors to be at the table.

Why companies choose Dremio for their lakehouse journey

Dremio is the easy and open lakehouse platform. Data teams use Dremio to deliver self-service analytics on the lakehouse, while enjoying the flexibility to use Dremio's lightning-fast SQL query service and any other processing engine on the same data. Companies looking to get started with their lakehouse journey choose Dremio for several reasons.

First, Dremio makes it easy for companies to immediately start running all their SQL workloads directly on data lake storage, from ad-hoc queries to mission-critical BI dashboards. Dremio supports full DML on Apache Iceberg tables (including inserts, updates, deletes, merges, and truncations), which means companies no longer need to load data into a warehouse to tackle workloads that require data mutations. And, with data stored in open formats on the lake, companies get the flexibility to use any other processing engine on that same data.

Companies can also use Dremio to combine their lakehouse data with data that resides in external sources. We know companies won’t have all their data in data lake storage on day one, so Dremio supports a variety of connectors to external systems like relational databases. Connectors, combined with Dremio’s DML support, enable teams to run a wide range of analytics workloads across a variety of sources, and get started with a lakehouse architecture with less up front work.

On top of connectivity, companies choose Dremio because it gives data consumers a self-service experience to explore, analyze, curate, and share data through a modern, intuitive UI. With Dremio, data analysts and data scientists can analyze and experiment with data immediately, without needing help from data engineers. In addition, Dremio’s transparent query acceleration enables teams to use any SQL client to work directly with their logical data model, without needing to worry about physically optimized tables or create BI extracts.

Companies across the world have already used Dremio to melt away their cloud data warehouse costs as part of their lakehouse journey:

  • A leading global e-commerce marketplace provider reduced their data architecture TCO by over 85% by moving their analytical workloads off their cloud data warehouse to running analytics directly on their lakehouse with Dremio.
  • A multinational consumer goods company saved over $2.5M USD within their first year of moving from their cloud data warehouse to a lakehouse architecture powered by Dremio. Aside from drastically simplifying their architecture and reducing TCO to support analytics, their data consumers now drive better business outcomes with immediate, self-service access to datasets, without help from IT.

Get started for free today

Companies have become interested in data lakehouse architectures because they eliminate the operational costs and vendor lock-in associated with data warehouses, and enable their teams to use best-of-breed tooling on their data. You can get started with your lakehouse architecture for free today with Dremio! If you have any questions or want to discuss cost optimization strategies when thinking about a lakehouse architecture, feel free to reach out to our experts here.

Ready to Get Started?

Enable the business to create and consume data products powered by Apache Iceberg, accelerating AI and analytics initiatives and dramatically reducing costs.