6 minute read · June 3, 2020
Getting Locked-In and Locked-Out With Snowflake
· Founder & Chief Product Officer, Dremio
Yesterday, Snowflake announced what it calls Snowflake Data Cloud. The announcement positioned Snowflake as the place where the world should store its data. In the same way that AWS, Microsoft Azure and Google Cloud Platform are infrastructure clouds and Salesforce is an application cloud, Snowflake wants you to think about its platform as a “data cloud”.
According to Snowflake, the more data companies put in the Data Cloud, the more valuable it gets for everyone. This is the result of a network effect—a phenomenon whereby increasing numbers of users improves the value of a product or service. Snowflake accomplishes this by making it easy to share data with other Snowflake users and customers without copying it. This data sharing feature has been available for a couple of years via their Data Marketplace.
Surrendering data to a data warehouse creates big challenges, however. For decades, enterprises have been fighting vendor lock-in and lock-out with their data warehouses—a reality in which companies are:
- Charged exorbitant fees for their data warehouse
- Locked in to the data warehouse, with almost no ability to migrate off
- Locked out of innovation outside the data warehouse with no ability to leverage other technologies and services to process that same data
Whether in the cloud or on-premises, the problem doesn’t change. Like all data warehouse architectures, the Snowflake data cloud model creates inherent risks that get bigger as more of an enterprise’s data is moved there. Beyond the data itself, the monolithic approach Snowflake is advocating limits one of the strongest appeals of cloud architectures—the ability to natively use best of breed technologies and services against a single source of data.
But there’s a better data architecture. One that’s built around an open cloud data lake (e.g., S3 or ADLS) instead of a proprietary data warehouse. One that provides full and complete control of your data at all times and enables you to quickly and efficiently access, process and query data directly on the data lake.
It’s time to think about an open, modern data architecture.
An Open Data Architecture for Analytics
A key advantage of cloud data lakes over data warehouses is their open architecture. Data lakes minimize the risk of vendor lock-in as well as the risk of being locked out of future industry innovation. Werner Voguls describes this well in a recent blog post:
“With a data lake, data is stored in an open format, which makes it easier to work with different analytic services. Open format also makes it more likely for the data to be compatible with tools that don’t even exist yet. Various roles in your organization, like data scientists, data engineers, application developers, and business analysts, can access data with their choice of analytic tools and frameworks. You’re not locked in to a small set of tools, and a broader group of people can make sense of the data.”
An open data architecture also means that data should be stored in open source formats in the cloud, not in proprietary formats in a vendor’s software or cloud platform. You should be able to access and analyze the data where it lands, right in your data lake. And you should have the flexibility to choose any existing and future technologies to access, process and query your data. For example, you could use Amazon S3 and ADLS to store the data, Dremio and Databricks to process it and Tableau and Power BI visualize it. You’ll need that flexibility, not just today but also in the future.
In comparison, Snowflake believes that all of your data should be in Snowflake. Yesterday they unveiled Snowsight, an integrated BI tool with charts and dashboards. However, you can only take advantage of these features by locking yourself into Snowflake.
At Dremio, our vision for data and analytics is fundamentally different. We believe in a more modern heterogeneous architecture, one in which companies will continue to use best of breed tools like Tableau, Power BI, Looker, Superset and Jupyter with their various data sources. We believe in investing in integrations with best-in-class technologies to give our customers a wide array of choices to satisfy all their analytics needs.
Unlocking the Last Piece of the Puzzle
If this open data architecture is so compelling, why do people still use data warehouses to the extent they do? Because of one significant problem—the inability to quickly access and query data directly in data lake storage, particularly by analysts who just want to deliver insights, not complexity and code. This missing puzzle piece has reinforced the traditional approach of ETLing data into a monolithic data warehouse like Teradata and Snowflake. Until now.
Here at Dremio we’re unlocking the full potential of an open data architecture for analytics. The Dremio data lake engine creates and defines a new category of analytics query infrastructure that connects data teams with their data in a much more performant, efficient and open way than ever before.
Our data lake engine enables high-performance queries directly on data lake storage like S3 and ADLS, eliminating the need to ETL data from the data lake into a separate data warehouse. Dremio’s query acceleration capabilities also eliminate the time and expense required to create and manage cubes, extracts and aggregations tables.
Dremio has eliminated the last hurdle that stands between you and an open data architecture. You now have a powerful alternative to the legacy, data warehouse-centric model for analytics.
Ensuring an Open Data Lake Future
At Dremio, we believe there are myriad advantages to this open data architecture today, and that its best days of innovation are still to come. To that end, in this paper, we’ve outlined several of Dremio’s innovations that will drive the open cloud data lake architecture into the future while avoiding vendor lock-in and data silos. Please take a look and let us know what you think. You can contact us via the Dremio community forum or at @dremio on Twitter.
The open data architecture is no longer the future, it’s here, and it’s going to get even better.