With the launch of Dremio 4.0.0, I’m delighted to share the Dremio Snowflake Connector that is built with the ARP framework. In this article, let’s see what Dremio is and how the new connector can help you achieve faster time to insight for your organization.
What is Dremio?
Faster time to insight enables organizations to move rapidly to action and accelerate the speed with which they can bring new product and service capabilities to market.
To enable faster time to insight, organizations across the world have spent considerable money on data collection, storage and analytics. For the most part, data collection and storage has been addressed in the past decade. Hadoop and S3 compatible object storage became the de-facto for storing big data on-prem and S3, GCS, ADLS became the de-facto for cloud storage.
There is also a big list of vendors who provide data visualization like Tableau, Looker, Qlik, Business Objects, SiSense etc.
However, the question of how to provide a semantic layer across the enterprise that can bridge the BI tools with the massive data to achieve the faster time to insight has not been addressed well yet. To address this gap, we need a platform that can not only do the following but also do it well without any dependencies on anything:
- Lightning fast query engine for ad-hoc analysis
- Sub-second query performance for frequently accessed data
- Self-service layer for data engineers and analysts
- No vendor lock-in
Data Virtualization, federation, OLAP on Hadoop solutions falls short in one or many of the above categories.
“Dremio is a data platform that enables businesses to be data-driven by providing faster time to insight with no vendor lock-in”
Dremio addresses the above problems as follows.
- Lightning fast query engine for ad-hoc analysis. Dremio’s scale out architecture enables querying on large data sets.
- Sub-second query performance for frequently accessed data. Data Reflections provide the sub-second performance.
- Self-service layer for data engineers and analysts.Unified Web UI that provides lots of curation capabilities to empower data engineers and analysts.
- No vendor lock-in. Dremio works with open source columnar data format Apache Parquet whether its on cloud or on-prem directly without the dependency of another RDBMS or any other execution engine. Data Reflections are also stored in parquet format in a cheap storage like S3 or ADLS.
Dremio’s contributions to the OSS community is also a big factor on how it is driving the ecosystem towards faster time to insight:
Apache Arrow — The de-facto in memory representation of columnar data that is used by 50+ projects now.
Arrow Flight — Arrow Flight provides a high-performance wire protocol for large-volume data transfer for analytics
Gandiva — Gandiva is a new execution kernel for Arrow that is based on LLVM
The Snowflake Connector and ARP framework
Dremio’s Advanced Relational Pushdown (ARP) framework allows data consumers and developers to create custom relational connectors for those cases where data is stored in uncommon locations. Using the ARP framework not only allows you to create better connectors with improved push-down abilities but it also provides a way to develop connectors more efficiently and easily.
The ARP framework lets you build a connector very easily for any data source with a JDBC driver. Even the push downs are simply mappings on YAML which makes it very easy for anyone to create a connector.
With ARP, I started working on building a connector for Snowflake about a month ago. Today,
I’m happy to announce a major release of this connector with the following features:
- Complete data type support
- Pushdown of over 50+ functions
- Verified push downs of all TPCH queries supported by Snowflake
You can find more information about the connector here: https://github.com/narendrans/dremio-snowflake
Use Cases
- Join data in your data lake (S3/Azure Storage/Hadoop/Other relational sources) with Snowflake. — No need to move all your on-prem data to Snowflake saving you $$$
- Accelerate Snowflake data with Reflections™ to provide interactive sub second SQL performance. — Enables your business to make decisions faster
- Use CTAS to export data from Snowflake to open source PARQUET format into your cheap data lake store. — Avoid vendor lock-in
- Use COPY INTO from Snowflake to export large data sets in PARQUET to S3/ADLS and use Dremio on top of that making use of Columnar Cloud Cache and/or join them with data directly in Snowflake.
Visit Dremio Hub to learn how can you leverage this and other connectors in your organizations.