6 minute read · June 18, 2024
Data Sharing of Apache Iceberg tables and other data in the Dremio Lakehouse
Data sharing is becoming increasingly important in the data world. Not all the data we need can be generated in-house, and our data can also be a valuable asset for generating revenue or building strategic partnerships. Leveraging tools that enable data sharing can significantly enhance the value of your data. In this blog, we aim to clarify the different types of data sharing and explore how the Dremio Lakehouse Platform can enhance data sharing capabilities within your data platform.
Types of Data Sharing
We can categorize data sharing into three primary categories:
- Data Marketplaces: These platforms allow you to add your datasets for others to purchase or subscribe to, either for free or for a fee. Popular data marketplaces include Snowflake and AWS.
- Sharing Data with Compute: In this model, you provide access to your data and the necessary compute resources. Users can utilize the data, but you are responsible for the cost of compute and data storage.
- Sharing Data without Compute: Here, users are given access to the data without a preconfigured compute engine. Examples include using Delta Sharing for Delta Lake tables or bringing an Apache Iceberg catalog to your preferred compute engine.
Dremio offers features that support and enhance all three of these data-sharing pathways, enabling seamless and efficient data sharing within your data platform.
How Dremio Facilitates Data Sharing
Regarding data sharing marketplaces, Dremio doesn’t manage its own marketplace but can connect to platforms like S3, AWS Glue, and Snowflake. This allows you to maximize the value of datasets you've purchased by joining them with data you have elsewhere. Additionally, since Dremio integrates Nessie, and with Nessie's upcoming interoperability with Snowflake, it may soon be possible to list datasets curated in Dremio on Snowflake’s Marketplace.
Using Dremio, you can create users and assign them different access levels within your Dremio organization. Users can be granted dataset access and then query these datasets by logging into Dremio or using the REST API, Apache Arrow Flight, or JDBC/ODBC. In this scenario, when they query the data you’ve given them access to, they will use your Dremio cluster to share both your data and compute resources. Alternatively, if using the integrated Dremio Catalogs powered by Nessie, you can grant a user access to the catalog, which they can bring to engines like Apache Spark, Apache Flink, Presto, Trino, and more, using their own compute resources to query tables in the catalog.
In summary, Dremio enables:
- Query Federation: Integrate data from multiple data marketplaces using Dremio’s query federation capabilities.
- Shared Compute: Grant users access to individual datasets and data sources, allowing them to use your Dremio cluster for queries.
- Catalog Access: Provide users access to a Dremio catalog, which they can then query with any supporting tool or library, bringing their own compute resources.
Suppose you are using other catalogs, such as standalone Nessie, Graviton, AWS Glue, and others. In that case, each has its own methods for granting access to share data without compute via Apache Iceberg catalog access.
Conclusion
Dremio offers a versatile and powerful platform for data sharing, whether through integrating with existing data marketplaces, providing shared compute resources, or enabling independent data access via catalogs. By leveraging these capabilities, you can maximize the value of your data, streamline collaboration, and create new opportunities for revenue and partnerships. Dremio’s comprehensive approach to data sharing ensures that you can meet your organization’s needs while maintaining control and governance over your data assets.
Want to explore how to unify, collaborate and share your data with Dremio? Contact Us
Here are Some Exercises for you to See Dremio’s Features at Work on Your Laptop
- Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop
- From SQLServer -> Apache Iceberg -> BI Dashboard
- From MongoDB -> Apache Iceberg -> BI Dashboard
- From Postgres -> Apache Iceberg -> BI Dashboard
- From MySQL -> Apache Iceberg -> BI Dashboard
- From Elasticsearch -> Apache Iceberg -> BI Dashboard
- From Kafka -> Apache Iceberg -> Dremio
Explore Dremio University to learn more about Data Lakehouses and Apache Iceberg and the associated Enterprise Use Cases. You can even learn how to deploy Dremio via Docker and explore these technologies hands-on.