h2

6 minute read · June 18, 2024

Data Sharing of Apache Iceberg tables and other data in the Dremio Lakehouse

Data sharing is becoming increasingly important in the data world. Not all the data we need can be generated in-house, and our data can also be a valuable asset for generating revenue or building strategic partnerships. Leveraging tools that enable data sharing can significantly enhance the value of your data. In this blog, we aim to clarify the different types of data sharing and explore how the Dremio Lakehouse Platform can enhance data sharing capabilities within your data platform.

Types of Data Sharing

We can categorize data sharing into three primary categories:

Dremio offers features that support and enhance all three of these data-sharing pathways, enabling seamless and efficient data sharing within your data platform.

How Dremio Facilitates Data Sharing

Regarding data sharing marketplaces, Dremio doesn’t manage its own marketplace but can connect to platforms like S3, AWS Glue, and Snowflake. This allows you to maximize the value of datasets you've purchased by joining them with data you have elsewhere. Additionally, since Dremio integrates Nessie, and with Nessie's upcoming interoperability with Snowflake, it may soon be possible to list datasets curated in Dremio on Snowflake’s Marketplace.

Using Dremio, you can create users and assign them different access levels within your Dremio organization. Users can be granted dataset access and then query these datasets by logging into Dremio or using the REST API, Apache Arrow Flight, or JDBC/ODBC. In this scenario, when they query the data you’ve given them access to, they will use your Dremio cluster to share both your data and compute resources. Alternatively, if using the integrated Dremio Catalogs powered by Nessie, you can grant a user access to the catalog, which they can bring to engines like Apache Spark, Apache Flink, Presto, Trino, and more, using their own compute resources to query tables in the catalog.

In summary, Dremio enables:

  • Query Federation: Integrate data from multiple data marketplaces using Dremio’s query federation capabilities.
  • Shared Compute: Grant users access to individual datasets and data sources, allowing them to use your Dremio cluster for queries.
  • Catalog Access: Provide users access to a Dremio catalog, which they can then query with any supporting tool or library, bringing their own compute resources.

Suppose you are using other catalogs, such as standalone Nessie, Graviton, AWS Glue, and others. In that case, each has its own methods for granting access to share data without compute via Apache Iceberg catalog access.

Conclusion

Dremio offers a versatile and powerful platform for data sharing, whether through integrating with existing data marketplaces, providing shared compute resources, or enabling independent data access via catalogs. By leveraging these capabilities, you can maximize the value of your data, streamline collaboration, and create new opportunities for revenue and partnerships. Dremio’s comprehensive approach to data sharing ensures that you can meet your organization’s needs while maintaining control and governance over your data assets.

Want to explore how to unify, collaborate and share your data with Dremio? Contact Us

Here are Some Exercises for you to See Dremio’s Features at Work on Your Laptop

Explore Dremio University to learn more about Data Lakehouses and Apache Iceberg and the associated Enterprise Use Cases. You can even learn how to deploy Dremio via Docker and explore these technologies hands-on. 

Ready to Get Started?

Enable the business to create and consume data products powered by Apache Iceberg, accelerating AI and analytics initiatives and dramatically reducing costs.