h4h4h4h4h4h4h4h4

5 minute read · August 27, 2024

What’s New in Dremio: Automatic Iceberg Data Ingestion with Auto Ingest Pipelines 

Mark Shainman

Mark Shainman · Principal Product Marketing Manager

Dremio continues to innovate and enhance the capabilities of Data Lakehouse environments with its latest feature, Auto Ingest Pipelines for Iceberg tables. This cutting-edge functionality for both Dremio Enterprise Software and Dremio Cloud changes the way organizations handle data ingestion from Amazon S3 into Iceberg tables in  Lakehouse environments.

What is Automatic Iceberg Data Ingestion?

Dremio’s automatic Iceberg data ingestion with Auto Ingest Pipelines is Dremio's latest feature that simplifies the loading of data from Amazon S3 into Iceberg tables. This feature automates and streamlines the data ingestion process, thereby reducing the complexity of managing data pipelines and ensuring that Iceberg tables are continuously updated with fresh data.

Functionality and Features

The primary function of Auto Ingest Pipelines  is to automate event driven data ingestion from cloud storage sources such as AWS S3.  It is capable of handling diverse data types and scaling according to the volume of data, while providing near real-time processing capabilities. Key features of this new Dremio functionality includes:

Notification-Based Auto Ingestion from S3:

AutoIngest Pipelines following the cloud native event driven architecture. For AWS S3 -Dremios first source for Auto Ingest Pipelines - Dremio leverages native S3 event notifications to load the latest files into Iceberg as they arrive.  This ensures that any changes in the data are reflected in the Iceberg tables, maintaining up-to-date and accurate data for analysis

File Deduplication: 

A common issue with event driven architectures is ensuring exactly once write semantics. This means that if an event is fired more than once, downstream systems need to ensure that the data is only written once. In Dremio’s AutoIngest Pipelines, we have built this for users by checking the current files to be loaded against a set of files previously loaded, ensuring that a pipe only loads each file once. 

Benefits 

AutoIngest Pipes offer numerous benefits, making it an invaluable feature for data-driven organizations . 

Efficiency and Reliability:

The automation of the data ingestion process reduces the need for manual intervention, thereby lowering the risk of errors. This leads to more efficient and reliable data processing, saving time and resources.

Scalability:

This feature effortlessly handles large volumes of data, scaling according to the needs of the organization. Whether the data volume is small or enormous, the ingestion process adapts to ensure smooth operations.

Reduced Data Latency:

Automatic Iceberg Data Ingestion ensures that data is ingested and available for analysis in real-time. This significantly reduces the time to insight, enabling organizations to make quicker, data-driven decisions.

Enhanced Data Analytics Performance:

By continuously appending Iceberg tables with fresh data, organizations can derive faster insights and make informed decisions. This leads to improved performance in data analytics, driving better outcomes across various business functions.

Reduced Cost:

.As part of Dremio’s high-performance, scalable data lakehouse engine,DAuto Ingest Pipes eliminate the operational costs associated with building and maintaining event-driven pipelines. By automating the ingestion process, organizations can reduce the complexity and overhead of these tasks. Companies now have the ability to rapidly and effectively ingest data into the lakehouse, where it can be analyzed. This allows data scientists and engineers to query data directly from the data lake, eliminating the need for extra data movement and data extracts. This further simplifies the management and maintenance of data pipelines in an analytical environment, resulting in faster, more reliable data access and enhanced performance, thereby improving total cost of ownership (TCO) and reducing overall costs.

Conclusion

Auto Ingest Pipes, is a new feature from Dremio that simplifies and automates the data ingestion process. By ensuring continuous updates of Iceberg tables with fresh data, this feature enhances data analytics performance and reduces the complexity of managing data pipelines. With robust security measures and seamless integration with data lakehouse environments, Dremio's Auto Ingest Pipes will change the way organizations handle data ingestion in lakehouse environments, driving faster insights and more informed decision-making.

Ready to Get Started?

Enable the business to create and consume data products powered by Apache Iceberg, accelerating AI and analytics initiatives and dramatically reducing costs.