h2h2h2h2h2

7 minute read · June 18, 2024

Introducing Auto Ingest Pipes: Event-Driven ingestion made easy

Casey Karst

Casey Karst · Principal Product Manager, Dremio

We are thrilled to announce the Public Preview of AutoIngest Pipes, a new way to load data into Iceberg tables designed to simplify the development and management of your data loading pipelines. In today's data driven world, the need for efficient, reliable, and scalable data ingestion has never been more critical. Auto Ingest Pipes is here to meet that need with a suite of unique features that set a new standard in data ingestion technology.

Key Features of Auto Ingest Pipes

1. Simple Setup

We understand that getting started with setting up data pipelines can be daunting. That’s why Auto Ingest Pipes is designed with simplicity in mind. Upon creation of an Auto Ingest Pipe, Dremio creates a source specific CLI command to set up the correct event notification mechanism in your cloud of choice. This CLI command is fully transparent and self contained. Spend less time on setup and more time on what matters—analyzing and utilizing your data.

2. File Deduplication

Data duplication can be a significant drain on resources, leading to increased storage costs, inefficiencies, and the risk of data inconsistencies. Auto Ingest Pipes tackles this challenge head-on with its robust file deduplication capabilities. Each Pipe has a default deduplication lookback period of 14 days which provides users with peace of mind knowing that their tables only include clean data. This ensures that only unique files are processed, eliminating redundancies and optimizing storage utilization. 

3. Event-Driven Architecture

Auto Ingest Pipes is built using cloud provider specific best practices to create Event-Driven Data Ingestion Architecture, making it highly responsive and scalable. This architecture allows the system to react to data events in near real-time, providing immediate processing and ingestion of data as it arrives. The event-driven design not only enhances the efficiency of data handling but also enables seamless scalability, ensuring that AutoIngest Pipes can grow with your data needs. Whether you're dealing with intermittent data bursts or continuous data streams, our event-driven approach ensures optimal performance and reliability.

4. Monitoring and Error Handling

Effective monitoring and error handling are crucial for maintaining the integrity of your data pipeline. Auto Ingest Pipes provides new pipe specific system tables - sys.project.copy_file_history and sys.project.pipes - which offer real-time visibility into your data ingestion processes. Track the state of your pipes, identify the load status of individual files, and ensure data flows smoothly. In the event of an error loading a file, the file is skipped from the load thereby preventing failure of the entire pipe. The error message and path of the file are registered in sys.project.copy_file_history for future debugging.

5. Efficient Batching for Resource Utilization

Optimizing resource utilization is key to efficient data processing and maintaining costs. Auto Ingest Pipes features sophisticated batching techniques that consolidate data into manageable chunks, maximizing processing efficiency and minimizing resource consumption. By intelligently grouping data, Auto Ingest Pipes ensures that your systems operate at peak performance while reducing latency and improving overall throughput.

Why Choose Auto Ingest Pipes?

Auto Ingest Pipes is more than just a data ingestion tool; it's a comprehensive solution designed to enhance every aspect of your data processing pipeline. Here’s why it stands out:

  • Enhanced Efficiency: Do more with your limited time by leveraging Auto Ingest pipes to create, manage, and monitor your data ingestion pipelines.
  • Reliability: With exactly-once semantics and file deduplication, you can trust that your data is ingested correctly, without hidden errors or duplications.
  • Scalability: The event-driven architecture allows Auto Ingest Pipes to scale effortlessly, accommodating the growth of your data and the increasing complexity of your ingestion requirements.

Get Started with Auto Ingest Pipes Today

We invite you to experience the ease of use and power of Auto Ingest Pipes for yourself with S3 sources on Dremio Cloud.  Whether you're a  data engineer or an heroic business analyst, Auto Ingest Pipes provides the tools you need to manage your data ingestion with unprecedented efficiency and reliability.

Visit documentation to learn more about Auto Ingest Pipes:

Thank you for choosing us as your trusted partner in data innovation. If you would like to reach out and provide feedback please engage at https://community.dremio.com/c/product-feedback . We look forward to seeing the incredible things you'll achieve with Auto Ingest Pipes.

Ready to Get Started?

Bring your users closer to the data with organization-wide self-service analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run Dremio anywhere with self-managed software or Dremio Cloud.