The Dremio Blog

Dremio Blog: Open Data Insights

Dremio Blog: Open Data Insights

Disaster Recovery for Apache Iceberg Tables – Restoring from Backup and Getting Back Online

Unlike traditional databases, Iceberg doesn’t bundle storage, metadata, and catalog into a single system. Instead, it gives you flexibility—with the tradeoff that restoring from a backup requires understanding how those components fit together:

Alex Merced
Dremio Blog: Open Data Insights

Demystifying Apache Iceberg Table Services – What They Are and Why They Matter

While the table spec and catalog spec laid the groundwork for interoperability and governance, it’s Table Services that will determine whether your Iceberg tables thrive or degrade in the real world. They’re the unseen engine room that keeps data performant, cost-effective, and reliable—especially at scale.

Alex Merced
Dremio Blog: Open Data Insights

What is the Model Context Protocol (MCP) and Why It Matters for AI Applications

The Model Context Protocol is quietly reshaping how we build with language models — not by making the models smarter, but by making their environments smarter.

Alex Merced
Dremio Blog: Open Data Insights

Securing Your Apache Iceberg Data Lakehouse

In conclusion, securing an Apache Iceberg lakehouse demands a holistic strategy that encompasses multiple layers of control. By implementing robust security measures at the object storage level, such as encryption and access restrictions, organizations can protect the raw data.

Emre Saglam
Dremio Blog: Open Data Insights

The Future of Apache Polaris (Incubating)

The Apache Polaris roadmap lays out an ambitious vision for the project, balancing core functionality, governance, security, and interoperability while staying true to its open-source roots. As Polaris evolves, its flexibility, community-driven approach, and commitment to quality will ensure it meets the growing demands of modern data ecosystems.

Alex Merced
Dremio Blog: Open Data Insights

Using Helm with Kubernetes: A Guide to Helm Charts and Their Implementation

Helm is an essential tool for Kubernetes administrators and DevOps teams looking to optimize deployment workflows. Whether you are deploying simple microservices or complex cloud-native applications, Helm provides the flexibility, automation, and reliability needed to scale efficiently.

Alex Merced
Dremio Blog: Open Data Insights

Governance in the Era of the Data Lakehouse

By leveraging modern tools like dbt, Great Expectations, and Dremio, organizations can implement robust governance frameworks that ensure data is accurate, secure, and accessible. These tools empower teams to enforce quality checks, manage sensitive data in compliance with regulations, secure decentralized data at multiple layers, and provide a centralized semantic layer for consistent access. At the heart of governance is transparency and trust, achieved through data lineage, metadata management, and accountability, enabling stakeholders to confidently rely on their data.

Alex Merced
Dremio Blog: Open Data Insights

Adopting a Hybrid Lakehouse Strategy

A hybrid lakehouse strategy offers the best of both worlds—leveraging the scalability of the cloud and the control of on-premises infrastructures. By addressing the limitations of cloud-only solutions, hybrid lakehouses enable organizations to optimize costs, enhance performance, and ensure robust governance.

Mark Shainman
Dremio Blog: Open Data Insights

Understanding Dremio’s Architecture: A Game-Changing Approach to Data Lakes and Self-Service Analytics

Modern organizations face a common challenge: efficiently analyzing massive datasets stored in data lakes while maintaining performance, cost-effectiveness, and ease of use. The Dremio Architecture Guide provides a comprehensive look at how Dremio's innovative approach solves these challenges through its unified lakehouse platform. Let's explore the key architectural components that make Dremio a transformative solution for modern data analytics.

Andrew Madson
Dremio Blog: Open Data Insights

Maximizing Value: Lowering TCO and Accelerating Time to Insight with a Hybrid Iceberg Lakehouse

For enterprises seeking a smarter approach to data management, the Dremio Hybrid Iceberg Lakehouse provides the tools and architecture needed to succeed—offering both cost savings and faster time to insight in today’s rapidly changing business landscape.

Mark Shainman
Dremio Blog: Open Data Insights

Hands-on with Apache Iceberg Tables using PyIceberg using Nessie and Minio

By following this guide, you now have a local setup that allows you to experiment with Iceberg tables in a flexible and scalable way. Whether you're looking to build a data lakehouse, manage large analytics datasets, or explore the inner workings of Iceberg, this environment provides a solid foundation for further experimentation.

Alex Merced
Dremio Blog: Open Data Insights

The Importance of Versioning in Modern Data Platforms: Catalog Versioning with Nessie vs. Code Versioning with dbt

Catalog versioning with Nessie and code versioning with dbt both serve distinct but complementary purposes. While catalog versioning ensures the integrity and traceability of your data, code versioning ensures the collaborative, flexible development of the SQL code that transforms your data into actionable insights. Using both techniques in tandem provides a robust framework for managing data operations and handling inevitable changes in your data landscape.

Alex Merced
Dremio Blog: Open Data Insights

Introduction to Apache Polaris (incubating) Data Catalog

Incorporating the Polaris Data Catalog into your Data Lakehouse architecture offers a powerful way to enhance data management, improve performance, and streamline data governance. The combination of Polaris's robust metadata management and Iceberg's scalable, efficient table format makes it an ideal solution for organizations looking to optimize their data lakehouse environments.

Alex Merced
Dremio Blog: Open Data Insights

Hybrid Data Lakehouse: Benefits and Architecture Overview

The hybrid data lakehouse represents a significant evolution in data architecture. It combines the strengths of cloud and on-premises environments to deliver a versatile, scalable, and efficient solution for modern data management. Throughout this article, we've explored the key features, benefits, and best practices for implementing a hybrid data lakehouse, highlighting Dremio's role as a central component of this architecture.

Alex Merced
Dremio Blog: Open Data Insights

A Guide to Change Data Capture (CDC) with Apache Iceberg

We'll see that because of Iceberg's metadata, we can efficiently derive table changes, and due to its efficient transaction and tool support, we can process those changes effectively. Although, there are different CDC scenarios so let's cover them.

Alex Merced

1
2
3
…
10
Next Page »