Dremio Blog: Open Data Insights
-
Dremio Blog: Open Data Insights
Guide to Maintaining an Apache Iceberg Lakehouse
Maintaining an Apache Iceberg Lakehouse involves strategic optimization and vigilant governance across its core components—storage, data files, table formats, catalogs, and compute engines. Key tasks like partitioning, compaction, and clustering enhance performance, while regular maintenance such as expiring snapshots and removing orphan files helps manage storage and ensures compliance. Effective catalog management, whether through open-source or managed solutions like Dremio's Enterprise Catalog, simplifies data organization and access. Security is fortified with Role-Based Access Control (RBAC) for broad protections and Fine-Grained Access Controls (FGAC) for detailed security, with tools like Dremio enabling consistent enforcement across your data ecosystem. By following these practices, you can build a scalable, efficient, and secure Iceberg Lakehouse tailored to your organization's needs. -
Dremio Blog: Open Data Insights
Apache XTable: Converting Between Apache Iceberg, Delta Lake, and Apache Hudi
Apache XTable offers a way to convert your existing data lakehouse tables to the format of your choice without having to rewrite all of your data. This, along with robust Iceberg DML support from Dremio, offers an additional way to easily migrate to an Apache Iceberg data lakehouse along with the catalog versioning benefits of the Dremio and Nessie catalogs. -
Dremio Blog: Open Data Insights
Migration Guide for Apache Iceberg Lakehouses
Migrating to an Apache Iceberg Lakehouse enhances data infrastructure with cost-efficiency, ease of use, and business value, despite the inherent challenges. By adopting a data lakehouse architecture, you gain benefits like ACID guarantees, time travel, and schema evolution, with Apache Iceberg offering unique advantages. Selecting the right catalog and choosing between in-place or shadow migration approaches, supported by a blue/green strategy, ensures a smooth transition. Tools like Dremio simplify migration, providing a uniform interface between old and new systems, minimizing disruptions and easing change management. Leveraging Dremio's capabilities, such as CTAS and COPY INTO, alongside Apache XTable, ensures an optimized and seamless migration process, maintaining consistent user experience and robust data operations. -
Dremio Blog: Open Data Insights
Getting Hands-on with Snowflake Managed Polaris
In previous blogs, we've discussed understanding Polaris's architecture and getting hands-on with Polaris self-managed OSS; in this article, I hope to show you how to get hands-on with the Snowflake Managed version of Polaris, which is currently in public preview. -
Dremio Blog: Open Data Insights
Getting Hands-on with Polaris OSS, Apache Iceberg and Apache Spark
A crucial component of an Iceberg lakehouse is the catalog, which tracks your tables, making them discoverable by various tools like Dremio, Snowflake, Apache Spark, and more. Recently, a new community-driven open-source catalog named Polaris has emerged at the forefront of open-source Iceberg catalog discussions. -
Dremio Blog: Open Data Insights
Comparing Apache Iceberg to Other Data Lakehouse Solutions
Apache Iceberg is a powerful data lakehouse solution with advanced features, robust performance, and broad compatibility. It addresses many of the challenges associated with traditional data lakes, providing a more efficient and reliable way to manage large datasets. -
Dremio Blog: Open Data Insights
Apache Iceberg Crash Course: What is a Data Lakehouse and a Table Format?
While data lakes democratized data access, they also introduced challenges that hindered their usability compared to traditional systems. The advent of table formats like Apache Iceberg and catalogs like Nessie and Polaris has bridged this gap, enabling the data lakehouse architecture to combine the best of both worlds. -
Dremio Blog: Open Data Insights
Unified Semantic Layer: A Modern Solution for Self-Service Analytics
The demand for flexible and fast data-driven decision-making is critical for modern business strategy. Semantic layers are designed to bridge the gap between complex data structures and business-friendly terminology, enabling self-service analytics. However, traditional approaches often struggle to meet performance and flexibility demands for today’s business insights. This is where a data lakehouse-powered semantic layer […] -
Dremio Blog: Open Data Insights
How Apache Iceberg is Built for Open Optimized Performance
Apache Iceberg's open and extensible design empowers users to achieve optimized query performance while maintaining flexibility and compatibility with a wide range of tools and platforms. Iceberg is indispensable in modern data architectures, driving efficiency, scalability, and cost-effectiveness for data-driven organizations. -
Dremio Blog: Open Data Insights
What is Data Virtualization? What makes an Ideal Data Virtualization Platform?
Dremio's approach removes primary roadblocks to virtualization at scale while maintaining all the governance, agility, and integration benefits. -
Dremio Blog: Open Data Insights
The Nessie Ecosystem and the Reach of Git for Data for Apache Iceberg
The recent adoption of the Apache Iceberg REST catalog specification by Nessie not only broadens its accessibility and usability across different programming environments but also cements its position as a cornerstone in the data architecture landscape. -
Dremio Blog: Open Data Insights
The Evolution of Apache Iceberg Catalogs
Central to the functionality of Apache Iceberg tables is their catalog mechanism, which plays a crucial role in the evolution of how these tables are used and their features are developed. In this article, we will take a deep dive into the topic of Apache Iceberg catalogs. -
Dremio Blog: Open Data Insights
Ingesting Data into Nessie & Apache Iceberg with kafka-connect and querying it with Dremio
This exercise hopefully illustrates that setting up a data pipeline from Kafka to Iceberg and then analyzing that data with Dremio is feasible, straightforward, and highly effective. It showcases how these tools can work in concert to streamline data workflows, reduce the complexity of data systems, and deliver actionable insights directly into the hands of users through reports and dashboards. -
Dremio Blog: Open Data Insights
How Apache Iceberg, Dremio and Lakehouse Architecture can optimize your Cloud Data Platform Costs
By leveraging a lakehouse architecture, organizations can achieve significant savings on storage and compute costs, streamline transformations with virtual modeling, and enhance data accessibility for analysts and scientists. -
Dremio Blog: Open Data Insights
Dremio’s Commitment to being the Ideal Platform for Apache Iceberg Data Lakehouses
Dremio's unwavering commitment to Apache Iceberg is not merely a strategic choice but a reflection of our vision to create an open, flexible, and high-performing data ecosystem. Our deep integration with Apache Iceberg throughout the entire stack complements Dremio's extensive functionality, empowering users to document, organize, and govern their data across diverse sources, including data lakes, data warehouses, relational databases and NoSQL tables. This synergy forms the bedrock of our open platform philosophy, facilitating seamless data accessibility and distribution across the organization.
- « Previous Page
- 1
- 2
- 3
- 4
- …
- 9
- Next Page »