7 minute read · February 3, 2022

Apache Iceberg Becomes Industry Open Standard with Ecosystem Adoption

Mark Lyons · Vice President of Product Management, Dremio

Signs of Apache Iceberg Growth and Adoption

Top 10 Apache Iceberg Contributors and Influencers

How to Get Involved and Learn More about Apache Iceberg

Cloud data lakes are now the go-to architecture for data storage and analytics across organizations of all types and sizes because cloud storage is scalable, easy and inexpensive. Digital experiences are ubiquitous and every company needs to make their data accessible to unlock innovation and offset competitive threats.

Organizations are now able to use cloud data lakes for workloads that traditionally went to data warehouses (such as BI and analytics). This shift is possible in part because of Apache Iceberg, an open source table format that provides many of the same features and capabilities found with traditional databases and data warehouses but within an open, flexible data lake environment.

Apache Iceberg continues to gain mindshare in the data ecosystem because of its well documented, engine-agnostic and open standard. While Apache Parquet is the de facto standard file format to track the rows and columns of data, we need the next layer of abstraction, a table, to track the files so we can efficiently access the minimum data necessary per query. In addition to a better user experience based upon SQL, Apache Iceberg tables also provide atomic transactions, data consistency guarantee, time travel and versioning.

Signs of Apache Iceberg Growth and Adoption

In May 2021, Apache Iceberg emerged from incubation to a top level Apache Software Foundation project.

A project like this requires vast ecosystem adoption to become an industry standard. Let’s look at what has happened in the Iceberg ecosystem over the past few months which makes us at Dremio very bullish on its future:

At re:Invent AWS announced Athena support for Apache Iceberg.
More recently AWS announced EMR support for Apache Iceberg.
Adobe Experience Cloud adopts Apache Iceberg.
Ryan Blue the Creator of Apache Iceberg at Netflix starts Tabular and raises series A funding.
Snowflake announces support for Apache Iceberg external table query.
It is easy to create a data lake based upon Apache Iceberg.
- Getting started with Apache Iceberg using AWS Glue and Dremio
- Creating S3 data lake with Apache Iceberg

Over the past 3 years code additions to the Apache Iceberg Project have increased and there is no signs of this slowing down based upon the recent ecosystem announcements.

Source: https://github.com/apache/iceberg/graphs/code-frequency

Top 10 Apache Iceberg Contributors and Influencers

There is a vibrant community of Apache Iceberg contributors and thought leaders that help drive growth and continuing innovation. Based on GitHub, LinkedIn, and our own research, here is a top 10 list of people we think are worth following on the topic of Apache Iceberg.

Ryan Blue - Tabular.io, previously Netflix
Anton Okolnychyi - Apple
Kyle Bendickson - Tabular.io, previously Apple
Jack Ye - AWS Athena
Openinx - Alibaba
Rusell Spitzer - Apple
Eduard Tudenhöfner - Dremio
Junjie Chen - Tencent
Fokko Driesprong - Datafold
Jun-he - Netflix

How to Get Involved and Learn More about Apache Iceberg

To learn more about Apache Iceberg check out these other resources:

Register for Subsurface LIVE Winter 2022 to hear more from Ryan Blue, the co-creator of Apache Iceberg, as well as other companies contributing to the project, including Uber and Apple.

We have some exciting Iceberg sessions in the agenda, including:

Article Topics

Dremio Blog: Open Data Insights

Apache Iceberg Becomes Industry Open Standard with Ecosystem Adoption

Table of Contents

Signs of Apache Iceberg Growth and Adoption

Top 10 Apache Iceberg Contributors and Influencers

How to Get Involved and Learn More about Apache Iceberg

Ready to Get Started?

Table of Contents

Signs of Apache Iceberg Growth and Adoption

Top 10 Apache Iceberg Contributors and Influencers

How to Get Involved and Learn More about Apache Iceberg

Additional Resources

The Why and How of Using Apache Iceberg on Databricks

Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop

5 Use Cases for the Dremio Lakehouse

Ready to Get Started?