Hiveberg: Integrating Apache Iceberg with the Hive Metastore

Apache Iceberg is an open table format that can be used for huge (petabyte scale) datasets. This talk will give an overview of Iceberg and its many attractive features such as time travel, improved performance, snapshot isolation, schema evolution and partition spec evolution. We’ll then discuss how Iceberg can be used inside an organisation such as Expedia Group to power next-generation data lake technology. One of the challenges of moving to a new table format for an organisation that already has a significant investment in existing technologies (in our case Hive and, specifically, the Hive metastore) is to prevent data silos from forming, where data generated in the new format can’t be used by others who haven’t switched to it yet. We’ll discuss the solution we came up with, Hiveberg, which opens up a path to read Iceberg tables from Hive (and thus any tooling that supports Hive). This allows more advanced users to take advantage of the features of Iceberg when creating data but still allows this data to be widely read and used by others.

Topics Covered

Hive Metastore
Metastores
Table Formats
Unlocking Potential with Apache Iceberg
get started

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now
demo on demand

See Dremio in Action

Not ready to get started today? See the platform in action.

Watch Demo
talk expert

Talk to an Expert

Not sure where to start? Get your questions answered fast.

Contact Us

Ready to Get Started?

Enable the business to create and consume data products powered by Apache Iceberg, accelerating AI and analytics initiatives and dramatically reducing costs.