Hiveberg: Integrating Apache Iceberg with the Hive Metastore

Apache Iceberg is an open table format that can be used for huge (petabyte scale) datasets. This talk will give an overview of Iceberg and its many attractive features such as time travel, improved performance, snapshot isolation, schema evolution and partition spec evolution. We’ll then discuss how Iceberg can be used inside an organisation such as Expedia Group to power next-generation data lake technology. One of the challenges of moving to a new table format for an organisation that already has a significant investment in existing technologies (in our case Hive and, specifically, the Hive metastore) is to prevent data silos from forming, where data generated in the new format can’t be used by others who haven’t switched to it yet. We’ll discuss the solution we came up with, Hiveberg, which opens up a path to read Iceberg tables from Hive (and thus any tooling that supports Hive). This allows more advanced users to take advantage of the features of Iceberg when creating data but still allows this data to be widely read and used by others.

Topics Covered

Hive Metastore
Metastores
Table Formats
Unlocking Potential with Apache Iceberg
get started

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now
demo on demand

See Dremio in Action

Not ready to get started today? See the platform in action.

Watch Demo
talk expert

Talk to an Expert

Not sure where to start? Get your questions answered fast.

Contact Us

Ready to Get Started?

Bring your users closer to the data with organization-wide self-service analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run Dremio anywhere with self-managed software or Dremio Cloud.