Deep Dive into Iceberg SQL Extensions
Apache Iceberg is an open table format that allows data engineers and data scientists to build reliable and efficient data lakes with features that are normally present only in data warehouses. The project allows companies to substantially simplify their current data lake use cases as well as to unlock fundamentally new ones.This talk will focus on the Iceberg SQL extensions, a recent development in the Iceberg community to efficiently manage tables through SQL. In particular, this session will cover how to snapshot/migrate an existing Hive or Spark table, perform table maintenance, and optimize metadata and data to fully benefit from Iceberg’s rich feature set. In addition, the presentation will cover common pitfalls of running and managing Iceberg tables with tens of millions of files in production and how they can be addressed using SQL extensions.
Topics Covered