Open source, high-performance Apache Iceberg table format has transformed data lake usage and data analytics for good, making traditional data warehouses less appealing, observes Jason Hughes of Dremio.
Amid ever-increasing volumes of data, it’s no secret that enterprises are struggling to get immediate value from that dataOpens a new window – while they simultaneously attempt to put systems in place that can respond to its future uses. What’s on the horizon can be tough to predict. Data platforms must meet this twofold need, and core technology is driving their evolution to do so. Open-source Apache IcebergOpens a new window , a high-performance format for analytic tables, is changing how businesses access data and put it to work, bringing fundamental flexibility to data analytics.
Iceberg enables unimpeded data warehousing performance for the data lake, as traditional data warehouses have become more of an albatross than a lifeboat for businesses seeking cost-effective analytics. Having originated in Netflix engineering, enabling them to treat Amazon S3 as their data warehouse, Iceberg has been a production-ready open-source project used to drive data analytics at companies like Netflix, Adobe, Apple and many others for a long time. In addition to its proven production-readiness, its APIs have also been ensuring compatibility, but its 1.0 release late last year enshrined that compatibility as a guarantee and reinforced its status for production-grade data warehousing and data science use cases. Iceberg has grown at a tremendous rate, with 1,559 pull requests merged in the last 12 months, and the software’s development via the Apache Software Foundation is currently supported by Amazon, Snowflake, Google, Tabular, and Dremio, among others.
Read the full article here.