7 minute read · December 15, 2023
Dremio’s Top 5 Data and Analytics Predictions for 2024
· CMO, Dremio
Could you have predicted a year ago that 2023 would be the year that we ushered in the era of generative AI? In this AI era, the data lakehouse isn’t just a player – it’s a game-changer. While many observers may have been surprised by the sudden rise of generative AI, at Dremio, we introduced generative AI capabilities for seamless data analysis this year.
As we approach the new year, many other topics are also top of mind in the world of data and analytics. Data mesh initiatives are being driven by business leaders. And, Apache Iceberg is seeing rapid growth in the community. While the explosion of generative AI may be dominating the news cycles and social media, we have to wonder if it will have the same massive impact on 2024 and what other trends will surface.
Data Lakehouses will become the primary architecture for analytical workloads
Going into 2023 we predicted that we’d begin to see the decline of the data warehouse and the rise of the data lakehouse. While data warehouses are still being used, we certainly have seen an increase in data warehouse workloads moving to open data lakehouses. We recently released a State of the Data Lakehouse Survey Report that found 69% of respondents said more than half of all their analytics will be on the lakehouse within three years, with 42% of lakehouse data coming from cloud data warehouses. This future adoption is not surprising, given that lakehouses have the potential to significantly cut costs, reduce mean-time-to-insight, and democratize data access. Our prediction is that within the next year, we’ll find that data lakehouses will have surpassed data warehouses as the primary platform for analytical workloads.
Apache Iceberg will become the most adopted table format, surpassing Delta Lake
The lakehouse isn’t the only thing we expect will see soaring adoption rates in 2024, we predict that Apache Iceberg will become the predominant table format. Iceberg is going to be the open format for interoperability and customer choice for flexibility. According to our recent survey report, among those intending to adopt a table format in the next three years, more are choosing Iceberg than any other table format. For us, it comes back to the customer and they want to own their own data - Apache Iceberg is the format for that, and we believe it is going to take over as the open standard in 2024.
DataOps will move from hype to production with implementation of CI/CD, git-inspired data version control, and automated data quality checks
We also believe that we’ll see a whole trend on dataops - meaning, there will be a significant shift where development best practices, traditionally associated with coding, will make their way into data. The goal of this whole movement is around the creation of automated data pipelines that provide clean and managed data products to the business. These automated pipelines need to be easy to create and manage, so data engineers can spend 80% of their time on projects that drive the business forward vs. repetitive manual work and menial tasks. Our survey data (500 data pros and leaders) points out 62% of data engineers' least favorite work was manually merging and reconciling data from multiple sources, repetitive manual processes, and cleaning up raw data.
A pivotal aspect contributing to the dataops movement is our git-inspired data version control. In this approach you treat data with the same version control rigor as code. As you bring in data, you can create branches to inspect and ensure its quality before seamlessly merging it back into the main set, your “data product” for the business. This not only facilitates a structured and controlled data workflow but also empowers you to pinpoint and remedy issues in the process if something were to go wrong, ultimately saving valuable time.
Implementation of Data Mesh pillars become a core requirement for data teams to spur data adoption and improve data quality
In late 2023 a debate was sparked about the viability of data mesh, spurred by Gartner’s Hype Cycle for Data Management. Based on our survey of over 500 data leaders, data mesh is alive and well. Almost every one of the 500 organizations surveyed (97%) expected data mesh implementation to continue to expand in the next year. Based on the data we have, this prediction is sort of a freebie. At Dremio we are continually helping customers implement data mesh principles with our intelligent query engine, unified semantic layer with data federation, fine-grained access controls for governance, and git-inspired data version control to create and manage data products.
Generative AI will be used by data engineers on nearly every one of their projects, improving productivity by 1/3rd
Some would argue that the buzz around generative AI is all hype, but it’s nearly impossible to discuss our predictions for 2024 without examining the impact it’s had on the industry. Dremio CEO, Sendur Sellakumar, says, “Generative AI will be the future of user interfaces, and that all applications will embed generative AI as a way to drive user interaction. Companies are embedding generative AI to solve some of those old data problems, such as semantic searching, data discovery and creating pipelines.”
The foundational aspect of data self-service enabled by lakehouses is essential for AI development. Throughout 2024 we’ll see the continued adoption of data lakehouses making contributions to, not only AI, but to the transformation of data teams across enterprise organizations. We’ll see self-service data products getting the data into the hands of the users as simply as possible, contributing to greater data democratization in 2024 and beyond.
Want to take a deeper dive into our State of the Data Lakehouse Survey Report?
- Download the full report here.
- View our on-demand webinar with TDWI here.
- Watch our on-demand Dremio LIVE event on Trends and Predictions here.
Interested in getting started with Dremio for free? Visit our Getting Started page here.