May 2, 2024

The Rise of Generative AI in Analytics: How You Can put AI to Work

Generative AI is top of mind for organizations in every industry. Despite the hype, there are real-world applications for GenAI in data management and analytics. Join us to learn about the transformative role of generative AI in data analytics. We will explore GenAI analytics applications and potential and discuss how to apply AI in your analytics workflows. We will review real-world GenAI use cases and gain insights into how you can leverage Dremio’s advanced Generative AI capabilities.

Topics Covered

AI & Data Science

Sign up to watch all Subsurface 2024 sessions

Speaker

Isha Sharma

Director of Product Management @ Dremio

Video Synopsis

What is Generative AI?
Finding Relevant Data
Creating a Data Product
Catalog The View
Dremio Delivers Autonomous AI-Powered Semantic Layer

Transcript

Note: This transcript was created using speech recognition software. While it has been reviewed by human transcribers, it may contain errors.

Isha Sharma:

We’re going to talk about generative AI because why not? Let’s start with something fun. How many folks in the room can relate to this cartoon? Come on, y’all are lying. Raise your hands. Okay. Exactly. Right? So with the world of generative AI literally changing every couple of days, everybody’s sort of confused, right? Like, is this a buzzword? Is this something we invest in? How much do we invest in it, right? So I’m here to tell you, I don’t think it’s a buzzword. I don’t think it’s just hype. So in this session, we’ll talk about what’s generative AI briefly kind of level set. And then more importantly, how does it fit into the data life cycle and how does Dremio enhance power, some of the workflows that you run into every day using generative AI? And most importantly, why does it matter to you and your users?

What is Generative AI?

So what is generative AI? I know you’ve seen these words, natural language, artificial intelligence. I think most of you know that. And then we kind of add a few other dozen and another dozen that aren’t on this slide. And it gets confusing, right? So I’m not going to read the dictionary definition of gen AI, but here it is. And I think the important part of this is really the part of using data to generate stuff. Generate more. Right? You’re training your models based on stuff that you already have. That’s not the end-all, be-all. But why does that matter? We have had customers yesterday and today talking about having millions of documents that they have to process. Somebody has to read them, whether that be monthly or annually. And that’s tedious. That feels like a lot, right? Millions of documents. Or when there’s an acquisition and you get all of that data, all of those documents from this company, what do you do with it other than manually processing a lot of that, right? So it becomes tedious. And what’s great about generative AI is you get away from a little bit of tedious.

So with regards to the data lifecycle, I think everybody’s familiar with — I mean, there’s so many variations that you can put into this diagram. But we’ll focus on the four that we’ve got here, which is discover, meaning you’re finding, you’re interpreting your data to understand what you’ve got, what you’re going to use to answer your question. You’re preparing your data. That means so many things, but cleaning and wrangling. And then you’re building your data products, as we talked about this morning. And then you operationalize. For me, that mostly means I have to catalog my data, right? So within that, what we’re looking at is Dremio is currently playing in the top line here that you see highlighted with a black box. And we’ll go into how we play in each of these and how the UX that we have is further enhanced by LLMs and some of the generative AI capabilities that we have. Cool.

Finding Relevant Data

So let’s start with discovering data, right? Our traditional methods of finding data used to be clicking through what you know, right? Through that tree that we’ve got in the UI. It also meant using a catalog search. You kind of needed to know your keywords, right? As you’ll hear probably continuously over the next day and a half or so is that your analysts think in terms of columns rather than table names, right? So a table name could be called sales, but am I really going to be searching for that when I’m asking a question? Probably not. And so with semantic search, you start to add a lot of, well, semantics, right? From your wikis, from your labels, from embeddings that are there. And so with a search that’s powered by that metadata, what you end up with is somebody being able to ask a question with natural language and us being able to give you datasets that are relevant not just because of the name matched, but because what’s in that dataset matched what you were looking for, right? So in this example here, you can see it’s probably hard to see from virtual or all the way back there, but somebody is searching for campaigns. And the results that you’re getting aren’t — it’s not because the table is called campaigns. It’s because something in the wiki or something in the columns identified with that. So from a discovery standpoint, it’s all about relevance. And that’s what generative AI is able to help with and power.

And furthermore, like, Iceberg already supports embeddings, right? So you can almost use Iceberg as a vector database today already, right? So what semantic search is going to look like for us is taking those embeddings and enriching your search experience even more, right? So not only are we generating embeddings for you, but you’re able to provide specifics to us as well. And where that comes in handy is, you know, everybody in the industry has got acronyms that are very specific to their industry or their company itself, right? So no AI model is going to know that unless you tell it that. So with things like embeddings in Iceberg, you can say this acronym means this. And so if I search a term that relates to the full form of that acronym, I’m now able to give that dataset back to you, right? So bringing Iceberg embeddings together with semantic search gives you, again, relevant data. And this is coming soon in Dremio. And the point of all of that was you’re spending less time looking for relevant data, right? That discovery period that could go for hours can now be trimmed down to, hopefully, seconds, maybe minutes.

Creating a Data Product

All right. So let’s move on to the next step. You found your data, now what? Let’s go ahead and create a data product. You as an analyst, maybe SQL is not the strongest language for you. Now you don’t need that. With natural language to SQL, all you have to do is know your question, right? Your thinking in terms of columns, like we said. And text to SQL will go as far as not just giving you a SQL statement. It’s not just a SELECT * FROM, right? This could end up being a JOIN across multiple tables. You could even say — actually, I did that for, you know, a couple things that I was working on. I said, hey, can you create me a view that has this type of information? And it went ahead and looked across a couple of datasets, figured out what the key was, and created the SQL for that view for me. All I had to do was hit run. And so, again, the point of all of this is you don’t have to be a SQL expert, right? The UX becomes really easy. You’re talking to Dremio in natural language, and we’re producing that SQL for you. Now, if you guys were here earlier for our Vanguard session, Hitesh was talking about is the next step. I’m thinking it, and you’re doing it for me. We haven’t gotten that far yet. We’ll get there eventually. That was already available in Dremio Cloud today.

Catalog The View

And then I know this is everybody’s favorite step. Documenting. Yes? No. Nope. So, personally, for me, I mentioned this morning that I create my own dashboards at Dremio. That means I’m creating my own data products at Dremio. Documenting is probably my least favorite part of this, right? And I know others agree with this as well. But it’s so critical, right? Again, the descriptions that you have, the labels and the tags that you add are critical for powering all the other experiences, right? Research revolves around this. When we’re talking about proper labels, there are so many things that revolve around that as well, right? If you have a label for PII data, that’s a key identifier in sensitive data. And with generative AI and being able to create consistent labels across all the data sets in your semantic layer, you can actually have a lot of value in that, right? So better than four analysts trying to communicate exactly the same thing with some of them put underscores in their labels, some of them put, you know, camel casing, and at the end of the day, you’ve got to, like, normalize all of that. You don’t have to do that anymore. The generation of labels takes care of that for you. So, less time documenting and cataloging. All you do is click a couple buttons and it’s done for you.

Dremio Delivers Autonomous AI-Powered Semantic Layer

So I told you it was going to be a quick one. To recap, what Dremio is looking to deliver is an autonomous AI-powered semantic layer. And that just means that we want to make it easy for you to discover, explore, build, all of that stuff, right? Whether you are a data engineer, whether you’re an analyst, whether you’re a data scientist, this will power your experience. And what we’ve talked about today, the different features and experiences we’ve talked about today, two of them, so text-to-SQL and generating wikis and labels is available today in Dremio. And our semantic search and automatic syntax correction are coming really soon here. So like I said, when I started this session, generative AI is not just type.

And the why part of this, I promised you that. First of all, faster time to insights. Because you’re spending a lot less time with all of the steps of that data lifecycle, you’re getting to your answers much faster, right? There’s a little less bottlenecking on your analysts where they’re waiting for the data engineering team to say, here, let me help you, right? It’s a little less of that. And they can get to their answers much, much, much faster. Finding relevant data is easier, right? You’re increasing productivity. Again, you’re spending less time on tasks that you would love to automate that are somewhat tedious. Not only that, you’re getting consistency in what’s being built, right? So no more manual, like, let me go fix this.

And last, because you can bring your own models, use Iceberg as a vector database, Dremio continues to be a very open platform, right? We’ve always said that that’s where we focus. And that is where, even with generative AI, we will continue to focus. Using Iceberg for embeddings helps avoid all of the vendor lock-in. It’s almost an open platform plus plus, right? So there you have it. And we’ve gone through this kind of fast. And there’s so much more to it. I gave you the current state of things today to learn more about Dremio’s perspective on what’s to come. Listen to the keynote tomorrow morning, and you’ll get all of that. And that’s it. That’s for me. Thank you.