May 3, 2024

ABC’s of ABC Supply

Join us for the ABC’s of ABC Supply, where we will discuss our architecture (A) using Databricks and Dremio, business intelligence (B) with Tableau, and consumer (C) empowerment using Dremio ‘spaces’ to organize our data by domain as well as accelerate development via ‘common metrics’. Learn how these components drive data-driven decision-making at ABC Supply.

Sign up to watch all Subsurface 2024 sessions

Speakers

Rajkumar Magatala

Data Engineering Manager, ABC Supply

Stephen Brehm

Manager, Business Intelligence, ABC Supply

Transcript

Note: This transcript was created using speech recognition software. While it has been reviewed by human transcribers, it may contain errors.

Stephen Brehm:

So, my name is Stephen Brehm and I’m presenting today with Rajkumar. Together we support the data and visualization needs of ABC Supply. I’ll start by telling you a little bit about our company to give you a sense of our humble beginnings and our growth. We were founded in 1982 with three stores and a national support center in Beloit, Wisconsin. In 2016, ABC Supply acquired L&W Supply, which added some additional interior building products like drywall, steel studs, and ceiling tile, and we retained their location in downtown Chicago. So, now our field has two national support centers, one in Chicago and one in Beloit.

Growth of ABC Supply

In 2022, we celebrated our 40 year anniversary. So today, ABC is the largest privately held wholesale distributor of exterior and interior products in North America, building products, and we are now growing into Canada. We have over 970 locations, 20,000 associates, and over $20 billion in annual revenue.

So let’s take a look at our product mix. So, you can see from these pictures, as I mentioned, roofing, you can also see windows and doors, decking, so these are the sorts of building products that we provide. And to take a deeper dive, you can look at our companies. So I mentioned that we acquired L&W Supply in 2016, and that’s where we get our drywall, steel studs, and ceiling tile. And then we have Narandex, where we get siding, ACM is for gutters, Mulehide products, roofing, and then Town & Country for pool and patio, windows and doors, gutters, and siding. And then we have our own ABC Supply catalog.

And I mentioned our locations and that we’re growing into Canada. So you can see from this map, on the lower section are over 980 locations. And then toward the top of the map, in that green color, is our 24 locations that we have thus far in Canada. So it’s an exciting time to be a part of ABC Supply.

And these are our specialty trucks for on-site delivery. These trucks are actually configured and customized in Beloit, Wisconsin, at that National Support Center. And you can see that they’re customized on the exterior from these photos. So we not only get the building products to the job site, but also get them off the truck and where they need to be at the job site location. And these trucks are also customized on the interior with cameras and other smart technology to keep our drivers safe and compliant.

Now that you have a sense of our growth and what we do as a company, Raj and I will showcase for you the data architecture and business intelligence tools we leverage to support this exponential growth and the internal customers we support with these technologies. These are our ABCs. I’ll let Raj kick off the A of ABC with a discussion on our data architecture.

Rajkumar Magatala:

A is for Architecture

Thank you, Stephen. I’m Raj. I’m the data engineering manager at ABC Supply. We’ll first start with, we’ll talk about AFABC, the architecture. This is our primary analytics ecosystem. This platform started six years back. We have been enhancing that further. That’s where you see the new tools as Dremio comes into the picture. We’ll talk more on that later. We are primarily an Azure shop. It’s a LACOS architecture. We primarily have two layers in our data lake. We call them as RAN transformed. Any data that’s coming in from source as is, we directly push them onto RAN layer. We create one single source of truth for any source tables that we get, aggregating into all the incremental chunks we got from the source and load it to a transformed layer along with all the other curated data sets.

We use Azure Data Factory for running all our pipelines. It serves as an orchestration tool for our nightly jobs that runs primarily all our loads and batch jobs at the moment. Databricks is a primary tool that we use for data transformation. It provides a single platform to analyze, process, and save data for reuse by others. Build an Apache Spark, it can process many different forms of data and has been very invaluable in being to scale our solutions to the enterprise and take part in the company’s growth.

Once the data sets are created, we load them all onto Snowflake. Primarily the EDW data sets that were created, we load them onto Snowflake for any of our self-service users who are using Oracle Analytics or other VA reporting. Also, this data will be accessed through Dremio extracts for any Tableau dashboards that are being published. Our data science team also have access to this data to run their algorithms using Karobe or Python. With that, we’ll talk with this robust architecture that we have built over the last six years. We were able to achieve tremendous growth in our analytics platform. Today, we pull in data from about 70 plus sources. Every day, we create 100 plus data sets, which are used by all the 1,200 users per day. Our total data lake size is about 300 terabytes.

ABC Data Ingestion Framework

So with that, I want to dive into a little bit more into the architecture. When we started this journey, our primary goal was to build a platform which is flexible to work with irrespective of any type of source that we have, which is consistent, provides a consistent file format, persistent location of data for any downstream references. That means we ended up creating a schema-less architecture for all of our data ingestion. So as long as the artifact is present in the source, we pull it, doesn’t matter the structure changes. We also have a persistent archival of data for trends and look back capabilities, right? So with this, we pull data today about 70 plus different sources loaded to one single format that’s used by all of those teams and it is all secured in our Azure data lake.

Next, once we have all of the data in, the next bigger step was to create our enterprise data sets, right? So that means our goal was to create something which is everyday analytics study, right? So today, our enterprise data sets are highly curated data sets, which have billions of rows with hundreds of attributes. When we started all of this exercise on creating these enterprise data sets, we followed the BEAM model outlined in the Azure Data Warehouse Design book, which BEAM stands for Business Event Analysis Modeling. This approach, we worked with experts in the business process, along with our data modelers to create models from us, data profiling of all the data that we’re getting from source in order to derive our dimensional data models, all the ETL specifications, any requirements that we need to incorporate the business logic, create the data dictionary, source to target mappings, even before we start to do our development, right? So as our whole STLC process, this process takes about 60% of the time, figuring all of that out even before we start the development. The goal was to create a robust enterprise data set, which is analytics ready with all the business requirements that are in there and provide reporting across enterprise.

Our Process

Now that we have all the requirements in place, we know what business is asking for, we know what they’re looking for, the next step for us was, we wanted a process that can create multiple data sets. We didn’t want to create multiple processes, we wanted a single process, which is metadata driven that can help create a concurrent work streams, run a repeatable process by just passing the metadata, and that’s what we ended up creating. For an example, I’ve depicted here how all of our dimension table creation process works every single day, right? So this is fully, as I said, metadata driven, all we need to is pass on what source data extract have to be run. So this process primarily, if you take first step, it extracts all the source data that needs hash up just to figure out what changes went in, create the hashes that are needed. And then update the actual dimensions. This process is also fully automated, just by configuring what postscripts have to be run. And then it’s also included with all the built in data validations.

So let’s say we have a new data set, we create the corresponding extracts, just go configure the metadata, it’s ready to go. With this approach, every day, we run about six different facts, like sales, inventory, purchase orders, and about 20 different dimensional data that’s needed for building all of this and useful by business, which are analytic study. So that was a goal when we started it, and that’s what we built and this process serves business every day.

Information Ready

Now that you got a little bit more understanding of how do we load data, how do we aggregate data, how do we build our data sets and what do we use to run our pipelines, create our notebooks, the next primary task that we took over was how can we get this data ready every single day with all the proper dependencies without any fail. So the glue that we created that we call as pipeline service, this is a 100% custom built solution at ABC. So this basically, again, is all metadata driven again, we primarily go configure which job we have to run after which, and it takes the corresponding dependencies, run them together. This helps us run jobs on time with proper dependencies, provide our analytics for the business every day on time. So as I said, these are all the technologies we use at ABC Supply to do all of this architecture and supply and provide the predictive and prescriptive analytics along with data science team. With that, I’m going to hand over to Stephen to talk a little bit more on our BI tools and the consumers.

Stephen Brehm:

B is for Business Intelligence

Thanks, Raj. Now that you can see all of the data available for ABC to leverage, I will talk about the reporting tools we use and how we organize and access this data. This is the B of our ABCs. So the analytics journey for our report users begins with the analytics hub. So on the left hand side of this screen capture, you can see what the analytics hub looks like. And that little circle in the middle is the analytics hub icon. So people click that icon, and they log in. And once they’ve logged in, their permissions, so based on their login, their permissions will enable them to see just the reports that they’re supposed to have access to. So if there’s any person who, for example, is not enabled or permitted to see the inventory levels report, then that tile would not be there and some other tile might be there instead.

Each of these tiles, you click the picture on the tile, and it takes you to the relevant report. And so this is great. The people who visit the analytics hub do not need to know whether their report is a Tableau report or an Oracle Analytics report. They don’t have to save bookmarks that might become outdated. They just click the tile, and then they’re directed to the relevant report. You know, if that link changed behind the scenes, that would be unbeknownst to them. The next time they click that tile, it’ll still launch the appropriate report. You can see a bell icon. So if we were to introduce a new report, for example, the branch summary, we use that bell icon. So that was new since your last login, you log in, you’d see a notification, and it would announce to you that there was a new report available. And we also use the bell icon to let people know if there’s any issues or alerts that they should be aware of with any of our reports. I won’t spend too much longer, but just a couple other features. The gear icon allows them to customize the analytics hub, to click on a favorite so that it shows up first in the tiles, helps them to navigate to the report more quickly because we have many reports for them. And then the download icon in the upper right-hand corner of each tile leads to a quick guide that lets the business user know what this report is for and how to navigate it.

Dremio Performance

So now I’m going to dive into the data behind these reports. Initially, both Tableau and Oracle Analytics both connected to Snowflake. Presently, our Oracle Analytics reports are still on Snowflake, but most of our Tableau reports are now on top of Dremio, and then Dremio connects to our data lake. So before we introduced Dremio, we were using a live connection from Tableau to Snowflake. And with that, developers were experiencing slowness, timeouts, and lost connections. So they switched from live to an extract. So in their effort to do an extract Tableau directly to Snowflake, that would be where Tableau is pulling the data from Snowflake and caching it into what’s called a hyperfile. With that approach, because we have large multi-year files for our data analytics needs, the extract was either taking too long to create or it was failing. So we needed another solution. And so the data engineering team came up with a great approach. Instead of pulling from Tableau data into Tableau, the approach was to push the data from the data server up to Tableau. And so the data pipeline service that Raj outlined would run. It compiles all the data needed, turns it into or creates a hyperfile, and then using the Tableau API, pushes that hyperfile up to the Tableau server.

So that solution worked. It was a creative workaround. However, with that solution, so where I’m still talking about Tableau connecting directly to Snowflake, but in a more creative way, it created a dependency on our data engineers. And so any time a new Tableau dashboard was being created, we had to work with the data engineering team, get it prioritized, and then the Tableau dashboard development could begin. And the same if we were even just making a modification to an existing Tableau dashboard, but we needed an extra column or a change to our calculation, we’d also have to go back to data engineering before we could start our Tableau development. Oops. There we go.

So with Tableau, the great advantage for the Tableau engineers on my team is increased independence, a reduction in the development effort and time. And the performance of the dashboard is to the level that we expect at ABC Supply. And the people on my team tell me that the Dremio platform is very easy to use.

Dremio Spaces for SDLC

So now let me dive into some of the key features that we enjoy when we use Dremio. I’ll start by looking at Dremio Spaces. So with Dremio Spaces, what we’ve done is we’ve organized all of the domain-related data into virtual data sets by domain. So you can see in this example, associate and branch operations. And so all of the queries related to our associate dashboards are saved as virtual data sets under the associate space, and then similarly for the branch operations space. So this creates a centralized location to find all of the data needed for the various dashboards in each of those data domains. In the same space, you can see in this picture for branch operations I gave as an example, there’s also a lab and a promoted folder. So this supports our software development lifecycle.

So development starts in lab, and once that SQL code is ready for primetime, it’s been fully tested, then that is copied to promoted, and then our Tableau production-ready dashboards will only point to the promoted virtual data sets in that folder. And that way, development features, bug fixes can continue on in lab without affecting Tableau in production. And then the cycle continues once lab is ready again, it gets copied to promoted, et cetera. So that’s our use of spaces and those lab and promoted folders. You might notice an extra folder there in that screen capture, common metrics. So in addition to using spaces to organize our data into domains, we also use spaces to provide a common repository of shareable code that can be made available to any dashboard. So BI developers can simply reference the common metrics folder with its VDS files. In this case, you can see the date helper, daily sales, and daily units have some common metrics, and they just reference that in their code, so they might be storing a VDS file in branch operations, but within that VDS file, it will refer to common metrics. So it’ll query that space. So this, because it’s shared code, and it can be shared across the domains, nobody has to rewrite the logic within each domain, and so this further accelerates our turnaround time.

So we talked about Dremio performance and it being faster, and so that, and no longer depending on data engineering, that was an acceleration of our turnaround time. And then using these common metrics further accelerates our turnaround time. I’ll give a specific example of a common metric, the date helper. So rather than each individual VDS file have to calculate last business day, last true business day, month end, et cetera, these date anchors, they can just simply get by using common metrics. Another quick example, sales. So what is our definition of sales? Does it include freight or does it not include freight, et cetera? That’s been established as a single source of truth in the daily sales common metric, and then that can be shared across the various VDS files in the domain folders so that they get the correct single source of truth for those values.

C is for Customer

So C is for customer, Dremio has two target customers, our BI team, which I’ve been describing, and our self-service environment, which is prepared for our data enabling community. So that community has access to Oracle Analytics for self-service, Tableau for self-service, and then recently we introduced some of them to Dremio for self-service so they can directly do some ad hoc queries. So between these two customers, they are the C of our ABCs, that’s okay.

For our BI Tableau developers, they use Dremio to provide them with reliable access to well-organized data, and for our business analyst self-service users, Dremio provides easily accessible and understandable data sets for their ad hoc analysis needs. Dremio also enables us to use active directory groups to govern who can access which spaces so that both our BI development and BA self-service users can only access what they need. These are the ABCs of ABC supply, the letter blocks that have supported our growth for over 40 years.