14 minute read · January 18, 2024
The Who, What, and Why of Data Products
· Senior Tech Evangelist, Dremio
The term "data products" has become increasingly prevalent today, especially concerning the growing trend of data mesh. Data products are often associated with cutting-edge data-driven strategies to break up massive centralized curation of data, previously treating data as a byproduct of operations and moving to shift thinking about curating data as a series of individual offerings that serve different value propositions. However, defining precisely what a data product is can seem like chasing a moving target. As data technologies advance and businesses find new ways to leverage information, the concept of a data product continually adapts and expands.
In this exploration, we will unravel the multifaceted world of data products by dissecting their two fundamental components: "data" and "product." We will navigate the terrain where data transforms into a tangible asset, where insights become actionable, and where the role of a data product manager takes center stage in shaping the data-driven future. Let's embark on this quest to demystify what data products are and understand how they are conceived, nurtured, and ultimately delivered to benefit organizations and their end users.
So, fasten your seatbelts, for we are about to embark on a voyage into the realm of data products — where data meets innovation and product management principles guide the way forward.
Breaking Down the Word "Data"
At its core, a data product revolves around "data." But what exactly is data in this context, and why is it so crucial? We must dissect this fundamental component to understand the essence of data products.
Data and Data Quality
Data, in its broadest sense, is information. It encompasses facts, figures, measurements, and observations, all of which can be recorded and analyzed. Data takes various forms, structured and unstructured, in data products.
Regardless of its form, data quality is paramount when building data products. High-quality data is accurate, reliable, complete, and up to date. Data quality can lead to correct insights or flawed decision-making depending on how effectively it is executed.
Breaking Down "Product" and Product Characteristics
Now that we've dissected the essence of "data" in data products, let's focus on the equally vital component: "product." In the context of data products, what exactly constitutes a product, and what characteristics define it? To answer these questions, we must explore the concept of a product and the key traits that make it meaningful.
Understanding the Concept of a Product
A product is created to fulfill a need or desire in its simplest form. It can be tangible or intangible, but it always serves a purpose. In data products, this concept extends to offerings that package a particular dataset (domain-based) for use in analytics, machine learning, application development, and more.
Key Characteristics of a Product
The transition from a mere collection of data or a tool to a full-fledged product is marked by several defining characteristics. When it comes to products, these traits are not only crucial in themselves but also require clear accountability. A designated owner/manager should ensure these characteristics are present, continuously upheld, and enhanced.
Usability: A data product's usability is paramount. It must be designed to be user-friendly and accessible and tailored to meet the needs and understanding of its intended audience, which can range from analysts and decision-makers to end users.
Ownership: Clear ownership is essential for the success of a data product. The owner plays a pivotal role in ensuring that the data product meets initial requirements and continues to evolve and improve over time.
Value: At the heart of a data product is its value proposition. The product must deliver tangible benefits to its users, which, in the context of data products, translates to providing clean, validated, and curated data in a format that users can easily interpret and utilize.
Maintenance: Maintenance is a critical aspect of a data product’s lifecycle. It involves not just the upkeep of the product but also its enhancement over time. The data must be consistently validated and cleaned to ensure its accuracy and usefulness. This requires a proactive approach to maintaining the data product, with regular assessments and updates to keep it relevant and effective.
Scalability: A well-designed data product must be scalable and capable of handling growing demands regarding data volume, user traffic, and computational needs. Scalability involves carefully planning and implementing data pipelines, storage solutions, and computational resources. This foresight in scalability ensures that the data product can adapt to growing needs without compromising performance or usability, making it a robust and long-lasting solution for users.
In summary, the presence of a clear owner for a product is essential for maintaining its usability, value, and scalability. This accountability ensures the product meets its initial goals and evolves effectively to meet future challenges and opportunities.
The Role of Product Management
Effective product management is crucial in the development and evolution of products. Product managers (PMs) are the product owners who are central in defining the product's vision, strategy, and roadmap, ensuring it’s usable, valuable, and scalable. They advocate for users, ensuring the product aligns with user needs and business objectives. PMs also oversee the product's lifecycle, from concept to delivery. Similarly, each data product needs a clearly defined manager/owner to help ensure the product's quality and timely delivery.
Principles of Product Management
Several principles guide successful product management that can be applied to the development of data products:
User-centric design: A user-centered approach involves empathizing with the end users to understand their pain points and needs. Products should be designed with the user in mind, ensuring they effectively address real-world problems. A data product manager should take the time to interview different stakeholders to understand their needs and ensure the correct data in the proper forms is included in the data product.
Iterative development: Agile development methodologies, such as Scrum or Kanban, are often employed in product development. These methodologies emphasize iterative development cycles, testing, and feedback, allowing products to evolve and improve. Data product managers can use similar strategies to iteratively develop their products from a “minimum viable product” to a robust offering for varied use cases.
Clear goals and metrics: Defining clear, measurable goals and key performance indicators (KPIs) is essential. These metrics help assess the product's success and guide ongoing improvements. Having clearly defined data freshness, data retention, and other requirements can help a data manager be more successful in working with their teams to build these products.
Defining a Data Product
Bringing it all together, a data product is a digital offering that packages data to deliver specific value, insights, or functionality to its users. We can apply the following characteristics to a data product:
Clearly defined scope and purpose: A data product has a well-defined scope and purpose. It's not a haphazard collection of data or tools but a carefully crafted solution designed to address a particular need or solve a specific problem. This clarity ensures that the product remains focused on its intended goals.
A data product manager: Every data product benefits from having a dedicated PM or owner. The role of the PM is to champion the product's vision, strategy, and roadmap. They serve as the bridge between users and the data engineering team crafting the data product, ensuring user needs are understood and met. The PM is pivotal in guiding the product's evolution, from its initial concept to ongoing improvements.
Clear data retention, data quality, and data freshness requirements: Data is the lifeblood of a data product, and as such, it must be managed with precision. Data products have well-defined requirements for data retention, quality, and freshness. Data retention defines how long historical data is kept, while data quality ensures that the data is accurate, complete, and reliable. Data freshness specifies how up to date the data should be to provide meaningful insights. These requirements ensure that the data remains valuable and trustworthy for users.
4. An easy method of use and delivery: Accessibility and usability are paramount for a data product. Authorized users should be able to interact with the product easily, extracting insights or utilizing its functionality without encountering barriers. This ease of use extends to the delivery method, ensuring that users can access the product through intuitive interfaces or platforms, whether a web application, API, or other means.
A data product is a purpose-driven digital offering that relies on data to deliver value. It has a clear scope, dedicated product management, well-defined data management requirements, and user-friendly delivery methods. Understanding these characteristics is critical to creating and managing effective data products that meet the needs of both organizations and their users.
How to Create a Data Product with Dremio
Let’s explore how Dremio, a data lakehouse platform, is an ideal solution for curating data products.
Robust Data Federation
Dremio offers a powerful feature that allows you to connect various data sources, including data lakes, databases, and data warehouses, and curate all the necessary data for a data product in one centralized place. This capability streamlines data integration, eliminating the need to hop between multiple tools and platforms to access data. With Dremio's data connectivity capabilities, you can bring together disparate data sources, making the creation of comprehensive and valuable data products easier.
Dremio's Semantic Layer for Data Curation
One of Dremio's standout features is its semantic layer, which easily curates multiple data products. The semantic layer provides a unified view of your data, allowing you to create virtual datasets, define transformations, and document metadata. This simplifies data curation and enhances data governance and documentation efforts. Additionally, Dremio's granular access control features allow you to implement role-based, column-level, and row-level access rules, ensuring that sensitive data is appropriately protected while still accessible to authorized users.
Using Dremio's semantic layer, data product managers can collaborate with data engineers and analysts to curate, document, and govern data efficiently. This collaborative approach enhances the quality and reliability of data products.
Simplified Data Sharing
Sharing data products with internal and external stakeholders is a crucial aspect of creating data products. Dremio offers multiple mechanisms to facilitate data sharing. You can create user accounts and grant them access rights to specific datasets and folders within Dremio, connect to other Dremio instances with the Dremio-to-Dremio connector, and share Iceberg tables between different tools that support Nessie-based catalogs like Apache Spark and Apache Flink. This fine-grained access control ensures that data is shared securely and complies with data governance policies.
With Dremio, you can organize your data by creating separate spaces/folders for each data product. By granting data product managers the necessary permissions for their respective folders, they can curate, document, and govern the data specific to their products. This compartmentalization simplifies the management of data products within the organization. Dremio catalogs benefit from a host of data lakehouse management features that leverage versioning for managing data quality and portability to bring the data to where you need it for your preferred use cases.
Conclusion
Data products play a pivotal role in the evolving landscape of data trends, particularly in the realm of advanced AI and large language models (LLMs), where the quality and integrity of data are paramount. In this context, the principles of a data product — emphasizing clear accountability, manageability, and usability — are crucial. These principles align seamlessly with the emerging trend of DataOps, which prioritizes practices like observability and versioning to ensure data's continuous quality and reliability. The concept of a data mesh, with data products as a fundamental component, further enhances this ecosystem. It facilitates a domain-oriented approach to data management, enabling teams to work with data more effectively and responsively.
In conclusion, Dremio offers a robust platform for creating data products by simplifying data integration, providing a semantic layer for data curation, and enabling secure data sharing. Whether you're curating data for a single product or managing multiple data products, Dremio's features can streamline the process and enhance collaboration among data professionals, ultimately leading to the successful creation of valuable data products.