Introduction
The amount of information that industries must keep safe is ever-increasing due to how easy it is to collect data from customers, patients, employees, etc. Now more than ever, it is very critical that we ensure that data security and privacy remain a priority to protect against expensive threats.
Dremio provides a powerful and flexible set of security features that integrate with the controls deployed across enterprise systems, and provides additional capabilities for masking and uniform, fine-grained security policies no matter where the data is managed.
In this tutorial we will walk through the steps of dynamically masking data so it is only visible by those who have been authorized. We will use a dataset that contains social security numbers and salary information, we will use Dremio to mask this data, and then we will try to visualize the data using a BI client.
Prerequisites
To get the most of this tutorial, we encourage you to complete getting oriented to Dremio and working with your first dataset tutorials. It is also important to note that the feature that we are about to demonstrate is available only on the Enterprise Edition version of Dremio.
Loading Unmasked Data Into Dremio
In this tutorial we will work with a predefined dataset that we have already loaded into Dremio and contains the following information:
- Employee ID
- First Name
- Last Name
- Social Security Number
- Credit Card Number
- CC Code
- Phone number
- Hire date
- Job ID
- Salary
- Manager ID
- Department ID
Dealing with Sensitive Data
In this tutorial we will play the role of a data engineer who was requested to curate and provide access to the employee database to certain users. However, the dataset contains several fields which not everyone should have access to. We’ve been requested to identify the sensitive fields and provide access to only team members who belong to the “Accounting” group.
First, let’s explore the data:
There are 3 different fields that contain sensitive data:
- SSN = Social Security Number
- credit_card
- salary
To verify that this data is fully accessible, we will visualize it using a BI client and a generic user account.
Here we can observe that the data has full open access regardless of the user that tries to access it.
Applying Security Policies
As mentioned earlier, in this scenario we’ve been requested to provide full access to the data only for those who belong to the “Accounting” team, anyone else will be able to have access to the data but it will be masked. Let’s see how we can get this done.
Using query_user() or is_member(), Dremio allows us to set up a virtual dataset with selective masking of its columns for different users or groups without having to create multiple datasets.
In this case we will use the following query:
SELECT employee_id, first_name, last_name, CASE WHEN query_user() IN ('[email protected]','dremio') OR is_member('Accounting') THEN ssn ELSE CONCAT('XXXXXXX',SUBSTR(ssn,8,4)) END AS ssn, CASE WHEN query_user() IN ('[email protected]','dremio') OR is_member('Accounting') THEN credit_card ELSE CONCAT('XXXX-XXXX-XXXX-',SUBSTR(credit_card,16,4)) END AS creddit_card, CASE WHEN query_user() IN ('[email protected]','dremio') OR is_member('Accounting') THEN cc_code END AS cc_code, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id FROM emp_data
This query checks to see if the user belongs to the accounting group, if it doesn’t then it will reveal only the last 4 numbers of the SSN field and will show only the last 4 digits of the Credit Card field. Let’s take a look at the results.
Accessing Masked Data From a BI Client
Now that the data has been masked, let’s check how secure is the data when we try to access it from a BI client; we will use Tableau in this scenario.
Click on the Tableau icon on the top toolbar:
Connect Dremio to Tableau using the same credentials you used to log-in into Dremio. Next, try to visualize the data:
We can notice that the masking policy has been successfully applied and it takes effect when we try to access the data from a BI client. In this case the user is not able to see the SSN and Credit Card information because we didn’t log in with [email protected] or we are not part of the ‘Accounting’ LDAP group.
Conclusion
In this tutorial we demonstrated a simple but powerful feature that Dremio provides to data engineers and data consumers who want to make sure they comply with the latest and most rigorous security and privacy measures in the data industry.
We used a generic dataset that contained sensitive data (SSN and CC numbers) and used dynamic masking to define column level permissions on a single dataset; this allowed us to provide different levels of visibility to the data without having to create separate virtual datasets for each one of the user groups who would be working with the same data.
Checkout Dremio’s Security Architecture Guide to learn more about Dremio’s security features and how they work.