15 minute read · July 29, 2022
New: Row-Level and Column-Level Access Controls
· Senior Product Manager, Dremio
Introduction
Dremio v16.0 introduced role-based access control (RBAC), which allows you to grant privileges to users for controlling access to objects in your cluster such as tables, views, and folders. For example, you could give Alice the SELECT privilege on a table, which lets her read from that table, or you could give Bob the ALTER privilege on a view, which lets him modify the definition of that view.
RBAC privileges apply to entire objects, but sometimes it is useful to control access at the level of individual rows or columns. To accommodate this requirement, Dremio Cloud and Dremio Software v22.0 introduce fine-grained access control, which allows you to specify column-masking and row-access policies without an external server such as Apache Ranger. You can add, remove, and view policies with SQL commands. You can set row and column policies on tables, views, and columns from most data sources including object storage and database sources. The remainder of this article describes how to create and set column-masking and row-access policies in Dremio.
Specifying Your Policies
Column-masking policies allow you to substitute values in a given column. Row-access policies let you exclude rows before returning a result set to the client. In both cases, you specify your policies as user-defined functions (UDFs), which Dremio uses to control access to rows and columns. UDFs are callable routines that take input parameters and return a single value. Expressing a policy as a UDF allows you to define the logic in one place and then set it on multiple objects. It also permits modifying your policy and having the changes apply everywhere you have set the policy.
You can also list, view, and remove UDFs as described in the blog post: Announcing Scalar User-Defined Functions.
Creating a User-Defined Function
You create a UDF with the following syntax:
CREATE FUNCTION function_name ( [ function_parameter [, ...] ] ) RETURNS { data_type } RETURN { query } function_parameter: parameter_name data_type
Here is an example that returns the product of two numbers:
CREATE FUNCTION multiply (x INT, y INT) RETURNS INT RETURN SELECT x * y; SELECT multiply(2, 3); -- 6
Within the body of a UDF, it can be useful to call QUERY_USER to identify the current user or IS_MEMBER to see if the current user is a member of a given role. In this way you can control access to rows and columns depending on who is running the query. The examples below use these functions.
Column-Masking Policies
Column-masking policies allow you to substitute or transform values in a given column. You typically do this when you want to redact private or sensitive columns before returning them to the client. For example, a policy could return "XXX-XX-XXXX" instead of a social security number, or a policy could return only the last four digits of a credit card number. A UDF serving as column-masking policy must accept and return the same data type as the column it is masking. In addition, a given column can have at most one column-masking policy. Setting the masking policy of a column that already has a masking policy will replace the previous one.
Setting a Column-Masking Policy
After creating a suitable UDF to serve as a column-masking policy, you can use ALTER TABLE … SET MASKING POLICY
on an existing table or CREATE TABLE … MASKING POLICY
when creating a new table.
The example below has a table named employees with a sensitive column ssn_col
. The UDF protect_ssn
replaces the first five characters with Xs unless the current user is Dave or Mike.
-- Create UDF to serve as the column-masking policy. CREATE FUNCTION protect_ssn (val VARCHAR) RETURNS VARCHAR RETURN SELECT CASE WHEN query_user() IN ('dave', 'mike') THEN val ELSE CONCAT('XXX-XX-', SUBSTR(val, 8, 4)) END;
-- Set the UDF as a policy when creating a table. CREATE TABLE $scratch.employees ( name VARCHAR ,ssn_col VARCHAR MASKING POLICY protect_ssn (ssn_col) ,department VARCHAR );
-- You can also set the UDF as a policy when modifying a table. ALTER TABLE $scratch.employees MODIFY COLUMN ssn_col SET MASKING POLICY protect_ssn (ssn_col);
-- Insert some test data. INSERT INTO $scratch.employees VALUES ('Charlie', '592-32-8562', 'Quality Assurance');
-- Notice that the sensitive column has been redacted. SELECT * FROM $scratch.employees; -- Charlie, XXX-XX-8562, Quality Assurance
Unsetting a Column-Masking Policy
To remove a column-masking policy, use ALTER TABLE … UNSET MASKING POLICY
, for example:
ALTER TABLE $scratch.employees MODIFY COLUMN ssn_col UNSET MASKING POLICY protect_ssn
To learn more, see the documentation for Dremio Cloud and Dremio Software.
Row-Access Policies
Row-access policies let you exclude entire rows before returning a result set to the client. For example, a policy could skip rows if the current user is forbidden to see them. A UDF serving as a row-access policy must return a boolean value (true or false). A table can have multiple row-access policies.
Adding a Row-Access Policy
After creating a suitable UDF to serve as a row-access policy, you can use ALTER TABLE … ADD ROW ACCESS POLICY
on an existing table or CREATE TABLE … ROW ACCESS POLICY
when creating a new table.
The example below has a table named officers with a column assignment. The UDF hide_undercover
protects the identities of undercover officers unless the current user has the role of "chief" or "captain."
-- Create UDF to serve as the row-access policy. CREATE FUNCTION hide_undercover (val VARCHAR) RETURNS BOOLEAN RETURN SELECT CASE WHEN is_member('chief') or is_member('captain') THEN TRUE WHEN val = 'undercover' THEN FALSE ELSE TRUE END;
-- Set the UDF as a policy when creating a table. CREATE TABLE $scratch.officers ( name VARCHAR ,assignment VARCHAR) ROW ACCESS POLICY hide_undercover(assignment);
-- You can also set the UDF as a policy when modifying a table. ALTER TABLE $scratch.officers ADD ROW ACCESS POLICY hide_undercover(assignment);
-- Insert some test data. INSERT INTO $scratch.officers VALUES ('Deckard', 'undercover'), ('Murphy', 'patrol');
-- Notice that the undercover officer is not in the results. SELECT * from $scratch.officers; -- Murphy, patrol
Removing a Row-Access Policy
To remove a row-access policy, use ALTER TABLE … DROP ROW ACCESS POLICY
, for example:
ALTER TABLE $scratch.officers DROP ROW ACCESS POLICY hide_undercover(assignment);
To learn more, see the documentation for Dremio Cloud and Dremio Software.
Showing Your Policies
In Dremio Cloud, you can view row-access and column-masking policies with the following queries:
SELECT view_name, masking_policies, row_access_policies FROM sys.project.views; SELECT table_name, masking_policies, row_access_policies FROM sys.project."tables";
In Dremio Software, you can view row-access and column-masking policies with the following queries:
SELECT view_name, masking_policies, row_access_policies FROM sys.views; SELECT table_name, masking_policies, row_access_policies FROM sys."tables";
In both Dremio Cloud and Dremio Software, you can run DESCRIBE TABLE
to see what column-masking policies have been set on a given table.
Healthcare Example
To illustrate the concepts from above, consider the following scenarios that demonstrate column-masking and row-access policies.
Scenario 1: Data Redaction
A hospital maintains a table of medical records. Columns containing the medical record number, patient name, and diagnosis are protected health information (PHI) that must only be shown to users in the "MEDICAL" role. Users such as billing clerks who are not in that role must see "[redacted]" instead.
-- Create a table to hold medical records. CREATE TABLE $scratch.records ( patient_mrn VARCHAR ,patient_name VARCHAR ,date_of_service DATE ,diagnosis VARCHAR ,is_vip BOOLEAN);
-- Insert some test data. INSERT INTO $scratch.records VALUES ('00847571', 'John Smith', '2022-04-26', 'cough', false); INSERT INTO $scratch.records VALUES ('00195483', 'Jane Doe', '2021-03-16', 'rash', false);
-- Create UDF to serve as the column-masking policy. CREATE FUNCTION mask_string (val VARCHAR) RETURNS VARCHAR RETURN SELECT CASE WHEN is_member ('MEDICAL') THEN val ELSE '[redacted]' END;
-- Set the UDF as a policy for the sensitive columns. ALTER TABLE $scratch.records MODIFY COLUMN patient_mrn SET MASKING POLICY mask_string (patient_mrn); ALTER TABLE $scratch.records MODIFY COLUMN patient_name SET MASKING POLICY mask_string (patient_name); ALTER TABLE $scratch.records MODIFY COLUMN diagnosis SET MASKING POLICY mask_string (diagnosis);
-- When logged in as a user that does not have the "MEDICAL" role, PHI columns are redacted. SELECT * from $scratch.records; -- [redacted], [redacted], 2022-04-26, [redacted], false -- [redacted], [redacted], 2021-03-16, [redacted], false
-- When logged in as a user having the "MEDICAL" role, all columns are visible. SELECT * from $scratch.records; -- 00847571, John Smith, 2022-04-26, cough, false -- 00195483, Jane Doe, 2021-03-16, rash, false
Scenario 2: Data Concealment
In addition to redacting PHI as required by law, the hospital also has a policy to protect the records of patients they consider to be high profile such as government officials and celebrities. Employees who have undergone additional training in handling sensitive data are added to the "VIP" role and are allowed to see records where the is_vip
flag is true.
-- Insert some test data. INSERT INTO $scratch.records VALUES ('00541357', 'Mr. Celebrity', '2020-02-13', 'insomnia', true);
-- Create UDF to serve as the row-access policy. CREATE FUNCTION conceal_vip (vip_patient BOOLEAN) RETURNS BOOLEAN RETURN SELECT CASE WHEN vip_patient THEN is_member ('VIP') ELSE true END;
-- Set the UDF as a policy for VIP patients. ALTER TABLE $scratch.records ADD ROW ACCESS POLICY conceal_vip (is_vip);
-- When logged in as a user having the "MEDICAL" but not the "VIP" role, VIP rows are hidden. SELECT * from $scratch.records; -- 00847571, John Smith, 2022-04-26, cough, false -- 00195483, Jane Doe, 2021-03-16, rash, false
-- When logged in as a user having the "MEDICAL" and "VIP" roles, -- all rows and columns are visible. SELECT * from $scratch.records; -- 00847571, John Smith, 2022-04-26, cough, false -- 00195483, Jane Doe, 2021-03-16, rash, false -- 00541357, Mr. Celebrity, 2020-02-13, insomnia, true
Get Started with Dremio Cloud – It’s Free!
Dremio Cloud: The easy and open, fully managed data lakehouse platform.
Everything you need to build, automate, and query your data lakehouse in production.