14 minute read · December 20, 2018

Working with Dremio and LDAP/AD Authentication

Dremio Authors: Insights and Perspectives

Dremio Authors: Insights and Perspectives · Dremio Team

Introduction

The lightweight directory access protocol (LDAP) is an open protocol used to store and retrieve data from a hierarchical directory structure. In this tutorial we will show you what are the steps needed to integrate LDAP authentication in Dremio. Then we will demonstrate how we can control access for different users and groups in Dremio using LDAP. We will create separate Spaces inside Dremio and will grant and restrict access to each group. Then we will mask sensitive data and try to access it from a BI client using different usernames to see how data is masked according to the users’ group memberships.

Assumptions

To get the most of this tutorial, we recommend that you first follow getting oriented to Dremio and working with your first dataset tutorials. It is also important to note that the feature that we are about to demonstrate is available only on the Enterprise Edition version of Dremio. While you are welcome to work with any LDAP provider, for this tutorial we will be using Okta to create the users and groups that we are going to use to log into Dremio once the set-up is configured. For more information on how to connect to Okta using the LDAP interface, they have a great article in their documentation that covers this topic.

Demo Set-up

For this tutorial we will be using the following predefined policies

UserOkta GroupDremio SpaceAccess to sensitive data
[email protected]EngineeringEngineeringNo
[email protected]Human ResourcesHuman ResourcesYes

Creating Users in Okta

To create users, I will first log into my Okta account

Okta user log in

Then navigate to “Directory” and from here we will go to “people” and create a couple of users

Okta directory

Here, click on “Add Person”

Okta add person

And then we are going to add the users that we will be working with, in this case we are going to create Beluga Ice and Polar Bear.

Now I’m going to assign these users to their respective groups following the table shown in the “Demo Set-up” section; Beluga Ice will be assigned to the Engineering group and Polar Bear to Human Resources.

Okta users

Configuring LDAP in Dremio

Before we deploy Dremio we need to make the following changes to the config file.

We are going to edit the Dremio.conf file. Inside the application directory for Dremio, navigate to “/conf” and locate and open “dremio.conf”. I will be using a text editor to make the changes.

At the bottom of the file we are going to locate the following block

services: {
  coordinator.enabled: true,
  coordinator.master.enabled: true,
  executor.enabled: true
}

And we are going to add the following lines

Coordinator.web.auth.type: "ldap"
Coordinator.web.auth.ldap_config: "ad.json"

The “services” block should look like this

services: {
  coordinator.enabled: true,
  coordinator.master.enabled: true,
  executor.enabled: true,
  coordinator.web.auth.type: "ldap"
  coordinator.web.auth.ldap_config: "ad.json"
}

Create a new file inside the /conf directory, name it “ad.json” and paste the following lines:

{
    "connectionMode": "ANY_SSL",
    "servers": [
        {
            "hostname": "dremio.ldap.okta.com",
            "port": 636
        }
    ],
    "names": {
        "bindDN": "uid=naren,dc=dremio,dc=okta,dc=com",
        "bindPassword": "<password>",
        "baseDN": "ou=users,dc=dremio,dc=okta,dc=com",
        "userFilter": "&(objectclass=inetorgperson)",
        "userAttributes": {
            "baseDNs": [
                "ou=users,dc=dremio,dc=okta,dc=com"
            ],
            "searchScope": "SUB_TREE",
            "firstname": "givenName",
            "id": "uid",
            "lastname": "sn",
            "email": "email"
        },
        "groupMembership": "memberOf",
        "groupDNs": ["cn={0},ou=groups,dc=dremio,dc=okta,dc=com"],
        "groupFilter": "(objectClass=groupofUniqueNames)",
        "autoAdminFirstUser": true
    }
}

Notice that the hostname directs to “dremio.ldap.okta.com” since that is the provider we are using.

Now, double check that there are no Dremio processes running on your machine

$sudo ps -ef | grep dremio

At this point we can start Dremio. Navigate to the /bin directory inside Dremio and execute the following command:

./dremio start

I’m going to log into Dremio using my administrator username. Notice that I’m using the email address as username.

Dremio login

Working With Spaces

I’m going to create two spaces, one for Engineering and another for Human Resources and will specify the respective groups in the sharing settings. Additionally, I will upload data corresponding to each team and will move these datasets to their respective spaces.

Dremio spaces
Dremio permissions

At this point I should be able to see both spaces on the main screen

Dremio spaces

We will skip through the steps of uploading the datasets for each space, if you would like to review those steps, please see our “working with your first dataset” tutorial.

Verifying Access to the Spaces

Now, I should be able to test the access to each space by logging in with each one of the users that belong to each group.

We can effectively confirm that Polar Bear has only access to Human Resources

Dremio Spaces

And Beluga has only access to Engineering

Dremio spaces

Working with Sensitive Data

In this scenario we are going to test how would this work if we have sensitive data that has been made accessible to several groups. To mask the sensitive data, I will follow the same methodology from our Dynamic Masking tutorial.

In this case I have placed a dataset containing Social Security Number and Credit Card information in a common space that everyone can access. I want to be able to share this dataset with several groups without having to create several datasets with different masked data each.

Dremio virtual dataset

I’m going to mask the data for those users who are not members of the “Human Resources” group using the following query

SELECT employee_id, first_name, last_name,
CASE
WHEN is_member('Human Resources') THEN ssn
ELSE CONCAT('XXXXXXX',SUBSTR(ssn,8,4))
END AS ssn,
CASE
WHEN is_member('Human Resources') THEN credit_card
ELSE CONCAT('XXXX-XXXX-XXXX-',SUBSTR(credit_card,16,4))
END AS creddit_card,
CASE
WHEN is_member('Human Resources') THEN cc_code
END AS cc_code, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id
FROM "@[email protected]"."firefly-ssn-employee_ssn_orig"

If I try to access the dataset from the Polar Bear profile, I should be able to see the unmasked data since that user belongs to Human Resources

Dremio dynamic masked data

Now let’s try to access this dataset using Beluga’s username, remember Beluga Ice is a member of the Engineering group.

Dremio dynamic masked data

We can see that the sensitive information was correctly masked according to the rules created in the SQL query.

Accessing Sensitive Data From External Tools

We are going to see how this implementation helps us when users try to visualize sensitive data using a BI client. This method applies to any BI or data science tool that you would like to use to analyze your data. We will use Tableau as an example.

I’m going to log back in as Polar Bear (Human Resources) and try to create a report in Tableau. After downloading the .tds file that Dremio generates, I’ll provide the same LDAP credentials of the user that I’m logged as in Dremio.

Dremio and Tableau

Then I’ll generate a brief report and verify that the information is available to members of the Human Resources group

Dremio and Tableau

Now I will perform the same exercise using a member of the Engineering group

Dremio and Tableau

We can observe that the dynamic masking has successfully hide the data from the new user

Dremio and Tableau

Conclusion

In this tutorial we demonstrated how easy, practical and yet powerful can be to implement LDAP on your Dremio environment. This feature provides data engineers and data consumers a robust solution to make sure their data is kept safe in accordance with the latest and most rigorous security and privacy measures in the data industry.

We used a trial version of Okta and their early access LDAP server offering. Created separate Spaces for each one of the groups and demonstrated how access was provided to each group correctly. Additionally, we used a generic dataset that contains sensitive data (SSN and CC numbers) and used dynamic masking to define column level permissions on a single dataset; this allowed us to provide different levels of visibility to the data without having to create separate virtual datasets for each one of the user groups who would be working with the same data.

We hope you enjoyed this tutorial, stay tuned to learn more about how you can gain insights from your data faster using Dremio.

get started

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now
demo on demand

See Dremio in Action

Not ready to get started today? See the platform in action.

Watch Demo
talk expert

Talk to an Expert

Not sure where to start? Get your questions answered fast.

Contact Us

Ready to Get Started?

Enable the business to create and consume data products powered by Apache Iceberg, accelerating AI and analytics initiatives and dramatically reducing costs.