Introduction
Users can interact with Dremio through a comprehensive set of REST APIs, allowing DevOps teams to orchestrate Dremio with other components of their technology stacks, and end users to more easily build web applications directly on top of Dremio. Dremio allows users to perform most operations via its REST API, including issuing queries and retrieving results as JSON; browsing and managing the data catalog; managing reflections; manually refreshing a reflection; and accessing the status and results for a specific job. You can find the documentation for the API here, and follow along to learn how to perform some basic requests in Python.
Prerequisites
To follow this tutorial you should have access to a Dremio deployment, and you should have completed the first two tutorials - Getting Oriented to Dremio, and Working With Your First Dataset. You will need an account with a username and password to use the API, and familiarity with some basic concepts in Dremio. You should also have Python installed and configured on your operating system. Go to the Python website to do so if you haven’t already.
Setting Up Your Requests
To use the API, we’ll first want to import Python’s requests library to make HTTP requests and the json library so that API responses are returned in JSON format. Then, we’ll want to define some constants, including your Dremio username and password, headers for authentication, and your Dremio server. If you are running Dremio on your local machine, this will be localhost.
import json import requests username = '<your username>' password = '<your password>' headers = {'content-type':'application/json'} dremioServer = 'http://<server>:9047'
The Dremio API is designed around RESTful principles, so next we will define some wrapper functions for HTTP GET, POST, PUT, and DELETE.
def apiGet(endpoint): return json.loads(requests.get('{server}/api/v3/{endpoint}'.format(server=dremioServer, endpoint=endpoint), headers=headers).text) def apiPost(endpoint, body=None): text = requests.post('{server}/api/v3/{endpoint}'.format(server=dremioServer, endpoint=endpoint), headers=headers, data=json.dumps(body)).text # a post may return no data if (text): return json.loads(text) else: return None def apiPut(endpoint, body=None): return requests.put('{server}/api/v3/{endpoint}'.format(server=dremioServer, endpoint=endpoint), headers=headers, data=json.dumps(body)).text def apiDelete(endpoint): return requests.delete('{server}/api/v3/{endpoint}'.format(server=dremioServer, endpoint=endpoint), headers=headers)
Generally, POST and PUT requests will take parameters. For example, creating a source will take a source input. Input configurations are detailed within the Models subheading of each endpoint. Make sure to format your configuration properly to send requests.
Authenticating Users
Now that we’ve defined our request functions, we can start using the API. Dremio uses a token-based authentication system, so we first need to authenticate ourselves by generating a token. We can do this by using the login endpoint along with your username and password as the body. Note that we are currently using an older API for logging in.
def login(username, password): # we login using the old api for now loginData = {'userName': username, 'password': password} response = requests.post('http://demo.drem.io:9047/apiv2/login', headers=headers, data=json.dumps(loginData)) data = json.loads(response.text) # retrieve the login token token = data['token'] return {'content-type':'application/json', 'authorization':'_dremio{authToken}'.format(authToken=token)} headers = login(username, password)
The login function will return the header that we must pass in to all of our other API requests.
Querying Data Using the API
In previous tutorials you learned how to upload data into Dremio either from your local machine or one of our connectors (Amazon S3, Microsoft ADLS, among many others). Let’s use the SQL and Job endpoints to query this data and return rows. Given a path to your dataset, you can use this path to access this dataset through the API. Define the path as an array, then form your SQL query to send to the API. You can find the path in the UI as well. Today I’m using a dataset of all WTA tennis matches, and my SQL query returns all of Serena Williams’s matches.
def querySQL(query): queryResponse = apiPost('sql', body={'sql': query}) jobid = queryResponse['id'] return jobid path = ['\"@elbert\"', 'wta', 'matches'] path = '.'.join([str(x) for x in path]) query = "SELECT * FROM {source} WHERE winner_name = 'Serena Williams' or loser_name = 'Serena Williams'".format(source=path) jobid = querySQL(query)
The SQL endpoint returns a job ID that you can also access in the interface. If your query was successful, there will be a green checkmark along with associated metadata.
Now that we have the job ID, we can use the Job endpoint to access those rows. The Job API uses a paging model, so you can recursively page through your rows by setting an offset and limit per call.
results = apiGet('job/{id}/results?offset={offset}&limit={limit}'.format(id='<jobid>', offset=0, limit=100))
The Job API model returns a rowCount, the returned rows, and the schema of the table.
Exploring the Catalog
Dremio’s data catalog is a representation of all of your datasets, spaces, and sources that you have either created or have access to.
The Catalog endpoint gives you access to this.
The following request will return a top level view of the data catalog.
apiGet('catalog')
Each container returned in the response has an associated ID. Given a path to a dataset, you can recursively traverse IDs returned by the Catalog endpoint to get metadata about that dataset. Here’s how:
def getCatalogRoot(): return apiGet('catalog')['data'] def getByPathChildren(path, children, depth): # search children for the item we are looking for for item in children: if item['path'][depth] == path[0]: path.pop(0) response = apiGet('catalog/{id}'.format(id=quote(item['id']))) if len(path) == 0: return response else: return getByPathChildren(path, response['children'], depth + 1) def getByPath(path): # get the root catalog root = getCatalogRoot() for item in root: if item['path'][0] == path[0]: path.pop(0) if len(path) == 0: return item else: response = apiGet('catalog/{id}'.format(id=quote(item['id']))) return getByPathChildren(path, response['children'], 1) dataset = getByPath(['@elbert', 'wta', 'matches'])
Using the id returned by getByPath, you can also refresh your reflections for a particular dataset given its id
apiPost('catalog/{id}/refresh'.format(id=dataset['id']))
The catalog endpoint also allows you to promote a file or folder to a dataset given a path.
path2 = ['path', 'to', 'your', 'dataset'] file = getByPath(path2) newDataset = {'entityType': 'dataset', 'id': file['id'], 'type': 'PHYSICAL_DATASET', 'path': path2, 'format': {'type': 'JSON'}} apiPost('catalog/{id}'.format(id=quote(file['id'])), body=newDataset)
Sources
Sources represent all the different connectors for Dremio including data from a local machine. To access all of your sources, run:
apiGet('source')
To access a specific source, run:
apiGet('source/{<source-id>}')
To create a new source, use a POST request with the correct source type for your specific connector. Here I am connecting to our Elastic cluster with the correct formatting.
sourceParams = {'username': <your elastic username>, 'password': <your elastic password>, "hostList": [ {"hostname": <your elastic hostname>, "port": <your elastic port>}], "authenticationType": "MASTER",} apiPost('source', body=sourceParams)
Conclusion
We hope that this is a gentle but informative introduction about how to use the REST API to interface with Dremio. To learn more about REST APIs and Dremio checkout our Dremio API documentation.