Intro
In this amazing tutorial created by Nirmalya Sen, we will show you how to analyze data stored in Amazon S3 with a Dremio cluster running on EKS in AWS. This article will also show how you can shut down the Dremio cluster and reduce the EKS worker nodes to save on AWS infrastructure costs when not in use. For more information about Dremio’s different deployment methods, visit our Deploy page
Prerequisites
To be able to successfully complete this tutorial, you should have the following setup in your shell:
- kubectl
- helm
- awscli
- eksctl
- git
Alternatively, you can use the Docker image dremio/cloud-tools that has all these tools installed.
docker run -i -t dremio/cloud-tools bash
Configure your shell to use your AWS account:
aws configure
The command prompts you for four pieces of information (access key, secret access key, AWS Region, and output format), and stores them in a profile (a collection of settings) named default. This profile is then used any time you run an AWS CLI command that doesn’t explicitly specify a profile to use. You can find more info on how to get an access key and secret from AWS documentation.
Setting Up EKS Cluster
Quickest way to set up an EKS cluster is using the eksctl tool. You can adjust the number of nodes or the node type based on your needs.
eksctl create cluster \ --name dremio-test \ --version 1.12 \ --nodegroup-name dremio-test-workers \ --node-type r5d.4xlarge \ --nodes 5 --nodes-min 0 --nodes-max 5 \ --node-ami auto \ --region us-west-2
After your EKS cluster is ready, setup Helm in your cluster. If you are using the dremio/cloud-tools Docker image, run helm-init.sh
. If you are not using the Docker image, execute the following commands to setup Helm.
kubectl create serviceaccount -n kube-system tiller kubectl create clusterrolebinding tiller-binding --clusterrole=cluster-admin --serviceaccount kube-system:tiller helm init --service-account tiller --wait
Deploying Dremio
Dremio publishes Helm charts to deploy Dremio in a Kubernetes cluster. Clone the dremio-cloud-tools repo.
git clone https://github.com/dremio/dremio-cloud-tools.git
Go to the directory dremio-cloud-tools/charts/dremio. Adjust memory, CPU for coordinator and executors, and executor count in values.yaml
as per your needs (the defaults should work perfectly as well if you have not adjusted the node type and node count when creating the EKS cluster). You can also configure to store your uploads in an existing S3 bucket.
Deploy Dremio
helm install . --wait --timeout 900
Once it is deployed, go to Dremio UI, register and start using it. Get the hostname to connect to from Kubernetes service.
kubectl get services dremio-client
The value of EXTERNAL-IP
from the output of the above command is the hostname to connect to. For example, if the EXTERNAL-IP
from the output of the above command is ae315257aa03911e98bb90e46a9f1e9a-1557651590.us-west-2.elb.amazonaws.com
, you would connect to:
http://ae315257aa03911e98bb90e46a9f1e9a-1557651590.us-west-2.elb.amazonaws.com:9047
Register and you are ready to use Dremio.
Analyzing Data in S3
Add your S3 bucket as a data source in Dremio.
You are now ready to analyze your data.
You can run your queries directly in Dremio or you can use other clients to analyze your data.
Shutdown and Restart Dremio
You do not need to have your Dremio (and your EKS cluster worker nodes) running when you do not need Dremio. You can shut down Dremio and reduce the worker nodes in EKS down to zero when you do not need it. And then bring them back up when you need it.
Shutting down Dremio
Find the helm release running Dremio.
helm list
Delete the release. Say, the release name was invited-narwhal,
helm delete --purge invited-narwhal
When the following command returns no results, you can scale down the EKS cluster.
kubectl get pods
Scale down EKS cluster
If you changed the cluster name or node-group name when creating the EKS cluster, you need to match the EKS cluster name and the name of the node-group with the ones you used when creating the cluster.
eksctl scale nodegroup \ --cluster dremio-test \ --name dremio-test-workers \ --nodes 0 \ --region us-west-2
Scale up EKS cluster
This is same as the command to scale down the cluster except the value for the number of nodes.
eksctl scale nodegroup \ --cluster dremio-test \ --name dremio-test-workers \ --nodes 5 \ --region us-west-2
Re-install Dremio
Once you EKS cluster is scaled up, you can install Dremio again.
helm install . --wait --timeout 900
Installing Dremio again restores the existing Dremio metadata. This is due to the Kubernetes feature of retaining persistent volume even when the stateful sets using those volumes are deleted. The hostname for the Dremio cluster will be different. You need to find the cluster hostname the same way - use the output of the EXTERNAL-IP
of the command.
kubectl get services dremio-client
Login with the user you created when you registered Dremio the first time.
Conclusion
In this tutorial, we navigated through the steps of using Amazon’s EKS to deploy Dremio on AWS, we also connected Dremio to an S3 bucket to analyze the data contained in it. Deploying Dremio on EKS is very easy and straightforward, especially with the help of such a powerful tool like Helm. If you would like to learn more about how you can gain insights from your data faster, checkout the rest of our tutorials and also Dremio University.