Data types in Dremio fall into two categories: primitive types such as INT and VARCHAR that hold single values, and semi-structured types like LIST, STRUCT, and MAP that hold complex values.
Arrays are lists of arbitrary size of any single type, indexed by non-negative integers, and are useful for holding sparse data.
Note: LIST and ARRAY are synonyms in Dremio, so you will see references to both in documentation, error messages, and function names.
Dremio Cloud as of mid-September and Dremio Software version 24.2 provide a variety of functions for manipulating array data.
The remainder of this article describes array functions in Dremio.
Try Dremio’s Interactive Demo
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
Dremio Array Functions
Creating Arrays
In Dremio Software v24.2+ and in Dremio Cloud, you create array literals with the ARRAY keyword like this:
You can also query array columns from tables and views in the usual manner. The Python 3 script below creates a file example.parquet that has array columns named fruits and numbers:
#!/usr/bin/env python3
import pyarrow as pa
import pyarrow.parquet as pq
table = pa.table(
{
"fruits": [("apple", "strawberry", "banana")],
"numbers": [(1, 2, 3,)],
}
)
pq.write_table(table, "example.parquet")
After uploading example.parquet you can query fruits and numbers through Dremio as you would any columns:
SELECT fruits FROM example;
-- ["apple","strawberry","banana"]
SELECT numbers FROM example;
-- [1, 2, 3]
Available Functions
The functions below are available in Dremio Software v24.2 and in Dremio Cloud:
Signature
Description
array_avg(A)
Returns the average of all non-null elements in A.
array_contains(A, V)
Returns whether A contains V.
array_max(A)
Returns the maximum value in A.
array_min(A)
Returns the minimum value in A.
array_remove(A, V)
Removes all elements that equal V in A.
array_sum(A)
Returns the sum of all non-null elements in A.
cardinality(A)
Returns the number of elements in A.
unnest(A)
Converts elements in A into rows.
Examples
SELECT ARRAY_AVG(numbers) FROM example;
-- 2.000000
SELECT ARRAY_CONTAINS(fruits, 'banana') FROM example;
-- true
SELECT ARRAY_MAX(numbers) FROM example;
-- 3
SELECT ARRAY_MIN(numbers) FROM example;
-- 1
SELECT ARRAY_REMOVE(fruits, 'strawberry') FROM example;
-- ["apple","banana"]
SELECT ARRAY_SUM(numbers) FROM example;
-- 6
SELECT CARDINALITY(fruits) FROM example;
-- 3
SELECT fruit FROM example, UNNEST(fruits) as t(fruit);
-- apple
-- strawberry
-- banana
Coming Soon
The functions below are planned for Dremio Software v24.3 and the November 2023 update of Dremio Cloud:
Signature
Description
array_agg(expr)
Returns an array consisting of all values in expr.
array_append(A, E)
Returns a new array with E at the end of A.
array_distinct(A)
Returns a new array with only the distinct elements from A.
array_frequency(A)
Returns a map where the keys are the unique elements in A, and the values are how many times the key appears.
array_prepend(A, E)
Returns a new array with E at the beginning of A.
arrays_overlap(X, Y)
Returns whether X and Y have any elements in common.
set_union(X, Y, ...)
Returns an array of all the distinct values contained in each array of the input.
Get Started with Dremio Cloud – It’s Free!
Dremio Cloud: The easy and open, fully managed data lakehouse platform.
Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg
By unifying data from diverse sources, simplifying data operations, and providing powerful tools for data management, Dremio stands out as a comprehensive solution for modern data needs. Whether you are a data engineer, business analyst, or data scientist, harnessing the combined power of Dremio and Apache Iceberg will undoubtedly be a valuable asset in your data management toolkit.
Oct 12, 2023·Product Insights from the Dremio Blog
Table-Driven Access Policies Using Subqueries
This blog helps you learn about table-driven access policies in Dremio Cloud and Dremio Software v24.1+.
Aug 31, 2023·Dremio Blog: News Highlights
Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud
Dremio Arctic bring new features to Dremio Cloud, including Apache Iceberg table optimization and Data as Code.