Introduction
Data types in Dremio fall into two categories: primitive types such as INT and VARCHAR that hold single values, and semi-structured types like LIST, STRUCT, and MAP that hold complex values.
Arrays are lists of arbitrary size of any single type, indexed by non-negative integers, and are useful for holding sparse data.
Note: LIST and ARRAY are synonyms in Dremio, so you will see references to both in documentation, error messages, and function names.
Dremio Cloud as of mid-September and Dremio Software version 24.2 provide a variety of functions for manipulating array data.
The remainder of this article describes array functions in Dremio.
Dremio Array Functions
Creating Arrays
In Dremio Software v24.2+ and in Dremio Cloud, you create array literals with the ARRAY keyword like this:
SELECT ARRAY['apple', 'strawberry', 'banana'] -- ["apple","strawberry","banana"]
Dremio Software versions before v24.2 do not have array literals, but you can get the same result using CONVERT_FROM and JSON strings. For example:
SELECT CONVERT_FROM('["apple", "strawberry", "banana"]', 'json') -- ["apple","strawberry","banana"]
You can also query array columns from tables and views in the usual manner. The Python 3 script below creates a file example.parquet
that has array columns named fruits
and numbers
:
#!/usr/bin/env python3 import pyarrow as pa import pyarrow.parquet as pq table = pa.table( { "fruits": [("apple", "strawberry", "banana")], "numbers": [(1, 2, 3,)], } ) pq.write_table(table, "example.parquet")
After uploading example.parquet
you can query fruits
and numbers
through Dremio as you would any columns:
SELECT fruits FROM example; -- ["apple","strawberry","banana"] SELECT numbers FROM example; -- [1, 2, 3]
Available Functions
The functions below are available in Dremio Software v24.2 and in Dremio Cloud:
Signature | Description |
array_avg(A) | Returns the average of all non-null elements in A. |
array_contains(A, V) | Returns whether A contains V. |
array_max(A) | Returns the maximum value in A. |
array_min(A) | Returns the minimum value in A. |
array_remove(A, V) | Removes all elements that equal V in A. |
array_sum(A) | Returns the sum of all non-null elements in A. |
cardinality(A) | Returns the number of elements in A. |
unnest(A) | Converts elements in A into rows. |
Examples
SELECT ARRAY_AVG(numbers) FROM example; -- 2.000000 SELECT ARRAY_CONTAINS(fruits, 'banana') FROM example; -- true SELECT ARRAY_MAX(numbers) FROM example; -- 3 SELECT ARRAY_MIN(numbers) FROM example; -- 1 SELECT ARRAY_REMOVE(fruits, 'strawberry') FROM example; -- ["apple","banana"] SELECT ARRAY_SUM(numbers) FROM example; -- 6 SELECT CARDINALITY(fruits) FROM example; -- 3 SELECT fruit FROM example, UNNEST(fruits) as t(fruit); -- apple -- strawberry -- banana
Coming Soon
The functions below are planned for Dremio Software v24.3 and the November 2023 update of Dremio Cloud:
Signature | Description |
array_agg(expr) | Returns an array consisting of all values in expr. |
array_append(A, E) | Returns a new array with E at the end of A. |
array_distinct(A) | Returns a new array with only the distinct elements from A. |
array_frequency(A) | Returns a map where the keys are the unique elements in A, and the values are how many times the key appears. |
array_prepend(A, E) | Returns a new array with E at the beginning of A. |
arrays_overlap(X, Y) | Returns whether X and Y have any elements in common. |
set_union(X, Y, ...) | Returns an array of all the distinct values contained in each array of the input. |
Get Started with Dremio Cloud – It’s Free!
Dremio Cloud: The easy and open, fully managed data lakehouse platform.
Everything you need to build, automate, and query your data lakehouse in production.