5 minute read · March 3, 2021
Announcing Dremio February 2021
· Senior Director Product Management, Dremio
Announcing Dremio February 2021
Today we are excited to announce the release of Dremio February 2021!
This month’s release delivers new powerful features such as support for Delta Lake open source table format, improved performance when using complex data types and more. In total, this release includes over 100 improvements!
Delta Lake
Delta Lake is an open source table format that provides transactional consistency and increased scale for Data Lake datasets by creating a consistent definition of datasets, including both schema evolution changes and data mutations. With Delta Lake, updates to datasets are viewed in a consistent manner across any application consuming the datasets, and users are prevented from seeing an inconsistent view of data during transformation. This creates a consistent and reliable view of datasets in the data lake as they are updated and evolve.
Data consistency is enabled by creating a series of manifest files which define the schema and data for a given point in time as well as a transaction log that defines an ordered record of every transaction on the dataset. By reading the transaction log and manifest files applications are guaranteed to see a consistent view of data at any point in time and writers can ensure intermediate changes are not visible until a write operation is complete.
Delta Lake provides:
- Large-scale support – Efficient metadata handling enables applications to readily process petabyte-sized datasets with millions of files
- Schema consistency – All applications processing a dataset operate on a consistent and shared definition of the dataset’s columns, data types, partitions, etc.
- Time travel – Historical snapshots enable queries to analyze datasets at any point in the historical record, rollback to prior versions and create full audit trails
- Transactions – Datasets can be modified by Spark SQL with ACID transactional consistency
Starting with this month’s release, Dremio supports analyzing Delta Lake datasets through a native and high-performance reader. Dremio automatically identifies which datasets are saved in the Delta Lake format and imports table information from the Delta Lake manifest files. Dataset promotion is seamless and operates the same as any other format in Dremio, where users can promote file system directories containing a Delta Lake dataset to a table manually or automatically by querying the directory. When using Delta Lake format, Dremio supports datasets of any size including petabyte-sized datasets with billions of files. Delta Lake is currently offered in Preview and can be enabled by setting the support key “dremio.deltalake.enabled” to true.
Pushdown Filtering on Complex Data Types
Complex data types are nested data structures consisting of one or more levels of STRUCT and ARRAY data types and are typically used to store unstructured data where new fields are added over time. Prior to this release, filters on nested data structures, such as “WHERE employee_data.name.last_name = xxx”, were applied after reading the entire column.
Starting in February 2021, filters applied to nested columns are pushed down to the scan operation to enable efficient filtering during read operations, which speeds reading large datasets and prunes unneeded data from consideration. Pushdown filtering on complex data types is enabled by default and Dremio will automatically improve query performance in such cases without any user involvement required.
Hostname Customization for Tableau and Power BI Integrations
For security, some corporate environments configure the Dremio UI to be accessed through a different URL hostname than used by client tools such as Tableau and Power BI which access Dremio over ODBC/JDBC sessions. To better support these environments it is now possible to configure the hostname to use the Tableau and Power BI integrations when these tools would access Dremio through a different URL than the Dremio UI. To configure, simply set the support key “export.bi.hostname” to the appropriate URL.
Wrapping Up!
We are very excited about this release and its capabilities, and we hope you are too. As always, we look forward to your feedback.
For a complete list of additional new features, enhancements and changes, please review the release notes, which include information about additional new features as well as numerous improvements and fixes. Please post any questions on our community forum and we’ll do our best to answer them there, along with other members of the Dremio community.