4 minute read · January 15, 2018
Java Vector Enhancements for Apache Arrow 0.8.0
In a recent blog post I wrote about enhancement to vectorized processing in Apache Arrow 0.8.0 in the Java code base. In this post we will contrast latency figures for Arrow 0.7.0 and 0.8.0 to explore how these enhancements have helped to improve the overall performance of Arrow. Note that these changes are specific to the Java implementation.
For Arrow 0.8.0, we made significant enhancements to the Java vector implementation by improving the performance of critical code paths and reducing heap usage. In addition, these changes lay the foundation for additional efforts in the future to further reduce heap usage in vectors.
In the table below we compare PROJECT and FILTER operations of specific queries from the TPC-H benchmark using Dremio. Once these code changes were committed to Arrow, we changed our code in Dremio to move to the new implementation of Java vectors and ran a subset of the standard TPC-H performance tests using a 10 node cluster on GCP.
TPC-H Query | Project (Arrow 0.8) | Project (Arrow 0.7) | Project (% improvement) | Filter (Arrow 0.8) | Filter (Arrow 0.7) | Filter (% improvement) |
Q1 | 101ms | 120ms | 15.8% | N/A | N/A | N/A |
Q3 | 0.2ms | 0.5ms | 60% | N/A | N/A | N/A |
Q4 | 3ms | 3ms | 0% | 162ms | 250ms | 35.2% |
Q5 | 23ms | 38ms | 39.4% | N/A | N/A | N/A |
Q6 | 14ms | 18ms | 22.2% | 42ms | 50ms | 16.0% |
Q7 | 16ms | 25ms | 36.0% | 87ms | 150ms | 42.0% |
Q8 | 14ms | 22ms | 36.4% | N/A | N/A | N/A |
Q9 | 22ms | 34ms | 35.3% | 675ms | 720ms | 6.3% |
Q10 | 15ms | 20ms | 25.0% | N/A | N/A | N/A |
Q11 | 19ms | 20ms | 5.0% | 10ms | 12ms | 16.7% |
Q12 | 30ms | 50ms | 40.0% | 80ms | 142ms | 43.7% |
Q13 | 9ms | 15ms | 40.0% | 3400ms | 3500ms | 2.9% |
Q14 | 11ms | 17ms | 35.3% | N/A | N/A | N/A |
Q17 | 12ms | 19ms | 36.8% | 6ms | 6ms | 0.0% |
Q18 | 12ms | 20ms | 40.0% | 31ms | 48ms | 35.4% |
Q20 | 3ms | 4ms | 25.0% | 36ms | 42ms | 14.3% |
Some notes on these results:
- Project (Arrow 0.8) and Project (Arrow 0.7) represent the processing time (in milliseconds) consumed by the “Project” operator for a particular TPC-H query using the vector implementation in respective Arrow versions.
- Filter (Arrow 0.8) and Filter (Arrow 0.7) represent the processing time (in milliseconds) consumed for predicate evaluation using the vector implementation in respective Arrow versions.
- The presence of N/A for Filter (Arrow 0.8) and Filter (Arrow 0.7) indicates that the particular query didn’t require any WHERE clause evaluation and thus no “Filter” operator.
The reason for choosing specific operators for performance comparison rather than full elapsed time of query is that the performance improvements from the new vector implementation are likely to impact the non-vectorized operators since here we work with vector APIs extensively. Dremio’s vectorized query processing algorithms are very optimized and bypass most of the Arrow code stack to interact directly with the Arrow in-memory buffers to perform the meaningful work.
As Arrow vectors are used extensively in Dremio, this new implementation has provided overall performance improvements for our users as a result of fewer objects being created inside Arrow, and more efficient function calls associated with the vector APIs.
We are excited to be able to make these enhancements knowing there is much more we can do to further improve the efficiency of operations in Apache Arrow. Stay tuned for more in the near future!