How to Create an ARP Connector
Intro
Dremio’s Advanced Relational Pushdown (ARP) framework allows data consumers and developers to create custom relational connectors for those cases where data is stored in uncommon locations. Using the ARP framework not only allows you to create better connectors with improved push-down abilities but it also provides a way to develop connectors more efficiently. In this tutorial we will walk through the steps of using the ARP connector template to develop a connector, we will discuss best practices, things to avoid, potential problems and how to fix them.
Assumptions
For this tutorial we will use the latest deployment of Dremio which you can download from our deploy page. Also, for the purposes of this tutorial we will use SQLite, a software library that provides a lightweight serverless relational database management system. You can obtain its JDBC driver from their Github page.
Setting Things Up
Before we start working on our SQLite connector, let’s take a look at what are the steps that we will take to get this connector ready to go. There are two files that are necessary for the creation of an ARP-based plugin:
- Storage plugin configuration: [data source]Conf.java
- Plugin ARP YAML file: [data source].arp.yaml
The storage plugin configuration file tells Dremio what the name of the plugin should be, what connection options should be displayed in the source UI such as host address, user credentials, etc., what the name of the ARP file is, which JDBC driver to use and how to make a connection to the JDBC driver.
The Plugin ARP YAML file is what is used to modify the SQL queries that are sent to the JDBC driver, allowing you to specify support for different data types and functions, as well as rewrite them if tweaks need to be made for your specific data source.
The ARP file is broken down into several sections:
Metadata: This section outlines some high level metadata about the plugin.
Syntax: This section allows for specifying some general syntax items like the identifier quote character.
Data Types: This section outlines which data types are supported by the plugin, their names as they appear in the JDBC driver, and how they map to Dremio types.
Relational Algebra: This section is divided up into a number of other subsections:
- Aggregation: Specify what aggregate functions, such as SUM, MAX, etc, are supported and what signatures they have. You can also specify a rewrite to alter the SQL for how this is issued.
- except/project/join/sort/union/union_all/values: These sections indicate if the specific operation is supported or not.
- Expressions: This section outlines general operations that are supported. The main sections are:
- Operators: Outlines which scalar functions, such as SIN, SUBSTR, LOWER, etc, are supported, along with the signatures of these functions which are supported. Finally, you can also specify a rewrite to alter the SQL for how this is issued.
- Variable_length_operators: The same as operators, but allows specification of functions which may have a variable number of arguments, such as AND and OR.
If an operation or function is not specified in the ARP file, then Dremio will handle the operation itself. Any operations which are indicated as supported but need to be stacked on operations which are not will not be pushed down to the SQL query.
Understanding the ARP Template
The conf or Storage Plug-in file for the ARP connector provides Dremio with the necessary information about the JDBC driver that we will use, the name of the connector, GUI connection options and much more. In this section I’ll identify which elements were edited to work with SQLite. At first, if you are not a day-to-day coder, this file might look intimidating, but the changes needed are very straightforward and foolproof. Open the SqliteConf.java file and notice every instance of Sqlite.
Line 38
@SourceType(value = "SQLITE", label = "SQLite")
Line 39
public class SqliteConf extends AbstractArpConf<SqliteConf> {
And Line 40
private static final String ARP_FILENAME = "arp/implementation/sqlite-arp.yaml";
In addition to these lines, we have indicated the JDBC driver class for the driver that we will use in line 43:
Note that this driver class is different for each data source, you will need to look up in your source’s documentation to see what is the driver class for that source.
private static final String DRIVER = "org.sqlite.JDBC";
Now we need to enable the GUI fields for the connection parameters. In this case SQLite only needs the following parameters: Database and Record fetch size.
@NotBlank
@Tag(1)
@DisplayMetadata(label = "Database")
public String database;
@Tag(2)
@DisplayMetadata(label = "Record fetch size")
@NotMetadataImpacting
public int fetchSize = 200;
These lines will be responsible for the following items in the GUI
By default, each field is considered “metadata impacting”, which means that a change of the value of that field could result in different metadata that Dremio will see. For example, the service account username would be metadata impacting because different users could have different permissions and therefore see different objects. The @NotMetadataImpacting tag tells Dremio that this field is not considered “metadata impacting”.
Setting Up the Database URL
Before continuing with this step, read the documentation for your data source and find out what is the correct structure for its JDBC URL, in the case of SQLite it is: org.sqlite.JDBC Because we are capturing the database name only in the UI, we need to provide that value to the JDBC URL in the toJdbcConnectionString class, to do so, make sure the variable names inside this class match the names of the parameters captured in the section above. In our case the class will look as follows:
@VisibleForTesting
public String toJdbcConnectionString() {
final String database = checkNotNull(this.database, "Missing database.");
return String.format("jdbc:sqlite:%s", database);
Save and close the file.
Editing The Plug-in ARP File
This file is used to modify the SQL queries that are sent to the JDBC driver, here we will indicate what are the different data types supported as well as SQL functions supported by the datasource. It is a good practice to check the datasource’s documentation to look for this information. You can see the original file here. In lines 19, and 20 we indicate the name of the API as well as the datasource that we will use.
metadata:
# Manually Configured Metadata Section.
name: SQLITE
apiname: sqlite
spec_version: '1'
The syntax section allows for specifying some general syntax items like the identifier quote character.
syntax:
# Manually Configured Syntax Section.
identifier_quote: '"'
identifier_length_limit: 128
allows_boolean_literal: false
map_boolean_literal_to_bit: false
supports_catalogs: false
supports_schemas: false
Now we can move on to the data_types section, this part of the file outlines which data types are supported by the plugin, their names as they appear in the JDBC driver, and how they map to Dremio types. You will need to refer to your data source’s documentation to complete the mapping.
First we will do the basic mappings for numeric types, date types and string types. We will list some of them here, you can refer to the final YAML file to see all of them.
Data types:
data_types:
mappings:
# Manually Configured Data Types Mappings Section.
- source:
name: "INTEGER"
dremio:
name: "integer"
required_cast_arguments: "none"
- source:
name: "REAL"
dremio:
name: "double"
required_cast_arguments: "none"
- source:
name: "VARCHAR"
dremio:
name: "varchar"
required_cast_arguments: "none"
Building the Connector
Now that we have all the files ready, we will go to the final step of creating the ARP connector. Before continuing, make sure your connector directory contains the following files in the appropriate paths:
- Pom.xml
- Readme.md
- Src (folder) ->Main->Resources->Arp->Implementation->sqlite-arp.yaml
- Scr->Main->Java->Com->dremio->exec->store->jdbc->conf->SqliteConf.java
Now, verify that Maven is installed on your computer. Once ready using the terminal, navigate to the directory where the pom, readme and src files are located and run the following command:
After a few minutes, you will see a Build Success message on the terminal, and also there will be a Target directory created in your current working path.
If there are any errors in the build, Maven will display the exact location of the issue so you can troubleshoot it, once ready simply save the file and run the build command again.
Installing The Connector
First, make sure Dremio is not running, then, from the target **directory, copy the **.jar file and place it inside the /$DREMIO_HOME/jars directory. Additionally, move the JDBC driver to the /$DREMIO_HOME/jars/3rdparty directory. Start and log in in Dremio. Inside Dremio, click on the plus sign to add a new Data source
Select SQLite from the list of available sources.
Add the connection and authentication parameters and click save.
If everything went well, you should be able to see the directories inside your data source.
Things that could go wrong
While this method is very straightforward, there is always the possibility that something might need certain level of troubleshooting. Here are some of the common issues that you might encounter and how to solve them.
Maven Build Failures
When running the mvn clean install **command, you might see the following error on your terminal. This means that something is not ok with the **[your source]Conf.java file.
Thankfully, Maven is good about pinpointing the coordinates (line and character, issue, and file causing it) of where the issue is located.
These are some of the things to look for in the [your source]Conf.java file:
- Double check that the private static final String DRIVER = contains the correct JDBC class for the data source that you are working with.
- Check that the @DisplayMetadata variable names (username, password, etc) are uniform across the entire file.
- Make sure that the JDBC string is correctly built
Pushdown Failures
To debug pushdowns for queries, open the logback.xml file located in the dremio/conf directory and add the following lines:
<logger name="com.dremio.exec.store.jdbc">
<level value="${dremio.log.level:-trace}"/>
</logger>
After running queries, you might notice lines in the server.log file like the following:
- 2019-07-11 18:56:24,001 [22d879a7-ce3d-f2ca-f380-005a88865700/0:foreman-planning] DEBUG c.d.e.store.jdbc.dialect.arp.ArpYaml - Operator / not supported. Aborting pushdown.
To troubleshoot aborted pushdowns, always refer to the [your source]-arp.yaml and double check your data source’s documentation to make sure the supported SQL functions are listed correctly.
Conclusion
In earlier versions of Dremio each connector was developed on an independent code path. Starting in Dremio 3.0 we have developed an all-new declarative framework (ARP) for developing relational connectors. This allows us to standardize on a single code base that is now more efficient, provides better push-down abilities, and is easier for us to maintain. In this tutorial we walked through the steps of developing a custom ARP connector for SQLite, we also highlighted what are some of the issues that users may come across when developing the connector and how to solve them. We hope you find this tutorial useful, stay tuned to learn more about how Dremio can help you gain insights from your data, faster.