MySQL is "The world's most popular open source database" and provides a robust, widely-used way to store and access your data. In fact, many cloud databases have adopted the MySQL interface over other options. There are several native, open-source drivers available for connecting to your MySQL data from other applications.
Preparation
This article will serve to compare the native, open-source JDBC Driver from MySQL (Connector/J1), the MariaDB (a popular MySQL fork) Connector/J2, and the licensed JDBC Driver from CData (JDBC Driver for JDBC 20163). In order to provide a benchmark, we installed MySQL Server 5.7 on a test machine and inserted a relatively large dataset into the database.
The test machine specifications are as follows:
Operating System: Windows 7 Ultimate, SP1
Processor: Intel® CoreTM i3-2120 CPU @ 3.30GHz
Installed Memory (RAM): 8.00 GB
System type: 64-bit Operating System
Since the drivers are being compared side-by-side, the performance of the machine itself is relatively unimportant; what matters is how the drivers compare relative to one another.
As a side note, we had plans to include other commercial JDBC drivers in our comparison, but they did not support connecting to the Community Edition of MySQL Server.
Comparison
To begin the comparison, we created a new database in MySQL and then created a table called "restaurant", described below:
+-----------------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------------+---------------+------+-----+---------+-------+ | _id | int(10) | NO | PRI | | | | grades | varchar(2000) | YES | | NULL | | | address_street | varchar(255) | YES | | NULL | | | address_zipcode | int(11) | YES | | NULL | | | name | varchar(255) | YES | | NULL | | | address_coord | varchar(255) | YES | | NULL | | | building | varchar(127) | YES | | NULL | | | restaurant_id | bigint(20) | NO | | NULL | | | borough | varchar(255) | YES | | NULL | | | cuisine | varchar(255) | YES | | NULL | | | avg_score | double | YES | | NULL | | +-----------------+---------------+------+-----+---------+-------+
The dataset consists of 1,546,899 rows of data and was extrapolated from the data contained in a MongoDB sample database4. The only fields added to the MongoDB sample database were a new field to hold an arbitrary double value ("avg_score", for use in the aggregate query below) and a separate, unique primary key column ("_id") so that the original dataset could be repeated until a larger data set was created.
The main goal of this investigation was to compare the related performance of the drivers. We did this by running the same four queries with each driver. The queries are listed below:
- [SELECT * FROM restaurant]
- [SELECT _id, grades, avg_score, name, cuisine FROM restaurant]
- [SELECT cuisine, AVG(avg_score) FROM restaurant GROUP BY cuisine]
- *a JDBC metadata request*
The queries above have varying numbers of results and will showcase each driver’s ability to manage both large and small result sets.
Results
For each driver, we established a connection to the database and ran all four queries back-to-back. I performed this operation fifty times per driver and then calculated the average runtime for the set of fifty runs for each query. The results are displayed in the table below.
Query Times by Driver (in ms) | |||
---|---|---|---|
Query | CData | MySQL | MariaDB |
1 | 9786.431 | 14518.182 | 22582.566 |
2 | 8536.838 | 12843.329 | 15582.370 |
3 | 7841.812 | 7602.339 | 7571.693 |
4 | 4.718 | 4.288 | 2.056 |
As can be seen in the results, the CData Driver significantly outperformed the native MySQL Driver and MariaDB when working with large result sets and was comparably performant for other queries.
For the "SELECT * ..." query, there are over 170 million data points to read (over 1.5 million rows times 11 columns) and the CData Driver is able to read the results in approximately 67% of the time it takes the MySQL Driver.
The second query ("SELECT _id, grades, avg_score, ...") has approximately 7.7 million data points to read and the CData Driver is able to read the results in approximately 66% of the time it takes the MySQL Driver.
At this point, it is worth noting that the performance for the CData Driver for the first and second queries is are significantly more similar than either the MySQL driver or the MariaDB driver. This highlights the likelihood that the majority of the time used by the CData Driver is based on web traffic time and server processor time, and not time spent processing the results.
The third query ("SELECT cuisine, AVG(avg_score) ...") has 170 points of data to process (85 rows, resulting from 85 unique values for "cuisine", times 2 columns) and the performance of each driver is almost the same. This can be attributed to the low number of results and the time required by the MySQL Server to calculated the average of the avg_score for each grouping.
The only situation studied in which both the MySQL Driver and MariaDB measurably outperformed the CData Driver was when the drivers were tasked with reading the metadata of the table. While the percent difference in performance seems large (the CData Driver takes over twice as long as the MariaDB Driver), the real-world difference in performance is functionally unnoticable (approximately 2.5 milliseconds, or one-one-hundredth the time it takes to blink an eye5).
The average runtime for each query is compared in the chart below:
Conclusion
Based on the results from the queries, we can conclude that the CData Driver's performance far exceeds that of the native MySQL Connector/J and the MariaDB Connector/J. The benefits of the CData Driver are highlighted most when working with large data sets, processing 1.5 million rows almost as quickly as it processes 85 rows. The developers at CData have done some impressive work to maximize the performance in processing the result set and seem to only be hindered by web traffic and server processing times.