Because Analysis Service is bundled with SQL Server for free, many companies built data warehouses and OLAP cubes on top of SQL Server. DBAs usually use DTS to get data from Oracle and dump it into SQL Server for DSS and OLAP analysis.
When you are ready to do data transformation from Oracle to Microsoft SQL Server, you may face a choice of a few data providers:
- Microsoft ODBC driver for Oracle
- Microsoft OLE DB provider for Oracle
- Oracle provider for OLE DB, and Oracle in OraHomeXX (fill in your version number of your Oracle client). Depending on your setup, you may not see Oracle in OraHomeXX, which is OK, because it is the slowest driver, as you will see in the speed comparison below.
Of the remaining drivers, which one of these should you choose? Which one perform the best when doing data transformation from Oracle to SQL Server? To find out the answer, I did some performance benchmark testing using DTS. But first, let's define the difference between OLE DB and ODBC.
OLE DB and ODBC: what is the difference
OLE DB stands for O
inking and E
mbedding database. It is a standard created by Microsoft based on COM. It is really just a set of C
odel interfaces, or COM interfaces, that provide applications with access to data stored in diverse sources. The OLE DB provider accepts an OLE DB API call to the interface and does what is necessary to process that request against the remote data source. This site http://www.microsoft.com/data/oledb/
has more information on OLE DB. It also provides OLE DB software development kit. This has a lot of information on the interfaces that OLE DB supports.
OLE DB is not intended to be a replacement for ODBC. ODBC, or Open DataBase Connectivity, is a well-established standard for connecting to relational databases. It has been extended a little bit to connect to Excel spreadsheets and text files, but for the most part, it is designed for connecting to relational databases. However, OLE DB can access relational databases as well as nonrelational databases. Typically, your data is stored all over the corporation. There is data in your mail servers, directory services, spreadsheets, and text files. OLE DB allows SQL Server to link to these nonrelational database systems. For instance, if you want to query, through SQL Server, the Active Directory on the domain controller, you couldn't do this with ODBC, because it's not a relational database. However, you could use an OLE DB provider to accomplish that.
In summary, the key difference between OLE DB and ODBC is that OLE DB can provide connection to data stored in non-relational format.
Oracle provider for OLE DB is the fastest
To find out which driver perform the best when transferring data from Oracle to SQL Server, I ran a few tests. I used Oracle9i Enterprise Edition Release 188.8.131.52.0, Oracle Client 9.2, SQL server 2000 Developer Edition, and MDAC 2.7 with the latest patch. Here are steps I took to get my test results:
- I created a view for a table on a test Oracle server, which has 1 million rows and 60 columns, which has decent data size. The view is created to work around the timestamp data type issue I discussed in a different article;
- A table on Microsoft SQL Server is created. This table has the same structure as the view;
- I created a DTS package that use one driver mentioned above;
- Before I run the package, I issue this command on Oracle server.
alter system flush shared_pool;This command clears Oracle database buffer cache so we have a level playing field for testing;
- I then run the package, record the time it took to finish, truncate the table, reboot SQL Server machine;
- I repeat the above steps for all the other database drivers.
Below is my testing results:
|Microsoft ODBC driver for Oracle
|Microsoft OLE DB provider for Oracle
|Oracle provider for OLE DB
|Oracle in OraHome92
From this table, it is easy to tell that Oracle provider for OLE DB is the fastest driver. In a real production system, the overall speed should be better than this, as my SQL Server system is based on a PC. It has just one physical drive. Therefore the log and data files are not placed on different drives.
From my own testing, it is clear that Oracle provider for OLE DB is the best driver to use when transforming data from Oracle to SQL Server using DTS. To make the transformation even faster, if possible, using a single transformation for all columns will result in a much quicker data transfer. When data is transformed column by column, you end up using a number of function calls for one row of data, whereas a single transformation only needs one call for a row of data. If data is transformed column by column, the number of COM function calls equals to the number of columns you are transforming. Obviously, the more calls you make to a COM component and scripting engine, the more overhead that will incur.