Blog Post

Exam Prep 70-463: Data Flow Source Adapters

,

This post is part of a series on this blog that will help me, and hopefully you, pass exam 70-463: Implementing a Data Warehouse with Microsoft SQL Server 2012.  So far, we’ve covered:

The most critical part of any SSIS project is the data flow.  It is, after all, the heart of your ETL process.  In this post, we begin looking at the components of the data flow more closely.  We’ll start with the data flow source adapters.

Data Flow Source Adapters

The data flow source adapter is used to pull data out of a source repository for use within your package.  There are several source adapters to choose from, depending on where your source data resides.

  • ADO NET source – allows connectivity to a data source using an ADO.NET provider.  For example, I’ve used this adapter to pull data out of MySQL databases.
  • CDC source – introduced in SQL Server 2012, this adapter allows you to pull changed data out of a Change Data Capture (CDC)-enabled table.  The CDC source uses an ADO .NET connection manager.
  • Excel source – extract data from an Excel spreadsheet
  • Flat File source – pull data from a delimited or fixed-width file
  • ODBC source – connect to a source using native ODBC
  • OLE DB source – connect to OLE DB providers like SQL Server and Oracle (as mentioned in a previous post, Microsoft is moving away from OLE DB and recommends using ODBC sources instead)
  • Raw File source – raw files are native files generated by SSIS using the Raw File destination.  The Raw File source reads in these files very quickly with little parsing or translation required.
  • XML source – extract data from an XML file using a defined XML schema

Configuring the Source Adapter

When you add a data source adapter, you’ll need to configure it.  This usually entails specifying a connection manager, an access mode (table or view, SQL command, etc.), columns, and any other properties that are specific to the selected adapter.

Let’s look more closely at the ODBC Source adapter as an example.  If I add the adapter to my data flow and double-click it to edit, the first page I see is the Connection Manager page.  Here I can select an ODBC connection manager that I’ve already created in my package, or create a new one.  For the data access mode, I have a choice of a table or a SQL Command.  Depending on which mode I select, I would then specify a table/view (views don’t seem to show up in the drop down list, although you can manually enter the view name) or type in my SQL query.

ODBC Source - Connection Manager

ODBC Source – Connection Manager

Once I’ve specified my source, I can then move to the Columns page to specify which columns I want to pull.  It’s a best practice to limit the columns to those you actually need to extract, to avoid consuming unnecessary resources.

ODBC Source - Columns

ODBC Source – Columns

Further configuration can be performed using the Advanced Editor (right-click on your source and select “Show Advanced Editor…”  In the Component Properties tab you can configure the same properties you would see in the properties window, including the ValidateExternalMetadata and BatchSize properties.  ValidateExternalMetadata tells SSIS whether to validate the data source’s metadata at design time or wait until runtime.  Setting this property to false is useful when the specified source table doesn’t exist now, but will be created by the package itself.  The BatchSize property specifies the number of rows per batch.

On the Input and Output Properties tab of the Advanced Editor, you can see and change how SSIS is mapping the source data types to SSIS data types.  By clicking on the ODBC Source Output node, you can also set the IsSorted property to True if you’re using a source query with an ORDER BY clause.  This will help eliminate the need to sort data inside your SSIS package when using certain transformations (but more on that in a later post).

ODBC Source - Advanced Editor

ODBC Source – Advanced Editor

Summary

As with everything in SSIS, it’s important to use the right tool for the job.  While you may be able to use the ADO .NET, OLE DB, or ODBC source for a particular connection, each offers slightly different capabilities.  Use the appropriate source for your specific scenario.  Next time, we’ll talk about the other side of the ETL equation: the destination.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating