Blog Post

Using SSIS Lookup Transformation in ETL Packages

,

Lookup is one of many transformations available in SQL Server Integration Services (SSIS). Its purpose is to perform a JOIN of two datasets: input columns from a data source and columns in a reference dataset – a table, view or a result of a query. Lookup can be used in various scenarios but this post focuses on how to use it to implement a typical ETL package that performs incremental loads into a Data Warehouse.

Here’s a simplified description of how Lookup works: it tries to match each incoming record with one or more records in the reference table. If there is a match, the incoming record is sent to the match output. If there is no match, Lookup generates an error by default but it can also be configured to send the data to the no match output. There are some more considerations that will be described later on.

ETL PACKAGE WITH LOOKUP

Here’s a complete package that implements incremental updates to the Address table in a Data Warehouse. It connects to a source database, takes all Address records that were either added or modified in last 3 days and then, for each input record, it performs either INSERT or UPDATE in the target Address table:

lookup1

Let’s take a closer look at each element of this package:

The ODBC data source is a MySQL database from which we’re importing recently added or updated Address records:

lookup2
lookup3

Configuring the Lookup Transformation starts by specifying that rows with no matching entries should be sent to the no match output:

lookup4

Next, we select a reference data source which in our ETL scenario is the target Address table in the Data Warehouse:
lookup5

Now we go to the Columns tab and select columns we want to JOIN on, in this case it’s just the address_id:
lookup6

That was all we need to do with the Lookup. The next step is to properly handle the two outputs to INSERT non-matched and UPDATE matched records.

Configuring the INSERT command is rather simple: we add a destination and configure the connection manager and the target table:
lookup7

After selecting the table, we must go to the Mappings tab and confirm that all columns are properly mapped. In our case where the source and target tables have identical structures, this mapping will happen automatically. When column names are different, however, we need to do manual mapping by dragging and dropping source columns over respective target columns:
lookup7a

The UPDATE part of this package requires some more work and even some coding. We will first write a stored procedure that updates a single record in the target table:

CREATE PROCEDURE [dbo].[Address_update]
@address_id [INT] ,
@contact_fname [NVARCHAR](64)  ,
@contact_lname [NVARCHAR](64)  ,
@contact_phone [NVARCHAR](32)  ,
@contact_phone_ext [NVARCHAR](16)  ,
@address1 [NVARCHAR](128)  ,
@address2 [NVARCHAR](64)  ,
@city [NVARCHAR](64)  ,
@state [NVARCHAR](48)  ,
@postal_code [NVARCHAR](20)  ,
@country_code [NVARCHAR](5)  ,
@address_type [NVARCHAR](5)  ,
@record_status [NVARCHAR](2)  ,
@creation_time [DATETIME] ,
@last_update_time [DATETIME] ,
@shipping_address_type_code [NVARCHAR](6)  ,
@address3 [NVARCHAR](64)  ,
@address4 [NVARCHAR](64)  ,
@last_update_timestamp [DATETIME] ,
@contact_phone2 [NVARCHAR](32)  ,
@contact_fax [NVARCHAR](32)  ,
@email [NVARCHAR](128)  ,
@company [NVARCHAR](128)  ,
@static_flag [SMALLINT] ,
@ups_daily_pickup_flag [TINYINT] ,
@default_shipping_location_flag [TINYINT] ,
@address_name [NVARCHAR](100)  ,
@default_billing_location_flag [TINYINT]
AS
UPDATE [dbo].[Address]
   SET address_id=@address_id  ,
contact_fname=@contact_fname,
contact_lname=@contact_lname,
contact_phone=@contact_phone,
contact_phone_ext=@contact_phone_ext,
address1=@address1,
address2=@address2,
city=@city,
state=@state,
postal_code=@postal_code,
country_code=@country_code,
address_type=@address_type,
record_status=@record_status,
creation_time=@creation_time,
last_update_time=@last_update_time,
shipping_address_type_code=@shipping_address_type_code,
address3=@address3,
address4=@address4,
last_update_timestamp=@last_update_timestamp,
contact_phone2=@contact_phone2,
contact_fax=@contact_fax,
email=@email,
company=@company,
static_flag=@static_flag,
ups_daily_pickup_flag=@ups_daily_pickup_flag,
default_shipping_location_flag=@default_shipping_location_flag,
address_name=@address_name,
default_billing_location_flag=@default_billing_location_flag
WHERE address_id = @address_id

Back to our SSIS package, we will use the OLE DB Command transformation that runs a SQL statement for each row in the incoming data flow and, as you can guess, we’ll be calling the stored procedure that we just created. The first tab specifies the connection manager:
lookup8

The Component Properties tab is where we define the SQL command that will be executed. We are going to create a stored procedure called Address_update with one input parameter for each input column:
lookup9

Then we go to the Column Mapping and link available input columns with available SP parameters. Unlike with INSERT, there will be no automatic mapping so we will have to manually draw lines from left to right:
lookup10

That was it, we can now execute the package:
lookup11

ADDITIONAL CONSIDERATIONS

There’s a couple of things to keep in mind when using Lookups:

  • If there are multiple matches in the reference table, the Lookup transformation returns only the first match returned by the lookup query.
  • If there is more than one input record with the same Id and those are not matched, the package will try to INSERT all of them and will fail due to the primary key violation. If we can’t change the data source and we want to tolerate this then we need to configure the Destination to either ignore errors or handle them as appropriate.
  • The lookups performed by the Lookup transformation are case sensitive! If we JOIN on character data, we must use the Character Map transformation to convert the data to uppercase or lowercase.
  • The reference datasets are cached in memory by default until the execution of the package is completed. Depending on the dataset size and available memory, we might need to disable caching to avoid out-of-memory errors.

For additional reading, refer to MSDN and this great post by Rob Farley (t | b).

Did you use Lookup in your ETL or other scenarios and are there any stories you’d like to share?

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating