Blog Post

Use Lookups the right Way

,

SSIS Lookup tasks are all too easy to abuse as they are easy to setup and not really think about much. Most commonly developers just drag the task in setup a reference table, map a column or two, add a column to be returned and click OK. Now while doing that will work no doubt it can cause a performance nightmare. So let’s setup this scenario then break down why you have to think about taking the easy way out.

Scenario:

Let’s use our favorite friend AdventureWorksDW and say we want to load the FactInternetSales table, which has 8 dimensions keys and those dimensions would be.

  1. DimProduct
  2. DimDate (OrderDate)
  3. DimDate (DueDate)
  4. DimDate (ShipDate)
  5. DimCustomer
  6. DimPromotion
  7. DimCurrency
  8. DimSalesTerritory

So if we created a SSIS package to load the using the easy way described earlier we would have a Source and 8 Lookups just to handle getting the lookup keys portion of this package. If all I did was create my lookups as table references I would have a data pull that looked something like this.

TableNameNumber of RowsNumber of times QueriedRow Size (KB)Total Data Size (KB)
DimProduct60617.794720.74
DimDate21913.14920.22
DimCustomer1848411.8434010.56
DimCurrency1051.1111.55
DimPromotion1612.1233.92
DimSalesTerritory111.303.3

However, we don’t need all the fields from these lookups for we only need the fields to join on and the field to set the Dimension Key for. Each of the table queries should look more like this.

  1. DimProduct (ProductKey, ProductAlternateKey)
  2. DimDate (DateKey, FullDateAlternateKey)
  3. DimCustomer (CustomerKey, CustomerAlternateKey)
  4. DimPromotion (PromotionKey, PromotionAlternateKey)
  5. DimCurrency (CurrencyKey, CurrencyAlternateKey)
  6. DimSalesTerritory (SalesTerritoryKey, SalesTerritoryAlternateKey)

Now the Data footprint will be the below.

TableNameNumber of RowsNumber of times QueriedRow Size (KB)Total Data Size (KB)Reduction in Size %
DimProduct6061.0530.399.36%
DimDate21913.00715.33798.33%
DimCustomer184841.03554.5298.37%
DimCurrency1051.0098.04999.56%
DimPromotion161.0078.124899.63%
DimSalesTerritory111.0078.085897.4%

As you can see the Reduction in size is significant and it is effected in three areas: The Database Engine, Network and SSIS Engine all have to deal with less data now. So please never choose a table as your source but rather take the time to write the query and save yourself from performance headaches down the road and you might make your DBA happy too.

Rate

Share

Share

Rate