SQLServerCentral Article

Custom SSMS Shortcuts for ETL Developer. Part 3: Source-Target Mapping

,

This is the third tip designed to help ETL developers be more productive. In the first article we discussed how you can use a keyboard shortcut to call the most often used SELECT query, in the second – how to quickly look up for extended table properties.

Here, consider having a SQL code simplifying source-target data mapping so that:

  • Data warehouse designers can use the resulting table structure for a data warehouse table design
  • SSIS ETL developers can have a complete SQL code ready to use in SSIS source data flow component
  • The entire data warehouse development team can benefit from maintaining consistency between data extracts

To start using this technique, first create a stored procedure, assign a keyboard shortcut to it, and then invoke the shortcut against every source table.

Creating Stored Procedure for Source-Target Mapping

You can use sp_SourceTargetMapping stored procedure to generate source-data mapping code that

  1. Converts source data types
  2. Minimizes space used
  3. Replaces NULLs with default values

1. Converting source data types

The sp_SourceTargetMapping converts source types to destination data types according to the best practices for data warehouse design:

  • Replaces date/time stamps with integer keys and uses smart date keys to partition fact tables
  • Sets 'Y'/'N' values for flag-type attributes to maintain consistency and readability across data of this type

 

For source data type:Converts to:
DATETIMEINT (in yyymmdd format)
BITCHAR(1): 'Y'/'N'

2. Minimizing space used

The sp_SourceTargetMapping calculates minimal textual column size according to maximum length of data used in the source column.

For source data type:Assigns data size to:
VARCHAR, NVARCHAR, CHAR, NCHAR, TEXT, NTEXTMaximum length of source data actually used

 

Although saving space for dimensional attributes is not critical for a data mart, keeping dimensional table column sizes reasonably short allows to:

  • Accelerate ETL lookups
  • Minimize OLAP cube attribute size. This can be valuable for loading big dimensions or executing MDX queries.

    Note. Remember Analysis Services do not support variable data sizes, so your VARCHAR storage benefits will get lost when data is translated to OLAP cube attributes.

3. Replacing NULLs with Default Values

In a data mart, foreign keys should never have NULLs. Avoiding NULLs for dimensional attributes simplifies ETL lookups.

The sp_SourceTargetMapping assigns columns that contain NULLs the following default values:

 

When values of these column types are NULL:Assigns value to:
VARCHAR (MAX), NVARCHAR, CHAR, NCHAR, TEXT, NTEXT'*'
INT, SMALLINT-1
DATETIME-1

Warning. If the default value carries a business meaning, choose another value that is impossible for business.

Tip. Replace the default values with Unknown, Not Applicable, or Not Available with descriptive strings later within database views.

For sp_SourceTargetMapping stored procedure code, download sp_SourceTargetMapping.sql file.

Assigning a Keyboard Shortcut to the Stored Procedure

After you created the sp_SourceTargetMapping stored procedure, assign a keyword shortcut to it so that you can quickly access it.

 

Assigning Keyboard Shortcut to sp_SourceTargetMapping

 

Note: Remember to close SSMS and re-open it to make the shortcut setting effective.

Calling the Stored Procedure with a Keystroke

To execute the stored procedure using a keyboard shortcut:

  1. Drag a table name from SSMS Object Explorer to Query Editor
  2. Select the table
  3. Press Ctrl+4

 

Generated SQL SELECT statement converts data types and sizes

Exploring Data Transformation Results

When you execute the code generated earlier against the "Person.Contact" table of AdventureWorks2008R2 database for example, notice the differences between source and target data types and sizes. The differences are highlighted in red color.

 Source Destination 
Column_nameTypeLengthTypeLength
BusinessEntityIDint4int4
PersonTypenchar4nchar4
NameStyleNameStyle1varchar1
Titlenvarchar16nvarchar8
FirstNameName100nvarchar48
MiddleNameName100nvarchar32
LastNameName100nvarchar44
Suffixnvarchar20nvarchar6
ModifiedDatedatetime8int4

 

The automatically generated code that produced the above results looks like this:

 

SELECT

CONVERT(int, ISNULL([BusinessEntityID],-1)) AS [BusinessEntityID],

CONVERT(nchar (2), ISNULL([PersonType],'*')) AS [PersonType],

CASE NameStyle WHEN 1 THEN 'Y' ELSE 'N' END AS [IsNameStyle],

CONVERT(nvarchar (4), ISNULL([Title],'*')) AS [Title],

CONVERT(nvarchar (24), ISNULL([FirstName],'*')) AS [FirstName],

CONVERT(nvarchar (16), ISNULL([MiddleName],'*')) AS [MiddleName],

CONVERT(nvarchar (22), ISNULL([LastName],'*')) AS [LastName],

CONVERT(nvarchar (3), ISNULL([Suffix],'*')) AS [Suffix],

CONVERT(int, ISNULL([EmailPromotion],-1)) AS [EmailPromotion],

[AdditionalContactInfo],

[Demographics],

[rowguid],

CASE

WHEN ModifiedDate ISNULL

THEN -1

ELSE CONVERT(VARCHAR, [ModifiedDate], 112)

END AS [ModifiedDate]

FROM [Person].[Person]

Realizing Benefits and Next Steps

Now, look at the code above and ask yourself:

  • How long would it take to write this code manually for every data column transformation?
  • How much time would it take to calculate the best size for destination columns?
  • How many revisions the script would go through before you're confident you didn't forget anything?

If your answer is: the amount of time required is significant, then you are ready to start extracting benefits from this solution. You can now quickly generate the data transformation code and be certain that it is consistent with the rules.

I encourage you to start using this technique right away in your daily practice. With time, add your logic and transformation rules to adjust it better to your business needs.

For more information ideas and the best practices for data warehouse design and implementation, read these great books:

Resources

Rate

4.22 (9)

You rated this post out of 5. Change rating

Share

Share

Rate

4.22 (9)

You rated this post out of 5. Change rating