Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Custom SSMS Shortcuts for ETL Developer. Part 3: Source-Target Mapping

By Olga Klimova,

This is the third tip designed to help ETL developers be more productive. In the first article we discussed how you can use a keyboard shortcut to call the most often used SELECT query, in the second – how to quickly look up for extended table properties.

Here, consider having a SQL code simplifying source-target data mapping so that:

  • Data warehouse designers can use the resulting table structure for a data warehouse table design
  • SSIS ETL developers can have a complete SQL code ready to use in SSIS source data flow component
  • The entire data warehouse development team can benefit from maintaining consistency between data extracts

To start using this technique, first create a stored procedure, assign a keyboard shortcut to it, and then invoke the shortcut against every source table.

Creating Stored Procedure for Source-Target Mapping

You can use sp_SourceTargetMapping stored procedure to generate source-data mapping code that

  1. Converts source data types
  2. Minimizes space used
  3. Replaces NULLs with default values

1. Converting source data types

The sp_SourceTargetMapping converts source types to destination data types according to the best practices for data warehouse design:

  • Replaces date/time stamps with integer keys and uses smart date keys to partition fact tables
  • Sets 'Y'/'N' values for flag-type attributes to maintain consistency and readability across data of this type

 

For source data type: Converts to:
DATETIME INT (in yyymmdd format)
BIT CHAR(1): 'Y'/'N'

2. Minimizing space used

The sp_SourceTargetMapping calculates minimal textual column size according to maximum length of data used in the source column.

For source data type: Assigns data size to:
VARCHAR, NVARCHAR, CHAR, NCHAR, TEXT, NTEXT Maximum length of source data actually used

 

Although saving space for dimensional attributes is not critical for a data mart, keeping dimensional table column sizes reasonably short allows to:

  • Accelerate ETL lookups
  • Minimize OLAP cube attribute size. This can be valuable for loading big dimensions or executing MDX queries.
    Note. Remember Analysis Services do not support variable data sizes, so your VARCHAR storage benefits will get lost when data is translated to OLAP cube attributes.

3. Replacing NULLs with Default Values

In a data mart, foreign keys should never have NULLs. Avoiding NULLs for dimensional attributes simplifies ETL lookups.

The sp_SourceTargetMapping assigns columns that contain NULLs the following default values:

 

When values of these column types are NULL: Assigns value to:
VARCHAR (MAX), NVARCHAR, CHAR, NCHAR, TEXT, NTEXT '*'
INT, SMALLINT -1
DATETIME -1

Warning. If the default value carries a business meaning, choose another value that is impossible for business.

Tip. Replace the default values with Unknown, Not Applicable, or Not Available with descriptive strings later within database views.

For sp_SourceTargetMapping stored procedure code, download sp_SourceTargetMapping.sql file.

Assigning a Keyboard Shortcut to the Stored Procedure

After you created the sp_SourceTargetMapping stored procedure, assign a keyword shortcut to it so that you can quickly access it.

 

Assigning Keyboard Shortcut to sp_SourceTargetMapping

 

Note: Remember to close SSMS and re-open it to make the shortcut setting effective.

Calling the Stored Procedure with a Keystroke

To execute the stored procedure using a keyboard shortcut:

  1. Drag a table name from SSMS Object Explorer to Query Editor
  2. Select the table
  3. Press Ctrl+4

 

Generated SQL SELECT statement converts data types and sizes

Exploring Data Transformation Results

When you execute the code generated earlier against the "Person.Contact" table of AdventureWorks2008R2 database for example, notice the differences between source and target data types and sizes. The differences are highlighted in red color.

  Source   Destination  
Column_name Type Length Type Length
BusinessEntityID int 4 int 4
PersonType nchar 4 nchar 4
NameStyle NameStyle 1 varchar 1
Title nvarchar 16 nvarchar 8
FirstName Name 100 nvarchar 48
MiddleName Name 100 nvarchar 32
LastName Name 100 nvarchar 44
Suffix nvarchar 20 nvarchar 6
ModifiedDate datetime 8 int 4

 

The automatically generated code that produced the above results looks like this:

 

SELECT

CONVERT(int, ISNULL([BusinessEntityID],-1)) AS [BusinessEntityID],

CONVERT(nchar (2), ISNULL([PersonType],'*')) AS [PersonType],

CASE NameStyle WHEN 1 THEN 'Y' ELSE 'N' END AS [IsNameStyle],

CONVERT(nvarchar (4), ISNULL([Title],'*')) AS [Title],

CONVERT(nvarchar (24), ISNULL([FirstName],'*')) AS [FirstName],

CONVERT(nvarchar (16), ISNULL([MiddleName],'*')) AS [MiddleName],

CONVERT(nvarchar (22), ISNULL([LastName],'*')) AS [LastName],

CONVERT(nvarchar (3), ISNULL([Suffix],'*')) AS [Suffix],

CONVERT(int, ISNULL([EmailPromotion],-1)) AS [EmailPromotion],

[AdditionalContactInfo],

[Demographics],

[rowguid],

CASE

WHEN ModifiedDate ISNULL

THEN -1

ELSE CONVERT(VARCHAR, [ModifiedDate], 112)

END AS [ModifiedDate]

FROM [Person].[Person]

Realizing Benefits and Next Steps

Now, look at the code above and ask yourself:

  • How long would it take to write this code manually for every data column transformation?
  • How much time would it take to calculate the best size for destination columns?
  • How many revisions the script would go through before you're confident you didn't forget anything?

If your answer is: the amount of time required is significant, then you are ready to start extracting benefits from this solution. You can now quickly generate the data transformation code and be certain that it is consistent with the rules.

I encourage you to start using this technique right away in your daily practice. With time, add your logic and transformation rules to adjust it better to your business needs.

For more information ideas and the best practices for data warehouse design and implementation, read these great books:

Resources:

sp_SourceTargetMapping.sql
Total article views: 2984 | Views in the last 30 days: 5
 
Related Articles
FORUM

Avoid repetitive code with ISNULL

Avoid repetitive code with ISNULL in stored procedures

FORUM

SSRS report data sources from a relational DB source - use views or stored procedures?

Should views or stored procedures be used as data sources?

FORUM

stored procedure parameters question

stored procedure parameters question

FORUM

Which Stored Procedure changed

Which Stored Procedure changed

FORUM

DTC in stored procedure

DTC in stored procedure

 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones