Scrubbing Data With DTS : Part 1

,

Download Sample File

When trying to implement data from an external source, very

rarely do you receive the data in perfect condition.  Sometimes you receive the data in all caps or where fields like

social security number are in the improper format (like 232221113).  In this article we’ll discuss some of the

strategies to clean up your data using Data Transformation Services (DTS).

In this example, we’re going to use a sample file that is comma delimited using the text qualifier or

double-quotes.  In DTS, your destination

doesn’t have to always be SQL Server. 

Indeed, your destination can be any OLE DB data source including a text

file.  In our example, our source will

be a text file and our destination will be a text file.  First add a text file source as the file you

downloaded and the text file properties should look like the following:

Next, add another connection as a text file

(destination).  DTS will take care of

creating the destination file for you so you can specify a file that doesn’t

exist yet as shown below.

Create a Transform Data Task between the two

connections.  This will create an arrow

between the two connections. 

Double-click on the arrow and this will bring up the Transform Data Task

properties.  Accept the defaults on the

first tab and proceed to the destination tab. 

When you go to this tab, you’ll be presented with the below screen where

you’ll need to click Execute.  This will

create the destination file.

Go then to the Transformation tab.  In this example, we will transform the social security number,

which is represented as123456789 to 123-45-6789.  To do this, we’ll delete the last transformation, which is the

arrow to the bottom of the below screen. 

Click the transformation once and the hit the delete key or select the

Delete button.

Click and drag the word Col011 from the source to the

destination.  After you release, the

below screen will popup.  Select the

ActiveX  Script method of

transformation.

Select properties as you enter the next screen.  This will take you to the screen where you

can modify the transformation.  By

default, you’ll see the below type of transformation:

DTSDestination("Col011") = DTSSource("Col011")

This basically writes the data exactly “as is” from the

source to the destination.  We will

modify this using the below script to place dashes between in the social

security number. We use the Left() function to gather the first three digits

then we concatenate the string using the & operator with the dash.  We then take the middle two digits with the

Mid() function then the last 4 digits with the Right() function.

'Copy each source
column to the destination column
Function Main()
DTSDestination("Col011")
= LEFT(DTSSource("Col011"),3)_
&"-"&MID(DTSSource("Col011"),4,2)&"-"&RIGHT(DTSSource("Col011"),4)
Main = DTSTransformStat_OK
End Function

You can test the data by clicking the Test button.  This will transform the data to a sample

temporary text file.  You can then click

View Results to see the results of the file as shown below.

Although this is a simplistic example, we can use this same

type of code for some more complex scrubbing of your data.  In the next article in this series on

scrubbing your data we will work on the upper case strings in the text file.

Rate

5 (2)

Share

Share

Rate

5 (2)