Scrubbing Data With DTS : Part 2

,

Chances are, if you've dealt with any type of data, you know you

rarely receive the data in perfect condition.

In my last article, we discussed the simpler example of

how to add dashes in a social security number using DTS. This week, we'll tackle a much more complex example.

In this example, we have data that looks like the below screen.

The data as you can see has some all upper-case words and some upper-lower case words. There's also

an instance where the data has a "&" symbol that we'd prefer be spelt out as a "and".

If you havn't done so already, you may want to go back and read the previous article in this series to

understand how we create the connections and the transform data task. After you've done this create a connection from the sample file to another empty text file as we did in the last article.

Create a Transform Data Task between the two connections and double click on the arrow between them.

Under the transformation tab, delete the first arrow connecting the source Col001 to the destination. Again, this is all explained in the previous example. Click and drag the word Col011 from the source to the destination. After you release, the below screen will popup. Select the ActiveX Script method of transformation.

Click properties in the first screen to take you to the ActiveX dialog box.

Select properties as you enter the next screen. This will take you to the screen where you can modify the transformation. By default, you’ll see the below type of transformation code:

DTSDestination("Col001") = DTSSource("Col001")

First, make sure VBScript is selected, then you're all set. Use the below code to make the first

letter of each word capitalized as well as fix the & to the word and.

Function Main()
	intlen = len(DTSSource("Col001"))
	i = 1
	flgucase = "Y"
	Do While i -1 

Taking the above code apart is not too difficult.  We must first set all of our variables to their defaults.
The variable intlen is set to the length of each source row.  We use the variable i for a loop later.  
Lastly, the variable flgucase is set to Y to designate that we'd like the first letter in our string capitalized.

	intlen = len(DTSSource("Col001"))
	i = 1
	flgucase = "Y"

The next step is to begin the loop. We must set the loop to loop through each letter until the end of the string.

This is done with our intlen variable that we initialized earlier. We then set the value of midstr to

the current letter in the loop by using the mid function and the i variable.

Next, you must perform the capitalization. If the flgucase variable is Y, then capitalize the letter

or variable using the UCASE function. If it's not set to Y, then make it lowercase using the LCASE function.

Notice that we added a IF statement in the code next to determine if the letter is actually a space. We also concatenate here the pevious letter with the new letter. If it is a space,

then the next letter will be capitalized using the flgucase variable. We must finally increment our loop and end the loop.

	Do While i -1 

The last bit of code is the easiest.  We must set the destination column to be equal to the variable
strName, which holds the concatenation that we created earlier.  We have also added in the code the logic
to replace the & with and using the replace function.
	DTSDestination("Col001") = replace(strName, "&", "and")
	Main = DTSTransformStat_OK

Make sure you click on the Save button to save your code before exiting this screen. To test it,

click on the Test button and then click View Results to see the final product. As you can see

below, the updated code should've fixed your formatting problem on the destination.

As we mentioned last week, the more logic you add to your code, the slower it will run. The smallest bit of VBScript will make your transformation run twice as slow as it would

using the straight copy column method. Now that you know the basics, you're ready to tackle about any transformation problem in VBScript!

Rate

5 (1)

Share

Share

Rate

5 (1)