SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


How to automate validation process


How to automate validation process

Author
Message
sql_er
sql_er
Old Hand
Old Hand (393 reputation)Old Hand (393 reputation)Old Hand (393 reputation)Old Hand (393 reputation)Old Hand (393 reputation)Old Hand (393 reputation)Old Hand (393 reputation)Old Hand (393 reputation)

Group: General Forum Members
Points: 393 Visits: 562
Hi,

We have the following scenario: We receive CSV files every month for which SSIS packages were built to process the data. The following problems occur from time to time:

1. The structure of the CSV file changed (e.g. column added or removed)
2. There were no footers in the data, but now footers started to appear
3. Date format changed (e.g. used to be mm/dd/yyyy, but became mm.dd.yyyy)
4. Number format changed (e.g. from 2000 to 2,000)

Currently we have person who manually opens each file, and using our "validation document" validates to ensure none of these or similar problems occur. We would like to move away from this manual process if possible and are looking for suggestions.

I understand that items 3. and 4. could be caught by loading data into a staging table with VARCHAR data types, and performing validation before moving it any further.

Item 2 is a bit questionable (meaning depending on the footer size SSIS load could fail or not).

Item 1, however, is a sure fail of the SSIS package that directly loads the data into a table.

Thus I feel the two possible options are:

1. Create a custom script that will run through the file, row by row, apply all the necessary validations and report an error or continue if all checks out

2. Use some 3rd party tool to validate the files (semi-manually) before kicking off the SSIS processing.

My questions are:

1. If you've had encountered a similar problem, how did you resolve it? If you did build a custom script, could you share, or do you know of some Framework that was built that could be used somewhat as plug and play?

2. Does anyone know of good 3rd party tool(s) to assist in this process?

Thanks in advance!
Orlando Colamatteo
Orlando Colamatteo
SSCoach
SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)

Group: General Forum Members
Points: 15135 Visits: 14396
sql_er (2/15/2013)
Hi,

We have the following scenario: We receive CSV files every month for which SSIS packages were built to process the data. The following problems occur from time to time:

1. The structure of the CSV file changed (e.g. column added or removed)
2. There were no footers in the data, but now footers started to appear
3. Date format changed (e.g. used to be mm/dd/yyyy, but became mm.dd.yyyy)
4. Number format changed (e.g. from 2000 to 2,000)

Currently we have person who manually opens each file, and using our "validation document" validates to ensure none of these or similar problems occur. We would like to move away from this manual process if possible and are looking for suggestions.

I understand that items 3. and 4. could be caught by loading data into a staging table with VARCHAR data types, and performing validation before moving it any further.

Item 2 is a bit questionable (meaning depending on the footer size SSIS load could fail or not).

Item 1, however, is a sure fail of the SSIS package that directly loads the data into a table.

Thus I feel the two possible options are:

1. Create a custom script that will run through the file, row by row, apply all the necessary validations and report an error or continue if all checks out

2. Use some 3rd party tool to validate the files (semi-manually) before kicking off the SSIS processing.

My questions are:

1. If you've had encountered a similar problem, how did you resolve it? If you did build a custom script, could you share, or do you know of some Framework that was built that could be used somewhat as plug and play?

2. Does anyone know of good 3rd party tool(s) to assist in this process?

Thanks in advance!


I have had a similar process and the csv files were actually in the multiple gigabytes in size so the stakes were high in terms of knowing the file was in the correct format before trying to load it so we did not waste a bunch of processing time only to find out the file was not good.

I wrote a PowerShell script to do the validation of the file. In my case it was usually that the sender would add a new column to the file, however they would add it in the middle of the column-list somewhere. I would read line one of the file which contained the column headers and then save it off to a new file. Then I would read the file from the previous file and compare it to this file's column header list to see if anything new was introduced. If it was then I would stop processing.

I did not have date formats changing, luckily, but if I did I would probably validate that in SSIS as it was being loaded into the initial staging table. If I were worried I could fix it up using a Derived Column or maybe a Script Component setup as a Transformation as .NET has some good date-parsing functions built-in and .NET is a little easier to debug than are SSIS Expressions.

__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Daniel Bowlin
Daniel Bowlin
SSCarpal Tunnel
SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)

Group: General Forum Members
Points: 4144 Visits: 2629
I have seen presentations suggesting that the Data Profiling task could be used in this manner but I can't offer much more help than that. I believe Ira Whiteside was the person using this approach. You might google him, or the Data Profiling task to learn a bit more.
sql_er
sql_er
Old Hand
Old Hand (393 reputation)Old Hand (393 reputation)Old Hand (393 reputation)Old Hand (393 reputation)Old Hand (393 reputation)Old Hand (393 reputation)Old Hand (393 reputation)Old Hand (393 reputation)

Group: General Forum Members
Points: 393 Visits: 562
Hi opc.three and Daniel Bowlin,

Thank you both for suggestions.

opc.three - would you be able to share the "power shell" code you wrote?

Daniel Bowlin - data profiling task is a great idea. It will only help me with the "data" validation though because if there are structural file changes i will not be able to load the file into the database table to run the "data profiling"

Thanks again!
Orlando Colamatteo
Orlando Colamatteo
SSCoach
SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)

Group: General Forum Members
Points: 15135 Visits: 14396
It was at a previous shop so I no longer have access to the code but basically it used a call to Get-Content that limited the rows to "the first one" to get the first line of the file, i.e. the column headers. I would store the first line of each data file to a new file for safe keeping. Then I would compare the new file's first line to the first line from the data file I received before it, and basically that was just doing another Get-Content to get the headers from the last file and an equality check with this-time's headers. If they did not match I would stop the process so someone could manually intervene because it meant the file format changed.

__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search