SSIS Data validation

  • Hello,

    In my company we receive data files from customers. Most often some type of text based flat file. We import these via SSIS and then send the data along to where it is needed.

    We've seen a growing need to have some type of validation on this data. Here is what we currently do (sorry for the length, I just want to clearly explain everything).

    - Data is uploaded by customer to a folder.

    - A job runs every X minutes that executes a SSIS package. The SSIS Package has a container loop that looks at this folder.

    - SSIS package imports data to a temp table.

    --- If data fails import complete (badly delimited data, etc) an error path kicks in and moves the file to an error folder and kicks off a generic error email

    - Data that imports to the temp table has customer SQL run that looks at each mandatory column to see if data is present. Columns that "fail validation" have a log record written to a log table

    - If a file has one or more log records for failed validation the data is removed from the system and an email is sent to the customer

    - If there are no validation errors logged data is imported into production tables.

    Here are the enhancements we want:

    - Data is uploaded by customer to a folder.

    - A job runs every X minutes that executes a SSIS package. The SSIS Package has a container loop that looks at this folder.

    - Data is validated against some type of schema. That way if something is delimited badly, or a field is too long we can identify that and give the customer that detail in an error. Basically we want to say "Record number 1, field 2 is 50 characters, only 20 allowed)" instead of "Your file was unable to import"

    - SSIS package imports data to a temp table.

    - Data in temp table has some type of business rule logic run against it. (Field X must be >0 and numeric, field Y*fieldZ must = field A etc). Records failing the rules are recorded and sent to the customer

    - If there are no validation errors logged data is imported into production tables.

    Management believes there has to be tools or functionality that do this without always writing custom SQL. Biztalk has been suggested, but nobody has any experience with it and setting it up has been very difficult. Is this the right solution? What about SQL Data Quality Services?

  • I've always written custom code and transformation script components with multiple outputs for things like this. I haven't heard of a tool with that much built into it (generically). There's a lot of little things that need to be reviewed in things like that.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Off the top of my head I think that one or more Conditional Split components might be able to help here; you can route data down different paths depending on whether or not it meets certain conditions.

    Alternatively you could write a Script or Custom Component as Craig suggests or even look at Data Quality Services (DQS).

    Regards

    Lempster

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply