How to validate data of a flat file source with the data of different tables of my database?

  • Hi All,

    I am trying to find a solution for my task.

    My task is, I have a Flat File Source data. Using Flat File Source of DFT, I am reading it.

    Now I have to validate data of this source for each row nearly on all columns with the data of different tables of a database.

    For example, Data of ColumnA of source file need to validate with some x table data of my database. In the same way data of columnB of source need to be validate with some Y table data of my database.

    What exactly I want to say is, for each column of source data, I have to apply some different business rules and after passing all these validation the result data I have to insert into Destination table.

    Please help me, if any one have solution for this.

    Thanks & Regards

    Ashok Vallu

  • you can take the data into temp table and then use some T SQL to validate or do some join in sql to get the correct data and then store in your tables. or you can also use some look up transformation in the SSIS to validate your data.

  • Hi Sharath,

    Thanks for reply, but this not a one time job. For this task I was already created a SQL Server Agent Job which will perform every day. So I can't do manual sql queries kind of validation.

    I was trying to prepare a store procedure (for validating data of a row) and to call it from any SSIS task like OLE DB command (or something like) but I am not getting output from this task (OLE DB command). I need its output then I have to use "Slowly changing dimensions" task on that output to insert into DB table.

    Lookup is good solution but I have to apply different business rule on each column to validate which I am getting how to do? For single column and single rule Lookup works perfect which I already applied.

    Please suggest any solution.

    Thanks & Regard

    Ashok Vallu

  • Ashok

    How many rows of data do you have? The problem with the SCD, and your stored procedure approach, is that they process one row at a time, which can be bad for performance, especially on large data sets. I would recommend that you get all your data into a staging table and then apply the column rules one at a time to the whole of the data. If you post some sample data and an example of one or two of the rules, I'll show you what I mean.

    John

  • Hi John,

    Thanks for your reply. Please find below sample data with business rules which need to apply on those column. I was try show just few columns and few rows of my test data. There more columns like this with different business rules.

    Sample Data:

    ColAColBColCColDColE

    QTDCVCCONVENTION 03867788710

    QTDCVCCONVENTION 039201210

    QTDCRCORCONVENTION REAL ESTA1015646

    QTDCRCORCONVENTION REAL ESTAF951753

    QTDCEDDEEQUIPMENT DEMO 4548301

    Business Rules:

    For ColA:

    Verify the ColA Data exists on the table A - if not write to error file and do not load any of the data of that partcular record (Row)

    For ColB & ColC:

    - Verify the ColB exists in the table C and is associated

    to the ColA. Means ColB data should exist in Table C and a relation should exist with ColA data with we find in some Table x

    - Error the record and do not update - Send to error file if missing

    For ColE:

    Make sure the ColE Data exists on the table E and is associated the ColA data which we will find in table X.

    - Error the record and do not update - Send to error file if missing

    As data is from flat file with Tab delimiter.

    Thanks & Regards

    Ashok Vallu

  • Ashok

    OK, I'm assuminng you know how to get all your data into a staging table, which we'll call SampleData. I'm not sure I understand the second and third rules, but you can do the first one something like this. I'm inserting into an error table rather than file for ease; if you need to, you can export it out to a file at the end.

    INSERT INTO ErrorTable (ErrorDate, ColA, ColB, ColC, ColD, ColE)

    SELECT

    CURRENT_TIMESTAMP

    ,ColA

    ,ColB

    ,ColC

    ,ColD

    ,ColE

    FROM

    SampleData s

    LEFT JOIN

    TableA t ON s.ColA = t.ColA

    WHERE

    s.ColA IS NULL

    John

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply