pre-validate csv to SQL load

  • Bruin

    SSC Eights!

    Points: 897

    before loading csv to SQL is there a way to validate against table schema to catch potential data issues and row number

    create a good file(records pass) and bad file(for review).

    THanks.

  • below86

    SSChampion

    Points: 11243

    You could set up something in a data flow in SSIS to redirect the 'bad' rows.  Myself I like to load the CSV data to a 'work' table where all fields are defined as VARCHAR(500) or more if needed.  Then I'll use SQL to validate/redirect as needed.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us

  • Bruin

    SSC Eights!

    Points: 897

    I did go the route of a staging table and defining them as varchar(255). Now how can I take that schema and data

    and validate against "Live" table field defs?

    If I do an Insert Into ?? from my staging table and cast\convert the different fields. If something fails how can I detect which row and field and not use a cursor as some of the loads could be big.

    Can Powershell come into play?

    Thanks.

     

  • Phil Parkin

    SSC Guru

    Points: 243596

    The validation your are referring to has to be done manually, ie, create some queries which identify the issues you are concerned about and run them on the staged data.

    If you perform the load using SSIS, you can set up the components to redirect problematic rows to a different table.

    If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

  • below86

    SSChampion

    Points: 11243

    If you are using SQL, then when you do an insert into the SQL will fail when it finds a field that doesn't match the data type or the constraints you have on your target table.

    If you use SSIS you can redirect the 'bad' rows when they don't match certain look ups.  You may even be able to redirect the rows when it goes to put the data in the target table.

    You could even set up a script task to validate the data.

    Never done anything with powershell, so can't help you there.

     

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us

  • Phil Parkin

    SSC Guru

    Points: 243596

    below86 wrote:

    You may even be able to redirect the rows when it goes to put the data in the target table.

    You mean if the INSERT fails? Yes, you can redirect these sorts of failures too.

    If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

  • Jeff Moden

    SSC Guru

    Points: 994555

    Bruin wrote:

    before loading csv to SQL is there a way to validate against table schema to catch potential data issues and row number

    create a good file(records pass) and bad file(for review).

    THanks.

    I wouldn't do a separate pre-validation because it's a waste of time.  No matter what, you have to scan the CSV file.  That scan might as well also double as a load.  If the load has no errors, then you're all done.  If the load has errors, then you've still only done a single scan.

    There are various thoughts on how to do this.  Personally, I create a table with just exactly the correct datatypes, set error tolerance to 2 billion rows, set a BULK INSERT command (in T-SQL) to sequester the bad rows and errors, and do the load into a table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply