Loading CSV Files Using Dynamic SQL

  • Comments posted to this topic are about the item Loading CSV Files Using Dynamic SQL

  • sgtidwell - Monday, September 17, 2018 10:34 PM

    Comments posted to this topic are about the item Loading CSV Files Using Dynamic SQL

    "You will also need to check to see if the file already has a corresponding table, and if does, you will need to see if the data that you are importing already exists in the table. If it does, you will not want to duplicate data, so you will just exit out and move to the next file"

    Any idea how/if this approach could be used to selectively insert (new) data into the table?
    For example, if the file(s) we're reading hold newer data than is held in the table(s), would it be possible to do an insert to append the newer data to the end of the already existing data in the table?

  • steve quinn - Tuesday, September 18, 2018 3:41 AM

    sgtidwell - Monday, September 17, 2018 10:34 PM

    Comments posted to this topic are about the item Loading CSV Files Using Dynamic SQL

    "You will also need to check to see if the file already has a corresponding table, and if does, you will need to see if the data that you are importing already exists in the table. If it does, you will not want to duplicate data, so you will just exit out and move to the next file"

    Any idea how/if this approach could be used to selectively insert (new) data into the table?
    For example, if the file(s) we're reading hold newer data than is held in the table(s), would it be possible to do an insert to append the newer data to the end of the already existing data in the table?

    If the table exists and you want to update or append newer data, you could use MERGE to do it.

  • How does the process work\handle(using format file) with truncation of a field in csv larger than sql table defintion.

     Thanks.

  • Has anyone managed to load a large file like that? I tried with a file containing 19 columns and 12 million rows, and it only inserted 82. Used VARCHAR(MAX) datatype for all columns.

  • This was removed by the editor as SPAM

  • steve quinn - Tuesday, September 18, 2018 3:41 AM

    sgtidwell - Monday, September 17, 2018 10:34 PM

    Comments posted to this topic are about the item Loading CSV Files Using Dynamic SQL

    "You will also need to check to see if the file already has a corresponding table, and if does, you will need to see if the data that you are importing already exists in the table. If it does, you will not want to duplicate data, so you will just exit out and move to the next file"

    Any idea how/if this approach could be used to selectively insert (new) data into the table?
    For example, if the file(s) we're reading hold newer data than is held in the table(s), would it be possible to do an insert to append the newer data to the end of the already existing data in the table?

    I think you would have to approach this with a couple of different steps.  First you would need to make sure that you do have a unique column in the table.  ( The Item Name).  From there you could add a new column to your temp table to act as a flag column.  A simple loop should allow you to mark the flag column as either a 1, 2 or 3.  Or whatever numbers you would like to use.  The 1 could be a totally new record where the name isn't seen at all in the table, if you have two columns that make the item unique, then you would match off of both of them, a 2 which means the item is there but say the price is different or a 3 no match and that would be a new item.  After you updated the temp table, then you could do an INSERT for the new records, an UPDATE for when the price is different and you want to update the price.

    If you would like me to work up a scenario like that and send you some code for it, please feel free to email me directly.  I believe my email address is in my profile.  It won't be until this weekend I would be able to write up an example, but I don't mind if it will help you.

  • Bruin - Tuesday, September 18, 2018 5:31 PM

    How does the process work\handle(using format file) with truncation of a field in csv larger than sql table defintion.

     Thanks.

    You would need to set your format file to the possible max column size of what you are importing.  However, if the CSV File is larger than the definition you provided in the format file, it should truncate it down without an error and still do the insert for you.

  • i_love_data - Wednesday, September 19, 2018 7:59 AM

    Has anyone managed to load a large file like that? I tried with a file containing 19 columns and 12 million rows, and it only inserted 82. Used VARCHAR(MAX) datatype for all columns.

    There shouldn't be any issue with it loading a large file.  I'll try to duplicate your results this weekend and will post a follow up reply to this comment.

  • sgtidwell - Thursday, September 20, 2018 6:43 AM

    Bruin - Tuesday, September 18, 2018 5:31 PM

    How does the process work\handle(using format file) with truncation of a field in csv larger than sql table defintion.

     Thanks.

    You would need to set your format file to the possible max column size of what you are importing.  However, if the CSV File is larger than the definition you provided in the format file, it should truncate it down without an error and still do the insert for you.

    any way to display\catch the data\field in csv that was truncated thanks

  • Bruin - Tuesday, September 18, 2018 5:31 PM

    How does the process work\handle(using format file) with truncation of a field in csv larger than sql table defintion.

     Thanks.

    Fix the table to match the data or pound the tar out of the people providing the bad data.

    If you're asking for how to easily detect and record such problems, that where the error handling of both BCP and BULK INSERT come into play.  Both actually have the capability of detecting, reporting on, and sequestering bad rows while continuing to load good rows.  You just have to read how to do it in the documentation for both.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Bruin - Friday, September 21, 2018 3:23 AM

    sgtidwell - Thursday, September 20, 2018 6:43 AM

    Bruin - Tuesday, September 18, 2018 5:31 PM

    How does the process work\handle(using format file) with truncation of a field in csv larger than sql table defintion.

     Thanks.

    You would need to set your format file to the possible max column size of what you are importing.  However, if the CSV File is larger than the definition you provided in the format file, it should truncate it down without an error and still do the insert for you.

    any way to display\catch the data\field in csv that was truncated thanks

    See my response above.  While this is going to sound a bit nasty, it's not meant to be... you have to know what the capabilities of the tools you're using are and, to find that out, you do have to read the documentation and spend some time experimenting with the tools.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • sgtidwell - Thursday, September 20, 2018 6:43 AM

    Bruin - Tuesday, September 18, 2018 5:31 PM

    How does the process work\handle(using format file) with truncation of a field in csv larger than sql table defintion.

     Thanks.

    You would need to set your format file to the possible max column size of what you are importing.  However, if the CSV File is larger than the definition you provided in the format file, it should truncate it down without an error and still do the insert for you.

    You don't actually ever want such a silent truncation and, no, it won't actually do the truncation.  It will error.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • i_love_data - Wednesday, September 19, 2018 7:59 AM

    Has anyone managed to load a large file like that? I tried with a file containing 19 columns and 12 million rows, and it only inserted 82. Used VARCHAR(MAX) datatype for all columns.

    Using LOBs in the target table is the wrong way to do things and will lead to nothing but slower imports and the import of bad data and a whole lot of extra totally unnecessary code to ensure that the data is actually correct.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'm preparing a pilot-study on Full-Text Search and this script will go part of the way towards it. The extended procedure master.dbo.xp_DirTree will be useful anyway.
    The aim is to see how useful it might be in storing some/many/all of the company's office-documents, saved scripts and PDFs.

    FTS can read & index LOBs with the help of iFilters although Jeff's comment about LOBs being the wrong way to do things is a tad worrying. I am expecting the DB to grow big quickly and one of the aims of the pilot study is to see what exactly the resource requirements are.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply