Loading CSV Files Using Dynamic SQL

  • sgtidwell

    SSC Enthusiast

    Points: 107

    Comments posted to this topic are about the item Loading CSV Files Using Dynamic SQL

  • steve quinn

    SSC Veteran

    Points: 284

    sgtidwell - Monday, September 17, 2018 10:34 PM

    Comments posted to this topic are about the item Loading CSV Files Using Dynamic SQL

    “You will also need to check to see if the file already has a corresponding table, and if does, you will need to see if the data that you are importing already exists in the table. If it does, you will not want to duplicate data, so you will just exit out and move to the next file”

    Any idea how/if this approach could be used to selectively insert (new) data into the table?
    For example, if the file(s) we’re reading hold newer data than is held in the table(s), would it be possible to do an insert to append the newer data to the end of the already existing data in the table?

  • mfagan12

    SSC Eights!

    Points: 922

    steve quinn - Tuesday, September 18, 2018 3:41 AM

    sgtidwell - Monday, September 17, 2018 10:34 PM

    Comments posted to this topic are about the item Loading CSV Files Using Dynamic SQL

    "You will also need to check to see if the file already has a corresponding table, and if does, you will need to see if the data that you are importing already exists in the table. If it does, you will not want to duplicate data, so you will just exit out and move to the next file"

    Any idea how/if this approach could be used to selectively insert (new) data into the table?
    For example, if the file(s) we're reading hold newer data than is held in the table(s), would it be possible to do an insert to append the newer data to the end of the already existing data in the table?

    If the table exists and you want to update or append newer data, you could use MERGE to do it.

  • Bruin

    SSC Eights!

    Points: 812

    How does the process work\handle(using format file) with truncation of a field in csv larger than sql table defintion.

     Thanks.

  • i_love_data

    SSC Rookie

    Points: 25

    Has anyone managed to load a large file like that? I tried with a file containing 19 columns and 12 million rows, and it only inserted 82. Used VARCHAR(MAX) datatype for all columns.

  • sgtidwell

    SSC Enthusiast

    Points: 107

    steve quinn - Tuesday, September 18, 2018 3:41 AM

    sgtidwell - Monday, September 17, 2018 10:34 PM

    Comments posted to this topic are about the item Loading CSV Files Using Dynamic SQL

    "You will also need to check to see if the file already has a corresponding table, and if does, you will need to see if the data that you are importing already exists in the table. If it does, you will not want to duplicate data, so you will just exit out and move to the next file"

    Any idea how/if this approach could be used to selectively insert (new) data into the table?
    For example, if the file(s) we're reading hold newer data than is held in the table(s), would it be possible to do an insert to append the newer data to the end of the already existing data in the table?

    I think you would have to approach this with a couple of different steps.  First you would need to make sure that you do have a unique column in the table.  ( The Item Name).  From there you could add a new column to your temp table to act as a flag column.  A simple loop should allow you to mark the flag column as either a 1, 2 or 3.  Or whatever numbers you would like to use.  The 1 could be a totally new record where the name isn’t seen at all in the table, if you have two columns that make the item unique, then you would match off of both of them, a 2 which means the item is there but say the price is different or a 3 no match and that would be a new item.  After you updated the temp table, then you could do an INSERT for the new records, an UPDATE for when the price is different and you want to update the price.

    If you would like me to work up a scenario like that and send you some code for it, please feel free to email me directly.  I believe my email address is in my profile.  It won’t be until this weekend I would be able to write up an example, but I don’t mind if it will help you.

  • sgtidwell

    SSC Enthusiast

    Points: 107

    Bruin - Tuesday, September 18, 2018 5:31 PM

    How does the process work\handle(using format file) with truncation of a field in csv larger than sql table defintion.

     Thanks.

    You would need to set your format file to the possible max column size of what you are importing.  However, if the CSV File is larger than the definition you provided in the format file, it should truncate it down without an error and still do the insert for you.

  • sgtidwell

    SSC Enthusiast

    Points: 107

    i_love_data - Wednesday, September 19, 2018 7:59 AM

    Has anyone managed to load a large file like that? I tried with a file containing 19 columns and 12 million rows, and it only inserted 82. Used VARCHAR(MAX) datatype for all columns.

    There shouldn’t be any issue with it loading a large file.  I’ll try to duplicate your results this weekend and will post a follow up reply to this comment.

  • Bruin

    SSC Eights!

    Points: 812

    sgtidwell - Thursday, September 20, 2018 6:43 AM

    Bruin - Tuesday, September 18, 2018 5:31 PM

    How does the process work\handle(using format file) with truncation of a field in csv larger than sql table defintion.

     Thanks.

    You would need to set your format file to the possible max column size of what you are importing.  However, if the CSV File is larger than the definition you provided in the format file, it should truncate it down without an error and still do the insert for you.

    any way to display\catch the data\field in csv that was truncated thanks

  • Jeff Moden

    SSC Guru

    Points: 993393

    Bruin - Tuesday, September 18, 2018 5:31 PM

    How does the process work\handle(using format file) with truncation of a field in csv larger than sql table defintion.

     Thanks.

    Fix the table to match the data or pound the tar out of the people providing the bad data.

    If you’re asking for how to easily detect and record such problems, that where the error handling of both BCP and BULK INSERT come into play.  Both actually have the capability of detecting, reporting on, and sequestering bad rows while continuing to load good rows.  You just have to read how to do it in the documentation for both.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • Jeff Moden

    SSC Guru

    Points: 993393

    Bruin - Friday, September 21, 2018 3:23 AM

    sgtidwell - Thursday, September 20, 2018 6:43 AM

    Bruin - Tuesday, September 18, 2018 5:31 PM

    How does the process work\handle(using format file) with truncation of a field in csv larger than sql table defintion.

     Thanks.

    You would need to set your format file to the possible max column size of what you are importing.  However, if the CSV File is larger than the definition you provided in the format file, it should truncate it down without an error and still do the insert for you.

    any way to display\catch the data\field in csv that was truncated thanks

    See my response above.  While this is going to sound a bit nasty, it’s not meant to be… you have to know what the capabilities of the tools you’re using are and, to find that out, you do have to read the documentation and spend some time experimenting with the tools.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • Jeff Moden

    SSC Guru

    Points: 993393

    sgtidwell - Thursday, September 20, 2018 6:43 AM

    Bruin - Tuesday, September 18, 2018 5:31 PM

    How does the process work\handle(using format file) with truncation of a field in csv larger than sql table defintion.

     Thanks.

    You would need to set your format file to the possible max column size of what you are importing.  However, if the CSV File is larger than the definition you provided in the format file, it should truncate it down without an error and still do the insert for you.

    You don’t actually ever want such a silent truncation and, no, it won’t actually do the truncation.  It will error.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • Jeff Moden

    SSC Guru

    Points: 993393

    i_love_data - Wednesday, September 19, 2018 7:59 AM

    Has anyone managed to load a large file like that? I tried with a file containing 19 columns and 12 million rows, and it only inserted 82. Used VARCHAR(MAX) datatype for all columns.

    Using LOBs in the target table is the wrong way to do things and will lead to nothing but slower imports and the import of bad data and a whole lot of extra totally unnecessary code to ensure that the data is actually correct.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • sean redmond

    SSCertifiable

    Points: 5563

    I’m preparing a pilot-study on Full-Text Search and this script will go part of the way towards it. The extended procedure master.dbo.xp_DirTree will be useful anyway.
    The aim is to see how useful it might be in storing some/many/all of the company’s office-documents, saved scripts and PDFs.

    FTS can read & index LOBs with the help of iFilters although Jeff’s comment about LOBs being the wrong way to do things is a tad worrying. I am expecting the DB to grow big quickly and one of the aims of the pilot study is to see what exactly the resource requirements are.

  • Jeff Moden

    SSC Guru

    Points: 993393

    Sean Redmond - Monday, December 10, 2018 6:07 AM

    I'm preparing a pilot-study on Full-Text Search and this script will go part of the way towards it. The extended procedure master.dbo.xp_DirTree will be useful anyway.
    The aim is to see how useful it might be in storing some/many/all of the company's office-documents, saved scripts and PDFs.

    FTS can read & index LOBs with the help of iFilters although Jeff's comment about LOBs being the wrong way to do things is a tad worrying. I am expecting the DB to grow big quickly and one of the aims of the pilot study is to see what exactly the resource requirements are.

    If you must load LOBs, make sure that you’ve used set the table option to store the LOBs out of row.  I’ve recently done some testing that shows (at least for the test data I built) that INSERTs of LOBs with that option turned on make the INSERTs about 19.6% faster and it doesn’t hurt reads either.  It fact, it makes the reads of the LOB data (I used a LIKE search) an average of 3.9% faster for the test data I used.

    Of course, forcing LOBs to be out of row so that the stuff that could fit won’t end up in-row will also help your non-LOB selects sing because they won’t have to read larger unused LOB data.  As a bit of a sidebar and major benefit, the option also took the Clustered Index rebuilds for the test data I used down from 7500 milliseconds down to 185 milliseconds not to mention the huge savings in Logical Reads for queries that don’t address the LOB data.

    Here’s the code to set the option for a given table.
    EXEC sp_tableoption ‘dbo.SomeTable’,’large value types out of row’,1;

    Please DO read up on the use of sp_tableoption for tables that already have data in them.  There are extra steps required to force existing in-row lobs out-of-row. (An “in-place” update must be done).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply