Flat File csv with double quotes

  • Powershell can be quite handy for this kind of thing, it can be as simple as:

    Import-Csv -Path C:\temp\test.csv | Write-SqlTableData -ServerInstance '.' -DatabaseName Scratch -SchemaName dbo -TableName MyTable 

    It's a bit picky about column ordering, so if your table structure doesn't match the data you may need to pipe through a Select-Object to rearrange them in the right order and it really doesn't play well with Identity columns in my experience but for simplicity it's hard to beat. In more complex cases I'd usually just import into a specific import table and do the real work in SQL
  • andycadley - Sunday, July 22, 2018 3:33 AM

    Powershell can be quite handy for this kind of thing, it can be as simple as:

    Import-Csv -Path C:\temp\test.csv | Write-SqlTableData -ServerInstance '.' -DatabaseName Scratch -SchemaName dbo -TableName MyTable 

    It's a bit picky about column ordering, so if your table structure doesn't match the data you may need to pipe through a Select-Object to rearrange them in the right order and it really doesn't play well with Identity columns in my experience but for simplicity it's hard to beat. In more complex cases I'd usually just import into a specific import table and do the real work in SQL

    +1 to that.  And, although I've not used it to do actual imports, rumor has it that it's quite slow for imports compared to many other methods.  I don't actually have personal proof of that, though.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Chrissy321 - Wednesday, July 18, 2018 4:13 PM

    See this http://sqlblog.com/blogs/jamie_thomson/archive/2010/11/25/sql-server-2008-sp1-cu-6-includes-small-changes-to-dtsx-files.aspx
    I distantly recall running into this. I think if you view the properties of the object rather than opening up in the GUI you can edit this properly.

    That's good info, Chrissy.  Thanks for posting it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @ gothaimviii ,

    In my curious mind, I'd like to know if you've solved this problem and, if so, what the issue was.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, July 17, 2018 1:26 PM

    Chris Hurlbut - Tuesday, July 17, 2018 1:18 PM

    Jeff Moden - Monday, July 16, 2018 3:49 PM

    Interesting.  Every time I see something like this, the more happy I am that I don't use SSIS ;-).  A BCP FORMAT FILE and the use of BULK INSERT would eliminate such problems in fairly short order.

    Why is this interesting?  So you don't know SSIS or you just dislike it?  So you suggest BCP Format File or Bulk Insert but don't offer any basics or advise?  This user is probably stressing out and frustrated but you find this "Interesting"...

    The reason I find it "interesting" is because SSIS is supposed to make things easy for users that may not know other methods.  It supposedly can handle text qualifiers but that apparently doesn't work for the OP.  It's also interesting that people would even bother with SSIS for such a simple import of such a nicely laid out file.  And, it's interesting that no one has been able to help the OP, including you. 😉  Instead, you're trying to crown me because I find all that "interesting" even though I mentioned a possible alternative that the OP hasn't said he'd be willing to try never mind even asking what it is that I'm talking about. 😀

    @ Chris Hurlbut ,

    Just a bit more feedback and I don't mean any of this in a negative way... I'm just trying to explain the post that you took exception to.

    As we've just found out, I've found that even when you know precisely how to overcome a given problem and do offer to help someone on such things but need and ask for just a little more information (which has not yet been provided even though I've asked twice, now) that's easy to come by in order to solve their problem, people get pretty stuck on the use of supposed "easy" applications.  It's not always their fault, either.  Frequently, they've been directed to so something using a certain tool and, as the old adage goes, "they're workin' for a livin' so they're takin' what they're given" and either can't or won't deviate.

    That's the reason for the post you took exception to... I was testing the water to see if I should spend any additional time trying to help.  And, yes, it was a bit of a rant about SSIS because this isn't the first time I've seen such easy problems frustrate an OP and no one seems to be able to solve the actual problem using SSIS.  It's not anyone's fault... it just happens.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Sunday, July 22, 2018 6:06 AM

    andycadley - Sunday, July 22, 2018 3:33 AM

    Powershell can be quite handy for this kind of thing, it can be as simple as:

    Import-Csv -Path C:\temp\test.csv | Write-SqlTableData -ServerInstance '.' -DatabaseName Scratch -SchemaName dbo -TableName MyTable 

    It's a bit picky about column ordering, so if your table structure doesn't match the data you may need to pipe through a Select-Object to rearrange them in the right order and it really doesn't play well with Identity columns in my experience but for simplicity it's hard to beat. In more complex cases I'd usually just import into a specific import table and do the real work in SQL

    +1 to that.  And, although I've not used it to do actual imports, rumor has it that it's quite slow for imports compared to many other methods.  I don't actually have personal proof of that, though.

    Early versions certainly had a bad habit of reverting to RBAR inserts if you weren't careful (essentially it was optimizing for throughput of the powershell pipeline rather than the write to the database)  but more modern versions seem to default to a bulk insert. The usual caveats about testing it in your own environment, particularly if you have large datasets or need maximum performance, all apply as usual. 

  • andycadley - Sunday, July 22, 2018 7:07 AM

    Jeff Moden - Sunday, July 22, 2018 6:06 AM

    andycadley - Sunday, July 22, 2018 3:33 AM

    Powershell can be quite handy for this kind of thing, it can be as simple as:

    Import-Csv -Path C:\temp\test.csv | Write-SqlTableData -ServerInstance '.' -DatabaseName Scratch -SchemaName dbo -TableName MyTable 

    It's a bit picky about column ordering, so if your table structure doesn't match the data you may need to pipe through a Select-Object to rearrange them in the right order and it really doesn't play well with Identity columns in my experience but for simplicity it's hard to beat. In more complex cases I'd usually just import into a specific import table and do the real work in SQL

    +1 to that.  And, although I've not used it to do actual imports, rumor has it that it's quite slow for imports compared to many other methods.  I don't actually have personal proof of that, though.

    Early versions certainly had a bad habit of reverting to RBAR inserts if you weren't careful (essentially it was optimizing for throughput of the powershell pipeline rather than the write to the database)  but more modern versions seem to default to a bulk insert. The usual caveats about testing it in your own environment, particularly if you have large datasets or need maximum performance, all apply as usual. 

    Heh... it DOES figure that it would be better because of the "default to a bulk insert".  I do, however, cut out the proverbial middleman and just go straight to BULK INSERT using T-SQL. 😀

    And, yeah... I absolutely agree with the "usual caveats" that you mention because they're so very true.  It's a pleasure to "meet" you, Andy.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Sunday, July 22, 2018 7:18 AM

    andycadley - Sunday, July 22, 2018 7:07 AM

    Jeff Moden - Sunday, July 22, 2018 6:06 AM

    andycadley - Sunday, July 22, 2018 3:33 AM

    Powershell can be quite handy for this kind of thing, it can be as simple as:

    Import-Csv -Path C:\temp\test.csv | Write-SqlTableData -ServerInstance '.' -DatabaseName Scratch -SchemaName dbo -TableName MyTable 

    It's a bit picky about column ordering, so if your table structure doesn't match the data you may need to pipe through a Select-Object to rearrange them in the right order and it really doesn't play well with Identity columns in my experience but for simplicity it's hard to beat. In more complex cases I'd usually just import into a specific import table and do the real work in SQL

    +1 to that.  And, although I've not used it to do actual imports, rumor has it that it's quite slow for imports compared to many other methods.  I don't actually have personal proof of that, though.

    Early versions certainly had a bad habit of reverting to RBAR inserts if you weren't careful (essentially it was optimizing for throughput of the powershell pipeline rather than the write to the database)  but more modern versions seem to default to a bulk insert. The usual caveats about testing it in your own environment, particularly if you have large datasets or need maximum performance, all apply as usual. 

    Heh... it DOES figure that it would be better because of the "default to a bulk insert".  I do, however, cut out the proverbial middleman and just go straight to BULK INSERT using T-SQL. 😀

    And, yeah... I absolutely agree with the "usual caveats" that you mention because they're so very true.  It's a pleasure to "meet" you, Andy.

    For a clean file, I'd agree but Powershell's CSV parser is really good at handling the freaky edge cases like fields that contain commas or even double quotes within the double quotes and trying to accomplish the same with a straight bulk import is way more effort than it's worth!
    Nice to "meet" you too, your code has been inspirational over the years on so many occasions.

  • andycadley - Sunday, July 22, 2018 7:33 AM

    Jeff Moden - Sunday, July 22, 2018 7:18 AM

    andycadley - Sunday, July 22, 2018 7:07 AM

    Jeff Moden - Sunday, July 22, 2018 6:06 AM

    andycadley - Sunday, July 22, 2018 3:33 AM

    Powershell can be quite handy for this kind of thing, it can be as simple as:

    Import-Csv -Path C:\temp\test.csv | Write-SqlTableData -ServerInstance '.' -DatabaseName Scratch -SchemaName dbo -TableName MyTable 

    It's a bit picky about column ordering, so if your table structure doesn't match the data you may need to pipe through a Select-Object to rearrange them in the right order and it really doesn't play well with Identity columns in my experience but for simplicity it's hard to beat. In more complex cases I'd usually just import into a specific import table and do the real work in SQL

    +1 to that.  And, although I've not used it to do actual imports, rumor has it that it's quite slow for imports compared to many other methods.  I don't actually have personal proof of that, though.

    Early versions certainly had a bad habit of reverting to RBAR inserts if you weren't careful (essentially it was optimizing for throughput of the powershell pipeline rather than the write to the database)  but more modern versions seem to default to a bulk insert. The usual caveats about testing it in your own environment, particularly if you have large datasets or need maximum performance, all apply as usual. 

    Heh... it DOES figure that it would be better because of the "default to a bulk insert".  I do, however, cut out the proverbial middleman and just go straight to BULK INSERT using T-SQL. 😀

    And, yeah... I absolutely agree with the "usual caveats" that you mention because they're so very true.  It's a pleasure to "meet" you, Andy.

    For a clean file, I'd agree but Powershell's CSV parser is really good at handling the freaky edge cases like fields that contain commas or even double quotes within the double quotes and trying to accomplish the same with a straight bulk import is way more effort than it's worth!
    Nice to "meet" you too, your code has been inspirational over the years on so many occasions.

    Agreed on the freaky stuff that happens.  I have used the trick of using IMPORT-CSV immediately followed by an EXPORT-CSV, which seems to fix a wealth of sins (especially when the source is spreadsheets where the data contains embedded delimiters and double quotes, for example).  You just have to know how to import everything with double-quote encapsulation after that. For some reason, that causes people great angst, especially for the first double-quote. 😀  I think a part of the reason why people steer away from using the actual T-SQL BULK INSERT command is because MS has never documented that backslash-doublequote CAN actually be used as a part of delimiters in BCP FORMAT files (or without if you're a bit clever about it) and has always been that way.

    And thank you for the very kind feedback.  I'm humbled by your comment.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply