Importing Tricky CSV Files with PowerShell

  • Stewart "Arturius" Campbell

    SSC Guru

    Points: 71316

    Comments posted to this topic are about the item Importing Tricky CSV Files with PowerShell

    ____________________________________________
    Space, the final frontier? not any more...
    All limits henceforth are self-imposed.
    “libera tute vulgaris ex”

  • SADSAC

    SSCommitted

    Points: 1897

    This is great, exactly what I was looking for....

  • Jerry Kelk

    SSC Enthusiast

    Points: 121

    This has been a personal annoyance for years.
    Surely, the import engine should properly respect the double quotes (or whatever specified character) as string delimiters, if so directed.  We would not then have to hack at the user's text by removing commas or any such workarounds.

  • Romac

    Right there with Babe

    Points: 722

    Interesting to see PowerShell used for this. (Must add "Learn PowerShell" to my @todo list!) To circumvent the issue where there are pipes already in the data, you could add something to replace them in the data line variable before processing char-by-char, then put them back in before writing.
    We did similar to this using c# in SSIS, including directly importing the files. I solved the CSV format issues by pre-processing the CSV files using a little routine I wrote in Go (@golang). Go, PHP and several other languages do not suffer from the crippling flaw in Microsoft products of being unable to correctly import CSV files unless they are of the very simplest form. Many, including Go, can correctly process files containing the separator, carriage returns, linefeeds, tabs, backspaces, etc embedded within quoted fields.

    It would be great for all of us if Microsoft addressed that!

    p.s. I wrote, compiled and tested my Go programme on my Mac. I then compiled a Microsoft Server executable still on my Mac, copied the (single stand-alone) executable to the server; it ran perfectly. That is one of Go's party tricks! It is also fast, typically processing a 1.5Gb csv file with 3.6M rows in around 90s on our server.

  • ssc 21649

    SSC Enthusiast

    Points: 101

    This is cool!

    If a quote-delimited string looks like this:

    "1313 Mockingbird Lane, Detroit, MI 48690"


    (two commas), does this logic still work?
    Same question for quote-delimited strings like this:

    "This is a ""test"" of parsing"

  • Stewart "Arturius" Campbell

    SSC Guru

    Points: 71316

    SADSAC - Tuesday, July 3, 2018 1:48 AM

    This is great, exactly what I was looking for....

    Glad to have been of some assistance

    ____________________________________________
    Space, the final frontier? not any more...
    All limits henceforth are self-imposed.
    “libera tute vulgaris ex”

  • Stewart "Arturius" Campbell

    SSC Guru

    Points: 71316

    Jerry Kelk - Tuesday, July 3, 2018 2:43 AM

    This has been a personal annoyance for years.
    Surely, the import engine should properly respect the double quotes (or whatever specified character) as string delimiters, if so directed.  We would not then have to hack at the user's text by removing commas or any such workarounds.

    Agree wholeheartedly...

    ____________________________________________
    Space, the final frontier? not any more...
    All limits henceforth are self-imposed.
    “libera tute vulgaris ex”

  • Stewart "Arturius" Campbell

    SSC Guru

    Points: 71316

    ssc 21649 - Tuesday, July 3, 2018 5:28 AM

    This is cool!

    If a quote-delimited string looks like this:

    "1313 Mockingbird Lane, Detroit, MI 48690"


    (two commas), does this logic still work?
    Same question for quote-delimited strings like this:

    "This is a ""test"" of parsing"

    It does not matter how many comma's are encountered within the quotes, the logic will still handle it correctly.
    even double-delimiting will work correctly.

    ____________________________________________
    Space, the final frontier? not any more...
    All limits henceforth are self-imposed.
    “libera tute vulgaris ex”

  • ssc 21649

    SSC Enthusiast

    Points: 101

    Stewart "Arturius" Campbell - Tuesday, July 3, 2018 6:24 AM

    ssc 21649 - Tuesday, July 3, 2018 5:28 AM

    This is cool!

    If a quote-delimited string looks like this:

    "1313 Mockingbird Lane, Detroit, MI 48690"


    (two commas), does this logic still work?
    Same question for quote-delimited strings like this:

    "This is a ""test"" of parsing"

    It does not matter how many comma's are encountered within the quotes, the logic will still handle it correctly.
    even double-delimiting will work correctly.

    Cool, thanks for sharing this!

  • Brian F

    SSC Rookie

    Points: 33

    I used a VB script inside an SSIS package to accomplish the same functionality. This is a smart way to take care of this issue.

    With the data I needed to import, the pipe ended up not being a good delimiter. So, especially in data fields full of user input, it's worth keeping in mind different alternatives. My solution was using a triple semicolon, which you should be hard pressed to find in normal user input, but I'd be interested to hear if anyone else had similar issues with different workaround delimiters.

  • ssc 21649

    SSC Enthusiast

    Points: 101

    Brian F - Tuesday, July 3, 2018 7:04 AM

    I used a VB script inside an SSIS package to accomplish the same functionality. This is a smart way to take care of this issue.

    With the data I needed to import, the pipe ended up not being a good delimiter. So, especially in data fields full of user input, it's worth keeping in mind different alternatives. My solution was using a triple semicolon, which you should be hard pressed to find in normal user input, but I'd be interested to hear if anyone else had similar issues with different workaround delimiters.

    In the past, the trick I have often used is replacing the delimiters with ASCII 7 (the bell character). I have NEVER encountered an ASCII 7 in normal text data, so it has worked well for me (although, that's not to say it won't ever happen).  😉

  • Bruin

    SSC Eights!

    Points: 894

    How do you invoke the script -thanks

  • layton 35542

    Valued Member

    Points: 64

    FYI, The Import-Csv cmdlet handles embedded quotes.  

    e.g.

    $Header = "ProductKey","Code","Description","Narrative","Comment"

    $J = Import-Csv test.csv -Header $Header
    $J | Format-Table

    Here's another version as an example that work too: (TMTOWTDI)

    [void][Reflection.Assembly]::LoadWithPartialName("System.Data")
    [void][Reflection.Assembly]::LoadWithPartialName("System.Data.SqlClient")
    $Date = $args[0]
    $ConnectionString = $args[1]
    $Directory = $args[2]
    $Table = $args[3]

    $file="$Directory\$Table.csv" #This could be anything, the file just happens to be the same as the table name in this instance

    $FirstRowColumnNames = $true
    $batchsize = 75000 #This could, theoretically, be any number - i found that this has the least drain on cpu & resources
    $bulkcopy = New-Object Data.SqlClient.SqlBulkCopy($Connectionstring, [System.Data.SqlClient.SqlBulkCopyOptions]::TableLock)
    $bulkcopy.DestinationTableName = $Table
    $bulkcopy.bulkcopyTimeout = 0
    $bulkcopy.batchsize = $batchsize
    $datatable = New-Object System.Data.DataTable

    # USE Import-Csv -- BEGIN
    $CSV_CONTENT = Import-Csv $file

    # Get Headers from first line
    $headers = $CSV_CONTENT[0].psobject.Properties.Name

    foreach ($hdr in $headers) {
      $datatable.Columns.Add($hdr) | Out-Null
    }

    # Values (rows)
    foreach ( $csvRow in $CSV_CONTENT ) {
      $row = $datatable.NewRow()
      foreach ( $c in $headers ) {
       $row[$c] = $csvRow.$c
      }
      $datatable.Rows.Add($row) | Out-Null
    }
    # USE Import-Csv -- END

    if($datatable.Rows.Count -gt 0) {
      $bulkcopy.WriteToServer($datatable)
      $datatable.Clear()
    }

    $bulkcopy.Close()
    $bulkcopy.Dispose()
    $datatable.Dispose()

  • Romac

    Right there with Babe

    Points: 722

    ssc 21649 - Tuesday, July 3, 2018 7:11 AM

    Brian F - Tuesday, July 3, 2018 7:04 AM

    I used a VB script inside an SSIS package to accomplish the same functionality. This is a smart way to take care of this issue.

    With the data I needed to import, the pipe ended up not being a good delimiter. So, especially in data fields full of user input, it's worth keeping in mind different alternatives. My solution was using a triple semicolon, which you should be hard pressed to find in normal user input, but I'd be interested to hear if anyone else had similar issues with different workaround delimiters.

    In the past, the trick I have often used is replacing the delimiters with ASCII 7 (the bell character). I have NEVER encountered an ASCII 7 in normal text data, so it has worked well for me (although, that's not to say it won't ever happen).  😉

    Ding-ding! Same here!

  • layton 35542

    Valued Member

    Points: 64

    Bruin - Tuesday, July 3, 2018 7:31 AM

    How do you invoke the script -thanks

    Using the following setup:
    1) CSV file = c:\tmp\ProductsTest.csv
    2) PS script: c:\myscripts\csvLoad.ps1
    3) Create table ProductsTest

    create table ProductsTest (
    ProductKey int
    ,Code varchar(50)
    ,Description varchar(100)
    ,Narrative varchar(500)
    ,Comment varchar(500)
    )

    Open Powershell in c:\myscripts directory and execute ( replace MyDB with your database name ):

    PS C:\myscripts > .\csvLoad.ps1 20180703 "Data Source=(local);Initial Catalog=MyDB;Integrated Security=true" C:\tmp ProductsTest

Viewing 15 posts - 1 through 15 (of 75 total)

You must be logged in to reply to this topic. Login to reply