Importing Tricky CSV Files with PowerShell

  • Bruin - Tuesday, July 10, 2018 10:47 AM

    When using $TruncateLongValues -or $EmptyValueIsNull the process slows way down. I have a 168 row csv and loads perfectly, but
    is there anything that could speed up the process?

     Thanks again.

    Not really. As I previously stated, doing processing at the client end does slow it right down as we have to check every field in every row.
    What you get is reliability as a trade off for speed. What we have ended up with is more an example of what can be done with PowerShell. Whether it is fit for purpose in all cases is another matter entirely.

    Probably the quickest way to safely import this much data is to use the bcp utility to create an XML format file from your staging table, adjust the format file appropriately, and then BCP in your data with that. SQL server will do the validation much faster than any PowerShell script.
    https://docs.microsoft.com/en-us/sql/relational-databases/import-export/create-a-format-file-sql-server?view=sql-server-2014

    It's also worth noting that PowerShell can be extremely useful for certain types of agent job steps where expressing some logic has you relying too much on sp_executesql or CmdExec steps and is much nicer than NT batch language!

  • What would have to change in script if Staging table was altered so the char fields defaulted to hold a larger value.  Will the $EmptyValueIsNull(which is needed) still be slow?

    I like the reliability of the loading and how it handles the csv to sql. Just trying to think of away to let PS doing the loading and maybe TSQL do some of the Translation.

     Many thanks again this is a great script.

  • Sorry didn't see all your post.

  • Stewart "Arturius" Campbell - Monday, July 2, 2018 11:04 PM

    Comments posted to this topic are about the item Importing Tricky CSV Files with PowerShell

    Question: 3 parms are being passed to this script. I'm assuming this script is part of a powershell function that you call and pass the parms to.
    Can you give me an example of how you would define this script as a powershell function and then a specific example of the contents of the
    actual parms? Particularly in regards to the $Connection parameter.

    Thanks,
    DaveR

  • dreynolds-779539 - Monday, July 16, 2018 9:01 AM

    Stewart "Arturius" Campbell - Monday, July 2, 2018 11:04 PM

    Comments posted to this topic are about the item Importing Tricky CSV Files with PowerShell

    Question: 3 parms are being passed to this script. I'm assuming this script is part of a powershell function that you call and pass the parms to.
    Can you give me an example of how you would define this script as a powershell function and then a specific example of the contents of the
    actual parms? Particularly in regards to the $Connection parameter.

    Thanks,
    DaveR

    How do you invoke the script -thanks

    Using the following setup:
    1) CSV file = c:\tmp\ProductsTest.csv
    2) PS script: c:\myscripts\csvLoad.ps1
    3) Create table ProductsTest

    create table ProductsTest (
    ProductKey int
    ,Code varchar(50)
    ,Description varchar(100)
    ,Narrative varchar(500)
    ,Comment varchar(500)
    )

    Open Powershell in c:\myscripts directory and execute ( replace MyDB with your database name ):

    PS C:\myscripts > .\csvLoad.ps1 20180703 "Data Source=(local);Initial Catalog=MyDB;Integrated Security=true" C:\tmp ProductsTest

    ____________________________________________
    Space, the final frontier? not any more...
    All limits henceforth are self-imposed.
    “libera tute vulgaris ex”

  • Stewart "Arturius" Campbell - Monday, July 16, 2018 10:58 PM

    dreynolds-779539 - Monday, July 16, 2018 9:01 AM

    Stewart "Arturius" Campbell - Monday, July 2, 2018 11:04 PM

    Comments posted to this topic are about the item Importing Tricky CSV Files with PowerShell

    Question: 3 parms are being passed to this script. I'm assuming this script is part of a powershell function that you call and pass the parms to.
    Can you give me an example of how you would define this script as a powershell function and then a specific example of the contents of the
    actual parms? Particularly in regards to the $Connection parameter.

    Thanks,
    DaveR

    How do you invoke the script -thanks

    Using the following setup:
    1) CSV file = c:\tmp\ProductsTest.csv
    2) PS script: c:\myscripts\csvLoad.ps1
    3) Create table ProductsTest

    create table ProductsTest (
    ProductKey int
    ,Code varchar(50)
    ,Description varchar(100)
    ,Narrative varchar(500)
    ,Comment varchar(500)
    )

    Open Powershell in c:\myscripts directory and execute ( replace MyDB with your database name ):

    PS C:\myscripts > .\csvLoad.ps1 20180703 "Data Source=(local);Initial Catalog=MyDB;Integrated Security=true" C:\tmp ProductsTest

    I don't use PowerShell for most things and, when I do, I use the xp_CmdShell hammer to do it from SQL Server.  How would you invoke the script from SQL Server through a scheduled job without using xp_CmdShell?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, July 17, 2018 6:14 AM

    Stewart "Arturius" Campbell - Monday, July 16, 2018 10:58 PM

    dreynolds-779539 - Monday, July 16, 2018 9:01 AM

    Stewart "Arturius" Campbell - Monday, July 2, 2018 11:04 PM

    Comments posted to this topic are about the item Importing Tricky CSV Files with PowerShell

    Question: 3 parms are being passed to this script. I'm assuming this script is part of a powershell function that you call and pass the parms to.
    Can you give me an example of how you would define this script as a powershell function and then a specific example of the contents of the
    actual parms? Particularly in regards to the $Connection parameter.

    Thanks,
    DaveR

    How do you invoke the script -thanks

    Using the following setup:
    1) CSV file = c:\tmp\ProductsTest.csv
    2) PS script: c:\myscripts\csvLoad.ps1
    3) Create table ProductsTest

    create table ProductsTest (
    ProductKey int
    ,Code varchar(50)
    ,Description varchar(100)
    ,Narrative varchar(500)
    ,Comment varchar(500)
    )

    Open Powershell in c:\myscripts directory and execute ( replace MyDB with your database name ):

    PS C:\myscripts > .\csvLoad.ps1 20180703 "Data Source=(local);Initial Catalog=MyDB;Integrated Security=true" C:\tmp ProductsTest

    I don't use PowerShell for most things and, when I do, I use the xp_CmdShell hammer to do it from SQL Server.  How would you invoke the script from SQL Server through a scheduled job without using xp_CmdShell?

    Create a PowerShell job step along these lines

    $connectionString = "Data Source=$(ESCAPE_NONE(SRVR));Initial Catalog=whatever;Integrated Security=true"
    & "full\path\to\csvLoad.ps1" 20180703 $connectionString C:\tmp ProductsTest

    Ideally use the script I published further up in this discussion as it will handle delimiters much better, and there are the additional safety checks - if enabled. Beware that those checks significantly impact performance in the interests of quality

  • Jeff Moden - Tuesday, July 17, 2018 6:14 AM

    I don't use PowerShell for most things and, when I do, I use the xp_CmdShell hammer to do it from SQL Server.  How would you invoke the script from SQL Server through a scheduled job without using xp_CmdShell?

    To execute from SQL agent, Job Step, Type = "PowerShell".
    The command is either the path to the script file (.ps1) or, for a simple script, with no parameters, the full script...

    ____________________________________________
    Space, the final frontier? not any more...
    All limits henceforth are self-imposed.
    “libera tute vulgaris ex”

  • How would the script handle invalid data in a datetime field. It's not null or empty, but contains something other than a date

    thanks

  • can you force a null for those

  • Stewart "Arturius" Campbell - Tuesday, July 17, 2018 7:15 AM

    Jeff Moden - Tuesday, July 17, 2018 6:14 AM

    I don't use PowerShell for most things and, when I do, I use the xp_CmdShell hammer to do it from SQL Server.  How would you invoke the script from SQL Server through a scheduled job without using xp_CmdShell?

    To execute from SQL agent, Job Step, Type = "PowerShell".
    The command is either the path to the script file (.ps1) or, for a simple script, with no parameters, the full script...

    Awesome.  Thanks for the tip, Stewart.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi,

    Any idea to import a csv file using powershell as it was decribed in the article but if there is \r\n delimeters in between of some lines of the file?

    Thanks!

  • When I've got something that the standard import tools can't handle easily, I tend to fall back on the Microsoft.VisualBasic.FileIO.TextFieldParser object.  You're not limited to VB, it can be used in C# or PowerShell code.  It handles qualifiers, multiple delimiters, embedded delimiters, comment lines, and so on.

    Parse one row at a time and add to a DataTable, then for large files pass it to a SqlBulkCopy object every thousand lines or so.

  • If you have certain fields in the csv input that you would like to do a pre-processor for, to perform checks before loading to SQL.

     This could be used to help speed up loads. I have 4 fields that are dates. I would like to make sure they contain dates and if so are valid. if not then write a log
    of the row and field and the Invalid date format(mm-dd-yyyy -format needed). If the date fields are null or blank that doesn't mean and error so that would be first check. 

     The original script posted does a check during time of load, but this pre-process could be used to validate input before load.

     Thanks.

  • While not perfect, I use the code below for this. the 'l' function is just a logging function that prepends a date/time. You can see them both here:

    https://gist.github.com/royashbrook/0cd77d1fa654803184a4d1c0439f5bcf

    i have found the powershell csv tools seem to do well at handling the 'weird' files. This can be done without the streaming, but on larger files the csv object ends up being fairly huge, but ymmv depending on what you are importing.

    function bcpCsv($f,$t,$b,$cs)
    {
      l "Importing file $f to table $t using batchsize $b and connection $cs"
     
      l "creating datatable using target table schema"
      $dt = (getData "select top 0 * from $t" $cs)
      $cols = $dt.columns.columnname
     
      l "bulk importing datatable"
      $bcp = new-object Data.SqlClient.SqlBulkCopy($cs, [System.Data.SqlClient.SqlBulkCopyOptions]::TableLock)
      $bcp.bulkcopytimeout = 0
      $bcp.destinationtablename = $t
      $bcp.batchsize = $b
      l "opening source file as stream"
      $r = New-Object System.IO.StreamReader($f)
      l "looping through each line, writing files to server at batchsize"
      $i=0
      try{
       while (($l = $r.ReadLine())) {
        $i++
        $rec = ConvertFrom-Csv $l -Header $cols
        $dr = $dt.NewRow()
        foreach($col in $cols) {
          if ($rec.$col.value) {
           $dr.$col = $rec.$col
          }
        }
        $dt.Rows.Add($dr)
        if (($i % $b) -eq 0) {
          l "writing batch to server. on record $i"
          $bcp.WriteToServer($dt)
          $dt.Clear()
        }
        
       }
       if($dt.Rows.Count -gt 0) {
        l "writing final batch to server. on record $i"
        $bcp.WriteToServer($dt)
        $dt.Clear()
       }
      }catch{
       l "An error ocurred on record $i. Details below:"
       $_
      }finally{
       $r.Dispose()
       $bcp.Dispose()
       $dt.Dispose()
      }
    }

Viewing 15 posts - 61 through 75 (of 76 total)

You must be logged in to reply to this topic. Login to reply