Remove Quotes from CSV

  • Kristen-173977

    SSCrazy Eights

    Points: 8422

    I'm importing a bunch of CSVs. They are TAB delimited and most of the char columns are quoted (I haven't looked yet to see what happens with any embedded quotes ... might be doubled-up and need sorting out too).

    I don't particularly want to use FORMAT files because of the hassle of maintaining them - external to the normal code base.

    I could knock up a pre-processor that strips out the chaff before the files are imported, but that's going to take me longer than "a bit of SQL".

    However, my concern is that the "bit of SQL" might be a performance/TLog/etc hog? They are multi-million row tablesI'm looking at either using STUFF/REVERSE or SUBSTRING/LEN
    UPDATE MyTable
        [MyCharCol1] = NullIf(
                              CASE WHEN [MyCharCol1] LIKE '"%"'
                                   THEN REVERSE(STUFF(
                                                      REVERSE(STUFF([MyCharCol1], 1, 1, ''))
                                                    , 1, 1, '')
                                   ELSE [MyCharCol1]
                              , '')
       , [MyCharCol2] = NullIf(CASE WHEN [MyCharCol2] LIKE '"%"'
                                   THEN SUBSTRING([MyCharCol2], 2, LEN([MyCharCol2])-2)
                                   ELSE [MyCharCol2]
                              , '')
       , ...

    Any suggestions as to which would perform better? I suppose it depends whether LEN has to count-characters, or has that metadata available anyway. Or if there are any other (i.e. undesirable) side-effects of SUBSTRING / LEN

    Do I need to use DATALENGTH() rather than LEN() ? I've got a mixture of NVarchar and Varchar columns to process, probably some of them will have trailing spaces ... probably ought to RTrim() those too

  • Ken McKelvey


    Points: 18301

    I think BCP needs a format file for this.

    I  tend to use Powershell for this type of processing with the Out-DataTable function:

    I then  use something like:

    . "YourPath\Out-DataTable.ps1"

    $Folder = 'YourFolder'
    $NameDate = "$((Get-Date -Format s).substring(0,13).replace('T', '-'))"
    $ConnectionString = "Data Source=YourDS; Database=YourDB; Trusted_Connection=True;";

    $Files = Get-ChildItem -Path $Folder | Where-Object{!($_.PSIsContainer) -and ($_.Name -like "$NameDate*.csv")} | Sort-Object Name
    $bulkCopy = new-object ("Data.SqlClient.SqlBulkCopy") $ConnectionString
    $connection = New-Object System.Data.SqlClient.SqlConnection($ConnectionString)
    $sqlCommand = $connection.CreateCommand()

    foreach ($File in $Files)
      $Table = $File.Name.Substring(19).replace('.csv','')
      #echo $Table
      $sqlCommand.CommandText = "TRUNCATE TABLE dbo." + $Table
      $result = $sqlCommand.ExecuteNonQuery()
      $csvDataTable = Import-CSV -Path $File.FullName | Out-DataTable
      $bulkCopy.DestinationTableName = $Table



    This works well enough for me  as my maximum file size is 65K lines.

  • Kristen-173977

    SSCrazy Eights

    Points: 8422

    Thanks Ken, I'll give it a go 🙂

  • frederico_fonseca


    Points: 14702

    as your file is tab delimited you need to add
    -Delimiter "`t"
    to the import-csv command.

    also and if volumes are high you will wish to add batch size to it
    $bulkCopy.BatchSize = 50000 -- 50k rows batch size

    If for some reason your file is not fully CSV compliant you may need to look at other methods like 
    Microsoft.VisualBasic.FileIO.TextFieldParser example at

    or a combination of the above plus regex

  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply