I'm importing a bunch of CSVs. They are TAB delimited and most of the char columns are quoted (I haven't looked yet to see what happens with any embedded quotes ... might be doubled-up and need sorting out too).
I don't particularly want to use FORMAT files because of the hassle of maintaining them - external to the normal code base.
I could knock up a pre-processor that strips out the chaff before the files are imported, but that's going to take me longer than "a bit of SQL".
However, my concern is that the "bit of SQL" might be a performance/TLog/etc hog? They are multi-million row tablesI'm looking at either using STUFF/REVERSE or SUBSTRING/LEN
[MyCharCol1] = NullIf(
CASE WHEN [MyCharCol1] LIKE '"%"'
REVERSE(STUFF([MyCharCol1], 1, 1, ''))
, 1, 1, '')
, [MyCharCol2] = NullIf(CASE WHEN [MyCharCol2] LIKE '"%"'
THEN SUBSTRING([MyCharCol2], 2, LEN([MyCharCol2])-2)
Any suggestions as to which would perform better? I suppose it depends whether LEN has to count-characters, or has that metadata available anyway. Or if there are any other (i.e. undesirable) side-effects of SUBSTRING / LEN
Do I need to use DATALENGTH() rather than LEN() ? I've got a mixture of NVarchar and Varchar columns to process, probably some of them will have trailing spaces ... probably ought to RTrim() those too