Good morning all.
Nothing wrong with these solutions, but I prefer something more direct like just treating the CSV files as virtual tables and reading them using OPENROWSET instead of created linked servers. An example:
SELECT
Identifier
, FirstName
, Surname
, PostalCode
FROM OPENROWSET('MSDASQL'
,'Driver={Microsoft Access Text Driver (*.txt, *.csv)};DefaultDir=C:\LoadDir\'
,'SELECT * FROM "MyDataFile.csv"')
How do you define the delimiter to be a semi-colon as in the example data provided? 😉
well for "OPENROWSET" or "OPENDATASOURCE" to work, regardless of data provider,
you also need a file "schema.ini" to exists in the same folder where the source files are located.
within that file you set a file definition config that is used by the functions above
so for example if you have file1.csv and file2.csv
than with in shema.ini you would have 2 sections defining the file format and column names,
the starting raw if needed to skip the header etc.
without that, your data might/will get messed up or not processed at all.
couple of years back, I had a project where I needed to automate an import/export of data in and out of SQL database. since back than, we run on windows server 2008 and old powershell version, using PS scripts were not possible, so I build. up solution using the "OPENROWSET" and "OPENDATASOURCE" procedures on MS-SQL 2010.
last year we had a huge server crash which took out our SQL server with it. as it happened we also lost our backup of the processing DB with all my import/export code (we had a full up to date backup of all main DBs but one that I build and used for intermediate data processing.)
so once I had the new server build out and moved to SQL 2014, was fairly easy to upgrade from 2010.
I had to either rebuild whole processing DB from scratch(lost all my scripts few month before the crush to a cryptolocker), or figure out an easier way. was looking into PowerShell to get a feel of it at the time,
tried a few things with other flat files, and though I had nothing to loose by trying.
other than being slow, I like the PS way of doing things. also I am sure that it is slow simply because I haven't found a better way of doing things I need do to the luck of knowledge, not the PS itself.
[file1.csv]
FORMAT=CSVDelimited
CHARACTERSET=OEM
COLNAMEHEADER=FALSE
TEXTDELIMITER = "
DateTimeFormat = "MM/DD/YYYY"
CurrencyDigits = 2
CurrencySymbol =' '
CurrencyNegFormat = 2
STARTROW = 1
MAXSCANROWS = 0
COL1 = col1 CHAR WIDTH 200
COL2 = col2 Short
COL3 = col3 CHAR WIDTH 200
[file2.csv]
FORMAT=DELIMITED(;)
CHARACTERSET=OEM
COLNAMEHEADER=FALSE
TEXTDELIMITER = "NONE"
DateTimeFormat = "MM/DD/YYYY"
CurrencyDigits = 2
CurrencySymbol = ' '
CurrencyNegFormat = 2
STARTROW = 2
MAXSCANROWS = 0
COL1 = col1 CHAR WIDTH 200
COL2 = col2 Short
COL3 = col3 CHAR WIDTH 200
COL4 = col4 CHAR WIDTH 200