• vl1969-734655 - Monday, January 23, 2017 8:55 AM

    Jeff Moden - Monday, January 23, 2017 8:28 AM

    John Bigler - Monday, January 23, 2017 6:49 AM

    Good morning all.  

    Nothing wrong with these solutions, but I prefer something more direct like just treating the CSV files as virtual tables and reading them using OPENROWSET instead of created linked servers.  An example:

    SELECT
         Identifier
         , FirstName
         , Surname
         , PostalCode
      FROM OPENROWSET('MSDASQL'
         ,'Driver={Microsoft Access Text Driver (*.txt, *.csv)};DefaultDir=C:\LoadDir\'
         ,'SELECT * FROM "MyDataFile.csv"')

    How do you define the delimiter to be a semi-colon as in the example data provided? 😉

    well for "OPENROWSET" or "OPENDATASOURCE" to work, regardless of data provider, 
    you also need a file "schema.ini" to exists in the same folder where the source files are located.
    within that file you set a file definition config that is used by the functions above 

    so for example if you have file1.csv and file2.csv

    than with in shema.ini you would have 2 sections defining the file format and column names,
    the starting raw if needed to skip the header etc.
    without that, your data might/will get messed up or not processed at all.

    couple of years back, I had a project where I needed to automate an import/export of data in and out of SQL database. since back than, we run on windows server 2008 and old powershell version, using PS scripts were not possible,  so I build. up solution using the  "OPENROWSET" and "OPENDATASOURCE"  procedures on MS-SQL 2010. 

    last year we had a huge server crash which took out our SQL server with it. as it happened we also lost our backup of the processing DB with all my import/export code (we had a full up to date backup of all main DBs but one that I build and used for intermediate data processing.)
    so once I had the new server build out and moved to SQL 2014, was fairly easy to upgrade from 2010.
    I had to either rebuild whole processing DB from scratch(lost all my scripts few month before the crush to a cryptolocker), or figure out an easier way. was looking into PowerShell to get a feel of it at the time,
    tried a few things with other flat files, and though I had nothing to loose by trying.
    other than being slow, I like the PS way of doing things.   also I am sure that it is slow simply because I haven't found a better way of doing things I need do to the luck of knowledge, not the PS itself.  

    [file1.csv]
    FORMAT=CSVDelimited
    CHARACTERSET=OEM
    COLNAMEHEADER=FALSE
    TEXTDELIMITER = "
    DateTimeFormat = "MM/DD/YYYY"
    CurrencyDigits = 2
    CurrencySymbol =' '
    CurrencyNegFormat = 2
    STARTROW = 1
    MAXSCANROWS = 0
    COL1 = col1  CHAR WIDTH 200
    COL2 = col2  Short
    COL3 = col3  CHAR WIDTH 200

    [file2.csv]
    FORMAT=DELIMITED(;)
    CHARACTERSET=OEM
    COLNAMEHEADER=FALSE
    TEXTDELIMITER = "NONE"
    DateTimeFormat = "MM/DD/YYYY"
    CurrencyDigits = 2
    CurrencySymbol = ' '
    CurrencyNegFormat = 2
    STARTROW = 2
    MAXSCANROWS = 0
    COL1 = col1  CHAR WIDTH 200
    COL2 = col2  Short
    COL3 = col3  CHAR WIDTH 200
    COL4 = col4  CHAR WIDTH 200

    Well, Interestingly enough... I've never had to use a schema.ini file.  I'll need to use DBCC TIMEWARP but I'll try to find an old example for what I'm talking about.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)