Importing a CSV into SQL Server Shouldn't Be This Hard

  • As strange as it may strike some, I've not even downloaded DBATools, yet.  I'm sure it's documented on the site one way or the other but, just to ask the quick question, do you have to download an install the whole shebang or can you download and install just on of the tools, like this one?

    And thanks for the article.  Like you, I've found that the biggest problem is that people don't quite understand the "database" files known as CSV, TSV, or even "Fixed Field", and so they add a bunch of junk instead of understanding the concept of a "data file" and a "manifest file".  The really good ones also include a single meta-data file as a record-layout for all common files.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Hi Steve,

    This post immediately struck a chord with me. SSIS would have to be one of the hardest to use tools ever for doing such a simple job. And don't even think about such complications as different encoding which requires additional staging tables, field by agonising field (not sure how the different encoding came about anyway).

    It surprises that when so much with SQL server is done so well that an important element such as SSIS remains poor. Getting data into the iseries (not renowned for its openness) was much easier than this!

    Nowadays I use python with some of the excellent libraries like csv or pandas plus pyodbc for quick and simple imports of all kinds of wacky input data,,,,works a treat!

     

  • I've used a variety of ETL tools. SSIS is my least favourite by a long way.

    There's an article for Python that illustrates why the CSVReader is so complex and it's all to do with weaknesses of CSV format.

    With WSL available I find Linux command line utilities do so much for me. Something ancient like awk is like finding out your grandpa can run a sub 2 hour marathon.

    I do agree Wizards should be better but understand why they are not

  • Am I a dinosaur for defaulting to BULK INSERT for tasks like this? I know it is a manual process, including defining the table in the first place, but I was surprised to see it not mentioned - even if it is to weigh the pros and cons (and tell all of us that use it that we are dinosaurs!).

  •  

    I had the same experiences at sometimes – Maybe because I’m more of a SQL guy and not 100% familiar with the SSIS tools. So most times I prefer using just the  BULK INSERT from SQL Server an import the CSV-File ( or files ) in a staging table. A temp table or a table variable is just fine. Then I use update an delete statements to clean and transform the data. At last use the MERGE statement to fill the final target table. With this you are in full control over your import logic in one place. It is super-fast and you have an ETL-Process that works, without dependencies. Meanwhile I was successful even with files that looks a lot more crazy then yours. And most of the time the work is fun for me.

  • Use Excel.  It will open csv files and is much more sensible at interpreting them.

    it's perfectly functional as a simple database tool for simple analysis and for editing the data.

    And of course you can move it from excel into a database fairly easily

  • My colleague prefers the SSIS-method of getting CSV/Excel-data into SQL Server.

    Personally, I prefer to use generate VALUES entries with Excel formulas. I even have a IF-Statement to check if the row-number equals 2 so that the INSERT-statement will automatically be inserted. I then just drag the selected cell with the formula down the Excel sheet, copy the result and paste it into SSMS.

    As an example: = ", ( '" & A2 & "', dateadd( day, " & F2 & ", '18991230' ), " & H2 & ", " & J2 & " )" . F2 is a date field.

    Or for datasets with more than 1 000 entries: = = IF( MOD(ROW(); 1000) = 2; "; INSERT INTO #tmp( Locality varchar(100), Region char(2), LocalityNr smallint, Oeppis char(1) ) VALUES "; ", " ) & "( '"&SUBSTITUTE(C2;"'";"''")&"', '" & A2 & "', '" &SUBSTITUTE(F2;"'";"''") & "', " & E2 & ", '" & H2 & "', '" & IF(B2="Q"; "J"; "N") & "' )"

    It may seem complicated, but with practice you become quite adept at typing it out, or from changing saved formulas from a file of formulas worth retaining.

    • This reply was modified 3 months, 2 weeks ago by  sean redmond. Reason: Poops. Big omission in my second formula
  • In this case I would suggest to not using SQL server at all, just rely on scripting languages like Python, R and PowerShell.

    You don't need to have the data in a database, you just need to wrangle data from some sources into a new file.

    In my job I use R (and the tidyverse packages) regularly to get this type of ETL done.

  • Still wondering why something likely as importing a csv is still so hard using the sql server tools

  • In my contracts, I have to leave the client with something that they can maintain. This generally means an SSIS job to import any CSVs. To try and be quick, I use the Wizard and save the DTSX for import into a SSIS project and, sometimes, it is quick. Sometimes it gives unhelpful error messages, as in this article.

    From my experience, I would guess that the place to look to fix the issue is on the "Choose Data Source" menu. You can tailor the import on the "Advanced" tab - a task that I have had to carry out many times. Nice to see the ADS + addon worked a bit better. The future should be brighter as I cannot see that the issues with the SSMS wizard are going to be fixed.

  • Compared with Power Query from Excel or Power BI it really seems hard to handle csv. SSIS is older. But meanwhile you can use a Power Query Data Source in SSIS and maybe Copy the Query from the Excel Wizard. I think it may be a little bit of an overkill in your case. With the Power Query Wizard your CSV is really easy to handle. If you choose use it together with SSIS would depend on how much automation you need.

  • Hi - this was removed by the editor as spam! So this is a repost.

    In my contracts, I have to leave the client with something that they can maintain. This generally means an SSIS job to import any CSVs. To try and be quick, I use the Wizard and save the DTSX for import into a SSIS project and, sometimes, it is quick. Sometimes it gives unhelpful error messages, as in this article.

    From my experience, I would guess that the place to look to fix the issue is on the "Choose Data Source" menu. You can tailor the import on the "Advanced" tab - a task that I have had to carry out many times. Nice to see the ADS + addon worked a bit better. The future should be brighter as I cannot see that the issues with the SSMS wizard are going to be fixed.

  • In the past I've used MSAccess to prep excel/csv files to standardise the formats and get things ready for a SSIS package to do the import.

    Not perfect, but it got the job done...getting suppliers to use data templates and stick with the same layout would make things a WHOLE lot easier...

  • I'm surprised at this article. Why doesn't anyone use the tried and true method that I've been using for over 20 years:

    SELECT *

    from opendatasource('Microsoft.ACE.OLEDB.12.0',

    'Data Source=C:\Zipstuff\;Extended Properties="Text;HDR=YES"')...products#csv

    While I'm using this against a CSV -- this is easily used against Excel or other weird delimited files.

    I wrote an article back in 2015 and it's still relevant:

    http://www.swimeventtimes.com/post/2015/11/22/Importing-ExcelCSV-to-SQL-Server-OpenDataSource.aspx

    Enjoy

Viewing 15 posts - 1 through 15 (of 62 total)

You must be logged in to reply to this topic. Login to reply