BombProof Data Import from Excel ?

  • I have a very large spreadsheet of 125,000 rows, 200 columns, some of which contain a lot of text. I have tried every method I can think of to import the data into SQL such as converting to .csv, tab delimited text, xls with just 60000 rows ... I keep getting hung up with data truncation or something, even when I tell the Import wizard to ignore truncation. and my SQL table has all varchar(max) columns.

    Is there some foolproof way force the data into a sql table, either predfined or created on the fly, then I can worry about what the data looks like after it's in SQL ? I tried BULK INSERT 2 and get non-useful errors and just get the column headers.

    I installed the Office 12.0 OLE DB Provider, but get errors when trying to use it with a .XLSX file .... very frustrating 6 hours.

    I have a SQL 2008 (R0) server and SQL 2005 servers to work with.

    Suggestions ?

  • Have you tried importing into an "inbox" or "incoming data" table that the Import Wizard creates itself? I just did this recently to get data into a table and then I split things up from there. It likes to create everything as an nvarchar and I let it just to get the data into the database.

    And here I thought I was alone in having problems importing data from Excel into SQL Server...

  • how does the data actually come? is it a real csv file, and not an excel doc?

    I've imported millions of rows via bulk insert or SSIS with delimited files, which would be my preference;

    excel 2007 doesn't handle lots of rows too nicely, i believe , and can manipulate the data to what it THINKS the data should be, instead of what it really is.

    I've also automated excel in .NET ,and it's frustratingly slow when it's processing a Range line by line.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Unfortunately, I agree with Lowell here. When you start trying to bombproof (I like that term) Excel importations, you're looking at VBA code which will open its own connection to the OLEDB and purposely IGNORE the drivers for excel data access.

    This may sound silly but start with 100 rows (enough to fill the data sampler) and work up from there, see if you're finding an arbitrary data limit in memory that's goofing you up.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • I think SQL doesn't take a big enough sample of data, so it makes the output columns too small. I found a registry setting for how many rows to sample and changed it from 8 to 10,000 or so, but it didn't seem to help.

  • Would SQL 2012 work any better ? I could install it on my laptop just for this.

  • i think it has nothing to do with SQL, but rather the ACE drivers that are used to access via Excel;

    the ACE driver does the peek at the first 100 rows, and does a best-guess datatype casting of the data base on that;

    that's the registry setting you can tweak to make it peek at more rows.

    so it sounds like your issues' have more to do with that.

    if the actual raw data is CSV, then i would take the ACE /excel drivers out of the equasion and use a text driver , whether via SSIS bcp or bulk insert to get the data instead.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Ed Wagner (3/18/2013)


    Have you tried importing into an "inbox" or "incoming data" table that the Import Wizard creates itself? I just did this recently to get data into a table and then I split things up from there. It likes to create everything as an nvarchar and I let it just to get the data into the database.

    And here I thought I was alone in having problems importing data from Excel into SQL Server...

    You are not alone !!

    After many failed attemps, I finally found the right combination by saving the .csv as Tab delimited text. Since SQL could not determine the correct length for the columns, I manually defined most of the columns as DT_STR length of 8000 (max allowed), but found 1 needed to be DT_TEXT because it must have been more than 8000. After much frustration, the import finally worked.

  • Well done. When I receive data from external sources, I find it to be less than spectacular quality. Fields that are supposed to contain dates contain strings, numeric fields contain alphanumeric data and string data doesn't conform to the agreed-to maximum length, etc. This is why I try to bring it into an empty table of large varchar and nvarchar fields. From there I can clean and put it into a real table that I'll actually use. Sad but necessary.

    Thanks for posting the notes on your resolution to the problem. Since it's a problem I encounter frequently, I was interested in the final solution you implemented.

  • Lowell (3/19/2013)


    i think it has nothing to do with SQL, but rather the ACE drivers that are used to access via Excel;

    the ACE driver does the peek at the first 100 rows, and does a best-guess datatype casting of the data base on that;

    that's the registry setting you can tweak to make it peek at more rows.

    so it sounds like your issues' have more to do with that.

    if the actual raw data is CSV, then i would take the ACE /excel drivers out of the equasion and use a text driver , whether via SSIS bcp or bulk insert to get the data instead.

    To put the dot on the i:

    The ACE driver looks by default to the first 8 rows. You can set the TypeGuessRows registry setting to any number between 0 and 16. The value of 0 makes the driver scan the first 16384 rows. However, this is useless until you add IMEX=1 to the end of the connection string, otherwise you'll end up with NULL data if your column has mixed data types.

    Shameless plug with more info:

    What’s the deal with Excel & SSIS?

    Bottomline:

    convert to a flat file 😀

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply