Import Excel Spreadsheet to Database Tables

  • JGay 2041

    SSC Enthusiast

    Points: 142

    Not just a forum newbie - a complete newbie - so Greg - please excuse the "I just don't get it!"

    Peter,

    I started this with a bunch of "why questions," and came to the conclusion that there must be an exact link in BOL that you can point me to for most or all of the answers. So if that is the case, please supply the link.

    1. Why create the GetColumnDataTypeName udf? Why do we need the data type name?

    2. Why do variables have odd lengths?

    For instance: DataTypeName nvarchar(387) or OutputTableName nvarchar(257)?

    3. In the udf, lines 34-44:

    Why do we need to check user_type_id?

    What do the numbers in ( , ) mean?

    4. Why is RowsToCheckDataType defaulted to 8?

    Besides, 0, what else could it be?

    I agree, this is a great solution to the problem. And I thank you for posting it. Also, thanks for fielding my questions.

    JG

  • SQLBOT

    SSCrazy Eights

    Points: 8014

    Nice work on the article.

    Very nice.

  • Peter Schott

    SSCrazy Eights

    Points: 9613

    This looks like it could be really promising for some of the issues that plague us w/ Excel such as mixed data types in a column (mostly numeric, but some character). I think I'll be giving this a try pretty soon.

  • Jeff Moden

    SSC Guru

    Points: 996661

    Servercentral-1016448 (2/17/2010)


    For import of csv files i would go for someting like this:

    SELECT *

    INTO tabelA

    FROM OPENROWSET('MSDASQL',

    'Driver={Microsoft Text Driver (*.txt; *.csv)};DefaultDir=C:\Data;','select * from tabelB.csv')

    You could just use BULK INSERT which will work both on 32 and 64 bit machines.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden

    SSC Guru

    Points: 996661

    Denis W. Repke (2/17/2010)


    I tried your code on a WinXP 64-bit workstation connected to SQL Server 2008 64-bit. I always get this error:

    Cannot bulk load. The file "C:\TestBook.xml" does not exist. at line 1

    Msg 50000, Level 16, State 1, Procedure uspImportExcelSheet, Line 497

    Cannot bulk load. The file "C:\TestBook.xml" does not exist.

    No matter where I put the "TestBook.xml" file ("C:" or a network drive), it cannot find the file.

    Am I missing something?

    If you use drive letters like C:, they refer only to drives on the SQL Server box. If you want them to read from your C: drive, then the login that your SQL Server uses must be able to see your desktop box drives and should usually be a domain super user. Even then, you cannot use drive letters... you must use the UNC path to the box and "share".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden

    SSC Guru

    Points: 996661

    I've not tried any of the code examples but the article looks to be a good one and I can't/won't take anything away from the author or the article. Nicely done, Peter.

    This XML method does seem like a (comparatively) lot of work to import from spreadsheets, though, and OPENROWSET (etc) isn't the only way to import CSV data. Why not just export TSV (Tab Separated Value) data (or CSV data, if you insist) to a file and use BULK INSERT to load the data into tables?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • peterhe

    SSChampion

    Points: 11363

    Thanks, Jeff.

    The background of this article is that my customers use Excel spreadsheets to edit data since SQL 2000 and we have tools to load data from Excel to database. Last year some customers upgraded to 64bit 2005, our tools did not work anymore because of the oledb jet driver. We have other applications to export data as Excel xml spreadsheet.

    Exporting with csv and importing by BULK INSERT definitely works. The problem is when you open the csv by Excel, Excel asks you how to identify the coloumns and rows, and you cannot save the Excel format to a csv. This is too much to our customers.

  • peterhe

    SSChampion

    Points: 11363

    JGay 2041,

    The following is the answer to your questions:

    1. Why create the GetColumnDataTypeName udf? Why do we need the data type name?

    The stored procedure supports to import data to a predefined table. It will convert the data in the spreadsheet to the corresponding data types of the table. So I need to create a udf to check the data type of the predefined table. The table can be a global temp table or a physical user table.

    2. Why do variables have odd lengths?

    For instance: DataTypeName nvarchar(387) or OutputTableName nvarchar(257)?

    OutputTableName can be in format of SchemaName.TableName, Both SchemaName and TableName are of sysname, which is an alias of nvarchar(128). So the maximum length of the table name is 128+1+128=257.

    For DataTypeName, the worst scenario is typed xml. It is in the format of xml(xmlSchema) Maximum length for xml schema (DBSchema.XmlSchemaName) is 257, data type name (xml) is of sysname. So the maximum length is 128+1+257+1=387

    3. In the udf, lines 34-44:

    Why do we need to check user_type_id?

    What do the numbers in ( , ) mean?

    I check user_type_id instead of system_type_id because in user defined table, you can use alias data type, e.g. Name for nvarchar(50) etc. Some data types like numeric has length and precision definion, e.g. numeric(10,3).

    4. Why is RowsToCheckDataType defaulted to 8?

    Besides, 0, what else could it be?

    You can use any positive number. 0 means all rows. I set default to 8 because it is the default value for oledb jet driver to detect data type.

  • starunit

    SSCommitted

    Points: 1807

    Thanks for the article, Peter. And thanks for the technique.

    It's unfortunate that these kinds of work-arounds are needed: it just shows how out-of-touch microsoft is becoming. For years they've provided a top-notch spreadsheet application, which has consequently become a commonplace method for people to pass information around.

    I've set up a 32bit SQL Server instance on my workstation just so I can continue to use the drivers that connect to Microsoft's other products.

    Cheers from Denver,

    Mark

    Mark
    Just a cog in the wheel.

  • JGay 2041

    SSC Enthusiast

    Points: 142

    Peter,

    Hi! Thanks for answering my previous request. I have since ran the code several times and I have a situation that I'd like your help on. This is a great fix and I know I'm missing something simple.

    IdentityColumn

    I have a pre-created table with an IdentityColumn:

    CREATE TABLE dbo.TestTable

    (

    RecordID int identity(1,1) PRIMARY KEY,

    Text1Col varchar(30) NULL,

    Text2Col varchar(30) NULL,

    NumCol varchar(30) NULL,

    DateCol datetime NULL

    )

    When I run the Sproc:

    EXEC dbo.[uspImportExcelSheet]

    @ExcelFileName=N'C:\TestBook.xml',

    @WorkSheetName=N'Sheet1',

    @OutputTableName=N'dbo.TestTable',

    @FirstRowIsHeader=1

    it fails with this error:

    Some non-nullable columns defined in the table [dbo.TestTable] does not exist in the Excel worksheet [Sheet1]. at line 289

    Msg 50000, Level 16, State 1, Procedure uspImportExcelSheet, Line 498

    Some non-nullable columns defined in the table [dbo.TestTable] does not exist in the Excel worksheet [Sheet1].

    What do I need to do to have the excel (xml) sheet insert into the table and have the identity column auto increment?

    Thanks

    JG

  • peterhe

    SSChampion

    Points: 11363

    JGay,

    I never tried this scenario.

    One workaround is you donot precrete the table. Instead, you can pass the table name and identity column name to the SP and let the SP create it. It works fine.

    Another workaround is that if you do want to precreate a table with identity column, you can call the SP by

    INSERT YourTable (Col1,Col2,...)

    EXEC theSP @P1,@P2

    If want to precreate the table with identity column and pass the table name to the SP as you are doing,

    we need change the SP a little bit to filter out identifier and timestamp columns , which are assigned value by SQL Server automatically.

    Find the following code and add the line new line:

    -- Temp table

    INSERT #Columns(ColumnName,DataTypeName,ExcelColNum,IsNullable)

    SELECT e.[name],dbo.udfGetColumnDataTypeName(@OutputTableName,e.[name]),0,e.is_nullable

    FROM tempdb.sys.columns e

    WHERE e.object_id=object_id('tempdb..'+@OutputTableName)

    -- Add the following line

    AND e.is_identity=0 AND e.system_type_id<>189

    Add the same line to the section that enumerate columns for a physical user table:

    -- Physical table

    INSERT #Columns(ColumnName,DataTypeName,ExcelColNum,IsNullable)

    SELECT e.[name],dbo.udfGetColumnDataTypeName(@OutputTableName,e.[name]),0,e.is_nullable

    FROM sys.columns e

    WHERE e.object_id=object_id(@OutputTableName)

    -- Add the following line

    AND e.is_identity=0 AND e.system_type_id<>189

  • JGay 2041

    SSC Enthusiast

    Points: 142

    Peter,

    Thanks for the swift response. I need to read it a bit more to understand. I'll repost if I have another question.

  • JGay 2041

    SSC Enthusiast

    Points: 142

    Peter,

    Cool. Changed the SP as you suggested, tested with your supplied TestBook.xml. Works! Thanks!

  • JGay 2041

    SSC Enthusiast

    Points: 142

    Peter,

    Now that I've successfully tested this solution with your TestBook.xml test data, I've decided to test it with my data - a rather large Excel xls spreadsheet (6,653KB) - saved off to an even larger xml spreadsheet (63,711KB).

    Running the Sproc, I come to a halt at lines: 258-262 "Empty Worksheet."

    I noticed on page 1 of this discussion forum, another poster had the same issue and the issue does not appear to have been addressed. See Post # 866784.

    I'm going to add some Debug statements to the code to see why ColNum and RowNum aren't populating.

    Question - In your article, under Overview, you state "The xml spreadsheet has "Excel.Sheet" as processing instruction so when you double click it, it will be opened in Excel instead of internet explorer or other xml processor."

    When I saved the xml spreadsheet from the excel spreadsheet and then double clicked on the newly created xml file, Excel didn't open. XML Editor opened the file.

    Here's the top of my file:

    <?xml version="1.0"?>

    <?mso-application progid="Excel.Sheet"?>

    <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"

    xmlns:o="urn:schemas-microsoft-com:office:office"

    xmlns:x="urn:schemas-microsoft-com:office:excel"

    xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"

    xmlns:html="http://www.w3.org/TR/REC-html40">

    <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">

    What should I have done differently to make Excel open the file?

    Do you have solution as to why I'm getting the error 'Invalid Excel worksheet?'

    As always - thanks for fielding my questions,

    JG

  • peterhe

    SSChampion

    Points: 11363

    JGay,

    If the file is too big, the OPENROWSET to bulkload the file may get time out so the xml is empty.

    As for Excel not open the xml spreadsheet, can you check that the icon of the file is Excel or something else?

    If you have Excel installed on the machine, Excel should open it.

Viewing 15 posts - 31 through 45 (of 70 total)

You must be logged in to reply to this topic. Login to reply