Using OPENROWSET to import CSV files

Question

Using OPENROWSET to import CSV files

Viewing 15 posts - 16 through 30 (of 31 total)

You must be logged in to reply to this topic. Login to reply

Paul White SSC Guru Points: 150468 More actions · Answer 1

Please also consider using an SSIS package to import data from files to a database - that is its primary purpose in life, and I don't like to see a good tool get upset.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

GSquared SSC Guru Points: 260824 More actions · Answer 2

Paul White (11/6/2009)
Please also consider using an SSIS package to import data from files to a database - that is its primary purpose in life, and I don't like to see a good tool get upset.

SSIS requires a fixed number of columns in the import definition.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Paul White SSC Guru Points: 150468 More actions · Answer 3

GSquared (11/10/2009)
SSIS requires a fixed number of columns in the import definition.

Well it certainly prefers to deal with stable metadata, that's for sure. But, unless the requirement is actually to import a file with truly 'any structure' into an unspecified table, there's often a way around it.

Most frequently I've come across this as different files being subsets of a larger structure, or something like that. If there's some structure or logical to work with, it's often possible to do.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004619 More actions · Answer 4

Jeff Moden (11/6/2009)
Simon Parry (11/6/2009)
Hi
i have tried using BULK INSERT
using
set @string ='BULK INSERT CSVTest FROM ''\\10.1.2.107\rp_uploaded_files\'+@file+''' WITH (FIELDTERMINATOR = '','',ROWTERMINATOR = '''')'
exec (@string)
but this requires the destintation table to have the same number of columns as the import file
The files i want to import have an unknown number of coulmns
Can you help at all?
Simon
Yep... attach a copy of one of the files (unless it has private info in it) to your next post and tell me what you think you'd like to do with it.

Simon?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Simon Parry Ten Centuries Points: 1297 More actions · Answer 5

Hi

Sorry for the delay

I have attached 3 sample files, its possible each one of these may be used

If the files were the same size each time i could do it, but because they have a varying number of columns im stuck

Many thanks for all your help

Simon

Jeff Moden SSC Guru Points: 1004619 More actions · Answer 6

That's the other piece of the puzzle I need, Simon... what do you want to do with the data when it's brought in? Separate tables? Common Table? If a common table, what do you want it to look like?

...ooooorrrrrrr....

... do you just want to be able to use the data short term as if it were in a table?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Simon Parry Ten Centuries Points: 1297 More actions · Answer 7

Hi Jeff

thanks for your reply

i would want the data stored in seperate tables

thanks

Simon

Thad Kovalchik SSC-Addicted Points: 407 More actions · Answer 8

Using BULK INSERT won't work if you there are variable number of columns in the data file.

Something like:

fld1A,fld2A,fld3A

fld1B,fld2B,fld3B

fld1C,fld2C,fld3C,fld4C

Oddly the OLD DTS data bumps/bulk insert tasks were fine with the above.

They must have used the MSDASQL driver for the insertion process.

Jeff Moden SSC Guru Points: 1004619 More actions · Answer 9

Wow... sorry about that, folks. I really lost track of this post.

Simon, after only 5 months of delay :blush:, do you still need any help on this?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 10

Jeff Moden (4/23/2010)
Wow... sorry about that, folks. I really lost track of this post.
Simon, after only 5 months of delay :blush:, do you still need any help on this?

Jeff, i would like to see a solution for this.. i smell that i might run into this problem in a short while.. mine is exactly what Simon Parry was talking about.. i am going to get inconsistent csv files (meaning, the number of columns will differ every day) and i need to import them into a table. So if u have a solution, i am sure u ll have one which will eventually be the best , please share with us!!

Thanks in advance , Jeff!!

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 11

I have 2 solutions in mind.

Solution 1:

1. Ask the csv team to tel give us if the extra columns will be added in today's file.

2. Create many format (*.fmt) files and use them according to the csv team's response.

3. Utilize BCP/BULK INSERT along with FMT files to import them.

Solution 2:

1. Use OPENROWSET to capture the column names in the csv file.

2. Create a table with those column names (using NVARCHAR for all the columns, of course)

3. Create a BULK INSERT statement using the above table name, dynamically.

4. EXEC the dynamic statment.

Hmmm... hope these 2 solutions are feasible. I did a POC of solution 1 and it worked fine.. For solution 2, i am yet to start! I am also quite interested in GSquared's import to temp table solution, i will give a shot at this in some time!

Now, Jeff, waiting for your inputs on this "tricky" problem .

Cheers!!

Jeff Moden SSC Guru Points: 1004619 More actions · Answer 12

ColdCoffee (4/24/2010)
I am also quite interested in GSquared's import to temp table solution, i will give a shot at this in some time!

BWAA-HAA!!! Try it on the Sample2.CSV file the OP provided in his original ZIP file... Surprise, surprise, surprise!

There is a way to handle it, though...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004619 More actions · Answer 13

I put the OP's files in C:\Temp to have the following code run...

--DROP TABLE #Staging,#Header,#Data

--GO

--===== Supress auto display of rowcounts

SET NOCOUNT ON;

--===== Declare local variables

DECLARE @pFullPath VARCHAR(256), --Full path to the file. May be a UNC. Would be a parameter in a stored proc

@SQLBulk VARCHAR(MAX), --Dynamic Bulk Insert command

@SQLSelect VARCHAR(MAX) --Dynamic Select command

;

SELECT @pFullPath = 'C:\Temp\Sample2.csv' --Just for testing purposes.

;

--===== Create the staging table. The others will be created on the fly.

CREATE TABLE #Staging

(LineData VARCHAR(8000))

;

--===== Bulk Insert the dynamically named file

SELECT @SQLBulk =

' BULK INSERT #Staging

FROM ' + QUOTENAME(@pFullPath,'''')+'

WITH (

FIELDTERMINATOR = '''',

ROWTERMINATOR = ''''

);'

;

EXEC (@SQLBulk)

;

--===== Split and Isolate the header data in a table and cover

-- for missing header names.

WITH

cteNumber AS

(

SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1 AS LineNumber,

LineData

FROM #Staging

)

SELECT split.ItemNumber,

QUOTENAME(ISNULL(NULLIF(split.Item,''),split.ItemNumber)) AS Data

INTO #Header

FROM cteNumber stage

CROSS APPLY (SELECT * FROM dbo.DelimitedSplit8K(LineData,',')) split

WHERE stage.LineNumber = 0

;

--===== Split and isolate the data

WITH

cteNumber AS

(

SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1 AS LineNumber,

LineData

FROM #Staging

)

SELECT stage.LineNumber,

split.ItemNumber,

split.Item AS Data

INTO #Data

FROM cteNumber stage

CROSS APPLY (SELECT * FROM dbo.DelimitedSplit8K(LineData,',')) split

WHERE LineNumber > 0

;

--===== Put everything together as a single result set with the proper column names and all

-- Create the dynamic part of the SELECT list first

SELECT @SQLSelect = ISNULL(@SQLSelect+','+CHAR(10),'')

+ 'MAX(CASE WHEN d.ItemNumber =' + STR(h.ItemNumber,3) + ' THEN Data ELSE '''' END) AS ' + h.Data

FROM #Header h

;

-- Add the static part of the query to the dynamic part

SELECT @SQLSelect = 'SELECT d.LineNumber,' + CHAR(10)

+ @SQLSelect + CHAR(10)

+ 'FROM #Data d GROUP BY d.LineNumber'

;

--===== We're ready to rock... return the result set

EXEC (@SQLSelect)

;

... and here's my splitter function...

CREATE FUNCTION dbo.DelimitedSplit8K

/***************************************************************************************************

Purpose:

Split a given string at a given delimiter and return a list of the split elements (items).

Returns:

iTVF containing the following:

ItemNumber = Element position of Item as a BIGINT (not converted to INT to eliminate a CAST)

Item = Element value as a VARCHAR(8000)

CROSS APPLY Usage Example:

---------------------------------------------------------------------------------------------------

--===== Conditionally drop the test tables to make reruns easier for testing.

-- (this is NOT a part of the solution)

IF OBJECT_ID('tempdb..#JBMTest') IS NOT NULL

DROP TABLE #JBMTest

;

--===== Create and populate a test table on the fly (this is NOT a part of the solution).

SELECT *

INTO #JBMTest

FROM (

SELECT 1,'1,10,100,1000,10000,100000,1000000' UNION ALL

SELECT 2,'2000000,200000,20000,2000,200,20,2' UNION ALL

SELECT 3, 'This,is,a,test' UNION ALL

SELECT 4, 'and so is this' UNION ALL

SELECT 5, 'This, too (no pun intended)'

) d (SomeID,SomeValue)

;

GO

--===== Split the CSV column for the whole table using CROSS APPLY (this is the solution)

SELECT test.SomeID, split.ItemNumber, split.Item

FROM #JBMTest test

CROSS APPLY

(

SELECT ItemNumber, Item

FROM dbo.DelimitedSplit8k(test.SomeValue,',')

) split

;

---------------------------------------------------------------------------------------------------

Notes:

1. Optimized for VARCHAR(8000) or less.

2. Optimized for single character delimiter.

3. Optimized for use with CROSS APPLY.

4. Does not "trim" elements just in case leading or trailing blanks are intended.

5. If you don't know how a Tally table can be used to replace loops, please see the following...

http://www.sqlservercentral.com/articles/T-SQL/62867/

6. Changing this function to use VARCHAR(MAX) will cause it to run twice as slow. It's just the

nature of VARCHAR(MAX) whether it fits in-row or not.

7. Multi-machine testing for the method of using UNPIVOT instead of 10 SELECT/UNION ALLs shows

that the UNPIVOT method is quite machine dependent and can slow things down quite a bit.

8. Performance testing shows using "TOP" for the limiting criteria of "N" is actually

slower and slightly more CPU intensive than the traditional WHERE N < LEN(@pString) + 2.

9. Performance testing shows using ORDER BY (SELECT x) where "x" is anything is actually

slower and slightly more CPU intensive than the traditional ORDER BY (SELECT N).

Credits:

This code is the product of many people's efforts including but not limited to the following:

cteTally concept originally by Iztek Ben Gan and "decimalized" by Lynn Pettis (and others) for a

bit of extra speed and finally redacted by Jeff Moden for a different slant on readability and

compactness. Hat's off to Paul White for his simple explanations of CROSS APPLY. Finally,

special thanks to Erland Sommarskog for his tireless efforts to help people understand

what you can actually do with T-SQL. I also thank whoever wrote the first article I ever saw

on "numbers tables" which is located at the following URL ...

http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-numbers-table.html

Revision History:

Rev 00 - 20 Jan 2010 - Concept: Lynn Pettis and others.

Redaction/Implementation: Jeff Moden

- Base 10 redaction and reduction for CTE. (Total rewrite)

Rev 01 - 13 Mar 2010 - Jeff Moden

- Removed one additional concatenation and one subtraction from the SUBSTRING in the

SELECT List for that tiny bit of extra speed.

Rev 02 - 14 Apr 2010 - Jeff Moden

- No code changes. Added CROSS APPLY usage example to the header, some additional credits,

and extra documentation.

Rev 03 - 18 Apr 2010 - Jeff Moden

- No code changes. Added notes 7, 8, and 9 about certain "optimizations" that dont'

actually work for this type of function.

***************************************************************************************************/

--===== Define I/O parameters

(

@pString VARCHAR(8000),

@pDelimiter CHAR(1)

)

RETURNS TABLE

AS

RETURN

--===== "Inline" CTE Driven "Tally Table” produces values up to

-- 10,000... enough to cover VARCHAR(8000)

WITH

E1(N) AS ( --=== Create Ten 1's

SELECT 1 UNION ALL SELECT 1 UNION ALL

SELECT 1 UNION ALL SELECT 1 --10

),

E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100

E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000

cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM E4)

--===== Do the split

SELECT ROW_NUMBER() OVER (ORDER BY N) AS ItemNumber,

N AS StartPosition,

SUBSTRING(@pString, N, CHARINDEX(@pDelimiter, @pString + @pDelimiter, N) - N) AS Item

FROM cteTally

WHERE N < LEN(@pString) + 2

AND SUBSTRING(@pDelimiter + @pString, N, 1) = @pDelimiter

;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 14

Thanks, Jeff, for posting your approach.. As i dont have SSMS at home, i will check out your code tomorrow and let you know..

But i got a doubt here.. From parsing your code, it looks like you are utilising 2 tables for the csv data. Hmmm, but the Solution 2 i proposed , it seems to use only one table for the csv data.. Anyways, i will try out both the options and give you the result on Monday.

Thanks again, Jeff!

Cheers!!

C'est Pras!

Jeff Moden SSC Guru Points: 1004619 More actions · Answer 15

ColdCoffee (4/24/2010)
Thanks, Jeff, for posting your approach.. As i dont have SSMS at home, i will check out your code tomorrow and let you know..
But i got a doubt here.. From parsing your code, it looks like you are utilising 2 tables for the csv data. Hmmm, but the Solution 2 i proposed , it seems to use only one table for the csv data.. Anyways, i will try out both the options and give you the result on Monday.
Thanks again, Jeff!
Cheers!!
C'est Pras!

The problem with your first suggestion is that you can't anticipate what humans will add nor call something which is what the current problem is... Unknown things being added without warning.

I like your second solution but the problem with that is... Neither OpenRowSet nor a Text based linked server will open the Sample2.csv file. You could, however, use the method I used to load the header line and parse it to make a Bulk Insert line and you might be able to use a delimited BULK INSERT then but that would also require a table creation using dynamic SQL which adds additional complexity to the problem (although it will probably be faster than doing a Tally split).

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Using OPENROWSET to import CSV files

Cookies on SQLServerCentral