RE: Dynamically create and populate staging tables from CSV files

SSC Guru

Points: 1003863

January 10, 2012 at 6:16 am

surreydude. (1/10/2012)
A great little technique but I would probably only use this kind of thing for an initial Proof of Concept. Rarely would you want to (or be allowed to) create dynamic table structures with possibly inaccurate data types and columns sizes in a production database. Morever, I suspect the ETL routines would fail if you have changing data types.
Consider that the CSV has nulls in the first row on the first load - how would you know what the data type is then? What happens if the data is longer than 20 characters?

I agree... it's a rare thing. However, I have worked with vendors such as "double-click.net" that will only provide the data in a CSV format as if it were exported from a reporting system or a spreadsheet where there are a handful of key columns on the left and an unknown quantity of columns on the right. The unknown columns actually needed to be paired in the final import. Each column was marked with a name which included the type of column it was and pairs of like-named columns could be paired based on those names. In our case, the files could be anywhere from 11 columns (9 key columns) to hundreds of columns wide based on how many active internet ads a customer may have for any given week.

Although I don't particularly care for SSIS nor the fact that the technique in this article had to use a script to do the job (I did my thing 100% in T-SQL with no problems), the article is a reasonable introduction as to how you might begin to approach such a thing in SSIS. The author included verification popups for experimentation purposes and once such a system is in place, these popups can be easily removed for hands-off scheduled imports.

The "final move" to real tables would be done using a dynamic unpivot (you already have the column names in a separate table to make this a bit easier) and then dynamically/conditionally repivoted to meet the structure of the final tables for insert/update. But before you can do all of that, you have to get the data in to work on it and this article shows one possible method for doing such a thing.

Of course, you could also pre-process tables before they come anywhere near SQL Server but I've found that writing such code is a bit of a fright because you might not have set based, declarative tools such as are available in SQL Server to keep from having to write read/write routines with lots of loops and decisions on your own.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)