Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

SQL-only ETL using a bulk insert into a temporary table (SQL Spackle) Expand / Collapse
Author
Message
Posted Tuesday, July 22, 2014 12:08 AM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Yesterday @ 2:02 PM
Points: 130, Visits: 946
Comments posted to this topic are about the item SQL-only ETL using a bulk insert into a temporary table (SQL Spackle)
Post #1594939
Posted Tuesday, July 22, 2014 10:18 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 2:28 PM
Points: 4,043, Visits: 9,191
It's a nice article, short and simple.

However, I would have liked if you specified one of the main problems I've seen with beginners (including me) which is that the file must be on a location available to the server. Most of us have been bitten by this problem using a local address instead of the address the server needs (or putting the file on the server).
Another improvement would have been to include some references for more information such as how to make it dynamic or how to use format files.



Luis C.
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

Forum Etiquette: How to post data/code on a forum to get the best help
Post #1595164
Posted Tuesday, July 22, 2014 7:00 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 10:56 AM
Points: 3,417, Visits: 2,035
Nice simple article. I may have to try this out sometime.
Post #1595322
Posted Tuesday, July 22, 2014 11:28 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Tuesday, August 12, 2014 9:52 PM
Points: 65, Visits: 15
Very nice article! Thank you so much!
Post #1595342
Posted Wednesday, July 23, 2014 6:49 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, November 14, 2014 6:34 AM
Points: 8, Visits: 110
Extremely academic -- since as was pointed out, the file must be on a disk local to that instance of the SQL Server. Doesn't happen often and most DBA's would argue against it.

It would be better if you could target local files to a Windows client and bulk those into SQL Server.

A look at SQLXMLBulkLoad may be worth the while since more and more files move from business to business in a XML format and this can run from any Windows client.
Post #1595470
Posted Wednesday, July 23, 2014 7:42 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, July 23, 2014 3:38 PM
Points: 1, Visits: 5
Nice article. This is a typical operation when the server is in a SOHO type situation or maybe some older XP-era gear that is co-located somewhere. Basically anyplace where the person using SSMS also has access to the C:\ drive of the server.

As my career has progressed and my situations become more corporate, the server administration is heavily managed. So things that were simple like bulk importing a file located on the same server near impossible now.

I've grown fond of using Linked Servers for large CSV files I only need temporarily. Usually this means buying the server admin a beer to enable this feature. The benefits are -


  • Treats the data file as a database

  • Does not have to be imported into existing database which means its a little easier on the transaction logs

  • Can select against it using quad-addressing (server.schema.table.column)

  • When done, merely unlink the file (server)

  • File does not have to be on server

  • Done entirely by script using T-SQL



Sean
Post #1595486
Posted Wednesday, July 23, 2014 9:27 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 2:14 PM
Points: 35,770, Visits: 32,435
I'm actually very torn on this article...

On the one hand, I agree. As noted by some of the others, the article would certainly have benefitted from the inclusion of some extra information, such as which machine drive-letter reference files must live on or the fact that you can use UNCs instead, etc

On the other hand, this IS a "Spackle" article and, as it says at the beginning of the article, "These short articles are NOT meant to be complete solutions". Is it a 5 Star article in the classic sense? No. But, if you don’t know about BULK INSERT, it IS, most definitely, a MUST READ article because, although it doesn't fill an entire crack in knowledge of T-SQL, it DOES demonstrate the very important and academic concept that the Extract, Translate, and Load process of data doesn't need to be complicated. While that might not provide a benefit to those of us that already know about the likes of BULK INSERT, it does provide the start of an answer to the recent outbreak of posts on this and other sites that can be generically summarized as “How can I import a file into a table”?

What's, unfortunately, not stated in the article and requires someone to actually have enough intellectual curiosity to try the code out and do a little research on their own, is the fact that the code provided imports more than 122 THOUSAND rows and then extracts the required (501) rows in just over a second. And, the author provided the substantial amount of test data to prove it! The article clearly demonstrates that it doesn’t require learning a new language (such as PowerShell), doesn’t require the use of another system/program (such as SSIS or any of a thousand other programs), and doesn’t require much additional knowledge of T-SQL.

In other words, this is a very high altitude and incredibly simple "introduction" to BULK INSERT that says “Look! Importing files can be easy!” and, for those with enough intellectual curiosity to lookup BULK INSERT in Books Online or other web articles, provides a clue to an extremely effective ETL method that they might not have otherwise considered or might not have even been aware of.

With all of that in mind, I say “Well done, Mr. Kulp”. Add a little more detail and advantageous reasons to use a particular feature to future “Spackle” articles and keep ‘em coming!


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1595544
Posted Wednesday, July 23, 2014 9:45 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 2:28 PM
Points: 4,043, Visits: 9,191
Just to be clear, the file doesn't need to be on the same server as SQL Server. Typically, the files will be stored in a separate location and referenced by a network address (\\SomeFileServer\Path\file).



Luis C.
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

Forum Etiquette: How to post data/code on a forum to get the best help
Post #1595560
Posted Wednesday, July 23, 2014 11:34 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, October 27, 2014 9:52 PM
Points: 2, Visits: 10
How could we extend this code to add the ability to convert datatypes, like parse a datetime.

Also so do an Upsert per row?

Post #1595620
Posted Wednesday, July 23, 2014 11:47 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 2:14 PM
Points: 35,770, Visits: 32,435
sgross 10581 (7/23/2014)
How could we extend this code to add the ability to convert datatypes, like parse a datetime.

Also so do an Upsert per row?



Unless you can guarantee that the person(s) or application will always provide a proper date and time, it's best to import the data into a staging table (as was done in the article) as VARCHAR and validate the data using standard techniques for validating such things. Once validated, simply inserting the data from the staging table to the final data table will do many implicit conversions such as converting proper VARCHAR renditions of date and time to an actual DATETIME (for example) data type of the target column in the final table.

As for "Upserts", it should be handled as any other data. You have a source table (the staging table) and a target table (the final table). I'll typically pre-mark each row with "I" (for insert) or "U" (for update) (sometimes, "D" for delete but I normally don't delete data due to audit requirements) using simple joins like you would in any classic upsert, and then do a separate insert and update. You could also use MERGE between the staging and target tables. BULK INSERT is not meant to provide upsert capabilities on its own. It's meant to be relatively simple and very fast... and it is.

If the source of the data is very good, you can, in fact, import into a staging table that has the correct data types. Even if the source is questionable in quality, you can setup BULK INSERT to sequester any bad lines of data in a separate file for future analysis and repair.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1595630
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse