Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


The New ETL Paradigm


The New ETL Paradigm

Author
Message
Jamie Thomson
Jamie Thomson
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1073 Visits: 188

Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/jthomson/thenewetlparadigm.asp



Jamie Thomson
http://sqlblog.com/blogs/jamie_thomson
kenno_rules
kenno_rules
SSC-Enthusiastic
SSC-Enthusiastic (128 reputation)SSC-Enthusiastic (128 reputation)SSC-Enthusiastic (128 reputation)SSC-Enthusiastic (128 reputation)SSC-Enthusiastic (128 reputation)SSC-Enthusiastic (128 reputation)SSC-Enthusiastic (128 reputation)SSC-Enthusiastic (128 reputation)

Group: General Forum Members
Points: 128 Visits: 173

Hi Jamie, thanks for the article. I am yet to try out 2005, but am particularly interested in SSIS. I don't have a broad range of ETL experience, but the ETL work I do largely consists of DTS packages that call Stored Procedures. In essence, the DTS is only really there to help control workflow.

In your article, you discuss the 'E' and the 'L', but what about the 'T'? From what you have seen of SSIS, how well (or otherwise) does it handle transformations? Will we still be using Stored Procedures to do the 'real' work - such as deciding when to update a dimension record instead of inserting a new one. In DTS, we can use lookups in a Data Driven Query Task to achieve this, but the row-by-row processing has put us off due to the performance overhead incurred when dealing with very large dimension/fact loads. Hence the reliance on Stored Procedures and the set-based processing that allows.

I would LOVE to be able to perform the complete end-to-end ETL processing in a standardised environment (like I am told a 'real' ETL tool does - no I haven't used any others!), and am really hoping that SSIS is the answer to this.

Thanks again,

Matt.





Tim OPry
Tim OPry
SSC-Enthusiastic
SSC-Enthusiastic (190 reputation)SSC-Enthusiastic (190 reputation)SSC-Enthusiastic (190 reputation)SSC-Enthusiastic (190 reputation)SSC-Enthusiastic (190 reputation)SSC-Enthusiastic (190 reputation)SSC-Enthusiastic (190 reputation)SSC-Enthusiastic (190 reputation)

Group: General Forum Members
Points: 190 Visits: 247

Thanks for the article - can you advise if there is something in SQL2005 to convert exsting DTS packages? If so, how accurate does it appear to be (if you have tested it)?





Jamie Thomson
Jamie Thomson
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1073 Visits: 188

Matt,

Thanks for your reply.

Your approach to using DTS is the one that alot of people take, certainly I know I do. And my company do steadfastly.

You are right that I didn't cover the 'T' and I only really touched on 'E' & 'L', that will come later. In the meantime you may want to check out http://www.sqlservercentral.com/columnists/jthomson/transactionsinsqlserver2005integrationservices.asp or http://www.sqlis.com which has many articles already or my SSIS blog at http://blogs.conchango.com/jamiethomson which I put useful little ditties on now and again.

You still have the choice of using stored procs to do your work and indeed there is an ExecuteSQL Task in SSIS to enable you to do this just as there was in DTS. However, the rich transformation functionality in SSIS is a complete departure from the limited functionality in DTS. To give you a flavour of what transformations you have there are:

Sort - Sort data in the pipeline
Aggregate - Similar to T-SQL aggregate functions
Lookup - Similar behaviour to DTS Data Driven Queries but far far more performant
Conditional split - Split a data set into numerous smaller data sets depending on attributes of the data. Enables filtering.
Derived Column - Using the new SSIS expression language derive new values from existing ones.
Merge and Merge Join - Similar to T-SQL joins

This list basically reflects "standard" functionality is by no means exhaustive. There are many many more transformation components available for you to use in a data flow including things that have never een available before such as fuzzy lookups and fuzzy matchnig (i.e. Match inexact values based on a scoring mechanism - invaluable for deduping data)

I hope this is useful to you. Check out the above references and look out from more material from myself in the future covering some of these concepts in more detail.



Jamie Thomson
http://sqlblog.com/blogs/jamie_thomson
Jamie Thomson
Jamie Thomson
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1073 Visits: 188

Tim,

There is a migration wizard for existing packages. I have not used it myself, largely because the mapping between DTS functionality and SSIS functionality is so obscure I am fearful about what the migration wizard will actually produce. I am convinced (and other people's experiences back this up) that it would be quicker to build the packages from scratch rather than migrate existing ones and I would also expect that the result of doing it yourself would be alot closer to an optimum solution than the migration wizard could produce.

As I say though I haven't used it so on this one you're just going to have to trust me



Jamie Thomson
http://sqlblog.com/blogs/jamie_thomson
Queen
Queen
Forum Newbie
Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)

Group: General Forum Members
Points: 3 Visits: 1

Dear Jamie,
I just wanted to get some feed back from you about performance in Integration Services. I've been using it for two months and I have created a package that should pump 2.5 million records from my source to my destination. Now I will say that I have several steps of sorting (Using the sort component) and merge join (using the merge join component). But my performance is horrible. I used the methodology outlined in your Article: " New ETL Paradigm" I saw this component from ivolva.com which they created as a third party component called extra sort. It says it speeds up sorting. Do you have ANY suggestions???

Thank you,

Mag


James Carmichael
James Carmichael
Forum Newbie
Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)

Group: General Forum Members
Points: 5 Visits: 1

The article was good and it's clear SSIS addresses some glaring deficiencies in DTS, but the title "A New ETL paradigm" is a terrible misnomer. It's a new SQL Server ETL tool, that's all.

On it's own, DTS is not an ETL application. It's good for moving data, but transforming and loading should be left to something else.


Jamie Thomson
Jamie Thomson
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1073 Visits: 188

In case anyone is interested, Mag and I sorted this one out offline and she was good enough to write a short case study which is well worth a read:

A case study of using LOOKUPs instead of MERGE JOINs
http://blogs.conchango.com/jamiethomson/archive/2005/10/21/2289.aspx

-Jamie



Jamie Thomson
http://sqlblog.com/blogs/jamie_thomson
Jamie Thomson
Jamie Thomson
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1073 Visits: 188

James,

Fair comment I guess. The title was meant to be fairly colloquial (I think - Its almost two years since I wrote it and was surprised to see it republished).

-Jamie



Jamie Thomson
http://sqlblog.com/blogs/jamie_thomson
VincentRainardi
VincentRainardi
SSC Veteran
SSC Veteran (215 reputation)SSC Veteran (215 reputation)SSC Veteran (215 reputation)SSC Veteran (215 reputation)SSC Veteran (215 reputation)SSC Veteran (215 reputation)SSC Veteran (215 reputation)SSC Veteran (215 reputation)

Group: General Forum Members
Points: 215 Visits: 191
Thanks for the article, Jamie.
It was very useful.
Vincent
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search