SQL Server Integration Services - The New ETL Paradigm

  • Jamie Thomson

    SSChampion

    Points: 11805

    Comments posted to this topic are about the item SQL Server Integration Services - The New ETL Paradigm

  • kenno_rules

    SSC-Addicted

    Points: 436

    Hi Jamie, thanks for the article. I am yet to try out 2005, but am particularly interested in SSIS. I don't have a broad range of ETL experience, but the ETL work I do largely consists of DTS packages that call Stored Procedures. In essence, the DTS is only really there to help control workflow.

    In your article, you discuss the 'E' and the 'L', but what about the 'T'? From what you have seen of SSIS, how well (or otherwise) does it handle transformations? Will we still be using Stored Procedures to do the 'real' work - such as deciding when to update a dimension record instead of inserting a new one. In DTS, we can use lookups in a Data Driven Query Task to achieve this, but the row-by-row processing has put us off due to the performance overhead incurred when dealing with very large dimension/fact loads. Hence the reliance on Stored Procedures and the set-based processing that allows.

    I would LOVE to be able to perform the complete end-to-end ETL processing in a standardised environment (like I am told a 'real' ETL tool does - no I haven't used any others!), and am really hoping that SSIS is the answer to this.

    Thanks again,

    Matt.

  • Tim OPry

    SSCrazy

    Points: 2988

    Thanks for the article - can you advise if there is something in SQL2005 to convert exsting DTS packages? If so, how accurate does it appear to be (if you have tested it)?

     

  • Jamie Thomson

    SSChampion

    Points: 11805

    Matt,

    Thanks for your reply.

    Your approach to using DTS is the one that alot of people take, certainly I know I do. And my company do steadfastly.

    You are right that I didn't cover the 'T' and I only really touched on 'E' & 'L', that will come later. In the meantime you may want to check out http://www.sqlservercentral.com/columnists/jthomson/transactionsinsqlserver2005integrationservices.asp or http://www.sqlis.com which has many articles already or my SSIS blog at http://blogs.conchango.com/jamiethomson which I put useful little ditties on now and again.

    You still have the choice of using stored procs to do your work and indeed there is an ExecuteSQL Task in SSIS to enable you to do this just as there was in DTS. However, the rich transformation functionality in SSIS is a complete departure from the limited functionality in DTS. To give you a flavour of what transformations you have there are:

    Sort - Sort data in the pipeline
    Aggregate - Similar to T-SQL aggregate functions
    Lookup - Similar behaviour to DTS Data Driven Queries but far far more performant
    Conditional split - Split a data set into numerous smaller data sets depending on attributes of the data. Enables filtering.
    Derived Column - Using the new SSIS expression language derive new values from existing ones.
    Merge and Merge Join - Similar to T-SQL joins

    This list basically reflects "standard" functionality is by no means exhaustive. There are many many more transformation components available for you to use in a data flow including things that have never een available before such as fuzzy lookups and fuzzy matchnig (i.e. Match inexact values based on a scoring mechanism - invaluable for deduping data)

    I hope this is useful to you. Check out the above references and look out from more material from myself in the future covering some of these concepts in more detail.

     

  • Jamie Thomson

    SSChampion

    Points: 11805

    Tim,

    There is a migration wizard for existing packages. I have not used it myself, largely because the mapping between DTS functionality and SSIS functionality is so obscure I am fearful about what the migration wizard will actually produce. I am convinced (and other people's experiences back this up) that it would be quicker to build the packages from scratch rather than migrate existing ones and I would also expect that the result of doing it yourself would be alot closer to an optimum solution than the migration wizard could produce.

    As I say though I haven't used it so on this one you're just going to have to trust me

     

  • Queen

    Valued Member

    Points: 51

    Dear Jamie,

    I just wanted to get some feed back from you about performance in Integration Services.  I've been using it for two months and I have created a package that should pump 2.5 million records from my source to my destination.  Now I will say that I have several steps of sorting (Using the sort component) and merge join (using the merge join component).  But my performance is horrible.  I used the methodology outlined in your Article: " New ETL Paradigm"  I saw this component from ivolva.com  which they created as a third party component called extra sort.  It says it speeds up sorting.  Do you have ANY suggestions??? 

    Thank you,

    Mag

  • James Carmichael

    SSC Rookie

    Points: 39

    The article was good and it's clear SSIS addresses some glaring deficiencies in DTS, but the title "A New ETL paradigm" is a terrible misnomer.  It's a new SQL Server ETL tool, that's all. 

    On it's own, DTS is not an ETL application.  It's good for moving data, but transforming and loading should be left to something else.

     

  • Jamie Thomson

    SSChampion

    Points: 11805

    In case anyone is interested, Mag and I sorted this one out offline and she was good enough to write a short case study which is well worth a read:

    A case study of using LOOKUPs instead of MERGE JOINs

    http://blogs.conchango.com/jamiethomson/archive/2005/10/21/2289.aspx

     

    -Jamie

     

  • Jamie Thomson

    SSChampion

    Points: 11805

    James,

    Fair comment I guess. The title was meant to be fairly colloquial (I think - Its almost two years since I wrote it and was surprised to see it republished).

    -Jamie

     

  • VincentRainardi

    SSCrazy

    Points: 2905

    Thanks for the article, Jamie.

    It was very useful.

    Vincent

  • homebrew01

    SSC Guru

    Points: 55188

    Tim,

    The little bit that I know is that the DTS migration tool will work for simple DTS packages, but some of the more complex ones will not convert because there is no exact match for some of the DTS functions in SSIS.  Microsoft has a SQL2005 "add-in" you can download that allows 2000 DTS packages to be run & edited in SQL 2005. That's the route we will take initially, then re-write the 2000 DTS packages as 2005 as time & need dictates.

  • Sam Greene

    SSCertifiable

    Points: 5252

    Where I am working, a consultant used the migration wizard for dts->ssis and it has seemed to work well.  You cannot use ActiveX scripts, they must be converted to .net Script tasks, but there is additional functionality in ssis and you may not need them any more.  There was a glitch though, dts package names cannot have spaces in them or the wizard will hang up before it shows a list of migrate-able packages. 

  • Sorin Petcu

    Hall of Fame

    Points: 3145

    I 've used the Migration Wizard for my DTS packages and it was worked well. But, my new server and database server have different names from the old one. So I have to change the connections from inside every package. How to do that? How to edit the packages since they are already imported in SQL 2005 SSIS? I've tried with BI but there is no deploy option to deploy the package to SSIS repository where my packages are stand on.

    In Theory, theory and practice are the same...In practice, they are not.
  • Samuel Vella

    SSCrazy Eights

    Points: 8055

    Sorin Petcu (2/1/2008)


    I 've used the Migration Wizard for my DTS packages and it was worked well. But, my new server and database server have different names from the old one. So I have to change the connections from inside every package. How to do that? How to edit the packages since they are already imported in SQL 2005 SSIS? I've tried with BI but there is no deploy option to deploy the package to SSIS repository where my packages are stand on.

    I've not had any experience migrating DTS to SSIS, however in the migration options is there no option to migrate to file rather than SQL Server?

  • Pieter-423357

    SSCommitted

    Points: 1619

    Nice article article Jamie!

    Granter I am no master in SSIS, but I have used SSIS for several ETL type applications and when it runs OK, its great. Trouble is that failures happen frequently and often for no reason. the worst part about that is that the error messages won't tell what happened, all you get is a red box. I ended up abolishing most packages in favor of SQL scripts executed via SQLCMD.exe.

    I like the concepts used but it doen't appear to be worked out properly yet.

    my 2 cents

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply