SSIS Cache/Lookup vs Linked Server Query

  • Hi guys -

    I'm trying to decide what to use and I keep going back and forth.

    Here's my setup..

    Server A(3 tables being joined) = Source Data. I'll have about 10-100 new rows per time I run this package. So not many at all.

    Server B = Datawarehouse. Which. Has millions of rows.

    I want to just insert those few rows from Server A, but make sure they're not in Server B yet, so a left join would probably be in order (else that's how I'd approach it). This could all be done "normally" using a linked server query:

    SELECT [Stuff]

    FROM ServerA.Databaes.dbo.TableA a

    LEFT JOIN ServerB.Database2.dbo.TableA2 a2 ON a.ID = a2.ID

    WHERE a2.ID IS NULL

    Then there's the way.

    There's also the way where I could put the things I would select from Server A (there's a few tables involved in that "part" of the query). And then put those into a cache connection manager. Then in a new data flow task have a oledb source with the query to be the dataset from Server B (narrowed down as much as I could), run it through a lookup with the Server A query cache as the "cache" part of it. Send the unmatched rows to be inserted..matched ones to be updated.

    So which is better / faster for what I'm looking for?

    And if that's not enough info let me know and I'll try again 🙂

    Thanks!

  • It kind of depends I think 🙂

    But if you are allowed to use linked servers and if you have good indexes on those tables, I would prefer the left outer join. (if you have SQL Server 2008 you can use the MERGE statement)

    You can avoid the whole linked server thing if you write the data from source A to a staging table on server B and then do the matching. As the number of rows are limited, this is feasable.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Lookup is always in favour for the reason that no explicit sort is mandatory and the prior to join and check data can be created on the fly while pacakage is in execution.

    Raunak J

  • Raunak Jhawar (10/4/2010)


    Lookup is always in favour for the reason that no explicit sort is mandatory and the prior to join and check data can be created on the fly while pacakage is in execution.

    As far as I know, a left outer join doesn't need explicit sorting either.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • da-zero (10/4/2010)


    Raunak Jhawar (10/4/2010)


    Lookup is always in favour for the reason that no explicit sort is mandatory and the prior to join and check data can be created on the fly while pacakage is in execution.

    As far as I know, a left outer join doesn't need explicit sorting either.

    A merge join Task requires the dataset be explicitly or assumed implicitly sorted

    Raunak J

  • Raunak Jhawar (10/4/2010)


    da-zero (10/4/2010)


    Raunak Jhawar (10/4/2010)


    Lookup is always in favour for the reason that no explicit sort is mandatory and the prior to join and check data can be created on the fly while pacakage is in execution.

    As far as I know, a left outer join doesn't need explicit sorting either.

    A merge join Task requires the dataset be explicitly or assumed implicitly sorted

    Allright, I was talking about a left outer join in T-SQL.

    Who uses the Merge Join anyway 🙂

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • lolZZZZ:-):-)

    Raunak J

  • da-zero (10/4/2010)


    Raunak Jhawar (10/4/2010)


    da-zero (10/4/2010)


    Raunak Jhawar (10/4/2010)


    Lookup is always in favour for the reason that no explicit sort is mandatory and the prior to join and check data can be created on the fly while pacakage is in execution.

    As far as I know, a left outer join doesn't need explicit sorting either.

    A merge join Task requires the dataset be explicitly or assumed implicitly sorted

    Allright, I was talking about a left outer join in T-SQL.

    Who uses the Merge Join anyway 🙂

    Haha now I'm confused...

    Left join would require linked servers though, no?

    What's the consequences of having to link them vs just using connection managers and merge join or lookup in ssis?

    Are you serious about the merge join? should i steer from it?

    thanks!

  • yes you should steer clear from merge join as it does not perform very well.

    you can do a join in t-sql if the servers are linked or on the same server.

  • So join/linked server would (and do) work...any reason *not* to use the lookup in SSIS with a cache connection manager?

  • _rmullen_ (10/4/2010)


    So join/linked server would (and do) work...any reason *not* to use the lookup in SSIS with a cache connection manager?

    Reasons why I normally don't use the lookup component:

    * sensitivity with case and trailing spaces (at least in 2005, not sure for 2008. See http://www.ssistalk.com/2007/12/19/ssis-lookup-component-case-gotcha/ for more detail.)

    * the behaviour when there are multiple matches. It only returns the first matched row. See http://msdn.microsoft.com/en-us/library/ms141821(v=SQL.90).aspx for more detail.

    * possible extra error handling or correcting null values when there are no matches.

    In my opinion, using a join in T-SQL is much more elegant then using the lookup component.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Agree with all the points above, the case senstivity quite often trips people up

    t-sql join is a lot nicer and has the possibliity of being optimized

  • There's definitely better performance happening when I just query across the linked servers! Thank you!

    But if the tools in SSIS behave so poorly..why are they there? Are they meant for small datasets?

  • _rmullen_ (10/5/2010)


    But if the tools in SSIS behave so poorly..why are they there? Are they meant for small datasets?

    Well, that statement is largely incorrect. Some components of SSIS behave poorly. Others behave excellent.

    You have blocking components (such as sort or aggregate) who must wait for all the rows to be read before they can do something. These block the stream of data in your memory, so they are performance killers.

    But there are a lot of non-blocking components, such as data conversion, derived column, lookup et cetera who don't affect performance. There are even some scenarios where SSIS will outperform excellent, such as using the SQL Server Destination.

    It all kind of depends on what you are trying to do. If it is just reading some data and joining it to a table, T-SQL will be faster. If you need to do a lot of in memory operations, SSIS can be faster. (It holds the world record for the transfer of data in the fastest time, by the way)

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Gotcha.

    Thanks for the reply 🙂

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply