SSIS Cache/Lookup vs Linked Server Query

Question

SSIS Cache/Lookup vs Linked Server Query

_rmullen_

SSC-Addicted

Points: 432
More actions
October 1, 2010 at 9:00 am

#239087

Hi guys -
I'm trying to decide what to use and I keep going back and forth.
Here's my setup..
Server A(3 tables being joined) = Source Data. I'll have about 10-100 new rows per time I run this package. So not many at all.
Server B = Datawarehouse. Which. Has millions of rows.
I want to just insert those few rows from Server A, but make sure they're not in Server B yet, so a left join would probably be in order (else that's how I'd approach it). This could all be done "normally" using a linked server query:
SELECT [Stuff]
FROM ServerA.Databaes.dbo.TableA a
LEFT JOIN ServerB.Database2.dbo.TableA2 a2 ON a.ID = a2.ID
WHERE a2.ID IS NULL
Then there's the way.
There's also the way where I could put the things I would select from Server A (there's a few tables involved in that "part" of the query). And then put those into a cache connection manager. Then in a new data flow task have a oledb source with the query to be the dataset from Server B (narrowed down as much as I could), run it through a lookup with the Server A query cache as the "cache" part of it. Send the unmatched rows to be inserted..matched ones to be updated.
So which is better / faster for what I'm looking for?
And if that's not enough info let me know and I'll try again 🙂
Thanks!

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Koen Verbeeck SSC Guru Points: 259085 More actions · Answer 1

It kind of depends I think 🙂

But if you are allowed to use linked servers and if you have good indexes on those tables, I would prefer the left outer join. (if you have SQL Server 2008 you can use the MERGE statement)

You can avoid the whole linked server thing if you write the data from source A to a staging table on server B and then do the matching. As the number of rows are limited, this is feasable.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Raunak Jhawar SSCoach Points: 15701 More actions · Answer 2

Lookup is always in favour for the reason that no explicit sort is mandatory and the prior to join and check data can be created on the fly while pacakage is in execution.

Raunak J

Koen Verbeeck SSC Guru Points: 259085 More actions · Answer 3

Raunak Jhawar (10/4/2010)
Lookup is always in favour for the reason that no explicit sort is mandatory and the prior to join and check data can be created on the fly while pacakage is in execution.

As far as I know, a left outer join doesn't need explicit sorting either.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Raunak Jhawar SSCoach Points: 15701 More actions · Answer 4

da-zero (10/4/2010)
Raunak Jhawar (10/4/2010)
Lookup is always in favour for the reason that no explicit sort is mandatory and the prior to join and check data can be created on the fly while pacakage is in execution.
As far as I know, a left outer join doesn't need explicit sorting either.

A merge join Task requires the dataset be explicitly or assumed implicitly sorted

Raunak J

Koen Verbeeck SSC Guru Points: 259085 More actions · Answer 5

Raunak Jhawar (10/4/2010)
da-zero (10/4/2010)
Raunak Jhawar (10/4/2010)
Lookup is always in favour for the reason that no explicit sort is mandatory and the prior to join and check data can be created on the fly while pacakage is in execution.
As far as I know, a left outer join doesn't need explicit sorting either.
A merge join Task requires the dataset be explicitly or assumed implicitly sorted

Allright, I was talking about a left outer join in T-SQL.

Who uses the Merge Join anyway 🙂

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Raunak Jhawar SSCoach Points: 15701 More actions · Answer 6

Raunak Jhawar

SSCoach

Points: 15701

October 4, 2010 at 7:14 am

#1230979

lolZZZZ:-):-)

Raunak J

_rmullen_ SSC-Addicted Points: 432 More actions · Answer 7

da-zero (10/4/2010)
Raunak Jhawar (10/4/2010)
da-zero (10/4/2010)
Raunak Jhawar (10/4/2010)
Lookup is always in favour for the reason that no explicit sort is mandatory and the prior to join and check data can be created on the fly while pacakage is in execution.
As far as I know, a left outer join doesn't need explicit sorting either.
A merge join Task requires the dataset be explicitly or assumed implicitly sorted
Allright, I was talking about a left outer join in T-SQL.
Who uses the Merge Join anyway 🙂

Haha now I'm confused...

Left join would require linked servers though, no?

What's the consequences of having to link them vs just using connection managers and merge join or lookup in ssis?

Are you serious about the merge join? should i steer from it?

thanks!

steveb. SSC-Forever Points: 46733 More actions · Answer 8

yes you should steer clear from merge join as it does not perform very well.

you can do a join in t-sql if the servers are linked or on the same server.

_rmullen_ SSC-Addicted Points: 432 More actions · Answer 9

So join/linked server would (and do) work...any reason *not* to use the lookup in SSIS with a cache connection manager?

Koen Verbeeck SSC Guru Points: 259085 More actions · Answer 10

_rmullen_ (10/4/2010)
So join/linked server would (and do) work...any reason *not* to use the lookup in SSIS with a cache connection manager?

Reasons why I normally don't use the lookup component:

* sensitivity with case and trailing spaces (at least in 2005, not sure for 2008. See http://www.ssistalk.com/2007/12/19/ssis-lookup-component-case-gotcha/ for more detail.)

* the behaviour when there are multiple matches. It only returns the first matched row. See http://msdn.microsoft.com/en-us/library/ms141821(v=SQL.90).aspx for more detail.

* possible extra error handling or correcting null values when there are no matches.

In my opinion, using a join in T-SQL is much more elegant then using the lookup component.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

steveb. SSC-Forever Points: 46733 More actions · Answer 11

Agree with all the points above, the case senstivity quite often trips people up

t-sql join is a lot nicer and has the possibliity of being optimized

_rmullen_ SSC-Addicted Points: 432 More actions · Answer 12

There's definitely better performance happening when I just query across the linked servers! Thank you!

But if the tools in SSIS behave so poorly..why are they there? Are they meant for small datasets?

Koen Verbeeck SSC Guru Points: 259085 More actions · Answer 13

_rmullen_ (10/5/2010)
But if the tools in SSIS behave so poorly..why are they there? Are they meant for small datasets?

Well, that statement is largely incorrect. Some components of SSIS behave poorly. Others behave excellent.

You have blocking components (such as sort or aggregate) who must wait for all the rows to be read before they can do something. These block the stream of data in your memory, so they are performance killers.

But there are a lot of non-blocking components, such as data conversion, derived column, lookup et cetera who don't affect performance. There are even some scenarios where SSIS will outperform excellent, such as using the SQL Server Destination.

It all kind of depends on what you are trying to do. If it is just reading some data and joining it to a table, T-SQL will be faster. If you need to do a lot of in memory operations, SSIS can be faster. (It holds the world record for the transfer of data in the fastest time, by the way)

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

_rmullen_ SSC-Addicted Points: 432 More actions · Answer 14

_rmullen_

SSC-Addicted

Points: 432

October 6, 2010 at 4:18 pm

#1233180

Gotcha.

Thanks for the reply 🙂