October 1, 2010 at 9:00 am
Hi guys -
I'm trying to decide what to use and I keep going back and forth.
Here's my setup..
Server A(3 tables being joined) = Source Data. I'll have about 10-100 new rows per time I run this package. So not many at all.
Server B = Datawarehouse. Which. Has millions of rows.
I want to just insert those few rows from Server A, but make sure they're not in Server B yet, so a left join would probably be in order (else that's how I'd approach it). This could all be done "normally" using a linked server query:
SELECT [Stuff]
FROM ServerA.Databaes.dbo.TableA a
LEFT JOIN ServerB.Database2.dbo.TableA2 a2 ON a.ID = a2.ID
WHERE a2.ID IS NULL
Then there's the way.
There's also the way where I could put the things I would select from Server A (there's a few tables involved in that "part" of the query). And then put those into a cache connection manager. Then in a new data flow task have a oledb source with the query to be the dataset from Server B (narrowed down as much as I could), run it through a lookup with the Server A query cache as the "cache" part of it. Send the unmatched rows to be inserted..matched ones to be updated.
So which is better / faster for what I'm looking for?
And if that's not enough info let me know and I'll try again 🙂
Thanks!
October 4, 2010 at 5:48 am
It kind of depends I think 🙂
But if you are allowed to use linked servers and if you have good indexes on those tables, I would prefer the left outer join. (if you have SQL Server 2008 you can use the MERGE statement)
You can avoid the whole linked server thing if you write the data from source A to a staging table on server B and then do the matching. As the number of rows are limited, this is feasable.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
October 4, 2010 at 5:56 am
Lookup is always in favour for the reason that no explicit sort is mandatory and the prior to join and check data can be created on the fly while pacakage is in execution.
Raunak J
October 4, 2010 at 6:52 am
Raunak Jhawar (10/4/2010)
Lookup is always in favour for the reason that no explicit sort is mandatory and the prior to join and check data can be created on the fly while pacakage is in execution.
As far as I know, a left outer join doesn't need explicit sorting either.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
October 4, 2010 at 6:59 am
da-zero (10/4/2010)
Raunak Jhawar (10/4/2010)
Lookup is always in favour for the reason that no explicit sort is mandatory and the prior to join and check data can be created on the fly while pacakage is in execution.As far as I know, a left outer join doesn't need explicit sorting either.
A merge join Task requires the dataset be explicitly or assumed implicitly sorted
Raunak J
October 4, 2010 at 7:11 am
Raunak Jhawar (10/4/2010)
da-zero (10/4/2010)
Raunak Jhawar (10/4/2010)
Lookup is always in favour for the reason that no explicit sort is mandatory and the prior to join and check data can be created on the fly while pacakage is in execution.As far as I know, a left outer join doesn't need explicit sorting either.
A merge join Task requires the dataset be explicitly or assumed implicitly sorted
Allright, I was talking about a left outer join in T-SQL.
Who uses the Merge Join anyway 🙂
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
October 4, 2010 at 7:14 am
lolZZZZ:-):-)
Raunak J
October 4, 2010 at 10:18 am
da-zero (10/4/2010)
Raunak Jhawar (10/4/2010)
da-zero (10/4/2010)
Raunak Jhawar (10/4/2010)
Lookup is always in favour for the reason that no explicit sort is mandatory and the prior to join and check data can be created on the fly while pacakage is in execution.As far as I know, a left outer join doesn't need explicit sorting either.
A merge join Task requires the dataset be explicitly or assumed implicitly sorted
Allright, I was talking about a left outer join in T-SQL.
Who uses the Merge Join anyway 🙂
Haha now I'm confused...
Left join would require linked servers though, no?
What's the consequences of having to link them vs just using connection managers and merge join or lookup in ssis?
Are you serious about the merge join? should i steer from it?
thanks!
October 4, 2010 at 10:25 am
yes you should steer clear from merge join as it does not perform very well.
you can do a join in t-sql if the servers are linked or on the same server.
October 4, 2010 at 4:41 pm
So join/linked server would (and do) work...any reason *not* to use the lookup in SSIS with a cache connection manager?
October 5, 2010 at 1:16 am
_rmullen_ (10/4/2010)
So join/linked server would (and do) work...any reason *not* to use the lookup in SSIS with a cache connection manager?
Reasons why I normally don't use the lookup component:
* sensitivity with case and trailing spaces (at least in 2005, not sure for 2008. See http://www.ssistalk.com/2007/12/19/ssis-lookup-component-case-gotcha/ for more detail.)
* the behaviour when there are multiple matches. It only returns the first matched row. See http://msdn.microsoft.com/en-us/library/ms141821(v=SQL.90).aspx for more detail.
* possible extra error handling or correcting null values when there are no matches.
In my opinion, using a join in T-SQL is much more elegant then using the lookup component.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
October 5, 2010 at 2:39 am
Agree with all the points above, the case senstivity quite often trips people up
t-sql join is a lot nicer and has the possibliity of being optimized
October 5, 2010 at 12:03 pm
There's definitely better performance happening when I just query across the linked servers! Thank you!
But if the tools in SSIS behave so poorly..why are they there? Are they meant for small datasets?
October 6, 2010 at 12:22 am
_rmullen_ (10/5/2010)
But if the tools in SSIS behave so poorly..why are they there? Are they meant for small datasets?
Well, that statement is largely incorrect. Some components of SSIS behave poorly. Others behave excellent.
You have blocking components (such as sort or aggregate) who must wait for all the rows to be read before they can do something. These block the stream of data in your memory, so they are performance killers.
But there are a lot of non-blocking components, such as data conversion, derived column, lookup et cetera who don't affect performance. There are even some scenarios where SSIS will outperform excellent, such as using the SQL Server Destination.
It all kind of depends on what you are trying to do. If it is just reading some data and joining it to a table, T-SQL will be faster. If you need to do a lot of in memory operations, SSIS can be faster. (It holds the world record for the transfer of data in the fastest time, by the way)
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
October 6, 2010 at 4:18 pm
Gotcha.
Thanks for the reply 🙂
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply