SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Exotic Joins in T-SQL and How They Translate in SSIS


Exotic Joins in T-SQL and How They Translate in SSIS

Author
Message
Nick Maroudas
Nick Maroudas
SSC Eights!
SSC Eights! (911 reputation)SSC Eights! (911 reputation)SSC Eights! (911 reputation)SSC Eights! (911 reputation)SSC Eights! (911 reputation)SSC Eights! (911 reputation)SSC Eights! (911 reputation)SSC Eights! (911 reputation)

Group: General Forum Members
Points: 911 Visits: 101
Comments posted to this topic are about the item Exotic Joins in T-SQL and How They Translate in SSIS
Per Jørgensen
Per Jørgensen
Grasshopper
Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)

Group: General Forum Members
Points: 10 Visits: 19
This is not a real join since only the first row from the lookup table is returned:-(
I have had the same problem and my experience is that there is not a general way to do joins in SSIS but your solution is very dependant on the problem at hand. In worst case you have to make a script task.
djedgar
djedgar
Valued Member
Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)

Group: General Forum Members
Points: 64 Visits: 35
It's only one step to realize that better solution will be move all joins to database with staging area Smile
Jamie Thomson
Jamie Thomson
SSCarpal Tunnel
SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)

Group: General Forum Members
Points: 4803 Visits: 188
Interesting. You say that this improved 10 minutes from using a Merge Join - that doesn't surprise me. I'd be more interested in knowing if there is a performance improvement/degredation compared to simply running it as a SQL statement.

Can I also ask why you chose to implement as a data flow given that you already had it working in T-SQL?

-Jamie

Jamie Thomson
http://sqlblog.com/blogs/jamie_thomson
Nick Maroudas
Nick Maroudas
SSC Eights!
SSC Eights! (911 reputation)SSC Eights! (911 reputation)SSC Eights! (911 reputation)SSC Eights! (911 reputation)SSC Eights! (911 reputation)SSC Eights! (911 reputation)SSC Eights! (911 reputation)SSC Eights! (911 reputation)

Group: General Forum Members
Points: 911 Visits: 101
Hi Jamie, thanks for your interest on this topic.

To answer the second part of your query first, the decision to implement as a dataflow rather than use the T-SQL in a stored procedure or in a SQL task within SSIS is mainly legacy based. Although the ability to change the way the package works is far simpler in SSIS than having to trudge through the SQL. This will also make it easier for future developers to follow on and improve on existing flows within the package.

On improvement/degradation,
In my experience and that is not the same as everyone else, the improvement on the SQL was markedly greater, although when developing our initial solution it was done mainly on a workstation. I will look into getting the server stats for you with regard to the differences.

Nick.
Luke Douglas-493452
Luke Douglas-493452
Grasshopper
Grasshopper (11 reputation)Grasshopper (11 reputation)Grasshopper (11 reputation)Grasshopper (11 reputation)Grasshopper (11 reputation)Grasshopper (11 reputation)Grasshopper (11 reputation)Grasshopper (11 reputation)

Group: General Forum Members
Points: 11 Visits: 9
Interesting article. I ran into something similar recently, but in our environment, the overhead of the individual select statements was too much for our production environment. I ended up implementing a script task (and started thinking about breaking it out into a new data flow component).
Jamie Thomson
Jamie Thomson
SSCarpal Tunnel
SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)

Group: General Forum Members
Points: 4803 Visits: 188
Nick Maroudas (2/4/2008)
Hi Jamie, thanks for your interest on this topic.

To answer the second part of your query first, the decision to implement as a dataflow rather than use the T-SQL in a stored procedure or in a SQL task within SSIS is mainly legacy based. Although the ability to change the way the package works is far simpler in SSIS than having to trudge through the SQL. This will also make it easier for future developers to follow on and improve on existing flows within the package.

Very interesting justifications. I wrote an article once comparing these 2 options which you can see here: http://blogs.conchango.com/jamiethomson/archive/2006/03/14/SSIS_3A00_-Data-flows-or-T_2D00_SQL.aspx
I'd be really grateful if you could add your comments into the comments section there.


On improvement/degradation,
In my experience and that is not the same as everyone else, the improvement on the SQL was markedly greater, although when developing our initial solution it was done mainly on a workstation. I will look into getting the server stats for you with regard to the differences.

Nick.

I would expect T-SQL to be significantly faster. You're using a row-by-row lookup which will be very slow in comparison.

-Jamie

Jamie Thomson
http://sqlblog.com/blogs/jamie_thomson
djedgar
djedgar
Valued Member
Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)

Group: General Forum Members
Points: 64 Visits: 35

To answer the second part of your query first, the decision to implement as a dataflow rather than use the T-SQL in a stored procedure or in a SQL task within SSIS is mainly legacy based. Although the ability to change the way the package works is far simpler in SSIS than having to trudge through the SQL. This will also make it easier for future developers to follow on and improve on existing flows within the package.



Maybe change SSIS task is simple and fast but you don’t forget about software development and ETL other tracks.
- Find impact changes
- Forward changes to test environment
- Multi user development
- Test changes (fe. blackbox test input – output)
- Debug and error handling
- Versioning
- Deployments task
- Logic for concurrent loading/ incidental loading…
… etc.
Maybe it’s only my opinion but TSQL rules !
Jamie Thomson
Jamie Thomson
SSCarpal Tunnel
SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)

Group: General Forum Members
Points: 4803 Visits: 188
djedgar (2/4/2008)

To answer the second part of your query first, the decision to implement as a dataflow rather than use the T-SQL in a stored procedure or in a SQL task within SSIS is mainly legacy based. Although the ability to change the way the package works is far simpler in SSIS than having to trudge through the SQL. This will also make it easier for future developers to follow on and improve on existing flows within the package.



Maybe change SSIS task is simple and fast but you don’t forget about software development and ETL other tracks.
- Find impact changes
- Forward changes to test environment
- Multi user development
- Test changes (fe. blackbox test input – output)
- Debug and error handling
- Versioning
- Deployments task
- Logic for concurrent loading/ incidental loading…
… etc.
Maybe it’s only my opinion but TSQL rules !



I don't think its only your opinion Smile

But I can't help but take issue with some of the things that you say there:
-Find impact changes SSIS has this. Limited, but it has it.
-Forward changes to test environment If this means its possible to promote code to different environments then SSIS has this too.
-Multi user development SSIS has this
-Debug and error handling SSIS has this
-Versioning SSIS has this (versioning of packages). And unless there's some sort of new feature that I don't know about there is no versioning of stored procs within the database engine
-Logic for concurrent loading/ incidental loading Not quite sure what this means but with SSIS its possible to load multiple tables concurrently.

And that's just my opinion Smile

-Jamie

Jamie Thomson
http://sqlblog.com/blogs/jamie_thomson
djedgar
djedgar
Valued Member
Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)

Group: General Forum Members
Points: 64 Visits: 35
I didn’t write that it’s missing and I don’t see fault in SSIS only
Let’s evaluate

-Forward changes to test environment If this means its possible to promote code to different
environments then SSIS has this too.

Promote code to different environments should freeze projects but when you want
test package you must write\reconfigure connection – potential errors !

-Multi user development SSIS has this

It’s for me freezing code for some task . We need to implement a little complex data flow task
we have two programmers how we could divide jobs for them ?

-Debug and error handling SSIS has this

Yes but I prefer try catch statements with nested functionality

-Versioning SSIS has this (versioning of packages). And unless there's some sort of new feature
that I don't know about there is no versioning of stored procs within the database engine

But if you can use not commercial concurrent versioning system it’s maybe problem delegate
code because xml include some binary data.

-Logic for concurrent loading/ incidental loading Not quite sure what this means but with SSIS
its possible to load multiple tables concurrently.

In real life ETL is very complex and has very dependent task. When something
is going wrong it’s needed quick find the errors and reprocess flow.
In some situations the best choice will be ignore error data, how can admin do it without modification
packages ? Why to the @#!#!! when you run package in agent job you can only see it as one atomic job. I want to say that SSIS package without more flexibly control loading tool (based on metadata) it’s condemned only on run in dubug mode Wink
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum







































































































































































SQLServerCentral


Search