Looking for reasons...

Question

Looking for reasons...

Brandie Tarvin

SSC Guru

Points: 173116
More actions
May 24, 2011 at 12:29 pm

#145602

I've seen this on the forums before, and have in fact experienced it several times myself. I want to write up an article, but I'm having trouble translating my reasons why (proprietary information conflict) into reasons why it happens in general.
You have two tables. Table A has every record in existence. Table B only has a subset of Table A's records. You need to return a result that has the ID of all records in Table A, with extra columns from Table B that tell you statuses or stuff.
But for X reason, you can't do an OUTER JOIN from Table A to Table B. Your FROM clause has to start from Table B because you need to do fun, weird calculation type stuff or find Max records or something. So you have to start from B then go back to A and get the rest of the records.
Has anyone else dealt with this type of coding? If so, can you give me your reasons (generic or not) for why you did this?
Please note: I am NOT looking for code. I have code. I need business cases for which to explain the code.
Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply

Koen Verbeeck SSC Guru Points: 259197 More actions · Answer 1

I've seen it once, but that was a TSQL puzzle on how to write a left outer join as a right outer join (or something like that).

So no, I haven't seen a valid business case (yet) for this type of question.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

SQLRNNR SSC Guru Points: 281344 More actions · Answer 2

I haven't seen a valid business case yet and would be interested in more details.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Sioban Krzywicki One Orange Chip Points: 27770 More actions · Answer 3

I read it, but I'm not entirely sure what you're looking for. More details or an example of the code would be very helpful.

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams

jcrawf02 SSC-Insane Points: 24198 More actions · Answer 4

We have situations like that where tableA has all claims submitted with our status back to the submitter, tableB has just the ones we in turn submitted to our regulatory agency, and it records their response to us. I have not written it up the way you're suggesting, although maybe you can show me the light 😀

PM me if that sounds like what you're looking for, we generally have to determine the 'final status' version from tableA then look for the MAX() in tableB to figure out current status (claims submitted multiple times due to error corrections)

---------------------------------------------------------
How best to post your question[/url]
How to post performance problems[/url]
Tally Table:What it is and how it replaces a loop[/url]

"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

Chad Crawford SSChampion Points: 11610 More actions · Answer 5

In your example is there was a mix of both left and right joins in the from clause? I'm not sure how the optimizer handles it, but I believe the order would be pretty critical there and something along those lines was the first thing that came to mind.

Out of curiosity, what happens if you switch the table order and swap it from a left to right join or vice-versa? Do you actually get the wrong data back, or is it just an optimization issue?

Thanks,

Chad

Jeff Moden SSC Guru Points: 1004680 More actions · Answer 6

I do it more often than not. Peter Larsson calls the process "pre-aggregation". You've seen me refer to it as "Divide'n'Conquer" methods. I do it for performance because it's frequently much more performant to do so with the amount of data I have to work with because it's (many times) cheaper to join Table A to a several thousand row aggregate of Table B than it is to join Table A to a 5 million row Table B. It also cuts way down on the columns in Group By's, etc, etc, etc, and it makes for some nasty-fast code.

Of course, "It Depends".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

spaghettidba SSC Guru Points: 105732 More actions · Answer 7

I'm not sure that order matters. Personally, I hate RIGHT joins, I find them harder to read: they don't fit well in my mindset.

I tend to write queries starting from the "main" table, then joining with LEFT and I have never found a situation where the optimizer was unable to find the appropriate path to extract data efficiently.

~~Maybe~~ Probably I'm short-sighted and I don't understand exactly what you mean.

Can you give us a small example?

-- Gianluca Sartori

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 8

Something similar: a data feed to a warehouse was comparing the 10M row DW orders table to the 10M row OLTP orders table then aggregating to obtain the most recent order which didn't already exist in the DW. It took 30 minutes to run. Preaggregating the OLTP table then joining reduced the time to 6 seconds.

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Jeff Moden SSC Guru Points: 1004680 More actions · Answer 9

Gianluca Sartori (5/25/2011)
~~Maybe~~ Probably I'm short-sighted and I don't understand exactly what you mean.
Can you give us a small example?

Yes, I believe I can. And I believe it can be done without an OUTER join at all. Of course, a "small example" won't demonstrate an advantage or disadvantage... neither is it in my nature. 😀 I'll try to put something together tonight.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004680 More actions · Answer 10

ChrisM@home (5/25/2011)
Something similar: a data feed to a warehouse was comparing the 10M row DW orders table to the 10M row OLTP orders table then aggregating to obtain the most recent order which didn't already exist in the DW. It took 30 minutes to run. Preaggregating the OLTP table then joining reduced the time to 6 seconds.

Exactly and great example. I may not have to put a physical example together tonight after all.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

spaghettidba SSC Guru Points: 105732 More actions · Answer 11

I see. Thank you for the example Chris. I think I got it now.

-- Gianluca Sartori

spaghettidba SSC Guru Points: 105732 More actions · Answer 12

Jeff Moden (5/25/2011)
Gianluca Sartori (5/25/2011)
~~Maybe~~ Probably I'm short-sighted and I don't understand exactly what you mean.
Can you give us a small example?
Yes, I believe I can. And I believe it can be done without an OUTER join at all. Of course, a "small example" won't demonstrate an advantage or disadvantage... neither is it in my nature. 😀 I'll try to put something together tonight.

My bad, Jeff. The request for an example was meant for Brandie.

Fortunately Chris came up with something a thick-head like myself can understand. 🙂

-- Gianluca Sartori

Brandie Tarvin SSC Guru Points: 173116 More actions · Answer 13

The reason I'm requesting business cases is because the example I have has the potential of violating business NDA. I'm looking for an example I can use publicly, which is why I haven't given one.

The idea is that you could do the code with a left or right outer join (and at least one subquery), but it's inefficient, clunky and can be avoided with, as others have mentioned, pre-aggregation.

The datawarehouse idea is interesting, but not detailed enough. My current pain point is designing tables and faux data to explain the code without using proprietary company information. For that, I need a business type (retail, airline, banking, etc.) and the complex user request that leads to the pre-aggregation piece.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Sioban Krzywicki One Orange Chip Points: 27770 More actions · Answer 14

Brandie Tarvin (5/25/2011)
The reason I'm requesting business cases is because the example I have has the potential of violating business NDA. I'm looking for an example I can use publicly, which is why I haven't given one.
The idea is that you could do the code with a left or right outer join (and at least one subquery), but it's inefficient, clunky and can be avoided with, as others have mentioned, pre-aggregation.
The datawarehouse idea is interesting, but not detailed enough. My current pain point is designing tables and faux data to explain the code without using proprietary company information. For that, I need a business type (retail, airline, banking, etc.) and the complex user request that leads to the pre-aggregation piece.

I think I have the idea at this point and if I'm understanding correctly, I use this for various rollups of data fairly frequently. It is especially helpful when you want to aggregate values in 2 different tables and then match the results back to one of those tables.

Sometimes I do the pre-aggregation into a temp table instead of using subqueries.

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams