Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Join yeilds different results if used with CTE or a real table


Join yeilds different results if used with CTE or a real table

Author
Message
itlk
itlk
Grasshopper
Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)

Group: General Forum Members
Points: 10 Visits: 122
Hi,
Here is the SQL I used as an example:



--CREATE AND POPULATE TABLE AAA ------------------------------------------
CREATE TABLE [dbo].[AAA](
[Col1] [nchar](10) NOT NULL,
[RowNumber] [int] NOT NULL
) ON [PRIMARY]

GO

INSERT INTO [dbo].[AAA]
([Col1]
,[RowNumber])
VALUES
('a', 1),
('s', 2),
('d', 3),
('e', 4),
('f', 5),
('g', 6)
GO

--QUERY No 1 ----------------------------------------------------------------
;with BBB as
(
select top 3 ABS(checksum(newid())) % 6 + 1 as RandomRow, RowNumber, Col1
from AAA
order by RowNumber
)
select A.Col1, A.RowNumber
from AAA as A
join BBB
On A.RowNumber = BBB.RandomRow;

--QUERY No 2 ----------------------------------------------------------------
select top 3 ABS(checksum(newid())) % 6 + 1 as RandomRow, RowNumber, AAA.Col1
into BBB
from AAA
order by RowNumber;

select A.Col1, A.RowNumber
from AAA as A
join BBB
On A.RowNumber = BBB.RandomRow;



--Cleaning -------------------------------------------------------------------
--drop table BBB;
--drop table AAA;





I have a table AAA with just two columns, one of which is a row number. I use CTE to create another table “BBB”, which based on the AAA. The BBB table contains the same columns as the AAA table, plus a new column RandomRow.

The RandomRow is populated by random numbers, generated from the range 1 to @N, where @N is the number of rows in the table AAA – in this case 6. The table BBB contains 3 rows only: the row numbers will be 1,2,3.

The Select statement should return rows from AAA, which are identified by the number in the [BBB].[RandomRows].
The Select statement contains the join, which is expected to return exactly 3 rows, because the BBB is subset of AAA. However, each time I run the query the Select returns different numbers of rows.

To see the effect, please try to re-run the query No 1 several times, and you should get different numbers of returned rows each time.

If I replace the CTE with real table (see the query 2/) I get expected results, exactly 3 rows each time I run the query.

My questions are:

A/ Why the join yields different results for CTE and for the real table?
B/ If I replace the join in the query No 1 with RIGHT OUTER JOIN, the query works fine and returns just 3 rows. However, I think it should work with “join” also, because the BBB is subset of AAA, and number in RandomRows column will always identify the valid row in AAA.

Thank you for help.
Evil Kraig F
Evil Kraig F
SSCertifiable
SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)

Group: General Forum Members
Points: 5701 Visits: 7660
itlk (3/21/2013)

A/ Why the join yields different results for CTE and for the real table?
B/ If I replace the join in the query No 1 with RIGHT OUTER JOIN, the query works fine and returns just 3 rows. However, I think it should work with “join” also, because the BBB is subset of AAA, and number in RandomRows column will always identify the valid row in AAA.

Thank you for help.



A) You're connecting Rownumber to randomRow, which changes EACH time you run the cte, as you generate a newID each run. You'll be very variable there. You need to connect rownumber to rownumber for consistency.

B) Right outer join is going to return you the three rows from BBB, sure, with varying connectivity.

It all revolves around you using a randomized value in the cte for your join.


- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
itlk
itlk
Grasshopper
Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)

Group: General Forum Members
Points: 10 Visits: 122
Thanks for answer.

I am not concerning about values per se. Sure, the numbers are random, so values will be different each time I run the query.

My concern is, that I should get each time exactly 3 rows (I am not voried about the content of rows). With CTE I get less or more rows, each time I run the query.

With the real table, the same query returns exactly 3 rows.

Why is the difference between CTE and real table?

Thanks.
Mansfield
Mansfield
Old Hand
Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)

Group: General Forum Members
Points: 370 Visits: 394
Interesting. Looks like you hit a similar problem to this:
http://www.glorf.it/blog/2008/05/16/sql-talk/sql-server-is-not-aware-of-nondeterministic-functions

In query #1, the non-deterministic function in the subquery is applied after the join (also note the "No Join Predicate" warning in the join operator.



The non-deterministic subquery has already been materialised in query #2, therefore is guaranteed to be correct.
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45147 Visits: 39923
You're joining to an indeterminent value and at execution plan time, it can't determine what to join on and so it ignores the predicate. Look at the execution plan for the "Nested Loops (Inner Join)" symbol. Notice the little yellow alert symbol on it. If you right click on the symbol and bring up "Properties", you'll see a warning message that says "No Join Predicate".

{Edit} Heh... beat out by 2 minutes. :-)

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Evil Kraig F
Evil Kraig F
SSCertifiable
SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)

Group: General Forum Members
Points: 5701 Visits: 7660
EDIT: Please ignore this entire post. One dog, wrong tree, right forest.


- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Attachments
InnerQueryPlan.sqlplan (0 views, 24.00 KB)
OuterQueryPlan.sqlplan (0 views, 24.00 KB)
Evil Kraig F
Evil Kraig F
SSCertifiable
SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)

Group: General Forum Members
Points: 5701 Visits: 7660
Jeff Moden (3/21/2013)
You're joining to an indeterminent value and at execution plan time, it can't determine what to join on and so it ignores the predicate. Look at the execution plan for the "Nested Loops (Inner Join)" symbol. Notice the little yellow alert symbol on it. If you right click on the symbol and bring up "Properties", you'll see a warning message that says "No Join Predicate".

{Edit} Heh... beat out by 2 minutes. :-)


Hey Jeff, I know that no join predicate basically turns it into a FULL OUTER JOIN, but why is the filter failing on its predicate upstream?

EDIT:

I want to clarify this question. To my understanding there's a few possibilities of how this plan actually results, but I'm not sure which is right and none of them show the logic that my conclusions reach.

First, we go out to a table of 3 and 6 rows, and turn it into a FULL OUTER JOIN query, returning that dataset back to the Compute Scalar request. The 18 rows in the actual confirm this as a reasonable pathing.

Now, the Compute Scalar occurs for each row (non-deterministic). So, one of two things happen. Either each row gets the exact same NEWID() (which isn't true) or they all get different NEWIDs. In either case, the result is the same... a value between 1 and 6.

The predicate filter is... Oh god bless rubber ducky coding... of course.

Because each of these rows are getting unique ABS(Checksum results, and they are then compared INTERNALLY to the row, instead of between the rowsets... *headsmack*... of course.

Answer: The Join is evaluated long after any join is involved, and the newID result is checked on each row uniquely. Whatever comes out of there is just luck that they matched up.


- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
itlk
itlk
Grasshopper
Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)

Group: General Forum Members
Points: 10 Visits: 122
Thanks for help.
ChrisM@Work
ChrisM@Work
SSCrazy Eights
SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)

Group: General Forum Members
Points: 8995 Visits: 19028
It's down to an optimisation introduced in SQL Server2005 often called "Deferred expression evaluation". Here's a great article by Paul White to add to others already posted.

The original "failing" query can be modified so that the calculation is written as a row-level CROSS APPLY:


WITH bbb AS (
SELECT TOP 3 x.RandomRow
FROM AAA
CROSS APPLY (SELECT RandomRow = ABS(checksum(newid())) % 6 + 1) x
ORDER BY RowNumber)
SELECT a.Col1, bbb.RandomRow
FROM AAA AS a
JOIN bbb
ON a.RowNumber = bbb.RandomRow;



The execution plan is identical to Query No 1:



Changing the CROSS APPLY to an OUTER APPLY

WITH bbb AS (
SELECT TOP 3 x.RandomRow
FROM AAA
OUTER APPLY (SELECT RandomRow = ABS(checksum(newid())) % 6 + 1) x
ORDER BY RowNumber)
SELECT a.Col1, bbb.RandomRow
FROM AAA AS a
JOIN bbb
ON a.RowNumber = bbb.RandomRow;


is sufficient to prevent the optimiser from implementing deferred expression evaluation and it generates a completely different plan where RandomRow is evaluated prior to the join between the CTE and the table:



With RandomRow evaluated prior to the join, it's now available to use as a join predicate and the query returns the expected results.

“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45147 Visits: 39923
Evil Kraig F (3/21/2013)
Jeff Moden (3/21/2013)
You're joining to an indeterminent value and at execution plan time, it can't determine what to join on and so it ignores the predicate. Look at the execution plan for the "Nested Loops (Inner Join)" symbol. Notice the little yellow alert symbol on it. If you right click on the symbol and bring up "Properties", you'll see a warning message that says "No Join Predicate".

{Edit} Heh... beat out by 2 minutes. :-)


Hey Jeff, I know that no join predicate basically turns it into a FULL OUTER JOIN, but why is the filter failing on its predicate upstream?

EDIT:

I want to clarify this question. To my understanding there's a few possibilities of how this plan actually results, but I'm not sure which is right and none of them show the logic that my conclusions reach.

First, we go out to a table of 3 and 6 rows, and turn it into a FULL OUTER JOIN query, returning that dataset back to the Compute Scalar request. The 18 rows in the actual confirm this as a reasonable pathing.

Now, the Compute Scalar occurs for each row (non-deterministic). So, one of two things happen. Either each row gets the exact same NEWID() (which isn't true) or they all get different NEWIDs. In either case, the result is the same... a value between 1 and 6.

The predicate filter is... Oh god bless rubber ducky coding... of course.

Because each of these rows are getting unique ABS(Checksum results, and they are then compared INTERNALLY to the row, instead of between the rowsets... *headsmack*... of course.

Answer: The Join is evaluated long after any join is involved, and the newID result is checked on each row uniquely. Whatever comes out of there is just luck that they matched up.


Correct. The only difference is that a lack of a predicate doesn't make a FULL OUTER JOIN. It makes a CROSS JOIN.

Heh... that's one of my favoritie interview questions, BTW. :-P

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search