Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Join yeilds different results if used with CTE or a real table Expand / Collapse
Author
Message
Posted Thursday, March 21, 2013 5:22 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, December 4, 2013 12:24 AM
Points: 10, Visits: 113
Hi,
Here is the SQL I used as an example:


--CREATE AND POPULATE TABLE AAA ------------------------------------------
CREATE TABLE [dbo].[AAA](
[Col1] [nchar](10) NOT NULL,
[RowNumber] [int] NOT NULL
) ON [PRIMARY]

GO

INSERT INTO [dbo].[AAA]
([Col1]
,[RowNumber])
VALUES
('a', 1),
('s', 2),
('d', 3),
('e', 4),
('f', 5),
('g', 6)
GO

--QUERY No 1 ----------------------------------------------------------------
;with BBB as
(
select top 3 ABS(checksum(newid())) % 6 + 1 as RandomRow, RowNumber, Col1
from AAA
order by RowNumber
)
select A.Col1, A.RowNumber
from AAA as A
join BBB
On A.RowNumber = BBB.RandomRow;

--QUERY No 2 ----------------------------------------------------------------
select top 3 ABS(checksum(newid())) % 6 + 1 as RandomRow, RowNumber, AAA.Col1
into BBB
from AAA
order by RowNumber;

select A.Col1, A.RowNumber
from AAA as A
join BBB
On A.RowNumber = BBB.RandomRow;



--Cleaning -------------------------------------------------------------------
--drop table BBB;
--drop table AAA;



I have a table AAA with just two columns, one of which is a row number. I use CTE to create another table “BBB”, which based on the AAA. The BBB table contains the same columns as the AAA table, plus a new column RandomRow.

The RandomRow is populated by random numbers, generated from the range 1 to @N, where @N is the number of rows in the table AAA – in this case 6. The table BBB contains 3 rows only: the row numbers will be 1,2,3.

The Select statement should return rows from AAA, which are identified by the number in the [BBB].[RandomRows].
The Select statement contains the join, which is expected to return exactly 3 rows, because the BBB is subset of AAA. However, each time I run the query the Select returns different numbers of rows.

To see the effect, please try to re-run the query No 1 several times, and you should get different numbers of returned rows each time.

If I replace the CTE with real table (see the query 2/) I get expected results, exactly 3 rows each time I run the query.

My questions are:

A/ Why the join yields different results for CTE and for the real table?
B/ If I replace the join in the query No 1 with RIGHT OUTER JOIN, the query works fine and returns just 3 rows. However, I think it should work with “join” also, because the BBB is subset of AAA, and number in RandomRows column will always identify the valid row in AAA.

Thank you for help.

Post #1434084
Posted Thursday, March 21, 2013 5:30 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 5:23 PM
Points: 6,131, Visits: 7,170
itlk (3/21/2013)

A/ Why the join yields different results for CTE and for the real table?
B/ If I replace the join in the query No 1 with RIGHT OUTER JOIN, the query works fine and returns just 3 rows. However, I think it should work with “join” also, because the BBB is subset of AAA, and number in RandomRows column will always identify the valid row in AAA.

Thank you for help.



A) You're connecting Rownumber to randomRow, which changes EACH time you run the cte, as you generate a newID each run. You'll be very variable there. You need to connect rownumber to rownumber for consistency.

B) Right outer join is going to return you the three rows from BBB, sure, with varying connectivity.

It all revolves around you using a randomized value in the cte for your join.



- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Post #1434089
Posted Thursday, March 21, 2013 5:55 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, December 4, 2013 12:24 AM
Points: 10, Visits: 113
Thanks for answer.

I am not concerning about values per se. Sure, the numbers are random, so values will be different each time I run the query.

My concern is, that I should get each time exactly 3 rows (I am not voried about the content of rows). With CTE I get less or more rows, each time I run the query.

With the real table, the same query returns exactly 3 rows.

Why is the difference between CTE and real table?

Thanks.
Post #1434092
Posted Thursday, March 21, 2013 6:45 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, July 15, 2014 2:20 PM
Points: 364, Visits: 384
Interesting. Looks like you hit a similar problem to this:
http://www.glorf.it/blog/2008/05/16/sql-talk/sql-server-is-not-aware-of-nondeterministic-functions

In query #1, the non-deterministic function in the subquery is applied after the join (also note the "No Join Predicate" warning in the join operator.



The non-deterministic subquery has already been materialised in query #2, therefore is guaranteed to be correct.

Post #1434099
Posted Thursday, March 21, 2013 6:47 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 2:50 PM
Points: 36,711, Visits: 31,159
You're joining to an indeterminent value and at execution plan time, it can't determine what to join on and so it ignores the predicate. Look at the execution plan for the "Nested Loops (Inner Join)" symbol. Notice the little yellow alert symbol on it. If you right click on the symbol and bring up "Properties", you'll see a warning message that says "No Join Predicate".

{Edit} Heh... beat out by 2 minutes.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1434101
Posted Thursday, March 21, 2013 6:54 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 5:23 PM
Points: 6,131, Visits: 7,170
EDIT: Please ignore this entire post. One dog, wrong tree, right forest.


- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA


  Post Attachments 
InnerQueryPlan.sqlplan (0 views, 24.96 KB)
OuterQueryPlan.sqlplan (0 views, 24.96 KB)
Post #1434102
Posted Thursday, March 21, 2013 6:56 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 5:23 PM
Points: 6,131, Visits: 7,170
Jeff Moden (3/21/2013)
You're joining to an indeterminent value and at execution plan time, it can't determine what to join on and so it ignores the predicate. Look at the execution plan for the "Nested Loops (Inner Join)" symbol. Notice the little yellow alert symbol on it. If you right click on the symbol and bring up "Properties", you'll see a warning message that says "No Join Predicate".

{Edit} Heh... beat out by 2 minutes.


Hey Jeff, I know that no join predicate basically turns it into a FULL OUTER JOIN, but why is the filter failing on its predicate upstream?

EDIT:

I want to clarify this question. To my understanding there's a few possibilities of how this plan actually results, but I'm not sure which is right and none of them show the logic that my conclusions reach.

First, we go out to a table of 3 and 6 rows, and turn it into a FULL OUTER JOIN query, returning that dataset back to the Compute Scalar request. The 18 rows in the actual confirm this as a reasonable pathing.

Now, the Compute Scalar occurs for each row (non-deterministic). So, one of two things happen. Either each row gets the exact same NEWID() (which isn't true) or they all get different NEWIDs. In either case, the result is the same... a value between 1 and 6.

The predicate filter is... Oh god bless rubber ducky coding... of course.

Because each of these rows are getting unique ABS(Checksum results, and they are then compared INTERNALLY to the row, instead of between the rowsets... *headsmack*... of course.

Answer: The Join is evaluated long after any join is involved, and the newID result is checked on each row uniquely. Whatever comes out of there is just luck that they matched up.



- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Post #1434103
Posted Thursday, March 21, 2013 7:48 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, December 4, 2013 12:24 AM
Points: 10, Visits: 113
Thanks for help.
Post #1434112
Posted Friday, March 22, 2013 4:43 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 9:52 AM
Points: 7,107, Visits: 13,462
It's down to an optimisation introduced in SQL Server2005 often called "Deferred expression evaluation". Here's a great article by Paul White to add to others already posted.

The original "failing" query can be modified so that the calculation is written as a row-level CROSS APPLY:

WITH bbb AS (
SELECT TOP 3 x.RandomRow
FROM AAA
CROSS APPLY (SELECT RandomRow = ABS(checksum(newid())) % 6 + 1) x
ORDER BY RowNumber)
SELECT a.Col1, bbb.RandomRow
FROM AAA AS a
JOIN bbb
ON a.RowNumber = bbb.RandomRow;

The execution plan is identical to Query No 1:



Changing the CROSS APPLY to an OUTER APPLY
WITH bbb AS (
SELECT TOP 3 x.RandomRow
FROM AAA
OUTER APPLY (SELECT RandomRow = ABS(checksum(newid())) % 6 + 1) x
ORDER BY RowNumber)
SELECT a.Col1, bbb.RandomRow
FROM AAA AS a
JOIN bbb
ON a.RowNumber = bbb.RandomRow;

is sufficient to prevent the optimiser from implementing deferred expression evaluation and it generates a completely different plan where RandomRow is evaluated prior to the join between the CTE and the table:



With RandomRow evaluated prior to the join, it's now available to use as a join predicate and the query returns the expected results.




“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #1434192
Posted Saturday, March 23, 2013 7:29 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 2:50 PM
Points: 36,711, Visits: 31,159
Evil Kraig F (3/21/2013)
Jeff Moden (3/21/2013)
You're joining to an indeterminent value and at execution plan time, it can't determine what to join on and so it ignores the predicate. Look at the execution plan for the "Nested Loops (Inner Join)" symbol. Notice the little yellow alert symbol on it. If you right click on the symbol and bring up "Properties", you'll see a warning message that says "No Join Predicate".

{Edit} Heh... beat out by 2 minutes.


Hey Jeff, I know that no join predicate basically turns it into a FULL OUTER JOIN, but why is the filter failing on its predicate upstream?

EDIT:

I want to clarify this question. To my understanding there's a few possibilities of how this plan actually results, but I'm not sure which is right and none of them show the logic that my conclusions reach.

First, we go out to a table of 3 and 6 rows, and turn it into a FULL OUTER JOIN query, returning that dataset back to the Compute Scalar request. The 18 rows in the actual confirm this as a reasonable pathing.

Now, the Compute Scalar occurs for each row (non-deterministic). So, one of two things happen. Either each row gets the exact same NEWID() (which isn't true) or they all get different NEWIDs. In either case, the result is the same... a value between 1 and 6.

The predicate filter is... Oh god bless rubber ducky coding... of course.

Because each of these rows are getting unique ABS(Checksum results, and they are then compared INTERNALLY to the row, instead of between the rowsets... *headsmack*... of course.

Answer: The Join is evaluated long after any join is involved, and the newID result is checked on each row uniquely. Whatever comes out of there is just luck that they matched up.


Correct. The only difference is that a lack of a predicate doesn't make a FULL OUTER JOIN. It makes a CROSS JOIN.

Heh... that's one of my favoritie interview questions, BTW.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1434644
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse