Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Understanding and Using APPLY (Part 2)


Understanding and Using APPLY (Part 2)

Author
Message
b5535084
b5535084
Grasshopper
Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)

Group: General Forum Members
Points: 13 Visits: 21
Hmmm.... replacing Having with Top 1...order by causes it to return Nulls for the max() for some rows -- not what I want. Here are the execution plans (couldn't see how to attach a document -- did I miss it?):

Join:

|--Merge Join(Inner Join, MANY-TO-MANY MERGESad[nco].[STATUS_ID])=([S].[STATUS_ID]), RESIDUALSad[OR_NCCORE].[dbo].[ORDER_DIM].[STATUS_ID] as [nco].[STATUS_ID]=[OR_NCCORE].[dbo].[STATUS_DIM].[STATUS_ID] as [S].[STATUS_ID]))
|--Sort(ORDER BYSad[nco].[STATUS_ID] ASC))
| |--Hash Match(Inner Join, HASHSad[OR_NCCORE].[dbo].[ORDER_ITEM_DIM].[ORDER_ID])=([nco].[ORDER_ID]), RESIDUALSad[OR_NCCORE].[dbo].[ORDER_DIM].[ORDER_ID] as [nco].[ORDER_ID]=[OR_NCCORE].[dbo].[ORDER_ITEM_DIM].[ORDER_ID]))
| |--Filter(WHERESad[Expr1010]=(1)))
| | |--Sequence Project(DEFINESad[Expr1010]=row_number))
| | |--Segment
| | |--Sort(ORDER BYSad[OR_NCCORE].[dbo].[ORDER_ITEM_DIM].[ORDER_ID] ASC, [OR_NCCORE].[dbo].[ORDER_ITEM_DIM].[ORDER_ITEM_ID] ASC))
| | |--Table Scan(OBJECTSad[OR_NCCORE].[dbo].[ORDER_ITEM_DIM]))
| |--Table Scan(OBJECTSad[OR_NCCORE].[dbo].[ORDER_DIM] AS [nco]))
|--Sort(ORDER BYSad[S].[STATUS_ID] ASC))
|--Table Scan(OBJECTSad[OR_NCCORE].[dbo].[STATUS_DIM] AS [S]))

Cross Apply:

|--Nested Loops(Inner Join, OUTER REFERENCESSad[nco].[ORDER_ID]))
|--Hash Match(Inner Join, HASHSad[S].[STATUS_ID])=([nco].[STATUS_ID]), RESIDUALSad[OR_NCCORE].[dbo].[STATUS_DIM].[STATUS_ID] as [S].[STATUS_ID]=[OR_NCCORE].[dbo].[ORDER_DIM].[STATUS_ID] as [nco].[STATUS_ID]))
| |--Table Scan(OBJECTSad[OR_NCCORE].[dbo].[STATUS_DIM] AS [S]))
| |--Table Scan(OBJECTSad[OR_NCCORE].[dbo].[ORDER_DIM] AS [nco]))
|--Filter(WHERESad[Expr1010] IS NOT NULL))
|--Stream Aggregate(DEFINESad[Expr1010]=MAX([OR_NCCORE].[dbo].[ORDER_ITEM_DIM].[ORDER_ITEM_ID])))
|--Index Spool(SEEKSad[OR_NCCORE].[dbo].[ORDER_ITEM_DIM].[ORDER_ID]=[OR_NCCORE].[dbo].[ORDER_DIM].[ORDER_ID] as [nco].[ORDER_ID]))
|--Index Scan(OBJECTSad[OR_NCCORE].[dbo].[ORDER_ITEM_DIM].[IX_ORDER_ITEM_ID_with_Includes]))
Paul White
Paul White
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10350 Visits: 11350
b5535084 (1/30/2012)
Hmmm.... replacing Having with Top 1...order by causes it to return Nulls for the max() for some rows -- not what I want. Here are the execution plans (couldn't see how to attach a document -- did I miss it?)

Yes, when you are adding a reply, there is an 'Edit Attachments' button in the Post Options section. If it is convenient, please also give the CREATE TABLE and CREATE INDEX statements for the tables involved. The more information you can provide, the more comprehensive an answer I'll be able to give you.



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
b5535084
b5535084
Grasshopper
Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)

Group: General Forum Members
Points: 13 Visits: 21
OK , I've attached the two execution plans. Recall that I do a similar query two ways: 1 with join and one with cross apply. The produce identical results. I'm trying to understand why the cross apply version is more expensive and if the latter can be rewritten to do better or if join is the best choice for this application.

with join:

select nco.ORDER_ID, nci.ORDER_ITEM_ID
from OR_NCCore.dbo.Order_Dim nco
INNER JOIN or_nccore.dbo.STATUS_DIM S
ON nco.STATUS_ID = S.STATUS_ID

inner join (
select *, rn=row_number()
over (partition by order_id order by order_item_id asc)
from or_nccore.dbo.ORDER_ITEM_DIM
) nci
on nco.order_id=nci.order_id and rn=1

Stats:

(70108 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ORDER_DIM'. Scan count 1, logical reads 4128, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ORDER_ITEM_DIM'. Scan count 1, logical reads 1417, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'STATUS_DIM'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 391 ms, elapsed time = 1075 ms

with cross apply:

select nco.ORDER_ID, nci.ORDER_ITEM_ID
From OR_NCCore.dbo.Order_Dim nco
INNER JOIN or_nccore.dbo.STATUS_DIM S
ON nco.STATUS_ID = S.STATUS_ID

cross apply (
select top 1 *, rn=row_number()
over (partition by order_id order by order_item_id asc)
from or_nccore.dbo.ORDER_ITEM_DIM nci
where nco.order_id=nci.order_id
) nci

stats:

(70108 row(s) affected)
Table 'Worktable'. Scan count 70147, logical reads 448162, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ORDER_ITEM_DIM'. Scan count 1, logical reads 1417, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ORDER_DIM'. Scan count 1, logical reads 4128, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'STATUS_DIM'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 828 ms, elapsed time = 1080 ms.
Attachments
planjoin.sqlplan (4 views, 14.00 KB)
planapply.sqlplan (2 views, 10.00 KB)
Paul White
Paul White
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10350 Visits: 11350
If those two queries produce identical results, it is just luck. Your original queries had (rn = 1) ordered by order_item_id ascending for the JOIN option (which would give the row with the lowest order_item_id). The APPLY form used MAX(order_item_id) which will produce the highest order_item_id of course. Your new query just uses TOP (1) without an ORDER BY - so which row you get is down to chance. The ROW_NUMBER in the apply does nothing, by the way. To answer your implied question about the worktable reads - these are related to the on-the-fly index SQL Server is creating for you each time the query runs (the Eager Index Spool in the query plan). This is an indication that you are missing a real index that would be helpful to the APPLY query. Based on the information provided, this is how I would write the APPLY query, and includes the definition of the index you should create to get best performance from it:


SELECT
nco.ORDER_ID,
nci.ORDER_ITEM_ID
FROM OR_NCCore.dbo.Order_Dim AS nco
JOIN OR_NCCore.dbo.Status_Dim AS s
ON nco.STATUS_ID = S.STATUS_ID
CROSS APPLY
(
SELECT TOP (1)
nci.ORDER_ITEM_ID
FROM OR_NCCore.dbo.ORDER_ITEM_DIM AS nci
WHERE
nci.ORDER_ID = nco.ORDER_ID
ORDER BY
nci.ORDER_ITEM_ID ASC
) AS nci;

-- Optimal index for this query:
CREATE INDEX x
ON OR_NCCore.dbo.ORDER_ITEM_DIM
(ORDER_ID, ORDER_ITEM_ID);





Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
peter-757102
peter-757102
SSC-Addicted
SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)

Group: General Forum Members
Points: 459 Visits: 2549
I took a look at the plans and what confused me is the apparent lack of clustered indexes (i noticed table scans instead of clustered index scans)!

Could it be that these are heaps with nonclustered indexes applied to them?

If so, that would create lots of perfomance issues and limit the optimizer.
Each table, except for very few exceptions should have a primarky key, and it should be clustered.
Paul White
Paul White
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10350 Visits: 11350
peter-757102 (1/31/2012)
I took a look at the plans and what confused me is the apparent lack of clustered indexes (i noticed table scans instead of clustered index scans)!

The table names made me think these are fact and dimension tables in a data warehouse. It is not uncommon to use heaps in that scenario.



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
peter-757102
peter-757102
SSC-Addicted
SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)

Group: General Forum Members
Points: 459 Visits: 2549
SQL Kiwi (1/31/2012)
peter-757102 (1/31/2012)
I took a look at the plans and what confused me is the apparent lack of clustered indexes (i noticed table scans instead of clustered index scans)!

The table names made me think these are fact and dimension tables in a data warehouse. It is not uncommon to use heaps in that scenario.


Can you explain (or have a link that does a good job at it), why one would want to go that route over clustered storage in that scenario?

The one thing that comes to my mind is reduces storage for simple entities, but when you start to add indexes and query it like it is relational, that small 'advantage' evaporates real fast.
b5535084
b5535084
Grasshopper
Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)

Group: General Forum Members
Points: 13 Visits: 21
Thanks for the insights! I have a question about your comment about the rn doing nothing though. My hope was that it would force the windowing so that the "top 1" would be in the window. Or, must I reference rn directly to make that happen?
Paul White
Paul White
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10350 Visits: 11350
b5535084 (1/31/2012)
Thanks for the insights! I have a question about your comment about the rn doing nothing though. My hope was that it would force the windowing so that the "top 1" would be in the window. Or, must I reference rn directly to make that happen?

The optimizer removes it entirely because the result of the ROW_NUMBER isn't used for anything. Even if you did reference it, the ORDER BY in the ROW_NUMBER only defines how rows are ordered for numbering. In principle, the TOP could still return rows in some other order - the point is there is no guarantee, so you should not rely on any apparently-reproducible observed behaviour.



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Paul White
Paul White
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10350 Visits: 11350
peter-757102 (1/31/2012)
Can you explain (or have a link that does a good job at it), why one would want to go that route over clustered storage in that scenario?

Not offhand, no. It's not something I have looked at for years - but I am aware that some DW methodologies advocate heaps. It may have something to do with requiring sorted order for no apparent benefit, but we're on the ragged edge of my recollection here.



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search