Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««23456»»»

Understanding and Using APPLY (Part 2) Expand / Collapse
Author
Message
Posted Monday, January 30, 2012 9:19 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, February 3, 2012 2:47 PM
Points: 11, Visits: 21
Hmmm.... replacing Having with Top 1...order by causes it to return Nulls for the max() for some rows -- not what I want. Here are the execution plans (couldn't see how to attach a document -- did I miss it?):

Join:

|--Merge Join(Inner Join, MANY-TO-MANY MERGE[nco].[STATUS_ID])=([S].[STATUS_ID]), RESIDUAL[OR_NCCORE].[dbo].[ORDER_DIM].[STATUS_ID] as [nco].[STATUS_ID]=[OR_NCCORE].[dbo].[STATUS_DIM].[STATUS_ID] as [S].[STATUS_ID]))
|--Sort(ORDER BY[nco].[STATUS_ID] ASC))
| |--Hash Match(Inner Join, HASH[OR_NCCORE].[dbo].[ORDER_ITEM_DIM].[ORDER_ID])=([nco].[ORDER_ID]), RESIDUAL[OR_NCCORE].[dbo].[ORDER_DIM].[ORDER_ID] as [nco].[ORDER_ID]=[OR_NCCORE].[dbo].[ORDER_ITEM_DIM].[ORDER_ID]))
| |--Filter(WHERE[Expr1010]=(1)))
| | |--Sequence Project(DEFINE[Expr1010]=row_number))
| | |--Segment
| | |--Sort(ORDER BY[OR_NCCORE].[dbo].[ORDER_ITEM_DIM].[ORDER_ID] ASC, [OR_NCCORE].[dbo].[ORDER_ITEM_DIM].[ORDER_ITEM_ID] ASC))
| | |--Table Scan(OBJECT[OR_NCCORE].[dbo].[ORDER_ITEM_DIM]))
| |--Table Scan(OBJECT[OR_NCCORE].[dbo].[ORDER_DIM] AS [nco]))
|--Sort(ORDER BY[S].[STATUS_ID] ASC))
|--Table Scan(OBJECT[OR_NCCORE].[dbo].[STATUS_DIM] AS [S]))

Cross Apply:

|--Nested Loops(Inner Join, OUTER REFERENCES[nco].[ORDER_ID]))
|--Hash Match(Inner Join, HASH[S].[STATUS_ID])=([nco].[STATUS_ID]), RESIDUAL[OR_NCCORE].[dbo].[STATUS_DIM].[STATUS_ID] as [S].[STATUS_ID]=[OR_NCCORE].[dbo].[ORDER_DIM].[STATUS_ID] as [nco].[STATUS_ID]))
| |--Table Scan(OBJECT[OR_NCCORE].[dbo].[STATUS_DIM] AS [S]))
| |--Table Scan(OBJECT[OR_NCCORE].[dbo].[ORDER_DIM] AS [nco]))
|--Filter(WHERE[Expr1010] IS NOT NULL))
|--Stream Aggregate(DEFINE[Expr1010]=MAX([OR_NCCORE].[dbo].[ORDER_ITEM_DIM].[ORDER_ITEM_ID])))
|--Index Spool(SEEK[OR_NCCORE].[dbo].[ORDER_ITEM_DIM].[ORDER_ID]=[OR_NCCORE].[dbo].[ORDER_DIM].[ORDER_ID] as [nco].[ORDER_ID]))
|--Index Scan(OBJECT[OR_NCCORE].[dbo].[ORDER_ITEM_DIM].[IX_ORDER_ITEM_ID_with_Includes]))
Post #1243712
Posted Monday, January 30, 2012 7:14 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 10:58 AM
Points: 11,192, Visits: 11,097
b5535084 (1/30/2012)
Hmmm.... replacing Having with Top 1...order by causes it to return Nulls for the max() for some rows -- not what I want. Here are the execution plans (couldn't see how to attach a document -- did I miss it?)

Yes, when you are adding a reply, there is an 'Edit Attachments' button in the Post Options section. If it is convenient, please also give the CREATE TABLE and CREATE INDEX statements for the tables involved. The more information you can provide, the more comprehensive an answer I'll be able to give you.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1244010
Posted Tuesday, January 31, 2012 7:33 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, February 3, 2012 2:47 PM
Points: 11, Visits: 21
OK , I've attached the two execution plans. Recall that I do a similar query two ways: 1 with join and one with cross apply. The produce identical results. I'm trying to understand why the cross apply version is more expensive and if the latter can be rewritten to do better or if join is the best choice for this application.

with join:

select nco.ORDER_ID, nci.ORDER_ITEM_ID
from OR_NCCore.dbo.Order_Dim nco
INNER JOIN or_nccore.dbo.STATUS_DIM S
ON nco.STATUS_ID = S.STATUS_ID

inner join (
select *, rn=row_number()
over (partition by order_id order by order_item_id asc)
from or_nccore.dbo.ORDER_ITEM_DIM
) nci
on nco.order_id=nci.order_id and rn=1

Stats:

(70108 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ORDER_DIM'. Scan count 1, logical reads 4128, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ORDER_ITEM_DIM'. Scan count 1, logical reads 1417, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'STATUS_DIM'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 391 ms, elapsed time = 1075 ms

with cross apply:

select nco.ORDER_ID, nci.ORDER_ITEM_ID
From OR_NCCore.dbo.Order_Dim nco
INNER JOIN or_nccore.dbo.STATUS_DIM S
ON nco.STATUS_ID = S.STATUS_ID

cross apply (
select top 1 *, rn=row_number()
over (partition by order_id order by order_item_id asc)
from or_nccore.dbo.ORDER_ITEM_DIM nci
where nco.order_id=nci.order_id
) nci

stats:

(70108 row(s) affected)
Table 'Worktable'. Scan count 70147, logical reads 448162, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ORDER_ITEM_DIM'. Scan count 1, logical reads 1417, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ORDER_DIM'. Scan count 1, logical reads 4128, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'STATUS_DIM'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 828 ms, elapsed time = 1080 ms.


  Post Attachments 
planjoin.sqlplan (4 views, 14.98 KB)
planapply.sqlplan (2 views, 10.61 KB)
Post #1244378
Posted Tuesday, January 31, 2012 7:58 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 10:58 AM
Points: 11,192, Visits: 11,097
If those two queries produce identical results, it is just luck. Your original queries had (rn = 1) ordered by order_item_id ascending for the JOIN option (which would give the row with the lowest order_item_id). The APPLY form used MAX(order_item_id) which will produce the highest order_item_id of course. Your new query just uses TOP (1) without an ORDER BY - so which row you get is down to chance. The ROW_NUMBER in the apply does nothing, by the way. To answer your implied question about the worktable reads - these are related to the on-the-fly index SQL Server is creating for you each time the query runs (the Eager Index Spool in the query plan). This is an indication that you are missing a real index that would be helpful to the APPLY query. Based on the information provided, this is how I would write the APPLY query, and includes the definition of the index you should create to get best performance from it:

SELECT
nco.ORDER_ID,
nci.ORDER_ITEM_ID
FROM OR_NCCore.dbo.Order_Dim AS nco
JOIN OR_NCCore.dbo.Status_Dim AS s
ON nco.STATUS_ID = S.STATUS_ID
CROSS APPLY
(
SELECT TOP (1)
nci.ORDER_ITEM_ID
FROM OR_NCCore.dbo.ORDER_ITEM_DIM AS nci
WHERE
nci.ORDER_ID = nco.ORDER_ID
ORDER BY
nci.ORDER_ITEM_ID ASC
) AS nci;

-- Optimal index for this query:
CREATE INDEX x
ON OR_NCCore.dbo.ORDER_ITEM_DIM
(ORDER_ID, ORDER_ITEM_ID);





Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1244410
Posted Tuesday, January 31, 2012 8:33 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Friday, July 25, 2014 3:18 AM
Points: 323, Visits: 2,200
I took a look at the plans and what confused me is the apparent lack of clustered indexes (i noticed table scans instead of clustered index scans)!

Could it be that these are heaps with nonclustered indexes applied to them?

If so, that would create lots of perfomance issues and limit the optimizer.
Each table, except for very few exceptions should have a primarky key, and it should be clustered.
Post #1244446
Posted Tuesday, January 31, 2012 8:40 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 10:58 AM
Points: 11,192, Visits: 11,097
peter-757102 (1/31/2012)
I took a look at the plans and what confused me is the apparent lack of clustered indexes (i noticed table scans instead of clustered index scans)!

The table names made me think these are fact and dimension tables in a data warehouse. It is not uncommon to use heaps in that scenario.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1244458
Posted Tuesday, January 31, 2012 8:47 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Friday, July 25, 2014 3:18 AM
Points: 323, Visits: 2,200
SQL Kiwi (1/31/2012)
peter-757102 (1/31/2012)
I took a look at the plans and what confused me is the apparent lack of clustered indexes (i noticed table scans instead of clustered index scans)!

The table names made me think these are fact and dimension tables in a data warehouse. It is not uncommon to use heaps in that scenario.


Can you explain (or have a link that does a good job at it), why one would want to go that route over clustered storage in that scenario?

The one thing that comes to my mind is reduces storage for simple entities, but when you start to add indexes and query it like it is relational, that small 'advantage' evaporates real fast.
Post #1244470
Posted Tuesday, January 31, 2012 9:26 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, February 3, 2012 2:47 PM
Points: 11, Visits: 21
Thanks for the insights! I have a question about your comment about the rn doing nothing though. My hope was that it would force the windowing so that the "top 1" would be in the window. Or, must I reference rn directly to make that happen?

Post #1244511
Posted Tuesday, January 31, 2012 9:31 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 10:58 AM
Points: 11,192, Visits: 11,097
b5535084 (1/31/2012)
Thanks for the insights! I have a question about your comment about the rn doing nothing though. My hope was that it would force the windowing so that the "top 1" would be in the window. Or, must I reference rn directly to make that happen?

The optimizer removes it entirely because the result of the ROW_NUMBER isn't used for anything. Even if you did reference it, the ORDER BY in the ROW_NUMBER only defines how rows are ordered for numbering. In principle, the TOP could still return rows in some other order - the point is there is no guarantee, so you should not rely on any apparently-reproducible observed behaviour.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1244514
Posted Tuesday, January 31, 2012 9:32 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 10:58 AM
Points: 11,192, Visits: 11,097
peter-757102 (1/31/2012)
Can you explain (or have a link that does a good job at it), why one would want to go that route over clustered storage in that scenario?

Not offhand, no. It's not something I have looked at for years - but I am aware that some DW methodologies advocate heaps. It may have something to do with requiring sorted order for no apparent benefit, but we're on the ragged edge of my recollection here.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1244516
« Prev Topic | Next Topic »

Add to briefcase «««23456»»»

Permissions Expand / Collapse