Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Understanding and Using APPLY (Part 2)


Understanding and Using APPLY (Part 2)

Author
Message
peter-757102
peter-757102
SSC-Addicted
SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)

Group: General Forum Members
Points: 459 Visits: 2549
b5535084 (1/31/2012)
Thanks for the insights! I have a question about your comment about the rn doing nothing though. My hope was that it would force the windowing so that the "top 1" would be in the window. Or, must I reference rn directly to make that happen?



I often wished for a ransomization (on/off) setting, that forces deliberately anti-ordered results and expression execution order to quickly identify hidden false assumptions in queries.
peter-757102
peter-757102
SSC-Addicted
SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)

Group: General Forum Members
Points: 459 Visits: 2549
SQL Kiwi (1/31/2012)
peter-757102 (1/31/2012)
Can you explain (or have a link that does a good job at it), why one would want to go that route over clustered storage in that scenario?

Not offhand, no. It's not something I have looked at for years - but I am aware that some DW methodologies advocate heaps. It may have something to do with requiring sorted order for no apparent benefit, but we're on the ragged edge of my recollection here.


I did check some on the net myself (long live Google)....and this is what I get from it:

1. Nonclustered indexes on heaps have direct "pointers" to their respective record data in the heap, avoiding a secondary scan trough a clustered index. On large tables that are heavily read and where records are never changed, this could be a big speed factor.

2. It is easier to append new data and use many different search patterns using multiple indexes (a clustered index is only optimal on one, the rest becomes expensive).
It might add flexibility in segregating inactive and active data too i think. Keeping the active set small, makes things fast ofcourse.
Some testing on my side is needed to figure out when to use and when not too and what problems I encounter.

Currenly all cubes I worked with are relativly easy managable in size, around 110 GB for the biggest, but with very limited access pattens, so clustering worked fine there. In other scenarios with much smaller datasets I also use clustereing, supported with one index in some cases and that worked fine too. But as usual, as the dataset grows, issues and wrong choices become more aparent.
b5535084
b5535084
Grasshopper
Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)

Group: General Forum Members
Points: 13 Visits: 21
OK -- I get that now. So I added:

where rn=1 (so it would be included)

and removed the top 1, so I now looks like:

select nco.*, nci.ORDER_ITEM_ID o2
From OR_NCCore.dbo.Order_Dim nco
INNER JOIN or_nccore.dbo.STATUS_DIM S
ON nco.STATUS_ID = S.STATUS_ID

cross apply (
select *,rn=row_number()
over (partition by order_id order by order_item_id asc)
from or_nccore.dbo.ORDER_ITEM_DIM nci
where nco.order_id=nci.order_id
) nci
where rn=1

The inner join still beats it by a wide margin on I/O, even after I added the index you suggested (thanks for that!) It's also a pity that I can't put the where clause inside the subquery for the cross apply, but such is life!
Paul White
Paul White
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10352 Visits: 11350
b5535084 (1/31/2012)
OK -- I get that now. So I added:

where rn=1 (so it would be included)

and removed the top 1, so I now looks like:

select nco.*, nci.ORDER_ITEM_ID o2
From OR_NCCore.dbo.Order_Dim nco
INNER JOIN or_nccore.dbo.STATUS_DIM S
ON nco.STATUS_ID = S.STATUS_ID

cross apply (
select *,rn=row_number()
over (partition by order_id order by order_item_id asc)
from or_nccore.dbo.ORDER_ITEM_DIM nci
where nco.order_id=nci.order_id
) nci
where rn=1

The inner join still beats it by a wide margin on I/O, even after I added the index you suggested (thanks for that!) It's also a pity that I can't put the where clause inside the subquery for the cross apply, but such is life!

I'm confused. Why can't you use:


SELECT
nco.*,
nci.ORDER_ITEM_ID AS o2
FROM OR_NCCore.dbo.Order_Dim AS nco
JOIN OR_NCCore.dbo.Status_Dim AS s
ON nco.STATUS_ID = S.STATUS_ID
CROSS APPLY
(
SELECT TOP (1)
nci.ORDER_ITEM_ID
FROM OR_NCCore.dbo.ORDER_ITEM_DIM AS nci
WHERE
nci.ORDER_ID = nco.ORDER_ID
ORDER BY
nci.ORDER_ITEM_ID ASC
) AS nci;


...? I would expect this to perform well with the index. I'm struggling to understand the need for the ROW_NUMBER. Perhaps an execution plan and statistics results for my query above?



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Paul White
Paul White
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10352 Visits: 11350
peter-757102 (1/31/2012)
I did check some on the net myself (long live Google)....and this is what I get from it:

I guess another consideration might be what to use as the clustering key, how wide that would be (which would affect nc indexes too), and whether having to sort new data in order of the cluster would be expensive or not.



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Paul White
Paul White
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10352 Visits: 11350
peter-757102 (1/31/2012)
I often wished for a ransomization (on/off) setting, that forces deliberately anti-ordered results and expression execution order to quickly identify hidden false assumptions in queries.

Yeah. I doubt we will ever see that though, given the implications for the internal design of the query executor. I have often wondered why TOP is allowed without a scoped ORDER BY though. Probably a backwards-compatibility issue; the dumb stuff usually is :-)



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
b5535084
b5535084
Grasshopper
Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)

Group: General Forum Members
Points: 13 Visits: 21
I can use that, for sure, but it changes nothing really. Similar I/O counts and CPU time actually goes up. I'm attaching the plans for the two versions:

cross apply 1:

select * 
From OR_NCCore.dbo.Order_Dim nco
INNER JOIN or_nccore.dbo.STATUS_DIM S
ON nco.STATUS_ID = S.STATUS_ID

cross apply (
select *, rn=row_number()
over (partition by order_id order by order_item_id asc)
from or_nccore.dbo.ORDER_ITEM_DIM nci
where nco.order_id=nci.order_id
) nci
where rn=1



stats:

[code="plain"](70108 row(s) affected)
Table 'ORDER_ITEM_DIM'. Scan count 70147, logical reads 303216, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ORDER_DIM'. Scan count 1, logical reads 4128, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'STATUS_DIM'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 2000 ms, elapsed time = 15948 ms.[/code
]



Cross apply 2:

select * 
From OR_NCCore.dbo.Order_Dim nco
INNER JOIN or_nccore.dbo.STATUS_DIM S
ON nco.STATUS_ID = S.STATUS_ID

cross apply (
select top 1 order_item_id
from or_nccore.dbo.ORDER_ITEM_DIM nci
where nco.order_id=nci.order_id
order by ORDER_ITEM_ID asc
) nci



Stats:

(70108 row(s) affected)
Table 'ORDER_ITEM_DIM'. Scan count 70147, logical reads 223933, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ORDER_DIM'. Scan count 1, logical reads 4128, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'STATUS_DIM'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 1281 ms, elapsed time = 7777 ms.

Attachments
planapply1.sqlplan (1 view, 83.00 KB)
planapply2.sqlplan (1 view, 43.00 KB)
Paul White
Paul White
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10352 Visits: 11350
b5535084 (1/31/2012)
I can use that, for sure, but it changes nothing really. Similar I/O counts and CPU time actually goes up. I'm attaching the plans for the two versions...

Er, from those results:

TOP: CPU time = 1281 ms, elapsed time = 7777 ms, logical reads 223933 (ORDER_ITEM_DIM)
ROW_NUMBER: CPU time = 2000 ms, elapsed time = 15948 ms, logical reads 303216 (ORDER_ITEM_DIM)

TOP is better by every measure. The TOP plan looks better too: there's a (pointless) RID lookup in the ROW_NUMBER plan.



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
b5535084
b5535084
Grasshopper
Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)

Group: General Forum Members
Points: 13 Visits: 21
TOP: CPU time = 1281 ms, elapsed time = 7777 ms, logical reads 223933 (ORDER_ITEM_DIM)
ROW_NUMBER: CPU time = 2000 ms, elapsed time = 15948 ms, logical reads 303216 (ORDER_ITEM_DIM)

TOP is better by every measure. The TOP plan looks better too: there's a (pointless) RID lookup in the ROW_NUMBER plan.


Better? Well OK, a little, but not by an order of magnitude or anything significant, really. Anyway, the original join beats both by miles in I/O. Which surprises me. OTOH It points out the values of digging down into the measures like this.

Good comment about the RID lookup. Something to watch out for...
Paul White
Paul White
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10352 Visits: 11350
b5535084 (1/31/2012)
TOP: CPU time = 1281 ms, elapsed time = 7777 ms, logical reads 223933 (ORDER_ITEM_DIM)
ROW_NUMBER: CPU time = 2000 ms, elapsed time = 15948 ms, logical reads 303216 (ORDER_ITEM_DIM)

TOP is better by every measure. The TOP plan looks better too: there's a (pointless) RID lookup in the ROW_NUMBER plan.

Better? Well OK, a little, but not by an order of magnitude or anything significant, really.

Well I'd say half the elapsed time was pretty significant ;-)

Anyway, the original join beats both by miles in I/O. Which surprises me.

There is a reason, but anyway I think we've gone as far with this as is sensible on this thread. Thanks for commenting.



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search