Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««23456»»

Understanding and Using APPLY (Part 2) Expand / Collapse
Author
Message
Posted Tuesday, January 31, 2012 9:35 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Friday, July 25, 2014 3:18 AM
Points: 323, Visits: 2,200
b5535084 (1/31/2012)
Thanks for the insights! I have a question about your comment about the rn doing nothing though. My hope was that it would force the windowing so that the "top 1" would be in the window. Or, must I reference rn directly to make that happen?



I often wished for a ransomization (on/off) setting, that forces deliberately anti-ordered results and expression execution order to quickly identify hidden false assumptions in queries.
Post #1244520
Posted Tuesday, January 31, 2012 9:48 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Friday, July 25, 2014 3:18 AM
Points: 323, Visits: 2,200
SQL Kiwi (1/31/2012)
peter-757102 (1/31/2012)
Can you explain (or have a link that does a good job at it), why one would want to go that route over clustered storage in that scenario?

Not offhand, no. It's not something I have looked at for years - but I am aware that some DW methodologies advocate heaps. It may have something to do with requiring sorted order for no apparent benefit, but we're on the ragged edge of my recollection here.


I did check some on the net myself (long live Google)....and this is what I get from it:

1. Nonclustered indexes on heaps have direct "pointers" to their respective record data in the heap, avoiding a secondary scan trough a clustered index. On large tables that are heavily read and where records are never changed, this could be a big speed factor.

2. It is easier to append new data and use many different search patterns using multiple indexes (a clustered index is only optimal on one, the rest becomes expensive).
It might add flexibility in segregating inactive and active data too i think. Keeping the active set small, makes things fast ofcourse.
Some testing on my side is needed to figure out when to use and when not too and what problems I encounter.

Currenly all cubes I worked with are relativly easy managable in size, around 110 GB for the biggest, but with very limited access pattens, so clustering worked fine there. In other scenarios with much smaller datasets I also use clustereing, supported with one index in some cases and that worked fine too. But as usual, as the dataset grows, issues and wrong choices become more aparent.
Post #1244536
Posted Tuesday, January 31, 2012 9:52 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, February 3, 2012 2:47 PM
Points: 11, Visits: 21
OK -- I get that now. So I added:

where rn=1 (so it would be included)

and removed the top 1, so I now looks like:

select nco.*, nci.ORDER_ITEM_ID o2
From OR_NCCore.dbo.Order_Dim nco
INNER JOIN or_nccore.dbo.STATUS_DIM S
ON nco.STATUS_ID = S.STATUS_ID

cross apply (
select *,rn=row_number()
over (partition by order_id order by order_item_id asc)
from or_nccore.dbo.ORDER_ITEM_DIM nci
where nco.order_id=nci.order_id
) nci
where rn=1

The inner join still beats it by a wide margin on I/O, even after I added the index you suggested (thanks for that!) It's also a pity that I can't put the where clause inside the subquery for the cross apply, but such is life!
Post #1244545
Posted Tuesday, January 31, 2012 10:00 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 1:55 AM
Points: 11,192, Visits: 11,095
b5535084 (1/31/2012)
OK -- I get that now. So I added:

where rn=1 (so it would be included)

and removed the top 1, so I now looks like:

select nco.*, nci.ORDER_ITEM_ID o2
From OR_NCCore.dbo.Order_Dim nco
INNER JOIN or_nccore.dbo.STATUS_DIM S
ON nco.STATUS_ID = S.STATUS_ID

cross apply (
select *,rn=row_number()
over (partition by order_id order by order_item_id asc)
from or_nccore.dbo.ORDER_ITEM_DIM nci
where nco.order_id=nci.order_id
) nci
where rn=1

The inner join still beats it by a wide margin on I/O, even after I added the index you suggested (thanks for that!) It's also a pity that I can't put the where clause inside the subquery for the cross apply, but such is life!

I'm confused. Why can't you use:

SELECT
nco.*,
nci.ORDER_ITEM_ID AS o2
FROM OR_NCCore.dbo.Order_Dim AS nco
JOIN OR_NCCore.dbo.Status_Dim AS s
ON nco.STATUS_ID = S.STATUS_ID
CROSS APPLY
(
SELECT TOP (1)
nci.ORDER_ITEM_ID
FROM OR_NCCore.dbo.ORDER_ITEM_DIM AS nci
WHERE
nci.ORDER_ID = nco.ORDER_ID
ORDER BY
nci.ORDER_ITEM_ID ASC
) AS nci;

...? I would expect this to perform well with the index. I'm struggling to understand the need for the ROW_NUMBER. Perhaps an execution plan and statistics results for my query above?




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1244554
Posted Tuesday, January 31, 2012 10:02 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 1:55 AM
Points: 11,192, Visits: 11,095
peter-757102 (1/31/2012)
I did check some on the net myself (long live Google)....and this is what I get from it:

I guess another consideration might be what to use as the clustering key, how wide that would be (which would affect nc indexes too), and whether having to sort new data in order of the cluster would be expensive or not.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1244558
Posted Tuesday, January 31, 2012 10:03 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 1:55 AM
Points: 11,192, Visits: 11,095
peter-757102 (1/31/2012)
I often wished for a ransomization (on/off) setting, that forces deliberately anti-ordered results and expression execution order to quickly identify hidden false assumptions in queries.

Yeah. I doubt we will ever see that though, given the implications for the internal design of the query executor. I have often wondered why TOP is allowed without a scoped ORDER BY though. Probably a backwards-compatibility issue; the dumb stuff usually is




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1244561
Posted Tuesday, January 31, 2012 10:24 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, February 3, 2012 2:47 PM
Points: 11, Visits: 21
I can use that, for sure, but it changes nothing really. Similar I/O counts and CPU time actually goes up. I'm attaching the plans for the two versions:

cross apply 1:

select * 
From OR_NCCore.dbo.Order_Dim nco
INNER JOIN or_nccore.dbo.STATUS_DIM S
ON nco.STATUS_ID = S.STATUS_ID

cross apply (
select *, rn=row_number()
over (partition by order_id order by order_item_id asc)
from or_nccore.dbo.ORDER_ITEM_DIM nci
where nco.order_id=nci.order_id
) nci
where rn=1

stats:

[code="plain"](70108 row(s) affected)
Table 'ORDER_ITEM_DIM'. Scan count 70147, logical reads 303216, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ORDER_DIM'. Scan count 1, logical reads 4128, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'STATUS_DIM'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 2000 ms, elapsed time = 15948 ms.[/code
]



Cross apply 2:

select * 
From OR_NCCore.dbo.Order_Dim nco
INNER JOIN or_nccore.dbo.STATUS_DIM S
ON nco.STATUS_ID = S.STATUS_ID

cross apply (
select top 1 order_item_id
from or_nccore.dbo.ORDER_ITEM_DIM nci
where nco.order_id=nci.order_id
order by ORDER_ITEM_ID asc
) nci

Stats:

(70108 row(s) affected)
Table 'ORDER_ITEM_DIM'. Scan count 70147, logical reads 223933, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ORDER_DIM'. Scan count 1, logical reads 4128, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'STATUS_DIM'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 1281 ms, elapsed time = 7777 ms.


  Post Attachments 
planapply1.sqlplan (1 view, 83.19 KB)
planapply2.sqlplan (1 view, 43.58 KB)
Post #1244579
Posted Tuesday, January 31, 2012 11:01 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 1:55 AM
Points: 11,192, Visits: 11,095
b5535084 (1/31/2012)
I can use that, for sure, but it changes nothing really. Similar I/O counts and CPU time actually goes up. I'm attaching the plans for the two versions...

Er, from those results:

TOP: CPU time = 1281 ms, elapsed time = 7777 ms, logical reads 223933 (ORDER_ITEM_DIM)
ROW_NUMBER: CPU time = 2000 ms, elapsed time = 15948 ms, logical reads 303216 (ORDER_ITEM_DIM)

TOP is better by every measure. The TOP plan looks better too: there's a (pointless) RID lookup in the ROW_NUMBER plan.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1244614
Posted Tuesday, January 31, 2012 11:37 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, February 3, 2012 2:47 PM
Points: 11, Visits: 21
TOP: CPU time = 1281 ms, elapsed time = 7777 ms, logical reads 223933 (ORDER_ITEM_DIM)
ROW_NUMBER: CPU time = 2000 ms, elapsed time = 15948 ms, logical reads 303216 (ORDER_ITEM_DIM)

TOP is better by every measure. The TOP plan looks better too: there's a (pointless) RID lookup in the ROW_NUMBER plan.


Better? Well OK, a little, but not by an order of magnitude or anything significant, really. Anyway, the original join beats both by miles in I/O. Which surprises me. OTOH It points out the values of digging down into the measures like this.

Good comment about the RID lookup. Something to watch out for...
Post #1244643
Posted Tuesday, January 31, 2012 11:47 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 1:55 AM
Points: 11,192, Visits: 11,095
b5535084 (1/31/2012)
TOP: CPU time = 1281 ms, elapsed time = 7777 ms, logical reads 223933 (ORDER_ITEM_DIM)
ROW_NUMBER: CPU time = 2000 ms, elapsed time = 15948 ms, logical reads 303216 (ORDER_ITEM_DIM)

TOP is better by every measure. The TOP plan looks better too: there's a (pointless) RID lookup in the ROW_NUMBER plan.

Better? Well OK, a little, but not by an order of magnitude or anything significant, really.

Well I'd say half the elapsed time was pretty significant

Anyway, the original join beats both by miles in I/O. Which surprises me.

There is a reason, but anyway I think we've gone as far with this as is sensible on this thread. Thanks for commenting.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1244654
« Prev Topic | Next Topic »

Add to briefcase «««23456»»

Permissions Expand / Collapse