• Solomon Rutzky (12/29/2010)


    Adam Machanic (12/29/2010)


    Have you compared the performance of doing a median calculation w/compression against performing the same calculation using Joe Celko's ROW_NUMBER technique?

    http://sqlblog.com/blogs/adam_machanic/archive/2006/12/18/medians-row-numbers-and-performance.aspx

    Compression is far from cheap, and I suspect that the ROW_NUMBER solution will scale much better. But that's just a suspicion and I don't have time to test myself at the moment.

    Hi Adam. Thanks for mentioning that article as I had not seen it. I did just test this against the AdventureWorks2008 DB using SQL Server 2008 SP2 using the Celko method, Ben-Gan method, and my CLR UDA. The CLR method has the same number of reads as the Celko method but has slightly higher CPU and Elapsed times as compared to both Celko and Ben-Gan methods.

    So yes, the pure T-SQL methods do appear to be slightly more efficient. However, for the record I did not say that compression was cheap. In fact, I did say that it would be slightly less efficient to do that but that it might prove to be a worthwhile trade-off depending on the circumstances.

    My testing is as follows (the first two queries are copied directly from your blog that you noted above):

    ------------------------------------------------------------------------

    DBCC DROPCLEANBUFFERS

    DBCC FREEPROCCACHE

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT

    CustomerId,

    AVG(TotalDue)

    FROM

    (

    SELECT

    CustomerId,

    TotalDue,

    ROW_NUMBER() OVER (

    PARTITION BY CustomerId

    ORDER BY TotalDue ASC, SalesOrderId ASC) AS RowAsc,

    ROW_NUMBER() OVER (

    PARTITION BY CustomerId

    ORDER BY TotalDue DESC, SalesOrderId DESC) AS RowDesc

    FROM Sales.SalesOrderHeader SOH

    ) x

    WHERE

    RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)

    GROUP BY CustomerId

    ORDER BY CustomerId;

    SET STATISTICS IO OFF

    SET STATISTICS TIME OFF

    -- logical reads 686

    -- CPU time = 47 ms, elapsed time = 1087 ms. (when DROPCLEANBUFFERS and FREEPROCCACHE are run)

    -- CPU time = 62 ms, elapsed time = 235 ms. (when cached)

    ------------------------------------------------------------------------

    DBCC DROPCLEANBUFFERS

    DBCC FREEPROCCACHE

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT

    CustomerId,

    AVG(TotalDue)

    FROM

    (

    SELECT

    CustomerId,

    TotalDue,

    ROW_NUMBER() OVER (

    PARTITION BY CustomerId

    ORDER BY TotalDue) AS RowNum,

    COUNT(*) OVER (

    PARTITION BY CustomerId) AS RowCnt

    FROM Sales.SalesOrderHeader

    ) x

    WHERE

    RowNum IN ((RowCnt + 1) / 2, (RowCnt + 2) / 2)

    GROUP BY CustomerId

    ORDER BY CustomerId;

    SET STATISTICS IO OFF

    SET STATISTICS TIME OFF

    -- logical reads 686 + 139407 for Worktable = 140093 total

    -- CPU time = 344 ms, elapsed time = 1085 ms. (when DROPCLEANBUFFERS and FREEPROCCACHE are run)

    -- CPU time = 328 ms, elapsed time = 374 ms. (when cached)

    ------------------------------------------------------------------------

    DBCC DROPCLEANBUFFERS

    DBCC FREEPROCCACHE

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECTCustomerId,

    SQL#.SQL#.Agg_Median(TotalDue)

    FROM Sales.SalesOrderHeader SOH

    GROUP BY CustomerId

    ORDER BY CustomerId;

    SET STATISTICS IO OFF

    SET STATISTICS TIME OFF

    -- logical reads 686

    -- CPU time = 1812 ms, elapsed time = 2707 ms. (when DROPCLEANBUFFERS and FREEPROCCACHE are run)

    -- CPU time = 1906 ms, elapsed time = 1948 ms. (when cached)

    ------------------------------------------------------------------------

    I don't think the CELKO method works correctly in all circumstances, the problem occurs when there are duplicates and the ASC and DESC ROW_NUMBERs don't match up. The data below shows the problem, the result should be 15, the Ben-Gan is correct here.

    INSERT INTO SalesOrderHeader(CustomerId, TotalDue, SalesOrderId)

    SELECT 100,10,10 UNION ALL

    SELECT 100,10,10 UNION ALL

    SELECT 100,20,10 UNION ALL

    SELECT 100,30,10

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537