Linking to the Previous Row

  • Can you demonstrate your technique for matching current rows with previous and next values is faster and more efficient as my best solution?

    See:

    CREATE TABLE [Sample]

    (

    seq_nbr INTEGER NOT NULL PRIMARY KEY

    );

    INSERT [Sample]

    VALUES (1), (4), (5), (8);

    /* Wanted Result

    seq_nbr seq_nbr seq_nbr

    ----------- ----------- -----------

    NULL 1 4

    1 4 5

    4 5 8

    5 8 NULL

    */

    --Mohammad Salimabadi Solution

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    ;WITH C AS

    (

    SELECT seq_nbr, k,

    DENSE_RANK() OVER(ORDER BY seq_nbr ASC) + k AS grp_fct

    FROM [Sample]

    CROSS JOIN

    (VALUES (-1), (0), (1)

    ) AS D(k)

    )

    SELECT MIN(seq_nbr) AS pre_value,

    MAX(CASE WHEN k = 0 THEN seq_nbr END) AS current_value,

    MAX(seq_nbr) AS next_value

    FROM C

    GROUP BY grp_fct

    HAVING min(seq_nbr) < max(seq_nbr);

    --Common Solution #1

    WITH C AS

    (SELECT seq_nbr,

    ROW_NUMBER() OVER(ORDER BY seq_nbr) AS rnk

    FROM [Sample]

    )

    SELECT B.seq_nbr,

    A.seq_nbr,

    C.seq_nbr

    FROM C AS A

    LEFT JOIN C AS B

    ON A.rnk - 1 = B.rnk

    LEFT JOIN C

    ON A.rnk = C.rnk - 1;

    --Common Solution #2

    SELECT D1.seq_nbr,

    S.seq_nbr,

    D2.seq_nbr

    FROM [Sample] AS S

    OUTER APPLY (

    SELECT TOP 1 seq_nbr

    FROM [Sample]

    WHERE seq_nbr < S.seq_nbr

    ORDER BY seq_nbr DESC

    ) AS D1

    OUTER APPLY (

    SELECT TOP 1 seq_nbr

    FROM [Sample]

    WHERE seq_nbr > S.seq_nbr

    ORDER BY seq_nbr

    ) AS D2;

    SET STATISTICS IO OFF

    SET STATISTICS TIME OFF

    /*

    Mohammad Salimabadi Solution:

    Table 'Sample'. Scan count 1, logical reads 2

    Common Solution #1

    Table 'Sample'. Scan count 3, logical reads 20

    Common Solution #2

    Table 'Sample'. Scan count 9, logical reads 18

    */

  • Hi Mohammad. I can run this on my data - right now it looks like it's picking out the current, previous and next 'date' or seq-nbr? What I'd need is the current, next and previous 'value', from those dates. Any chance of a tweak?

    A quick run of the code as is suggests it is much faster than the CTE and faster than the Outer Apply.

  • ;WITH C AS

    (

    SELECT seq_nbr, k,

    DENSE_RANK() OVER(ORDER BY seq_nbr ASC) + k AS grp_fct

    FROM [Sample]

    CROSS JOIN

    (VALUES (-1), (0), (1)

    ) AS D(k)

    )

    I'm curious about your choice to use DENSE_RANK (which is identical to ROW_NUMBER and RANK when all rows are unique, but which I believe won't work when there are duplicates).

    Is DENSE_RANK any faster? And if so, why? I believe your technique should still work fine with ROW_NUMBER.

    I supposed that the real speed up of this version comes from the up front CROSS JOIN which mitigates the tendency of the engine to treat multiple uses of CTE which is self-joined independently resulting in multiple reads.

  • >>

    Hi Mohammad. I can run this on my data - right now it looks like it's picking out the current, previous and next 'date' or seq-nbr? What I'd need is the current, next and previous 'value', from those dates. Any chance of a tweak?

    A quick run of the code as is suggests it is much faster than the CTE and faster than the Outer Apply.

    <<

    No problem,

    Try this on:

    DECLARE @Sample TABLE

    (

    seq_nbr INTEGER NOT NULL PRIMARY KEY,

    value INTEGER

    );

    INSERT @Sample

    VALUES (1, 5), (4, 9), (5, 8), (8, 20);

    ;WITH C AS

    (

    SELECT seq_nbr, value, k,

    DENSE_RANK() OVER(ORDER BY seq_nbr ASC) + k AS grp_fct

    FROM @Sample

    CROSS JOIN

    (VALUES (-1), (0), (1)

    ) AS D(k)

    )

    SELECT MAX(CASE WHEN k = 1 THEN value END) AS pre_value,

    MIN(CASE WHEN k = 1 THEN seq_nbr END) AS pre_seq,

    MAX(CASE WHEN k = 0 THEN value END) AS current_value,

    MAX(CASE WHEN k= 0 THEN seq_nbr END) AS current_seq,

    MAX(CASE WHEN k = -1 THEN value END) AS next_value,

    MAX(CASE WHEN k = -1 THEN seq_nbr END) AS next_seq

    FROM C

    GROUP BY grp_fct

    HAVING min(seq_nbr) < max(seq_nbr);

    /*

    pre_value pre_seq current_value current_seq next_value next_seq

    ----------- ----------- ------------- ----------- ----------- -----------

    NULL NULL 5 1 9 4

    5 1 9 4 8 5

    9 4 8 5 20 8

    8 5 20 8 NULL NULL

    */

  • Cade Roux (4/12/2011)


    ;WITH C AS

    (

    SELECT seq_nbr, k,

    DENSE_RANK() OVER(ORDER BY seq_nbr ASC) + k AS grp_fct

    FROM [Sample]

    CROSS JOIN

    (VALUES (-1), (0), (1)

    ) AS D(k)

    )

    I'm curious about your choice to use DENSE_RANK (which is identical to ROW_NUMBER and RANK when all rows are unique, but which I believe won't work when there are duplicates).

    Is DENSE_RANK any faster? And if so, why? I believe your technique should still work fine with ROW_NUMBER.

    I supposed that the real speed up of this version comes from the up front CROSS JOIN which mitigates the tendency of the engine to treat multiple uses of CTE which is self-joined independently resulting in multiple reads.

    No, if we change the DENSE_RANK with ROW_NUMBER the result will not be correct. You can rum my first script at my first post with row_number. So you will see an empty result set.

  • _ms65g_ (4/12/2011)

    No, if we change the DENSE_RANK with ROW_NUMBER the result will not be correct. You can rum my first script at my first post with row_number. So you will see an empty result set.

    I see what you are doing now, it is a re-grouping - basically like a pivot.

  • Well, in my application it seems about the same as the Outer Apply method. I only need the previous (not the next) day's value, for 5000 series of about 1000 elements. Finding those 5 million values and doing a brief calculation on them takes 3-4 mins on my laptop, with either method...

  • AlistairNY (4/12/2011)


    Well, in my application it seems about the same as the Outer Apply method. I only need the previous (not the next) day's value, for 5000 series of about 1000 elements. Finding those 5 million values and doing a brief calculation on them takes 3-4 mins on my laptop, with either method...

    That's probably because the cross-join triples the set first. Did you use a modified cross-join with only -1, 0? I still expect that at large data sizes the cross-join will start to level out with the other methods - cross-joins are expensive. Still, the cross-join/pivot is a cool trick.

  • Yes, I cut down the other table to -1 and 0. If there's anything else I could try, very happy to 🙂

    I partitioned the rank function by a SeriesId as well, but otherwise ran Mohammad's solution. Any advantage in replacing the CTE with an indexed temp table?

  • Looks like a very innovative and interesting approach.

    _ms65g_ (4/12/2011)


    Can you demonstrate your technique for matching current rows with previous and next values is faster and more efficient as my best solution?

    Can you propose a version of your script compatible with the benchmark tests? At which point I'll be glad to test your script and include the results.

    Best regards,

    David.

  • Sorry for late reply.

    Here you are:

    SET STATISTICS IO ON;

    SET STATISTICS TIME ON;

    --Mohammad Salimabadi Solution

    WITH C0 AS

    (

    SELECT Ph.*, I.item,

    ROW_NUMBER() OVER (PARTITION BY Ph.ItemId

    ORDER BY Ph.PriceStartDate) AS rownum

    FROM PriceHistory AS Ph

    JOIN Items AS I

    ON Ph.itemid = I.itemid

    ),

    C1 AS

    (

    SELECT itemid, item, PriceStartDate, price,

    rownum + k AS grp_fct,

    k

    FROM C0

    CROSS JOIN

    (

    VALUES (-1),

    ( 0),

    ( 1)

    ) AS D(k)

    )

    SELECT MAX(CASE WHEN k = 0 THEN item END) AS Item,

    MAX(CASE WHEN k = 1 THEN price END) AS OldPrice,

    MAX(CASE WHEN k = 0 THEN price END) AS RangePrice,

    MAX(CASE WHEN k = 0 THEN PriceStartDate END) AS StartDate,

    MAX(CASE WHEN k = -1 THEN PriceStartDate END) AS EndDate

    FROM C1

    GROUP BY itemid, grp_fct

    HAVING MIN(PriceStartDate) < MAX(PriceStartDate);

    --A Common Solution

    WITH PriceCompare AS

    (

    SELECT i.Item,

    ph.ItemId, ph.PriceStartDate, ph.Price,

    ROW_NUMBER() OVER (PARTITION BY ph.ItemId

    ORDER BY PriceStartDate) AS rownum

    FROM Items i

    JOIN PriceHistory ph

    ON i.ItemId = ph.ItemId

    )

    SELECT currow.Item,

    prevrow.Price AS OldPrice,

    currow.Price AS RangePrice,

    currow.PriceStartDate AS StartDate,

    nextrow.PriceStartDate AS EndDate

    FROM PriceCompare currow

    LEFT JOIN PriceCompare nextrow

    ON currow.rownum = nextrow.rownum - 1

    AND currow.ItemId = nextrow.ItemId

    LEFT JOIN PriceCompare prevrow

    ON currow.rownum = prevrow.rownum + 1

    AND currow.ItemId = prevrow.ItemId;

    SET STATISTICS IO ON;

    SET STATISTICS TIME ON;

    /*

    --Mohammad Salimabadi Solutio:

    Table 'PriceHistory'. Scan count 3, logical reads 6

    Table 'Items'. Scan count 1, logical reads 2

    --A Common Solution

    Table 'PriceHistory'. Scan count 5, logical reads 48

    Table 'Items'. Scan count 1, logical reads 2

    */

  • /*Matching Adjacent Rows based on a consecutive value*/

    CREATE TABLE Nums

    (nbr INTEGER NOT NULL PRIMARY KEY,

    val INTEGER NOT NULL);

    INSERT INTO Nums (nbr, val)

    VALUES (1, 0), (5, 7), (9, 4);

    --===========Mohammad Salimabadi Solution #2

    SELECT T2.*, T1.*, T3.*

    FROM Nums AS T1

    LEFT JOIN Nums AS T2

    ON T2.nbr = (SELECT MAX(nbr)

    FROM Nums

    WHERE nbr < T1.nbr)

    LEFT JOIN Nums AS T3

    ON T3.nbr = (SELECT MIN(nbr)

    FROM Nums

    WHERE nbr > T1.nbr);

    --==========Mohammad Salimabadi Solution #3

    SELECT pre_nbr, N1.val AS pre_val,

    C.nbr AS cur_nbr, C.val AS cur_val,

    nxt_nbr, N2.val AS nxt_val

    FROM (

    SELECT MAX(CASE WHEN N1.nbr > N2.nbr THEN N2.nbr ELSE NULL END) AS pre_nbr,

    N1.nbr, N1.val,

    MIN(CASE WHEN N1.nbr < N2.nbr THEN N2.nbr ELSE NULL END) AS nxt_nbr

    FROM Nums AS N1,

    Nums AS N2

    GROUP BY N1.nbr, N1.val

    ) AS C

    LEFT JOIN Nums AS N1

    ON C.pre_nbr = N1.nbr

    LEFT JOIN Nums AS N2

    ON C.nxt_nbr = N2.nbr;

    --=========Mohammad Salimabadi Solution #4

    SELECT CAST(SUBSTRING(concat_pre, 1, 4) AS integer) AS pre_nbr,

    CAST(SUBSTRING(concat_pre, 5, 8) AS integer) AS pre_val,

    nbr, val,

    CAST(SUBSTRING(concat_nxt, 1, 4) AS integer) AS pre_nbr,

    CAST(SUBSTRING(concat_nxt, 5, 8) AS integer) AS pre_val

    FROM (

    SELECT (

    SELECT MAX(CAST(nbr AS BINARY(4)) + CAST(val AS BINARY(4)))

    FROM Nums

    WHERE nbr < T.nbr

    ) AS concat_pre,

    nbr, val,

    (

    SELECT MIN(CAST(nbr AS BINARY(4)) + CAST(val AS BINARY(4)))

    FROM Nums

    WHERE nbr > T.nbr

    ) AS concat_nxt

    FROM Nums AS T

    ) AS D

  • Firstly apologies, Mohammad, for the obscenely long reply.

    Before we look at the results, let me just say that the test data, and code that I used are those that I posted previously. So you are free to perform exactly the same test in your own environment.

    I compared your method with the other methods previously discussed, and unfortunately the results were rather disappointing. Indeed it's generally the worst performer.

    For the full resultset I used your code exactly as is. For the "Price rises" and the "One Item" queries, I amended your query to put the results in a further CTE and then filtered the results. (extract below)

    C3 AS

    (

    SELECT MAX(CASE WHEN k = 0 THEN item END) AS Item,

    MAX(CASE WHEN k = 1 THEN price END) AS OldPrice,

    MAX(CASE WHEN k = 0 THEN price END) AS RangePrice,

    MAX(CASE WHEN k = 0 THEN PriceStartDate END) AS StartDate,

    MAX(CASE WHEN k = -1 THEN PriceStartDate END) AS EndDate

    FROM C1

    GROUP BY itemid, grp_fct

    HAVING MIN(PriceStartDate) < MAX(PriceStartDate)

    )

    SELECT * FROM C3 where Item='Item 512'

    Full ResultsetMethodExecution Time

    CTE View13 secs

    Temp Table with rownumber33 secs

    Table Variable with rownumber47 secs

    Temp Table with identity16 secs

    Table Variable with identityCancelled after 1 hour

    Cross Apply12 secs

    Mohammad Salimabadi Solution131 secs

    One Item

    CTE View2 secs

    Temp Table with rownumber5 secs

    Table Variable with rownumber4 secs

    Temp Table with identity6 secs

    Table Variable with identity83 secs

    Cross Apply0 sesc

    Mohammad Salimabadi Solution85 secs

    Price Rises

    CTE View8 secs

    Temp Table with rownumber19 secs

    Table Variable with rownumber17 secs

    Temp Table with identity12 secs

    Table Variable with identityNot Run

    Cross Apply8 secs

    Mohammad Salimabadi Solution114 secs

    Please try to reproduce this in your own environment, and let me know if you manage to bring improvements to the results.

    Best regards,

    David McKinney.

Viewing 13 posts - 136 through 147 (of 147 total)

You must be logged in to reply to this topic. Login to reply