Find which rows have a column value that is less than the previous row's column value

  • Jeff-

    This really confused me for a while, until I made an assumption that you meant "can't". Assumption correct? If not, could you help me understand how you mean it?

  • You could always upgrade to 2012 and use one of the new window functions 🙂

    http://msdn.microsoft.com/en-us/library/hh231256.aspx

    Jared
    CE - Microsoft

  • dwain.c (6/1/2012)


    jeffem (6/1/2012)


    Dwain, good job on accounting for the missing IDs! 🙂

    Just a note about the performance: I did a comparison, and using a table variable will give a bit of a performance advantage (15% or so, for reasonable numbers of records) to your method. If this were in an actual table (temp, or otherwise) with a good, covering index, then the execution plan between our two solutions is essentially identical.

    Can you post your comparison? I'm confused whether you're comparing actual run results (e.g., with Jeff's 1M row test harness) or just comparing execution plans. I'd do it myself but I'm kinda tied up on something at the moment, but if I don't see something posted maybe I'll come back to it.

    I'm really just curious because I saw a subquery version like mine blast the heck out of an alternative something like yours (no comparison here intended of course) before, where the execution plans rated them about the same.

    I use table variables for convenience so I don't have temp tables littering my sandbox if I forget to drop them or something. I agree they don't always end up rating the same as using an actual table. And with good indexing, it could completely reverse the results.

    At first, I only compared the execution plans, which is all I intended to represent in my post. So, after reading your post, I did some comparisons.

    Scenarios: Unindexed; indexed by key only; fully covered index

    Results: Sorted; Unsorted

    Dataset: 2-column table with an int key column, and a CHAR(30) text column (populated with random characters from the NEWID() function)

    What I found really surprised me, but it makes sense in hindsight. With no indexes, the TOP clause chokes, as it has to sort by the key in order to produce the correct TOP 1 record. As a comparison, with 1M records in the table, the INNER JOIN returned 500,333 records in 0:05 (that REALLY surprised me), while I stopped the CTE query after 2:00, and it had produced 14,415 records. I cleared the cache, added an index on the key column, and then it took 0:05 for me to return the results and 0:04 for you to return yours, unsorted. With sorting, they were both at 0:07 (yours was still slightly faster, but they rounded to be the same whole number of seconds).

    Where they separated was when scaling to even larger datasets. It seems like (on my machine), the break-even point between the two, with this amount and types of data, is around 900K records. So, the 1M test harness shows a very slight edge to yours. When I increased to 2M and 4M, the differences grew.

    For 2M, the unsorted results list was still about the same, with mine completing in 0:13, and yours in 0:14. But when sorting, yours completes in 0:18, and mine in 0:24.

    For 4M, my unsorted returned in 0:23, and yours in 0:18, and the sorted results returned in 0:43 and 0:18, respectively.

    So, not surprising, the INNER JOIN grows cumbersome on large sets of data whereas the CTE handles it easily. But because of the sorting required with the TOP N, if it's not indexed, it doesn't really work at all.

  • SQLKnowItAll (6/4/2012)


    You could always upgrade to 2012 and use one of the new window functions 🙂

    http://msdn.microsoft.com/en-us/library/hh231256.aspx

    Just don't assume they will always be 'better':

    http://sqlblog.com/blogs/michael_zilberstein/archive/2012/03/14/42332.aspx

  • jeffem (6/4/2012)


    dwain.c (6/1/2012)


    jeffem (6/1/2012)


    Dwain, good job on accounting for the missing IDs! 🙂

    Just a note about the performance: I did a comparison, and using a table variable will give a bit of a performance advantage (15% or so, for reasonable numbers of records) to your method. If this were in an actual table (temp, or otherwise) with a good, covering index, then the execution plan between our two solutions is essentially identical.

    Can you post your comparison? I'm confused whether you're comparing actual run results (e.g., with Jeff's 1M row test harness) or just comparing execution plans. I'd do it myself but I'm kinda tied up on something at the moment, but if I don't see something posted maybe I'll come back to it.

    I'm really just curious because I saw a subquery version like mine blast the heck out of an alternative something like yours (no comparison here intended of course) before, where the execution plans rated them about the same.

    I use table variables for convenience so I don't have temp tables littering my sandbox if I forget to drop them or something. I agree they don't always end up rating the same as using an actual table. And with good indexing, it could completely reverse the results.

    At first, I only compared the execution plans, which is all I intended to represent in my post. So, after reading your post, I did some comparisons.

    Scenarios: Unindexed; indexed by key only; fully covered index

    Results: Sorted; Unsorted

    Dataset: 2-column table with an int key column, and a CHAR(30) text column (populated with random characters from the NEWID() function)

    What I found really surprised me, but it makes sense in hindsight. With no indexes, the TOP clause chokes, as it has to sort by the key in order to produce the correct TOP 1 record. As a comparison, with 1M records in the table, the INNER JOIN returned 500,333 records in 0:05 (that REALLY surprised me), while I stopped the CTE query after 2:00, and it had produced 14,415 records. I cleared the cache, added an index on the key column, and then it took 0:05 for me to return the results and 0:04 for you to return yours, unsorted. With sorting, they were both at 0:07 (yours was still slightly faster, but they rounded to be the same whole number of seconds).

    Where they separated was when scaling to even larger datasets. It seems like (on my machine), the break-even point between the two, with this amount and types of data, is around 900K records. So, the 1M test harness shows a very slight edge to yours. When I increased to 2M and 4M, the differences grew.

    For 2M, the unsorted results list was still about the same, with mine completing in 0:13, and yours in 0:14. But when sorting, yours completes in 0:18, and mine in 0:24.

    For 4M, my unsorted returned in 0:23, and yours in 0:18, and the sorted results returned in 0:43 and 0:18, respectively.

    So, not surprising, the INNER JOIN grows cumbersome on large sets of data whereas the CTE handles it easily. But because of the sorting required with the TOP N, if it's not indexed, it doesn't really work at all.

    I am glad you thoroughly checked. Sorry that I didn't have the time myself. Now you should be able to choose the one that is best for your case.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • SQL Kiwi (6/4/2012)


    SQLKnowItAll (6/4/2012)


    You could always upgrade to 2012 and use one of the new window functions 🙂

    http://msdn.microsoft.com/en-us/library/hh231256.aspx

    Just don't assume they will always be 'better':

    http://sqlblog.com/blogs/michael_zilberstein/archive/2012/03/14/42332.aspx

    Who said I assumed that? 🙂 It is just another option (Of course, there are several other considerations to take into account before upgrading, and not really suggested to solve 1 query.)

    Jared
    CE - Microsoft

Viewing 6 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply