Optimising Server-Side Paging - Part I

  • A brilliant article. Keep it up!

    Tom

  • Thank you Jeff, Lutz, and Tom.

    No pressure then 😉

  • I wonder if the TOP specification has any special impact (as claimed) on the query execution - I tried to omit it (replacing it with WHERE PG.rn BETWEEN lowRn AND hignRn) and the IO statistics were the same.

    Query plans are nearly identical: both contain Filter and Top side by side, just in reversed order.

    Tried in 2005 and 2008 with the same results.

  • ondrej.bouda (4/27/2010)


    I wonder if the TOP specification has any special impact (as claimed) on the query execution - I tried to omit it (replacing it with WHERE PG.rn BETWEEN lowRn AND hignRn) and the IO statistics were the same. Query plans are nearly identical: both contain Filter and Top side by side, just in reversed order. Tried in 2005 and 2008 with the same results.

    I did make that claim, so I had better back it up 🙂

    My preference here is to always use an extra TOP, since it gives the query optimiser explicit information about the maximum number of rows than can flow past that iterator. Giving the QO more and better information is pretty much always a good thing.

    Using BETWEEN is not quite the same thing. It might be obvious to you, as a human, that BETWEEN 1 AND 10 (for example) will return 10 rows at most - but the QO can't currently make that logical leap. It is not able to trace the 'rn' column back to the ROW_NUMBER ranking function, and infer the same guarantee you do.

    So, your change subtly introduces scope for a cardinality estimation error: compare the execution plans - the QO has to make a guess (currently 9% selectivity) for the BETWEEN expression. This results in an estimate of 900 rows for the last page of the 10,000-row sample. (For larger data sets, the error would obviously be much larger.)

    Clearly this is wrong: we know that a page can have at most @PageSize rows. Using the extra TOP lets the optimiser have that information too.

    Using BETWEEN therefore introduces scope for a sub-optimal plan choice after the Filter iterator - something that we can easily avoid, with some good old defensive programming.

    Hope that helps.

  • Thanks!

    I checked my execution plans and it already makes the difference - the plan with TOP is much more accurate in the estimated number of rows.

    Thats a good point of view: Tell to the optimizer all that you can know. There's no need to care whether some instruction might be useless (because the optimizer might be smart enough to know it too).

  • ondrej.bouda (4/27/2010)


    Thats a good point of view: Tell to the optimizer all that you can know.

    Absolutely right!

  • Paul White NZ (4/27/2010)


    ondrej.bouda (4/27/2010)


    Thats a good point of view: Tell to the optimizer all that you can know.

    Absolutely right!

    +2!!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Nice article Paul. Thanks for sharing.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Hey Paul

    Thanks for sharing this great article!

    Best wishes,

    Flo

  • What's the point of the second CTE in your Key Seek solution? It seems like that could easily be incorporated into the final select statement.

    WITH Keys

    AS (

    -- Step 1 : Number the rows from the non-clustered index

    -- Maximum number of rows = @PageNumber * @PageSize

    SELECT TOP (@PageNumber * @PageSize)

    rn = ROW_NUMBER() OVER (ORDER BY P1.post_id ASC),

    P1.post_id

    FROM dbo.Post P1

    ORDER BY

    P1.post_id ASC

    )

    SELECT TOP (@PageSize)-- Step 3 : Retrieve the off-index data

    -- We will only have @PageSize rows by this stage

    SK.rn,

    P2.post_id,

    P2.thread_id,

    P2.member_id,

    P2.create_dt,

    P2.title,

    P2.body

    FROM Keys SK

    JOIN dbo.Post P2

    ON P2.post_id = SK.post_id

    WHERE SK.rn > ((@PageNumber - 1) * @PageSize)

    ORDER BY

    SK.post_id ASC;

    Seems to run with the exact same IO. Query execution plan simply moves the Top to the end of the plan from the middle. Why the added complexity? I tried 500 results on page 10, and 50 results on page 100.

    Strangely, when executing your FetchPageKeySeek and my version above, whichever runs second runs the fastest. I'm running on SQL Server 2008 (10.0.2531) from SSMS.

    Also, for those doing their own benchmarks: I find that SSMS's rendering of the resultsets take a significant amount of time. To control against it, I insert the results into a temp table instead.

    Great article though. Can't wait for the next installments. Any ETA?

  • Florian Reischl (5/1/2010)


    Hey Paul

    Thanks for sharing this great article!

    Best wishes,

    Flo

    Hey Flo, thanks very much! I'm a big fan of your blog by the way.

  • tobyteel (5/3/2010)


    What's the point of the second CTE in your Key Seek solution? It seems like that could easily be incorporated into the final select statement. Seems to run with the exact same IO. Query execution plan simply moves the Top to the end of the plan from the middle. Why the added complexity? I tried 500 results on page 10, and 50 results on page 100.

    Three reasons:

    1. I intended it to be clearer for those new to the idea to follow what was happening and why.

    2. Separating the steps out might make for easier maintenance. Compacting the code makes it less readable in my view, without benefiting the execution plan.

    3. Having the TOP operator before the look-ups tells the optimiser that a maximum of @PageSize look-ups will be performed. The query may optimise differently because of this. There was a small discussion previously concerning why it is better to give the optimiser all the information you can. With the TOP at the end, if there were other iterators (such as a filter) after the look-ups, the optimiser might choose a sub-optimal plan for the look-ups based on an incorrect cardinality estimation. The difference between a @PageSize row goal (TOP at the end) and a guaranteed maximum (TOP before the look-ups) is fairly subtle, but can be important.

    Strangely, when executing your FetchPageKeySeek and my version above, whichever runs second runs the fastest. I'm running on SQL Server 2008 (10.0.2531) from SSMS.

    This is the effect of cached data pages. The downloadable code in the Resources section provides a full test rig which eliminates this problem and makes a fairer comparison.

    Also, for those doing their own benchmarks: I find that SSMS's rendering of the resultsets take a significant amount of time. To control against it, I insert the results into a temp table instead.

    True. For a larger number of rows I tend to do the same myself. Feel free to modify the test rig to do this if you wish.

    Great article though. Can't wait for the next installments. Any ETA?

    Part II was accepted for publication a week or so ago. Just waiting for Steve to schedule it. Are you listening, Steve? 😛

  • Paul White NZ (5/3/2010)


    Florian Reischl (5/1/2010)


    Hey Paul

    Thanks for sharing this great article!

    Best wishes,

    Flo

    Hey Flo, thanks very much! I'm a big fan of your blog by the way.

    Can you give a link to Florian's blog please? That's gotta be a keeper!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (5/4/2010)


    Can you give a link to Florian's blog please? That's gotta be a keeper!

    It's http://florianreischl.blogspot.com/ but I apologise in advance - I have a amateur interest in SQLCLR and C# in particular, and Flo writes some good stuff bridging the gap between .NET and SQL Server. May not be for everyone, but please do check it out.

    Paul

  • Paul White NZ (5/3/2010)


    Florian Reischl (5/1/2010)


    Hey Paul

    Thanks for sharing this great article!

    Best wishes,

    Flo

    Hey Flo, thanks very much! I'm a big fan of your blog by the way.

    Hey Paul

    Thanks a bunch! Sadly I had no time over the last three month to publish some new things.

    (Just started in a new company - yesterday).

    Greets

    Flo

Viewing 15 posts - 31 through 45 (of 55 total)

You must be logged in to reply to this topic. Login to reply