Optimising Server-Side Paging - Part I

  • Paul White

    SSC Guru

    Points: 150442

    Comments posted to this topic are about the item Optimising Server-Side Paging - Part I

  • gitendrashah.net

    SSC Rookie

    Points: 41

    Nice post ......but i have been using this paging before also....so i didn't get any difference......

  • Aleksandar Cebov

    SSC Rookie

    Points: 41

    Hmmm... it seems to me that this will work only if where statement is not being used. What if we need to do server side paging and sorting and apply filter on which rows should be returned (maybe in the example I need to get only those records where the thread_id=some_value and sort them by create_dt)?

  • desade

    Valued Member

    Points: 59

    any idea why key seek method uses much more CPU for 10 pages then it uses for 100 or 200? It looks as some glitch in testing, but maybe there is logical explanation

  • Paul White

    SSC Guru

    Points: 150442

    Aleksandar Cebov (4/26/2010)


    Hmmm... it seems to me that this will work only if where statement is not being used.

    It works whenever there is a useful index to obtain the keys, in the required order.

    What if we need to do server side paging and sorting and apply filter on which rows should be returned (maybe in the example I need to get only those records where the thread_id=some_value and sort them by create_dt)?

    The appropriate Key-Seek index in that case would be on (thread_id, create_dt). There is a full reproduction script included with the article. I encourage you to download it, and experiment for yourself.

    Custom sorting and filtering is covered in more depth in part III - this first part is just about establishing the basic method.

    Paul

  • Paul White

    SSC Guru

    Points: 150442

    desade (4/26/2010)


    any idea why key seek method uses much more CPU for 10 pages then it uses for 100 or 200? It looks as some glitch in testing, but maybe there is logical explanation

    There's no special reason that I am aware of - the test results are shown exactly as they appeared. I just put it down to the small numbers involved, the limited timing resolution available, and random chance...

  • Paul White

    SSC Guru

    Points: 150442

    gitendrashah.net (4/26/2010)


    Nice post ......but i have been using this paging before also....so i didn't get any difference......

    Well hopefully you found something interesting or useful - if not, perhaps you will in parts 2 and 3.

  • Paul White

    SSC Guru

    Points: 150442

    Aleksandar Cebov (4/26/2010)


    What if we need to do server side paging and sorting and apply filter on which rows should be returned (maybe in the example I need to get only those records where the thread_id=some_value and sort them by create_dt)?

    Ok, here's some code to demonstrate what I meant in my previous reply:

    -- Key Seek index

    CREATE INDEX [IX dbo.Post thread_id, create_dt]

    ON dbo.Post (thread_id, create_dt);

    DECLARE @PageNumber BIGINT, -- Page number to fetch

    @PageSize BIGINT; -- Rows per page

    SET @PageSize = 50;

    SET @PageNumber = 10;

    -- The thread_id to filter on

    DECLARE @ThreadID INTEGER;

    SET @ThreadID = 6;

    -- Key-Seek algorithm

    WITH Keys

    AS (

    -- Step 1 : Number the rows from the non-clustered index

    -- Maximum number of rows = @PageNumber * @PageSize

    SELECT TOP (@PageNumber * @PageSize)

    rn = ROW_NUMBER() OVER (ORDER BY P1.create_dt ASC),

    P1.post_id,

    P1.create_dt

    FROM dbo.Post P1

    WHERE P1.thread_id = @ThreadID

    ORDER BY

    P1.create_dt ASC

    ),

    SelectedKeys

    AS (

    -- Step 2 : Get the primary keys for the rows on the page we want

    -- Maximum number of rows from this stage = @PageSize

    SELECT TOP (@PageSize)

    SK.rn,

    SK.post_id,

    SK.create_dt

    FROM Keys SK

    WHERE SK.rn > ((@PageNumber - 1) * @PageSize)

    ORDER BY

    SK.create_dt ASC

    )

    SELECT -- Step 3 : Retrieve the off-index data

    -- We will only have @PageSize rows by this stage

    SK.rn,

    P2.post_id,

    P2.thread_id,

    P2.member_id,

    P2.create_dt,

    P2.title,

    P2.body

    FROM SelectedKeys SK

    JOIN dbo.Post P2

    ON P2.post_id = SK.post_id

    ORDER BY

    SK.create_dt ASC

    OPTION (RECOMPILE);

    Actual execution plan:

  • Paul White

    SSC Guru

    Points: 150442

    That query plan highlights something I took out of the article at the last minute - to keep things simple.

    You might wonder why SQL Server chooses a non-clustered index seek and key lookup, rather than a clustered index seek in the last step.

    The answer is again down to the width of the clustered index: for a relatively small number of rows, the cost of the seek + lookup is less than scanning even a very small range on the cluster.

    For larger page sizes, SQL Server may choose a partial scan of the cluster, due to the mounting cost of the random I/O associated with the lookups.

    A final point: notice that the final loop joins to perform the seek and lookup both have the WithOrderedPrefetch: True and Optimized: False attributes. This is a read-ahead optimization, looking ahead in the index and issuing asynchronous I/O for rows that will be needed for the joins. More details:

    http://blogs.msdn.com/craigfr/archive/2008/10/07/random-prefetching.aspx

  • peter-757102

    SSCertifiable

    Points: 6877

    I scanned the article quicly and it is well presented.

    But I do have to dive into this particular one as I feel something odd is happening. The optimized statement is still just one statement and the steps in the with clause are no more then inline views. I am interested in why this construct manages to forces a way better execution plan.

    No time no to go deep into this, but I will 😉

    Thx for the article in advance!

  • nabidavid

    SSC Eights!

    Points: 815

    Great article, Paul. I look forward to Part II. 😀

    If you are not already covering it in a future article, I am curious if you have any advice on the best way (as in best performance) to incorporate the total record count for calculating total pages in this method, especially when custom filtering is involved.

    Of course, if you are already planning to cover this topic, I am happy to patiently wait until then. 😉

    Thanks!

  • Paul White

    SSC Guru

    Points: 150442

    nabidavid (4/26/2010)


    If you are not already covering it in a future article, I am curious if you have any advice on the best way (as in best performance) to incorporate the total record count for calculating total pages in this method, especially when custom filtering is involved.

    I do cover various methods for obtaining the total record count in part II, and part III is concerned with custom filtering and ordering. Of course it is never possible to cover every possible need - but I hope you will find it useful.

  • Paul White

    SSC Guru

    Points: 150442

    peter-757102 (4/26/2010)


    I scanned the article quicly and it is well presented.

    Thanks very much.

    But I do have to dive into this particular one as I feel something odd is happening. The optimized statement is still just one statement and the steps in the with clause are no more then inline views. I am interested in why this construct manages to forces a way better execution plan.

    You are right to suspect that there is rather more to it than that 😉

    I hope you find time for a proper read of it very soon. Please also take a look at the attached scripts if you can.

  • sknox

    SSChampion

    Points: 12292

    Good article. One thing I noticed is that you're using a contiguous identity for your tests, but relying on ROW_NUMBER() for the paging.

    I'm assuming that ROW_NUMBER() is used because you're planning for missing identity values due to deleted rows? Given that, I'd like to see the tests run with some random rows deleted, to better reflect a real-world scenario.

  • Paul White

    SSC Guru

    Points: 150442

    sknox (4/26/2010)


    Good article. One thing I noticed is that you're using a contiguous identity for your tests, but relying on ROW_NUMBER() for the paging. I'm assuming that ROW_NUMBER() is used because you're planning for missing identity values due to deleted rows?

    The cluster on the identity column is primarily for architectural reasons - as you know, you can never depend on IDENTITY columns to be contiguous, regardless of whether rows are ever deleted.

    The ROW_NUMBER is just used to assign a sequential number in selected key order, and to make the selection of keys for the page we want easy.

    The only reason I order by the underlying key (post_id) rather than the ROW_NUMBER (rn) is purely a limitation of the optimiser. We both know that the order of ROW_NUMBER assignment matches the ORDER BY of the ranking function, but the current optimiser does not include that logic. Sorting on the key avoids a small extra sort - so it's all rather technical, but then so are many things with SQL Server 🙂

    Given that, I'd like to see the tests run with some random rows deleted, to better reflect a real-world scenario.

    The full test rig is available for download in the Resources section. 😉

Viewing 15 posts - 1 through 15 (of 56 total)

You must be logged in to reply to this topic. Login to reply