Server Side Paging With SQL Server 2005

  • So far In my experience this kind of paging does not work for large sets. The real solution has always been a combination of finding the First page and let the state pass back and forth from the client building the SQL statement dynamically

    Search stored procedures are unnecessaily complex and multiple single-purpose dedicated stored procedures may help but a lot of maintenace is required if you go this route.

    For a full explanation of many techniques go to : http://www.sommarskog.se/dyn-search.html


    * Noel

  • Noggin & Jacob,

    Not sure if it is the best way, but I handle ascending and descending via parameter by throwing this order by on the outer query that references the CTE:

    ORDER BY

    CASE

    WHEN @DisplayNewestFirst = 1

    THEN ROW_NUMBER() OVER (ORDER BY PostDateTime DESC)

    ELSE ROW_NUMBER() OVER (ORDER BY PostDateTime ASC)

    END ASC

    returning the total number of records that match the current filters is very simple, you can simply throw this into the projection of query that references the CTE

    (SELECT COUNT(*) FROM emp) AS TotalRows

    But of course nothing in SQL is free - if you do include this COUNT(*) in the projection then sql MUST evaluate every row that matches. It can no longer optimize out work that it could have based on the WHERE RecID BETWEEN ... stuff.

    I'll include an example that demonstrates including the COUNT(*) gives a second scan:

    create table t1 (id int identity, val varchar(1))

    create clustered index t1_idx_id on t1(id)

    go

    insert t1 values ('a');

    go 100

    create proc sp1

    (

    @FirstItem INT,

    @NumItems INT,

    @MaxItem INT

    )

    AS

    BEGIN

    WITH RangedItems AS

    (

    SELECT

    id,

    val,

    ROW_NUMBER() OVER (ORDER BY id ASC) AS RowNum

    FROM t1

    WHERE id <= @MaxItem

    )

    SELECT

    *,

    (SELECT COUNT(*) FROM RangedItems) [TotalRows]

    FROM RangedItems

    WHERE RowNum BETWEEN @FirstItem AND @FirstItem + @NumItems - 1

    END

    go

    create proc sp2

    (

    @FirstItem INT,

    @NumItems INT,

    @MaxItem INT

    )

    AS

    BEGIN

    WITH RangedItems AS

    (

    SELECT

    id,

    val,

    ROW_NUMBER() OVER (ORDER BY id ASC) AS RowNum

    FROM t1

    WHERE id <= @MaxItem

    )

    SELECT

    *

    FROM RangedItems

    WHERE RowNum BETWEEN @FirstItem AND @FirstItem + @NumItems - 1

    END

    go

    set statistics time on

    set statistics io on

    go

    exec sp1 1, 10, 100

    --scan=2, logical reads=4

    exec sp2 1, 10, 100

    --scan=1, logical reads=2

  • At one time I created a very inelegant solution using Tempdb. Where I had to select all the ID's and stuff them in a temp table. Then I selected the first and last record to get the ID's that I needed from the actual table. Both sorting and filtering where figured in here.

    Anyway I created a much more elegant solution that uses ROW_Number. The nice thing here is that one SP can handle all grids in my case. Here is the SP:

    CREATE PROCEDURE [dbo].[tsp_Grid_Paging_Sorting_Filtering]

    (

    @pageNum int

    ,@pageSize int

    ,@TableName nvarchar(100)

    ,@orderby varchar(max)

    ,@whereclause varchar(max)

    )

    AS

    SET NOCOUNT ON

    declare @lownum nvarchar(10)

    declare @highnum nvarchar(10)

    declare @sql nvarchar(4000)

    set @lownum = convert(nvarchar(10), (@pagesize * (@pagenum - 1)))

    set @highnum = convert(nvarchar(10), (@pagesize * @pagenum))

    set @sql = 'select * from (

    select row_number() over (order by ' + @orderby + ') as rownum

    ,*

    from ' + @TableName + ' ' + @whereclause + '

    ) A WHERE RowNum > ' + @lownum + ' AND rownum @pagesize * (@pageNum - 1) AND rownum <= (@pagesize * @pageNum)

    -- Execute the SQL query

    EXEC sp_executesql @sql

    -----------------------------------------END---------------------------------------

    Here @TableName can also be the name of a view. Where the view can already have the columns selected.

  • Ian,

    I can tell you that sql does certainly not do a good job of pushing down constraints into the CTE based on joins outside of it.

    I foolishly implemented some stuff with this thinking without testing and was caught off guard when I had some perf problems as content began to scale up.

    I found that if you are going to use joins to do much rowset reduction/constraining then you need to manually push it down inside the CTE.

    However it does in my experience do a good job of pushing down stuff constraints on the range of output from ROW_NUMBER(). That is to say if you do:

    ROW_NUMBER() OVER (ORDER BY Id) AS [RowNumber]

    inside your CTE, and do

    WHERE RowNumber BETWEEN 1 AND 10

    outside the CTE, it is similar to doing a TOP 10 inside the CTE.

  • Great Tip (Y)

  • For one, you should never return all the columns you need from the CTE, there is no need to return anthing but the row's GUID or Identity column and the row number output. By doing this, you limit what the CTE has to pull. In your main SELECT you join to the same table using your CTE, and then pull all the columns you need. This way you are only pulling data from the page you want.

    I did something similar like this for a pretty busy website using temp tables and before, and have since switched to this method once we upgraded to 2005 and I can say it scales pretty nicely.

    The only tricky parts are sorting and getting total record counts. Sorting is made easy with CASE statements using the ROW_NUMBER() function. Getting the record count is a little tough because you can only refer to the CTE in the select statement immediately after the CTE definition. This leaves you with two choices, either add a sub-select that does a COUNT(*) from the CTE as the last column in your main select, or duplicating your CTE at a second query with a COUNT(*) instead. I would highly recommend the sub-select method since you dont query the data twice. So here is what you end up with:

    ;WITH [CTE] AS

    (

    SELECT

        RecordID, -- your row identifier

        ROW_NUMBER() OVER (ORDER BY [Sorting Column]) AS RowNumber

    FROM

        [Table]

    WHERE

        [Add your filters here]

    )

    SELECT

        [Your fields to return list],

        (SELECT COUNT(1) FROM [CTE]) AS TotalRecords

    FROM

        [CTE]

        INNER JOIN [Table] ON [CTE].RecordID = [Table].RecordID

    WHERE

        [CTE].RowNumber BETWEEN [filter to what page you want here]

    ORDER BY

        [CTE].RowNumber

  • JRodman,

    I disagree with your statement of absolutes - that you should never return all columns from the CTE. See my post on page 2 of this thread - you should return all of the columns in the case that all columns are contained in the same covering index as the ROW_NUMBER() OVER (...) does its ordering by. Doing the join does an unnecessary revisit to the index in this case.

    Run the following example and look at the plan - you will see that the approach you recommend does two index seeks whereas if you just pull the data from within the CTE there is only one index seek.

    create table t1 (id int identity, val varchar(1));

    create clustered index t1_idx_id on t1(id);

    go

    insert t1 values ('a');

    go 100

    create proc sp5

    (

    @FirstItem INT,

    @NumItems INT,

    @MaxItem INT

    )

    AS

    BEGIN

    WITH RangedItems AS

    (

    SELECT

    id,

    ROW_NUMBER() OVER (ORDER BY id ASC) AS RowNum

    FROM t1

    WHERE id <= @MaxItem

    )

    SELECT

    ri.id,

    t1.val,

    ri.RowNum

    FROM RangedItems AS ri INNER JOIN t1 ON ri.id = t1.id

    WHERE RowNum BETWEEN @FirstItem AND @FirstItem + @NumItems - 1

    END

    go

    create proc sp6

    (

    @FirstItem INT,

    @NumItems INT,

    @MaxItem INT

    )

    AS

    BEGIN

    WITH RangedItems AS

    (

    SELECT

    id,

    val,

    ROW_NUMBER() OVER (ORDER BY id ASC) AS RowNum

    FROM t1

    WHERE id <= @MaxItem

    )

    SELECT

    *

    FROM RangedItems

    WHERE RowNum BETWEEN @FirstItem AND @FirstItem + @NumItems - 1

    END

    go

    exec sp5 1, 10, 100;

    exec sp6 1, 10, 100;

    go

  • Here is the problem with that approach though... When you start getting into the 100's thousands and millinos of rows in the table, you are pulling down all of those rows through the CTE. By limiting what you return, there is less disk IO to retrieve that data, and the second index scan's cost is way less that the cost of retrieving that data and then filtering to the page you want. I know this for a fact because i have tested both scenarios on systems with millions of rows and this method scales better.

  • JRodman,

    I think the testing you have performed must have been in a different case than the one I was talking about. I'm only saying that the ideal approach is to put all columns into the CTE when they are ALL covered by the same index that you have your ROW_NUMBER over. In scenarios that are not this case, I completely agree (and use) the approach you mentioned.

    I will post demonstration code that shows over a 10MM row table, my method performs better. The more rows you return in the outer query, the greater the gap in perf becomes.

    When I returned 10 rows (Number BETWEEN 5000000 AND 5000010), my method was only slightly better. For 1000 rows (Number BETWEEN 5000000 AND 5001000), my method had about 10% fewer reads.

    Most of the time I do not use this approach because usually all of the columns I need are not in the covering index. I only noticed this perf improvement last week when I did have a scenario that had this case and I noticed the plan doing two index seeks looked unnecessary.

  • JRodman, here is the demonstration code I mentioned:

    create table test_table (number int primary key clustered, payload varchar(100));

    go

    WITH

    -- first CTE which returns 10 rows (0-9)

    digits AS (

    SELECT 0 as Number

    UNION SELECT 1

    UNION SELECT 2

    UNION SELECT 3

    UNION SELECT 4

    UNION SELECT 5

    UNION SELECT 6

    UNION SELECT 7

    UNION SELECT 8

    UNION SELECT 9

    )

    -- second CTE which returns 10 million rows by using

    -- a CROSS JOIN on the first CTE

    , dig AS (

    SELECT

    (millions.Number * 1000000)

    + (hThousands.Number * 100000)

    + (tThousands.Number * 10000)

    + (thousands.Number * 1000)

    + (hundreds.Number * 100)

    + (tens.Number * 10)

    + ones.Number AS Number

    FROM digits AS ones

    CROSS JOIN digits AS tens

    CROSS JOIN digits AS hundreds

    CROSS JOIN digits AS thousands

    CROSS JOIN digits AS tThousands

    CROSS JOIN digits AS hThousands

    CROSS JOIN digits as millions

    )

    INSERT test_table(number, payload)

    SELECT number, 'zzzzzzzzzzzzzzzzzzzzzzzz'

    FROM dig

    go

    --END TEST PREP

    declare @temp1 table (number int, payload varchar(100), rownumber int);

    WITH rangenumbers AS

    (

    SELECT

    Number,

    Payload,

    ROW_NUMBER() OVER (ORDER BY Number) as RowNumber

    FROM test_table

    )

    INSERT @temp1 (Number, Payload, RowNumber)

    SELECT

    Number,

    Payload,

    RowNumber

    FROM rangenumbers

    WHERE RowNumber BETWEEN 5000000 and 5001000 - 1

    GO

    declare @temp2 table (number int, payload varchar(100), rownumber int);

    WITH rangenumbers AS

    (

    SELECT

    Number,

    ROW_NUMBER() OVER (ORDER BY Number) as RowNumber

    FROM test_table

    )

    --just doing insert into to avoid return rows to client

    INSERT @temp2 (Number, Payload, RowNumber)

    SELECT

    rn.Number,

    tt.Payload,

    rn.RowNumber

    FROM rangenumbers as rn INNER JOIN test_table tt ON rn.Number = tt.Number

    WHERE RowNumber BETWEEN 5000000 and 5001000 - 1

    GO

  • In this scenario where there is only one table involved, this probably is a better approach. Rarely is this ever the case for me, so I usually need  joins to other tables to get columns needed in the main query, which is where the IO comes into play. Rather than doing those extra joins in the CTE, I do them in the main select because they are merely output and are not being searched.

    It seems like there is a fine line that needs to be found when doing queries like this, is it less expensive to pull all the columns from the primary table through the CTE, or join back to it to get them later for only the rows in the page you want. I guess it really depends on what columns you need and whether you are using the for filtering in the CTE.

  • Tx for introducing me to CTE and RowID.

    In my opinion, I would have to say that this paging is not usefull in a realistic scenario where information is changing.

    Imagine that while paging 10 pages of Custumers, 2 more pages of custumers where registered... the information would not be coherent between pagings. Worst! one could not garantee that all costumers would be shown.

    I have been thinking on this matter for a while, the only solution i found was using temporary tables or views...

    In my opinion the user should be paging a resultSet that represents a moment in time.

    Is there another option i am missing?

  • Good example. But it is of no use in an environment where you will face tables with 1 million or more records.

    For the total amount of pages... just do a "select count(id) from table" first. Most datalayers do support multiple recordsets to be returned from the server.

  • To: Adrian Hains

    Normaly I am only allowed to have indexes on the columns that need to be sorted. But usualy more columns are returned than an index(es) can cover.

    In an olap environment the rowindex solution is perfect. there you can have as many indexes as your diskspaces allows.

  • Dan, regarding your question of how to get the total rows affected without using temp tables, the following seems to work for me. I have no idea how efficient this is.

    You can use aggregate functions without the GROUP BY by using the OVER clause.

    see the following link: http://www.sqlteam.com/article/sql-sever-2005-using-over-with-aggregate-functions

    So I just put something like this in my stored procedure and no GROUP BY is required:

    SELECT

    *

    FROM

    (

    SELECT

    ROW_NUMBER() OVER (ORDER BY a.SearchScore DESC,b.OrderNo)AS RowNumber,

    COUNT(a.OrderID) OVER (PARTITION BY '')AS TotalRows,

    b.OrderNo,

    b.OrderDate,

Viewing 15 posts - 16 through 30 (of 61 total)

You must be logged in to reply to this topic. Login to reply