Server Side Paging With SQL Server 2005

Question

Server Side Paging With SQL Server 2005

Viewing 15 posts - 16 through 30 (of 61 total)

You must be logged in to reply to this topic. Login to reply

noeld SSC Guru Points: 96590 More actions · Answer 1

So far In my experience this kind of paging does not work for large sets. The real solution has always been a combination of finding the First page and let the state pass back and forth from the client building the SQL statement dynamically

Search stored procedures are unnecessaily complex and multiple single-purpose dedicated stored procedures may help but a lot of maintenace is required if you go this route.

For a full explanation of many techniques go to : http://www.sommarskog.se/dyn-search.html

* Noel

Adrian Hains Ten Centuries Points: 1322 More actions · Answer 2

Noggin & Jacob,

Not sure if it is the best way, but I handle ascending and descending via parameter by throwing this order by on the outer query that references the CTE:

ORDER BY

CASE

WHEN @DisplayNewestFirst = 1

THEN ROW_NUMBER() OVER (ORDER BY PostDateTime DESC)

ELSE ROW_NUMBER() OVER (ORDER BY PostDateTime ASC)

END ASC

returning the total number of records that match the current filters is very simple, you can simply throw this into the projection of query that references the CTE

(SELECT COUNT(*) FROM emp) AS TotalRows

But of course nothing in SQL is free - if you do include this COUNT(*) in the projection then sql MUST evaluate every row that matches. It can no longer optimize out work that it could have based on the WHERE RecID BETWEEN ... stuff.

I'll include an example that demonstrates including the COUNT(*) gives a second scan:

create table t1 (id int identity, val varchar(1))

create clustered index t1_idx_id on t1(id)

go

insert t1 values ('a');

go 100

create proc sp1

(

@FirstItem INT,

@NumItems INT,

@MaxItem INT

)

AS

BEGIN

WITH RangedItems AS

(

SELECT

id,

val,

ROW_NUMBER() OVER (ORDER BY id ASC) AS RowNum

FROM t1

WHERE id <= @MaxItem

)

SELECT

*,

(SELECT COUNT(*) FROM RangedItems) [TotalRows]

FROM RangedItems

WHERE RowNum BETWEEN @FirstItem AND @FirstItem + @NumItems - 1

END

go

create proc sp2

(

@FirstItem INT,

@NumItems INT,

@MaxItem INT

)

AS

BEGIN

WITH RangedItems AS

(

SELECT

id,

val,

ROW_NUMBER() OVER (ORDER BY id ASC) AS RowNum

FROM t1

WHERE id <= @MaxItem

)

SELECT

*

FROM RangedItems

WHERE RowNum BETWEEN @FirstItem AND @FirstItem + @NumItems - 1

END

go

set statistics time on

set statistics io on

go

exec sp1 1, 10, 100

--scan=2, logical reads=4

exec sp2 1, 10, 100

--scan=1, logical reads=2

Rob E-397965 SSC Rookie Points: 40 More actions · Answer 3

At one time I created a very inelegant solution using Tempdb. Where I had to select all the ID's and stuff them in a temp table. Then I selected the first and last record to get the ID's that I needed from the actual table. Both sorting and filtering where figured in here.

Anyway I created a much more elegant solution that uses ROW_Number. The nice thing here is that one SP can handle all grids in my case. Here is the SP:

CREATE PROCEDURE [dbo].[tsp_Grid_Paging_Sorting_Filtering]

(

@pageNum int

,@pageSize int

,@TableName nvarchar(100)

,@orderby varchar(max)

,@whereclause varchar(max)

)

AS

SET NOCOUNT ON

declare @lownum nvarchar(10)

declare @highnum nvarchar(10)

declare @sql nvarchar(4000)

set @lownum = convert(nvarchar(10), (@pagesize * (@pagenum - 1)))

set @highnum = convert(nvarchar(10), (@pagesize * @pagenum))

set @sql = 'select * from (

select row_number() over (order by ' + @orderby + ') as rownum

,*

from ' + @TableName + ' ' + @whereclause + '

) A WHERE RowNum > ' + @lownum + ' AND rownum @pagesize * (@pageNum - 1) AND rownum <= (@pagesize * @pageNum)

-- Execute the SQL query

EXEC sp_executesql @sql

-----------------------------------------END---------------------------------------

Here @TableName can also be the name of a view. Where the view can already have the columns selected.

Adrian Hains Ten Centuries Points: 1322 More actions · Answer 4

Ian,

I can tell you that sql does certainly not do a good job of pushing down constraints into the CTE based on joins outside of it.

I foolishly implemented some stuff with this thinking without testing and was caught off guard when I had some perf problems as content began to scale up.

I found that if you are going to use joins to do much rowset reduction/constraining then you need to manually push it down inside the CTE.

However it does in my experience do a good job of pushing down stuff constraints on the range of output from ROW_NUMBER(). That is to say if you do:

ROW_NUMBER() OVER (ORDER BY Id) AS [RowNumber]

inside your CTE, and do

WHERE RowNumber BETWEEN 1 AND 10

outside the CTE, it is similar to doing a TOP 10 inside the CTE.

Farrukh Shahzad SSC Journeyman Points: 83 More actions · Answer 5

Farrukh Shahzad

SSC Journeyman

Points: 83

August 29, 2007 at 7:51 am

#730104

Great Tip (Y)

JRodman SSC Enthusiast Points: 139 More actions · Answer 6

For one, you should never return all the columns you need from the CTE, there is no need to return anthing but the row's GUID or Identity column and the row number output. By doing this, you limit what the CTE has to pull. In your main SELECT you join to the same table using your CTE, and then pull all the columns you need. This way you are only pulling data from the page you want.

I did something similar like this for a pretty busy website using temp tables and before, and have since switched to this method once we upgraded to 2005 and I can say it scales pretty nicely.

The only tricky parts are sorting and getting total record counts. Sorting is made easy with CASE statements using the ROW_NUMBER() function. Getting the record count is a little tough because you can only refer to the CTE in the select statement immediately after the CTE definition. This leaves you with two choices, either add a sub-select that does a COUNT(*) from the CTE as the last column in your main select, or duplicating your CTE at a second query with a COUNT(*) instead. I would highly recommend the sub-select method since you dont query the data twice. So here is what you end up with:

;WITH [CTE] AS

(

SELECT

RecordID, -- your row identifier

ROW_NUMBER() OVER (ORDER BY [Sorting Column]) AS RowNumber

FROM

[Table]

WHERE

[Add your filters here]

)

SELECT

[Your fields to return list],

(SELECT COUNT(1) FROM [CTE]) AS TotalRecords

FROM

[CTE]

INNER JOIN [Table] ON [CTE].RecordID = [Table].RecordID

WHERE

[CTE].RowNumber BETWEEN [filter to what page you want here]

ORDER BY

[CTE].RowNumber

Adrian Hains Ten Centuries Points: 1322 More actions · Answer 7

JRodman,

I disagree with your statement of absolutes - that you should never return all columns from the CTE. See my post on page 2 of this thread - you should return all of the columns in the case that all columns are contained in the same covering index as the ROW_NUMBER() OVER (...) does its ordering by. Doing the join does an unnecessary revisit to the index in this case.

Run the following example and look at the plan - you will see that the approach you recommend does two index seeks whereas if you just pull the data from within the CTE there is only one index seek.

create table t1 (id int identity, val varchar(1));

create clustered index t1_idx_id on t1(id);

go

insert t1 values ('a');

go 100

create proc sp5

(

@FirstItem INT,

@NumItems INT,

@MaxItem INT

)

AS

BEGIN

WITH RangedItems AS

(

SELECT

id,

ROW_NUMBER() OVER (ORDER BY id ASC) AS RowNum

FROM t1

WHERE id <= @MaxItem

)

SELECT

ri.id,

t1.val,

ri.RowNum

FROM RangedItems AS ri INNER JOIN t1 ON ri.id = t1.id

WHERE RowNum BETWEEN @FirstItem AND @FirstItem + @NumItems - 1

END

go

create proc sp6

(

@FirstItem INT,

@NumItems INT,

@MaxItem INT

)

AS

BEGIN

WITH RangedItems AS

(

SELECT

id,

val,

ROW_NUMBER() OVER (ORDER BY id ASC) AS RowNum

FROM t1

WHERE id <= @MaxItem

)

SELECT

*

FROM RangedItems

WHERE RowNum BETWEEN @FirstItem AND @FirstItem + @NumItems - 1

END

go

exec sp5 1, 10, 100;

exec sp6 1, 10, 100;

go

JRodman SSC Enthusiast Points: 139 More actions · Answer 8

Here is the problem with that approach though... When you start getting into the 100's thousands and millinos of rows in the table, you are pulling down all of those rows through the CTE. By limiting what you return, there is less disk IO to retrieve that data, and the second index scan's cost is way less that the cost of retrieving that data and then filtering to the page you want. I know this for a fact because i have tested both scenarios on systems with millions of rows and this method scales better.

Adrian Hains Ten Centuries Points: 1322 More actions · Answer 9

JRodman,

I think the testing you have performed must have been in a different case than the one I was talking about. I'm only saying that the ideal approach is to put all columns into the CTE when they are ALL covered by the same index that you have your ROW_NUMBER over. In scenarios that are not this case, I completely agree (and use) the approach you mentioned.

I will post demonstration code that shows over a 10MM row table, my method performs better. The more rows you return in the outer query, the greater the gap in perf becomes.

When I returned 10 rows (Number BETWEEN 5000000 AND 5000010), my method was only slightly better. For 1000 rows (Number BETWEEN 5000000 AND 5001000), my method had about 10% fewer reads.

Most of the time I do not use this approach because usually all of the columns I need are not in the covering index. I only noticed this perf improvement last week when I did have a scenario that had this case and I noticed the plan doing two index seeks looked unnecessary.

Adrian Hains Ten Centuries Points: 1322 More actions · Answer 10

JRodman, here is the demonstration code I mentioned:

create table test_table (number int primary key clustered, payload varchar(100));

go

WITH

-- first CTE which returns 10 rows (0-9)

digits AS (

SELECT 0 as Number

UNION SELECT 1

UNION SELECT 2

UNION SELECT 3

UNION SELECT 4

UNION SELECT 5

UNION SELECT 6

UNION SELECT 7

UNION SELECT 8

UNION SELECT 9

)

-- second CTE which returns 10 million rows by using

-- a CROSS JOIN on the first CTE

, dig AS (

SELECT

(millions.Number * 1000000)

+ (hThousands.Number * 100000)

+ (tThousands.Number * 10000)

+ (thousands.Number * 1000)

+ (hundreds.Number * 100)

+ (tens.Number * 10)

+ ones.Number AS Number

FROM digits AS ones

CROSS JOIN digits AS tens

CROSS JOIN digits AS hundreds

CROSS JOIN digits AS thousands

CROSS JOIN digits AS tThousands

CROSS JOIN digits AS hThousands

CROSS JOIN digits as millions

)

INSERT test_table(number, payload)

SELECT number, 'zzzzzzzzzzzzzzzzzzzzzzzz'

FROM dig

go

--END TEST PREP

declare @temp1 table (number int, payload varchar(100), rownumber int);

WITH rangenumbers AS

(

SELECT

Number,

Payload,

ROW_NUMBER() OVER (ORDER BY Number) as RowNumber

FROM test_table

)

INSERT @temp1 (Number, Payload, RowNumber)

SELECT

Number,

Payload,

RowNumber

FROM rangenumbers

WHERE RowNumber BETWEEN 5000000 and 5001000 - 1

GO

declare @temp2 table (number int, payload varchar(100), rownumber int);

WITH rangenumbers AS

(

SELECT

Number,

ROW_NUMBER() OVER (ORDER BY Number) as RowNumber

FROM test_table

)

--just doing insert into to avoid return rows to client

INSERT @temp2 (Number, Payload, RowNumber)

SELECT

rn.Number,

tt.Payload,

rn.RowNumber

FROM rangenumbers as rn INNER JOIN test_table tt ON rn.Number = tt.Number

WHERE RowNumber BETWEEN 5000000 and 5001000 - 1

GO

JRodman SSC Enthusiast Points: 139 More actions · Answer 11

In this scenario where there is only one table involved, this probably is a better approach. Rarely is this ever the case for me, so I usually need joins to other tables to get columns needed in the main query, which is where the IO comes into play. Rather than doing those extra joins in the CTE, I do them in the main select because they are merely output and are not being searched.

It seems like there is a fine line that needs to be found when doing queries like this, is it less expensive to pull all the columns from the primary table through the CTE, or join back to it to get them later for only the rows in the page you want. I guess it really depends on what columns you need and whether you are using the for filtering in the CTE.

Joao Pimentel Grasshopper Points: 17 More actions · Answer 12

Tx for introducing me to CTE and RowID.

In my opinion, I would have to say that this paging is not usefull in a realistic scenario where information is changing.

Imagine that while paging 10 pages of Custumers, 2 more pages of custumers where registered... the information would not be coherent between pagings. Worst! one could not garantee that all costumers would be shown.

I have been thinking on this matter for a while, the only solution i found was using temporary tables or views...

In my opinion the user should be paging a resultSet that represents a moment in time.

Is there another option i am missing?

yves.mundorff SSC Rookie Points: 30 More actions · Answer 13

Good example. But it is of no use in an environment where you will face tables with 1 million or more records.

For the total amount of pages... just do a "select count(id) from table" first. Most datalayers do support multiple recordsets to be returned from the server.

yves.mundorff SSC Rookie Points: 30 More actions · Answer 14

To: Adrian Hains

Normaly I am only allowed to have indexes on the columns that need to be sorted. But usualy more columns are returned than an index(es) can cover.

In an olap environment the rowindex solution is perfect. there you can have as many indexes as your diskspaces allows.

Matt Myers-261405 Grasshopper Points: 13 More actions · Answer 15

Dan, regarding your question of how to get the total rows affected without using temp tables, the following seems to work for me. I have no idea how efficient this is.

You can use aggregate functions without the GROUP BY by using the OVER clause.

see the following link: http://www.sqlteam.com/article/sql-sever-2005-using-over-with-aggregate-functions

So I just put something like this in my stored procedure and no GROUP BY is required:

SELECT

*

FROM

(

SELECT

ROW_NUMBER() OVER (ORDER BY a.SearchScore DESC,b.OrderNo)AS RowNumber,

COUNT(a.OrderID) OVER (PARTITION BY '')AS TotalRows,

b.OrderNo,

b.OrderDate,