SQLServerCentral

SQL Server 2005 Paging – The Holy Grail


https://www.sqlservercentral.com/Forums/Topic672980.aspx

By R Michael - Tuesday, March 10, 2009 3:33 PM

Comments posted to this topic are about the item SQL Server 2005 Paging – The Holy Grail
By Toby Harman - Tuesday, March 10, 2009 5:41 PM

This is an interesting article, which left me scratching my (SQL 2000) head over the syntax. Once I had that straight and looked up ROW_NUMBER in the Books Online I saw almost exactly this example. Notwithstanding, I think this is a good article which showcases a solution.

Well done bringing this functionality into the spotlight, and thanks for the time taken to do so.
By SwePeso - Tuesday, March 10, 2009 7:21 PM

I created a TallyNumber table with only one column (ascending clustered primary key) and 2,000,001 records, ranging from 0 to 2,000,000.
First I tried your suggestion in the article, which gave me the correct records back


set statistics io on
set statistics time on

DECLARE @startRow INT ; SET @startrow = 50
;WITH cols
AS
(
SELECT number,
ROW_NUMBER() OVER(ORDER BY number) AS seq,
ROW_NUMBER() OVER(ORDER BY number DESC) AS totrows
FROM tallynumbers
)
SELECT number, totrows + seq -1 as TotRows
FROM cols
WHERE seq BETWEEN @startRow AND @startRow + 49
ORDER BY seq

set statistics time off
set statistics io off


and gave me these results


(50 row(s) affected)
Table 'TallyNumbers'. Scan count 1, logical reads 3225, physical reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0.

SQL Server Execution Times:
CPU time = 6107 ms, elapsed time = 6059 ms.


Setting totrows to constant 1


set statistics io on
set statistics time on

DECLARE @startRow INT ; SET @startrow = 50
;WITH cols
AS
(
SELECT number,
ROW_NUMBER() OVER(ORDER BY number) AS seq,
1 as totrows
FROM tallynumbers
)
SELECT number, totrows + seq -1 as TotRows
FROM cols
WHERE seq BETWEEN @startRow AND @startRow + 49
ORDER BY seq

set statistics time off
set statistics io off


gave me these results


(50 row(s) affected)
Table 'TallyNumbers'. Scan count 1, logical reads 3, physical reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.


So for a large table you will have to scan all records anyway with your suggestion.

Modifying your code to this


set statistics io on
set statistics time on

DECLARE @startRow INT ; SET @startrow = 50
;WITH cols
AS
(
SELECT number,
ROW_NUMBER() OVER(ORDER BY number) AS seq
FROM tallynumbers
)
SELECT number, (select count(*) from cols) as TotRows
FROM cols
WHERE seq BETWEEN @startRow AND @startRow + 49
ORDER BY seq

set statistics time off
set statistics io off


gave me these results


(50 row(s) affected)
Table 'TallyNumbers'. Scan count 4, logical reads 3250, physical reads 0.

SQL Server Execution Times:
CPU time = 360 ms, elapsed time = 191 ms.



So it's not all about reads. It's about "customer satisfaction". My last suggestion runs in less than 0.2 seconds with 3250 reads, and your suggestion runs in 6.1 seconds and 3225 reads, on my single-column 2,000,001 record table.

The reads only went up by a measly 0.78%, but time went down by a staggering 96.8% !

Doing this all over again, but using @StartRow = 5000 gave almost same time for your solution because all records are to be numbered. Same amount of reads too.


(50 row(s) affected)
Table 'TallyNumbers'. Scan count 1, logical reads 3225, physical reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0.

SQL Server Execution Times:
CPU time = 6017 ms, elapsed time = 5995 ms.


My last suggestion gave these results (for @Startrow = 5000)


(50 row(s) affected)
Table 'TallyNumbers'. Scan count 4, logical reads 3260, physical reads 0.

SQL Server Execution Times:
CPU time = 235 ms, elapsed time = 122 ms.


Which is only a minimal amount of more reads.

The suggestion increased the number of reads by 0.1% and decreased time by 98.0%.
By daninmanchester - Tuesday, March 10, 2009 11:00 PM

I published an FAQ some years ago which granted was a bit immature at the time but I still use the principal today in a more devloped and mature way. It could certainly be refined to make use of some of the advances in SQL server etc.

FAQ

The approach basically uses a mix of the database and DAL / BLL layers to process the paging set.

In the first instance it hits the database to return list of all the primary keys for the result set. This is a prety small and efficient dataset if you use int as your primary key and a hash table or similar.

The DAL/BLL layer then processes this pulling out the total number of records and the current page we want. It then pulls the required page records using the primary keys which is very efficient.

To make this even more efficient I cache the initial list of primary keys so when the user requests the next page, etc all I have to do is look for the relevant primary keys in the cached list and pull the records I want based on primary key. This means the orginal complex query is only fired once regardless of the number of pages and any future requests on the same dataset only pull records based on known primary keys.
By tymberwyld - Tuesday, March 10, 2009 9:11 PM

About the whole row count issue. I know this isn't bullet proof but SQL is already tracking the row counts in every table. You don't need to create any Triggers and keep your own "tally" tables. Now, of course if you're paging a View, you'll still have to get the row counts using one of your methodologies, however, you can use the following query for tables:

DECLARE @Object SysName
SET @Object = 'dbo.MyTable' -- Schema.ObjectName

-- Get RowCount for Object
-- NOTE: Sometimes there are multiple rows returned when a Table has many indexes,
-- however, each index contains the same value for the row count
SELECT TOP 1 P.Rows
FROM sys.partitions P
INNER JOIN sys.indexes I ON (P.object_id = I.object_id) AND (P.index_id = I.index_id)
INNER JOIN sys.objects O ON (P.object_id = O.object_id)
WHERE (
-- O.type IN('S', 'U')
-- AND
(I.type IN(0,1))
)
AND (
O.name = PARSENAME(@Object, 1)
AND O.[schema_id] = IsNull(SCHEMA_ID(PARSENAME(@Object, 2)), O.[schema_id])
)
ORDER BY O.name

You can also do the same in SQL 2000 by querying the dbo.sysindexes.
By SwePeso - Tuesday, March 10, 2009 8:25 PM

My best attempt this far is to have a trigger on tallynumbers table and report the number of records in a CountingTable.

That gave me a total of only 14 reads and 4 ms in runtime! And this is for @StartRow = 5000


set statistics io on
set statistics time on

DECLARE @startRow INT ; SET @startrow = 5000
declare @items int
select @items = number From CountingTable where table_name = 'tallynumbers'

SELECT Number,
@items
FROM (
SELECT TOP(50)
Number
FROM (
SELECT TOP(@startrow + 49)
Number
FROM TallyNumbers
ORDER BY Number
) AS d
ORDER BY Number DESC
) AS q
ORDER BY Number

set statistics time off
set statistics io off


Results were


(1 row(s) affected)
Table 'CountingTable'. Scan count 1, logical reads 1, physical reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

(50 row(s) affected)
Table 'TallyNumbers'. Scan count 1, logical reads 13, physical reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 4 ms.



4 ms compared to 5995 ms (decrease by 99.9%) and 14 reads compared to 3225 is a decrease by 99.6%.

What can this tell us?

1. Have number of total records stored in some management table and maintained by a trigger
2. If possible, have the "numbering" work done by possible same trigger and have a "sequence" column in the source table if paging is done a lot of times. Or use the "multiple order bys" principle.
By TheSQLGuru - Tuesday, March 10, 2009 10:31 PM

Nice article.

I have been doing similar coding for a few years now with great success. My typical situation gets just the primary key fields from many tables necessary to filter/retrieve the results, then joining those keys back to their original tables to get the required output.
By prasad.puranik - Tuesday, March 10, 2009 11:15 PM

nice solution. and very nicely written too. looking forward to read more from you.
By vlado_work - Tuesday, March 10, 2009 9:21 PM

Based on my experience with a big table, the best approach is to get a Total Rows number in a separate call:


set statistics io on
set statistics time on

DECLARE @TotRows AS INT
DECLARE @startRow INT ; SET @startrow = 50
;WITH cols
AS
(
SELECT table_name, column_name,
ROW_NUMBER() OVER(ORDER BY table_name, column_name) AS seq
FROM [INFORMATION_SCHEMA].columns
)
--select * from cols
SELECT table_name, column_name
FROM cols
WHERE seq BETWEEN @startRow AND @startRow + 49
ORDER BY seq

select @TotRows=count(*)
FROM [INFORMATION_SCHEMA].columns

--SELECT @TotRows

set statistics time off
set statistics io off

By daninmanchester - Wednesday, March 11, 2009 4:57 AM

hhcosmin, I agree with what you say the caching policy needs to be defined and suitable. It can be dynamic these days and linked to the data but a lot of the time for what I do I can cache a set of ID’s for the duration so it becomes very cheap.

One thing that frustrates me about some sites is where the data does change and paging just becomes messy and isn’t always the best way to present it. For example page 1 becomes page 2 because the dataset has changed and so when I click next page I get half of what I was just looking at on page 1 and the first half of what was page 2 etc. Or you go back page and what was page 2 isn’t page 2 anymore.

I’m not quite sure I understand how the last page becomes very expensive using this approach as once the initial query has been performed anything else is PK based. Unless you are talking about having to read from the beginning of say 1000000 PK’s to the end as appose to the current page? Which is true, that would be a nightmare scenario!

So what you say about very large datasets is true and I would typically limit the results set to 1000 or at the extreme 5000 records using the SQL command. If the user wants to go beyond this it does make it complicated but I can’t remember the last time I went beyond page three of any search.

Typically in my experience if a user gets more than a handful of pages they are likely to want to refine their search or maybe sort it to get the records they want in the top pages. Enticing a user down this road is pretty easy….. “you got over 1000 records … try refining your search below!” then give them a bunch of useful refinements.

Obviously every solution has its pros and cons and I wouldn’t present any one as the holy grail as every application has its own requirements.
By André Cardoso - Wednesday, March 11, 2009 5:45 AM

Another alternative that I currently have (I haven't measured/compared it) is to use a COMPUTE clause (if the client can cope with multiple result sets).
select a, b, c
from tableX
where a = 1
compute count(a)

The best part is that it works in .NET 1.1 and SQL 2000 (most of the systems I support) Smile

Re-checking: the compute that I used was for obtaining the total of the result set (that was paginated). For pagination it's not so interesting because the number of rows retrieved is controlled by the client (the count should be for the whole table/view).

Select top 200 a, b, c
from tableX
where a = 1
compute sum(c)
By jcraddock - Wednesday, March 11, 2009 7:27 AM

These are all well and good, but I rarely need such results in a T-SQL window. Where I need them is in an application where I want to paginate using ASP.net etc.

In those cases, the best I can come up with is two calls. To be clear - I use dynamic sql to build all queries in my list windows. There I need the total count and the page of data required.
By Charles Cherry - Wednesday, March 11, 2009 1:53 AM

I just read today that ADO.Net Data Services 1.5 CTP (coming out in a few months) will have built-in support for server-side data paging.

http://blogs.msdn.com/astoriateam/archive/2009/03/01/announcing-ado-net-data-services-v1-5-ctp1.aspx
By StarNamer - Tuesday, March 10, 2009 11:59 PM

I tried this and found that it depends on the indexes avaialble.

Using a 1.2 million record table as follows

CREATE TABLE [dbo].[tblInvoice](
[SpecialGLIndicator] [varchar](50) NOT NULL,
[DocumentNumber] [bigint] NOT NULL,
[PostingDate] [smalldatetime] NOT NULL,
[DocumentDate] [smalldatetime] NOT NULL,
[EnteredOn] [smalldatetime] NOT NULL,
[DocTypeID] [int] NOT NULL,
[Period] [varchar](4) NOT NULL,
[PKID] [int] NOT NULL,
[Amount] [money] NOT NULL,
[UserID] [int] NOT NULL,
[TranCodeID] [int] NOT NULL,
[CompanyID] [int] NOT NULL,
[Month] [smalldatetime] NOT NULL,
[WithPO] [tinyint] NOT NULL,
[Lines] [int] NOT NULL,
[docs] [int] NOT NULL,
[TransCode] [varchar](10) NOT NULL,
CONSTRAINT [PK_tblInvoice] PRIMARY KEY CLUSTERED
(
[DocumentNumber] ASC,
[CompanyID] ASC,
[Month] ASC
) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_27754] ON [dbo].[tblInvoice]
(
[CompanyID] ASC,
[Month] ASC
)
INCLUDE ( [docs])
GO
CREATE NONCLUSTERED INDEX [IX_27777] ON [dbo].[tblInvoice]
(
[CompanyID] ASC,
[Month] ASC,
[WithPO] ASC
)
INCLUDE ( [UserID], [docs])
GO
CREATE NONCLUSTERED INDEX [IX_27779] ON [dbo].[tblInvoice]
(
[Month] ASC,
[WithPO] ASC
)
INCLUDE ( [UserID], [CompanyID], [docs])
GO
CREATE NONCLUSTERED INDEX [IX_31155] ON [dbo].[tblInvoice]
(
[UserID] ASC,
[Month] ASC
)
INCLUDE ( [CompanyID], [docs])
GO
CREATE NONCLUSTERED INDEX [IX_tblInvoice_PostingDate] ON [dbo].[tblInvoice]
(
[PostingDate] ASC
)
GO



I found that the 2-bite (select count(*) then select) method was most efficient.

-- select count
Table 'tblInvoice'. Scan count 1, logical reads 3855, physical reads 3, read-ahead reads 3851, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 156 ms, elapsed time = 810 ms.
-- select
(10 row(s) affected)
Table 'tblInvoice'. Scan count 1, logical reads 41, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 1938 ms, elapsed time = 688 ms.

-- double row_number
(10 row(s) affected)
Table 'tblInvoice'. Scan count 4, logical reads 4251, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 11406 ms, elapsed time = 6212 ms.


However, if I add an index as follows.

CREATE NONCLUSTERED INDEX [IX_test1] ON [dbo].[tblInvoice]
(
[CompanyID] ASC,
[Month] DESC,
[DocumentNumber] ASC
)



The IO improves.

Table 'tblInvoice'. Scan count 1, logical reads 2832, physical reads 0, read-ahead reads 2821, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 156 ms, elapsed time = 350 ms.

(10 row(s) affected)
Table 'tblInvoice'. Scan count 1, logical reads 239, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 94 ms, elapsed time = 103 ms.

(10 row(s) affected)
Table 'tblInvoice'. Scan count 1, logical reads 2832, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 8345 ms, elapsed time = 4556 ms.


Unfortunately, the CPU and Elapsed times are still worse. So it seems it's not a panacea for all ills.

Still a very interesting technique which is worth trying. Thanks for the article.
By Scott Monnig - Wednesday, March 11, 2009 12:04 AM

Gang,

I suggest that Peso's first response at 9:21 AM where a simple count(*) of the cols table be seriously considered.

This approach, which we employ as well, serves the need well when you consider that you have already queried the table/tables in question in the "cols" portion. No need to go back again just for the count.

Consider that if you have developed a robust data layer in an application where ad-hoc querying is supported, the Row_Number() function may take multiple columns into consideration. The approach presented requires you to reverse the sort direction of each criteria. Peso's (and our employed approach) does not require you to do that. Simply query as you are required to in the "cols" or similar CTE expression and Count(*) that for your total rows which match your criteria.
By Dale Allen - Wednesday, March 11, 2009 4:08 AM

I agree with Peso, tried the holy grail method and it took my procedure from 7 (using the count(*) method) to 12 seconds to execute. Those 5 extra seconds would unfortunately leave unhappy customer, nice idea though!
By werner.broser - Wednesday, March 11, 2009 12:16 AM

Nice method, but works only for unique fields!
By hhcosmin - Tuesday, March 10, 2009 11:31 PM

well... we use the same approach but i do not fully agree with it. you say you execute the first complex query only once and then page on it. what if there appears new data in the DB? i guess the set would be stale. i guess this should be coupled with some data retention policy to make the set of ids expire after a certain time. the second problem is that retrieving the first pages is very cheap in term of cost. the price gets maximum when you reach the last page. retrieving the last page with a query is almost as expensive as returning the whole set iof ids because the server must skip the first n records. the majorty of users don't even bother to go beyond the first few pages and that means that the complex query that you executed is mainly unused. in our case one can return tens of thousands of ids or even milions but the user is mostly interested by the first page mostly. so when you have large sets of data getting all the ids is useless mostly.

another thing to note is that retrieving the count of records of a very complex query costs as much as getting the list of ids. so if the user is not interested to know the exact count (which could be 100 or 10 millions) one could retrieve the first thousand of ids (depending on the needs) ... page through them and say the query resulted in more than 1000 hits. the user is never interested by more than a few tens or hundreds of records anyway. this could also be coupled with a retention policy for further improvements. i believe this is the best solution when one could potentially return a large number of records (more than 5000 let's say).

i have another amendament. let's say the user.. in a very special case wants to go beyond the 50-100 pages that you retrieved. you could tell him that it's query retieved more than 5000 records and let him naviagte freely to the next page beyond the first 50-100 pages, retrieving the records from the DB. this adds some complexity to the BAL but it could work.
By pfwojnar - Tuesday, March 10, 2009 11:32 PM

I've been using one of these two methods. The Optimizer seems to be smart enough to know not to do the COUNT multiple times. However, I've never done an analysis of this method quite as deep as the one you've done here. Great job by the way, I may need to change what I'm doing.

DECLARE @startRow INT ; SET @startrow = 50

;WITH cols
AS
(
SELECT table_name, column_name,
ROW_NUMBER() OVER(ORDER BY table_name, column_name) AS seq
FROM [INFORMATION_SCHEMA].columns
)
SELECT table_name, column_name, (SELECT COUNT(*) FROM cols) AS TotRows
FROM cols
WHERE seq BETWEEN @startRow AND @startRow + 49
ORDER BY seq

For the other way, replace (SELECT COUNT(*) FROM cols) AS TotRows with (SELECT MAX(seq) FROM cols) AS TotRows.
By johanp.za - Wednesday, March 11, 2009 1:30 AM

Maybe not the best but I've been using this for years.
Keen to try the methods posted in the article.


USE [*********]
GO
SET ANSI_NULLS OFF
GO
SET QUOTED_IDENTIFIER OFF
GO
ALTER PROCEDURE [dbo].[ReturnRows]
(
@SQL nvarchar(4000),
@Page int,
@RecsPerPage int,
@ID varchar(255),
@Sort varchar(255)
)
AS
DECLARE @Str nvarchar(4000)
SET @Str='Select TOP '+CAST(@RecsPerPage AS varchar(20))+' * FROM ('+@SQL+') T WHERE T.'+@ID+' NOT IN
(select TOP '+CAST((@RecsPerPage*(@Page-1)) AS varchar(20))+' '+@ID+' FROM ('+@SQL+') T9 ORDER BY '+@Sort+') order by '+@Sort
PRINT @Str
EXEC sp_ExecuteSql @Str

By ryan_ternier - Wednesday, March 11, 2009 1:46 AM

Thanks for the article. I'm working on a Query that returns a list of users, but I've never been able to do it with the ROW_NUMBER() because of the orderby. Here's what i have that works - SQL 2000 way. Is there any better way of writing this?

ALTER PROCEDURE [dbo].[procSelectUsers] (
@p_vchLetterFilter VARCHAR(1),@p_vchOrderBy VARCHAR(20), @p_vchSortDirection VARCHAR(20), @p_vchQuickSearch VARCHAR(50), @p_intPage int, @p_intPageSize int
)

AS
BEGIN
DECLARE @TempTable TABLE(RowID INT IDENTITY, UserID int)

INSERT INTO @TempTable(UserID)
SELECT intUserID
FROM tblUsers U, tblSecurityRoles SR, tblWorkgroups W
WHERE U.sysDateDeleted = 0 AND SR.intSecurityRoleID = U.intSecurityRoleID
AND W.intWorkgroupID = U.intWorkgroupID
AND vchLastName LIKE @p_vchLetterFilter + '%'
AND vchLastName LIKE '%' + @p_vchQuickSearch + '%'
ORDER BY
CASE
WHEN @p_vchOrderBy = 'last' AND @p_vchSortDirection = 'ASC' THEN (RANK() OVER (ORDER BY vchLastName ASC, vchFirstName ASC))
WHEN @p_vchOrderBy = 'last' AND @p_vchSortDirection = 'DESC' THEN (RANK() OVER (ORDER BY vchLastName DESC, vchFirstName DESC))
WHEN @p_vchOrderBy = 'first' AND @p_vchSortDirection = 'ASC' THEN (RANK() OVER (ORDER BY vchFirstName ASC, vchLastName ASC))
WHEN @p_vchOrderBy = 'first' AND @p_vchSortDirection = 'DESC' THEN (RANK() OVER (ORDER BY vchFirstName DESC, vchLastName DESC))
WHEN @p_vchOrderBy = 'username' AND @p_vchSortDirection = 'ASC' THEN (RANK() OVER (ORDER BY vchUserName DESC))
WHEN @p_vchOrderBy = 'username' AND @p_vchSortDirection = 'DESC' THEN (RANK() OVER (ORDER BY vchUserName ASC))
WHEN @p_vchOrderBy = 'security' AND @p_vchSortDirection = 'ASC' THEN (RANK() OVER (ORDER BY vchRoleTitle DESC, vchLastName DESC, vchFirstName DESC))
WHEN @p_vchOrderBy = 'security' AND @p_vchSortDirection = 'DESC' THEN (RANK() OVER (ORDER BY vchRoleTitle ASC, vchLastName ASC, vchFirstName ASC))
WHEN @p_vchOrderBy = 'workgroup' AND @p_vchSortDirection = 'ASC' THEN (RANK() OVER (ORDER BY vchDescription DESC, vchLastName DESC, vchFirstName DESC))
WHEN @p_vchOrderBy = 'workgroup' AND @p_vchSortDirection = 'DESC' THEN (RANK() OVER (ORDER BY vchDescription ASC, vchLastName ASC, vchFirstName ASC))
END

SELECT UserID, vchFirstName, vchLastName,vchUserName, vchPassword,
vchEmail, IsNull(U.intSecurityRoleID,0) as intSecurityRoleID,
IsNull(U.intWorkgroupID,0) as intWorkgroupID, vchInitials, vchBadgeNumber,
SR.vchRoleTitle AS vchSecurityRole, W.vchDescription AS vchWorkgroupDesc
FROM @TempTable T, tblUsers U, tblSecurityRoles SR, tblWorkgroups W
WHERE U.sysDateDeleted = 0 AND SR.intSecurityRoleID = U.intSecurityRoleID
AND T.UserID = U.intUserID
AND W.intWorkgroupID = U.intWorkgroupID
AND vchLastName LIKE @p_vchLetterFilter + '%'
AND vchLastName LIKE '%' + @p_vchQuickSearch + '%'
AND t.RowID BETWEEN ((@p_intPage - 1) * @p_intPageSize + 1) AND (@p_intPage * @p_intPageSize)

END /* CREATE PROCEDURE procSelectAllUsers */





This is what I had before, but look at how it printed out the results - so it wouldn't work the way I needed:


ALTER PROCEDURE [dbo].[procSelectUsers] (
@p_vchLetterFilter VARCHAR(1),@p_vchOrderBy VARCHAR(20), @p_vchSortDirection VARCHAR(20), @p_vchQuickSearch VARCHAR(50), @p_intPage int, @p_intPageSize int
)

AS
BEGIN
DECLARE @startRowIndex INT
SET @startRowIndex = (@p_intPage * @p_intPageSize) + 1;


WITH Users as (
SELECT ROW_NUMBER() OVER (ORDER BY intUserID ASC) as Row,
intUserID, vchFirstName, vchLastName,vchUserName, vchPassword,
vchEmail, IsNull(U.intSecurityRoleID,0) as intSecurityRoleID,
IsNull(U.intWorkgroupID,0) as intWorkgroupID, vchInitials, vchBadgeNumber,
SR.vchRoleTitle AS vchSecurityRole, W.vchDescription AS vchWorkgroupDesc
FROM tblUsers U, tblSecurityRoles SR, tblWorkgroups W
WHERE U.sysDateDeleted = 0 AND SR.intSecurityRoleID = U.intSecurityRoleID
AND W.intWorkgroupID = U.intWorkgroupID
AND vchLastName LIKE @p_vchLetterFilter + '%'
AND vchLastName LIKE '%' + @p_vchQuickSearch + '%'
)



SELECT
intUserID, vchFirstName, vchLastName,vchUserName, vchPassword,
vchEmail, intSecurityRoleID, intWorkgroupID, vchInitials, vchBadgeNumber,
vchSecurityRole, vchWorkgroupDesc
FROM Users
WHERE Row BETWEEN @startRowIndex AND @startRowIndex + @p_intPageSize - 1
ORDER BY
CASE
WHEN @p_vchOrderBy = 'last' AND @p_vchSortDirection = 'ASC' THEN (RANK() OVER (ORDER BY vchLastName ASC, vchFirstName ASC))
WHEN @p_vchOrderBy = 'last' AND @p_vchSortDirection = 'DESC' THEN (RANK() OVER (ORDER BY vchLastName DESC, vchFirstName DESC))
WHEN @p_vchOrderBy = 'first' AND @p_vchSortDirection = 'ASC' THEN (RANK() OVER (ORDER BY vchFirstName ASC, vchLastName ASC))
WHEN @p_vchOrderBy = 'first' AND @p_vchSortDirection = 'DESC' THEN (RANK() OVER (ORDER BY vchFirstName DESC, vchLastName DESC))
WHEN @p_vchOrderBy = 'username' AND @p_vchSortDirection = 'ASC' THEN (RANK() OVER (ORDER BY vchUserName DESC))
WHEN @p_vchOrderBy = 'username' AND @p_vchSortDirection = 'DESC' THEN (RANK() OVER (ORDER BY vchUserName ASC))
WHEN @p_vchOrderBy = 'security' AND @p_vchSortDirection = 'ASC' THEN (RANK() OVER (ORDER BY vchSecurityRole DESC, vchLastName DESC, vchFirstName DESC))
WHEN @p_vchOrderBy = 'security' AND @p_vchSortDirection = 'DESC' THEN (RANK() OVER (ORDER BY vchSecurityRole ASC, vchLastName ASC, vchFirstName ASC))
WHEN @p_vchOrderBy = 'workgroup' AND @p_vchSortDirection = 'ASC' THEN (RANK() OVER (ORDER BY vchWorkgroupDesc DESC, vchLastName DESC, vchFirstName DESC))
WHEN @p_vchOrderBy = 'workgroup' AND @p_vchSortDirection = 'DESC' THEN (RANK() OVER (ORDER BY vchWorkgroupDesc ASC, vchLastName ASC, vchFirstName ASC))
END

END /* CREATE PROCEDURE procSelectAllUsers */



EXEC [procSelectUsers] '', '', 'ASC', '', 0, 6

A1
A2
B1
B2
C1
C2


EXEC [procSelectUsers] '', '', 'ASC', '', 1, 6

A3
A4
B3
B4
C3
C4


Results show an example of the output - not the actual output
By Jeff Moden - Wednesday, March 11, 2009 5:18 PM

First of all... very well written article. Nice to see examples like that with some performance stats. Unfortunately, the most important performance for the user (duration) wasn't measured and the test data was kinda shy on row count.

It turns out that the "2 Bite" method is very fast provided that, as someone else already pointed out, that you use the system tables to give you a leg up. And, if you have a clustered index on the table to keep it from being just a heap, the first "Bite" requires no joins... that means you can have the best of both worlds... performance and fewer reads.

Just to give everyone some common data to test on, here's a million rows of data to play with... only takes about 41 seconds including the index builds...

--===== Create and populate a 1,000,000 row test table.
-- Column "RowNum" has a range of 1 to 1,000,000 unique numbers
-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings
-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'
-- for all rows.
-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)
-- Jeff Moden
SELECT TOP 1000000
RowNum = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
SomeHex12 = RIGHT(NEWID(),12)
INTO dbo.JBMTest
FROM Master.dbo.SysColumns t1
CROSS JOIN Master.dbo.SysColumns t2

--===== A table is not properly formed unless a Primary Key has been assigned
ALTER TABLE dbo.JBMTest
ADD PRIMARY KEY CLUSTERED (RowNum)

--===== Create and index for the lookups we expect
CREATE INDEX IX_JBMTest_SomeInt_SomeLetters2
ON dbo.JBMTest (SomeInt,SomeLetters2)



... and here's some code that test two newer methods, the original "Holy Grail", and Peso's method. If someone else wants to test another method, please add it to this code and run the whole thing because machine speeds vary and it would be nice to have it all together to compare to...

--===== Define the starting row and page size
DECLARE @StartRow INT ; SET @StartRow = 900000
DECLARE @PageSize INT ; SET @PageSize = 50

PRINT '--============================================================================='
PRINT '-- The "Holy Grail" method'
PRINT '--============================================================================='

--===== Turn on the timers
SET STATISTICS IO ON
SET STATISTICS TIME ON

--===== The "Holy Grail" method of getting a page of info
;WITH
cteCols AS
(
SELECT SomeInt, SomeLetters2,
ROW_NUMBER() OVER(ORDER BY SomeInt, SomeLetters2) AS Seq,
ROW_NUMBER() OVER(ORDER BY SomeInt DESC, SomeLetters2 DESC) AS TotRows
FROM dbo.JBMTest
)
SELECT Seq, SomeInt, SomeLetters2, TotRows + Seq - 1 AS TotRows
FROM cteCols
WHERE Seq BETWEEN @StartRow AND @StartRow + @PageSize - 1
ORDER BY Seq

--===== Turn off the timers
SET STATISTICS TIME OFF
SET STATISTICS IO OFF

PRINT '--============================================================================='
PRINT '-- The "No RBAR/No Join" method'
PRINT '--============================================================================='

--===== Turn on the timers
SET STATISTICS IO ON
SET STATISTICS TIME ON

--===== The "No RBAR/No Join" method
;WITH
cteCols AS
(
SELECT NULL AS SomeInt, NULL AS SomeLetters2, 0 AS Seq, Rows AS TotRows
FROM sys.Partitions
WHERE Object_ID = OBJECT_ID('dbo.JBMTest')
AND Index_ID = 1
UNION ALL --------------------------------------------------------------------
SELECT SomeInt, SomeLetters2,
ROW_NUMBER() OVER(ORDER BY SomeInt, SomeLetters2) AS Seq,
NULL AS TotRows
FROM dbo.JBMTest
)
SELECT Seq, SomeInt, SomeLetters2, TotRows
FROM cteCols
WHERE Seq BETWEEN @StartRow AND @StartRow + @PageSize - 1
OR Seq = 0
ORDER BY Seq

--===== Turn off the timers
SET STATISTICS TIME OFF
SET STATISTICS IO OFF

PRINT '--============================================================================='
PRINT '-- A different No Join method'
PRINT '--============================================================================='

--===== Turn on the timers
SET STATISTICS IO ON
SET STATISTICS TIME ON

--===== A different No Join method
;WITH
cteCols AS
(
SELECT SomeInt, SomeLetters2,
ROW_NUMBER() OVER(ORDER BY SomeInt, SomeLetters2) AS Seq,
NULL AS TotRows
FROM dbo.JBMTest
)
SELECT Seq, SomeInt, SomeLetters2, (SELECT Rows
FROM sys.Partitions
WHERE Object_ID = OBJECT_ID('dbo.JBMTest')
AND Index_ID = 1) AS TotRows
FROM cteCols
WHERE Seq BETWEEN @StartRow AND @StartRow + @PageSize - 1
OR Seq = 0
ORDER BY Seq

--===== Turn off the timers
SET STATISTICS TIME OFF
SET STATISTICS IO OFF

PRINT '--============================================================================='
PRINT '-- Peso''s Embedded "2 Bite" method'
PRINT '--============================================================================='
--===== Turn on the timers
SET STATISTICS IO ON
SET STATISTICS TIME ON

--===== Embedded "2 Bite" method
;WITH
cteCols AS
(
SELECT SomeInt, SomeLetters2,
ROW_NUMBER() OVER(ORDER BY SomeInt, SomeLetters2) AS Seq,
NULL AS TotRows
FROM dbo.JBMTest
)
SELECT Seq, SomeInt, SomeLetters2, (SELECT COUNT(*) FROM dbo.JBMTest) AS TotRows
FROM cteCols
WHERE Seq BETWEEN @StartRow AND @StartRow + @PageSize - 1
OR Seq = 0
ORDER BY Seq

--===== Turn off the timers
SET STATISTICS TIME OFF
SET STATISTICS IO OFF



Here's how that played out on my humble single 1.8Ghz/1G Ram desktop running 2k5 sp2....

--=============================================================================
-- The "Holy Grail" method
--=============================================================================

(50 row(s) affected)
Table 'JBMTest'. Scan count 1, logical reads 1985, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 7594 ms, elapsed time = 9261 ms.
--=============================================================================
-- The "No RBAR/No Join" method
--=============================================================================

(51 row(s) affected)
Table 'JBMTest'. Scan count 1, logical reads 1985, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysrowsets'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 1265 ms, elapsed time = 1314 ms.
--=============================================================================
-- A different No Join method
--=============================================================================

(50 row(s) affected)
Table 'sysrowsets'. Scan count 50, logical reads 100, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'JBMTest'. Scan count 1, logical reads 1985, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 1157 ms, elapsed time = 1383 ms.
--=============================================================================
-- Peso's Embedded "2 Bite" method
--=============================================================================

(50 row(s) affected)
Table 'JBMTest'. Scan count 2, logical reads 3970, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 1406 ms, elapsed time = 1466 ms.
By Jeff Moden - Thursday, March 12, 2009 2:10 PM

C. Westra (3/12/2009)
I'm afraid differences in resources are partly induced by Sql Server caching (intermediate) results between the first and latter queries.
If you use DBCC DROPCLEANBUFFERS between the queries, caches are emptied and differences in time and physical reads may better reflect actual use of resources.


Heh... or do what I did... just run the code more than once so you can see that even in the face of caching, some solutions just shouldn't be used. Wink
By Jeff Moden - Thursday, March 12, 2009 2:26 PM

Jeff Moden (3/12/2009)
C. Westra (3/12/2009)
I'm afraid differences in resources are partly induced by Sql Server caching (intermediate) results between the first and latter queries.
If you use DBCC DROPCLEANBUFFERS between the queries, caches are emptied and differences in time and physical reads may better reflect actual use of resources.


Heh... or do what I did... just run the code more than once so you can see that even in the face of caching, some solutions just shouldn't be used. Wink


... and, just to drive the point home, here's the code with both cache clearing commands in it just before each test run AND I put the "Holy Grail" code as the last code to give it the absolute best chance AND I listed the results from the 1st run after I rebuild the table. Caching has nothing to do with how bad the "Holy Grail" method takes a beating...


--===== Define the starting row and page size
DECLARE @StartRow INT ; SET @StartRow = 900000
DECLARE @PageSize INT ; SET @PageSize = 50

PRINT '--============================================================================='
PRINT '-- The "No RBAR/No Join" method'
PRINT '--============================================================================='
--===== Clear the guns
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

--===== Turn on the timers
SET STATISTICS IO ON
SET STATISTICS TIME ON

--===== The "No RBAR/No Join" method
;WITH
cteCols AS
(
SELECT NULL AS SomeInt, NULL AS SomeLetters2, 0 AS Seq, Rows AS TotRows
FROM sys.Partitions
WHERE Object_ID = OBJECT_ID('dbo.JBMTest')
AND Index_ID = 1
UNION ALL --------------------------------------------------------------------
SELECT SomeInt, SomeLetters2,
ROW_NUMBER() OVER(ORDER BY SomeInt, SomeLetters2) AS Seq,
NULL AS TotRows
FROM dbo.JBMTest
)
SELECT Seq, SomeInt, SomeLetters2, TotRows
FROM cteCols
WHERE Seq BETWEEN @StartRow AND @StartRow + @PageSize - 1
OR Seq = 0
ORDER BY Seq

--===== Turn off the timers
SET STATISTICS TIME OFF
SET STATISTICS IO OFF

PRINT '--============================================================================='
PRINT '-- A different No Join method'
PRINT '--============================================================================='
--===== Clear the guns
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

--===== Turn on the timers
SET STATISTICS IO ON
SET STATISTICS TIME ON

--===== A different No Join method
;WITH
cteCols AS
(
SELECT SomeInt, SomeLetters2,
ROW_NUMBER() OVER(ORDER BY SomeInt, SomeLetters2) AS Seq,
NULL AS TotRows
FROM dbo.JBMTest
)
SELECT Seq, SomeInt, SomeLetters2, (SELECT Rows
FROM sys.Partitions
WHERE Object_ID = OBJECT_ID('dbo.JBMTest')
AND Index_ID = 1) AS TotRows
FROM cteCols
WHERE Seq BETWEEN @StartRow AND @StartRow + @PageSize - 1
OR Seq = 0
ORDER BY Seq

--===== Turn off the timers
SET STATISTICS TIME OFF
SET STATISTICS IO OFF

PRINT '--============================================================================='
PRINT '-- Peso''s Embedded "2 Bite" method'
PRINT '--============================================================================='
--===== Clear the guns
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

--===== Turn on the timers
SET STATISTICS IO ON
SET STATISTICS TIME ON

--===== Embedded "2 Bite" method
;WITH
cteCols AS
(
SELECT SomeInt, SomeLetters2,
ROW_NUMBER() OVER(ORDER BY SomeInt, SomeLetters2) AS Seq,
NULL AS TotRows
FROM dbo.JBMTest
)
SELECT Seq, SomeInt, SomeLetters2, (SELECT COUNT(*) FROM dbo.JBMTest) AS TotRows
FROM cteCols
WHERE Seq BETWEEN @StartRow AND @StartRow + @PageSize - 1
OR Seq = 0
ORDER BY Seq

--===== Turn off the timers
SET STATISTICS TIME OFF
SET STATISTICS IO OFF

PRINT '--============================================================================='
PRINT '-- The "Holy Grail" method'
PRINT '--============================================================================='
--===== Clear the guns
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

--===== Turn on the timers
SET STATISTICS IO ON
SET STATISTICS TIME ON

--===== The "Holy Grail" method of getting a page of info
;WITH
cteCols AS
(
SELECT SomeInt, SomeLetters2,
ROW_NUMBER() OVER(ORDER BY SomeInt, SomeLetters2) AS Seq,
ROW_NUMBER() OVER(ORDER BY SomeInt DESC, SomeLetters2 DESC) AS TotRows
FROM dbo.JBMTest
)
SELECT Seq, SomeInt, SomeLetters2, TotRows + Seq - 1 AS TotRows
FROM cteCols
WHERE Seq BETWEEN @StartRow AND @StartRow + @PageSize - 1
ORDER BY Seq

--===== Turn off the timers
SET STATISTICS TIME OFF
SET STATISTICS IO OFF



Here're the results...

--=============================================================================
-- The "No RBAR/No Join" method
--=============================================================================
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

(51 row(s) affected)
Table 'JBMTest'. Scan count 1, logical reads 1985, physical reads 0, read-ahead reads 1, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysrowsets'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 1391 ms, elapsed time = 1653 ms.
--=============================================================================
-- A different No Join method
--=============================================================================
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

(50 row(s) affected)
Table 'sysrowsets'. Scan count 50, logical reads 100, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'JBMTest'. Scan count 1, logical reads 1985, physical reads 0, read-ahead reads 1, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 1203 ms, elapsed time = 1371 ms.
--=============================================================================
-- Peso's Embedded "2 Bite" method
--=============================================================================
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

(50 row(s) affected)
Table 'JBMTest'. Scan count 2, logical reads 3970, physical reads 0, read-ahead reads 1, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 1453 ms, elapsed time = 2253 ms.
--=============================================================================
-- The "Holy Grail" method
--=============================================================================
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

(50 row(s) affected)
Table 'JBMTest'. Scan count 1, logical reads 1985, physical reads 0, read-ahead reads 1, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 7859 ms, elapsed time = 12073 ms.
By Jeff Moden - Friday, March 13, 2009 3:38 PM

For my ends, the 'holy grail' approach is absolutely the fastest and most efficient technique


Heh... sure... 7 to 8 seconds compared to just over a second. I'm sure your customers will think the same thing while they're waiting. Wink
By Jeff Moden - Friday, March 20, 2009 9:00 AM

Robert Cary (3/14/2009)
Jeff Moden (3/13/2009)

Heh... sure... 7 to 8 seconds compared to just over a second. I'm sure your customers will think the same thing while they're waiting. Wink


LOL Thankfully the queries being paged aren't quite as large as those you used to give my technique a beating. Wink

This is the right tool for me only because the paged result sets are fairly small, and the cost of the query that builds the resultset is what causes my customers to wait.

Your comments caused me to take another look at my procs, and the worst case scenario is elapsed time overhead in the tens of milliseconds. I, and my customers, can live with that.

I've submitted a minor revision to the article, suggesting that this approach is only beneficial in a narrow set of circumstances.


Heh... understood... but the problem is that other folks, who may not have such limited numbers of rows, are reading your good article and I want them to know that your method is rather slow in the face of scalability.

To prevent problems like this, I will always write code against a million row test table. It's a good habit to get into so that you're not called up at midnight on the last day of each month by your rather ticked of boss wondering why your code has, yet again, caused him to miss the SLA deadlines for month end runs.

Never justify performance challenged code with low row counts. It's an accident waiting to happen.
By jcraddock - Thursday, March 12, 2009 1:44 AM

jackie.williams (3/11/2009)
jcraddock (3/11/2009)
Um, how are you going to declare the variables and such inside one statement from an ASP.Net page/class/etc? This looks like you have to use a proc, and procs are out with the queries I generate.


I'm not sure what you're saying here. We use stored procedures that generate dynamic queries to retrieve datasets similar to this all the time. Our datalayer takes a datarow containing the values for whatever parameters are necessary for the procedures and iterates through that to pass the necessary parameters into our stored proc, then the stored proc makes sure any sql injection chars etc are stripped, generates the dynamic select query with paging etc. based on whatever values have been passed into it and returns a dataset with the requested page of data and a full count of the records. Why would procs be out in what you're doing?


Well, all my applications are actually one application. All table information, validation rules, permissions, roles, etc, are stored in metadata. I generate the queries for list windows dynamically from the ASP.NET page with full filtering and sorting - one list page works for every single table in all systems, similarly one editpage works for every table in every system...to do what you propose, I would have to move my OBJECT class into Sql, as it contains all the metadata. I have no interest in running .NET from my SQL Server for several reasons - number one being total rewrite of the architecture. I was just hoping there was a way to do it directly from the ASP code instead of relying on a procedure. At the top of this window you can see the record count and then the page is displayed. I was hoping for a way (without rearchitecting my class onto SQL Server and taking the performance hit for turning that on) to get both numbers at once.

Here is a Screenshot:
By jcraddock - Wednesday, March 11, 2009 2:23 PM

Um, how are you going to declare the variables and such inside one statement from an ASP.Net page/class/etc? This looks like you have to use a proc, and procs are out with the queries I generate.
By C. Westra - Wednesday, March 11, 2009 9:41 PM

I'm afraid differences in resources are partly induced by Sql Server caching (intermediate) results between the first and latter queries.
If you use DBCC DROPCLEANBUFFERS between the queries, caches are emptied and differences in time and physical reads may better reflect actual use of resources.
By SwePeso - Wednesday, April 22, 2009 10:00 PM

Thank you for the feedback!
By R Michael - Friday, March 13, 2009 6:16 PM

Jeff Moden (3/13/2009)

Heh... sure... 7 to 8 seconds compared to just over a second. I'm sure your customers will think the same thing while they're waiting. Wink


LOL Thankfully the queries being paged aren't quite as large as those you used to give my technique a beating. Wink

This is the right tool for me only because the paged result sets are fairly small, and the cost of the query that builds the resultset is what causes my customers to wait.

Your comments caused me to take another look at my procs, and the worst case scenario is elapsed time overhead in the tens of milliseconds. I, and my customers, can live with that.

I've submitted a minor revision to the article, suggesting that this approach is only beneficial in a narrow set of circumstances.
By R Michael - Friday, March 13, 2009 4:41 AM

Thank you, everyone, for the eye-opening research. You guys have contributed some caveats to consider when trying to identify the best solution.

Steve's lead-in was a bit misleading in that I do not consider the 'holy grail' method to be "the best" in all cases (although it remains the best in cases that I've had to work with). I was defining the 'holy grail' as being able to get the count and the page without adding a bunch of I/O overhead.

Every time I've needed to implement server-side paging, the code would be paging the results of a query, sometimes simple, sometimes complex. As a result Jeff's 'no join' methods are out the window, which is a shame because that's a very cool trick! In the edge cases, these queries would page tens of thousands of records (nowhere near the 1-2 million record test cases we've seen on this thread, although I should have done tests like that for this article)

I started looking for a better approach to getting a count because the underlying query that had to be returned to the user was very complex and expensive. Doing the 'embedded two-bite' method or others add considerable cost and time to execute the query.

For my ends, the 'holy grail' approach is absolutely the fastest and most efficient technique and I am convinced that there are other people out there who can benefit from this technique (which is why I wrote the article) Will this approach be the best in all cases? No

There is definitely a lot of good feedback and information in this thread. I'll update the article with the information you shared.
By R Michael - Wednesday, March 11, 2009 8:16 AM

jcraddock (3/11/2009)
These are all well and good, but I rarely need such results in a T-SQL window. Where I need them is in an application where I want to paginate using ASP.net etc.

In those cases, the best I can come up with is two calls. To be clear - I use dynamic sql to build all queries in my list windows. There I need the total count and the page of data required.


This is exactly the scenario I was imagining when putting this solution together, with the idea being that you can return both the total count and page n in a single call/read.

There has been some very interesting comments and ideas in this thread. As far as this idea goes, it works well in some cases, and not so well in others (as highlighted by other posters).

I built a paging proc based on this principle and, after extensive testing, found it to be the best approach. I saw only a few ms of cpu & elapsed time overhead. I would be very interested to know the size of some of the datasets peso and others are using. This would be very useful in determining the best solution for a given problem.

Thank you, everyone, for you comments and insight!
By R Michael - Wednesday, March 11, 2009 8:29 AM

Actually, rereading the thread, I see that Peso did post the size of his dataset; 2,000,000 rows.

It would be interesting to determine a relative cut off point where the row_number trick is the most efficient and where the two-bite approach becomes best.

For my purposes, the biggest cost was the query itself. The underlying tables hold hundreds of millions of records, but the data is pretty well filtered down by the time we get to the record set to page. To avoid running that query twice was a big win.

I deliberately avoided including time statistics in my comparisons because they can be very subjective (unless they demonstrate a clear performance difference as a few posters noted).
By R Michael - Thursday, March 12, 2009 2:57 AM

Thomas Bühler (3/12/2009)
Nice article. But I miss a solution for handling a custom sorting when using a paging functionality.


Yeah, I deliberately left that bit out because it had been covered by so many other articles. The 'real world' problem I chose this approach for does implement custom sorting by using conditional logic in the OVER(ORDER BY).
By Thomas Bühler - Wednesday, March 11, 2009 7:44 PM

Nice article. But I miss a solution for handling a custom sorting when using a paging functionality.
In practice (lets say a ASP.NET application with a GridView component) you need to sort dynamically by another column than the ID. So you have to take care of this before you do the paging.
Look here: http://aspnet.4guysfromrolla.com/articles/032206-1.aspx#
If you combine the custom sort capability together with the discussed Total Row retrieval you can get most out of this...

Cheers
By th-spam - Tuesday, October 20, 2009 8:15 AM

Really nice solution for the total row count and paging problem; and you can't miss the analogy to the classical Gauss argument for the sum of the N first integers 1+2+..+N= (N+1)*N / 2

1 + 2 + ... N
N + (N-1) ... 1

Smile
By aaron.bertrand - Friday, March 20, 2009 6:08 AM

Just for the benefit of the folks talking about grabbing the count of all rows in the table separately, note that in a lot of cases we can't look at sys.indexes or the more reliable sys.dm_db_partition_stats because our queries have WHERE clauses or are JOINed to other tables.
By fred_ch9400_cn - Thursday, June 18, 2009 2:23 AM

werner.broser said
Nice method, but works only for unique fields!
-- I agree.

When the order-expression does not bear an unique value, "The Holy Grail" can make mistakes:

DECLARE @t1 TABLE ([x] INT, [y] INT)
INSERT INTO @t1 ([x], [y])
SELECT 1, 1
UNION ALL
SELECT 1, 2

SELECT *
FROM (SELECT *, ROW_NUMBER() OVER (ORDER BY [x] ASC)
+ ROW_NUMBER() OVER (ORDER BY [x] DESC)
- 1 AS [Sum]
FROM @t1) t



The query result gives
x y Sum
1 1 1
1 2 3
which obviously cannot satisfy us.

For rows over which order-expression gives the same value, their relative order will not change regardless of "ASC" or "DESC" option, but only following the order how they are physiclely stored. The previous example proves this.

We should know this limit and do not apply "The Holy Grail" in the wrong place.
By jackie.williams - Wednesday, March 11, 2009 3:32 PM

jcraddock (3/11/2009)
Um, how are you going to declare the variables and such inside one statement from an ASP.Net page/class/etc? This looks like you have to use a proc, and procs are out with the queries I generate.


I'm not sure what you're saying here. We use stored procedures that generate dynamic queries to retrieve datasets similar to this all the time. Our datalayer takes a datarow containing the values for whatever parameters are necessary for the procedures and iterates through that to pass the necessary parameters into our stored proc, then the stored proc makes sure any sql injection chars etc are stripped, generates the dynamic select query with paging etc. based on whatever values have been passed into it and returns a dataset with the requested page of data and a full count of the records. Why would procs be out in what you're doing?
By ningaraju.n - Wednesday, February 10, 2010 10:33 PM

Hi Friend,
Can you get me the other alternative not using trigger
By alan.kelly.london - Wednesday, April 22, 2009 9:42 PM

The original article was very well written and it's sooo nice to statistics to back up the arguments presented.

However, having read the forums here I have to agree with Peso that overall time is perhaps more important than just I/O reads, certainly in terms of customer experience anyway!

I've tried Peso's method on several tables of real data with rows ranging from a few hundred to over 500,000 and in every single case Peso's method is quicker by a factor of at least 10.

Thanks for the post, it has made a massive difference to my paging search times.
By sicsemperspam - Thursday, April 30, 2009 9:40 AM

Hello all,

After benefitting from the code examples and explanations above, I thought I'd make a contribution.

Below is a generic stored procedure that will retrieve the page of rows based on the parameters fed to it. It determines whether the table called for is in fact a table or a view. If it's a table, it uses the sys-tables to get the total row count (thanks to a post above this one!) and if it's a view, it uses the original strategy of calculating the inverse row # for a given row. I didn't compute the total rows because I thought whatever was calling it would only have to make the computation once rather than once-per-row if I had done it here.

Before the Efficiency Police hall me away, this was slapped together as a proof of concept and is not (in any way) presented as the best way of doing this. And there is a rule. Your ORDER-BY clause must include ASC where appropriate. This made inverting the order-by merely ugly, down from absolutely hideous.


set ANSI_NULLS ON
set QUOTED_IDENTIFIER ON
go


ALTER PROCEDURE [dbo].[GetPage]
-- Add the parameters for the stored procedure here
(
@TableName varchar(1000)
,@Fields varchar(1000)
,@OrderBy varchar(1000)
,@StartPosition int
,@PageLen int
)
AS
Begin

DECLARE
@strSQL varchar(max)
,@RowCount int
,@RightNow DATETIME
,@TableType varchar(20)
,@RevOrderBy varchar(1000)

Set @RightNow=GETDATE()

Set @PageLen = @PageLen - 1
if not EXISTS (select * from INFORMATION_SCHEMA.tables where table_name = @TableName)
begin
Select 'Table/View not found' as Error
return (-1)
end

select @TableType = TABLE_TYPE from INFORMATION_SCHEMA.tables where table_name = @TableName
if @TableType = 'BASE TABLE'
begin
SELECT @RowCount=P.Rows
FROM sys.partitions P
INNER JOIN sys.indexes I ON (P.object_id = I.object_id) AND (P.index_id = I.index_id)
INNER JOIN sys.objects O ON (P.object_id = O.object_id)
WHERE (I.type IN(0,1)) AND (O.name = PARSENAME(@TableName, 1) AND
O.[schema_id] = IsNull(SCHEMA_ID(PARSENAME(@TableName, 2)), O.[schema_id]))
ORDER BY O.name
set @strSQL =' SELECT * ' +
', ' + convert(varchar(10),@RowCount) + ' as [RowCount] ' +
' FROM (SELECT ' + @Fields + ', ROW_NUMBER() OVER(ORDER BY ' + @OrderBy + ') AS RowNumber ' +
' FROM [' + @TableName + ']) as [' + @TableName + '_A] ' +
' WHERE RowNumber BETWEEN ' + convert(varchar(10),@StartPosition) +
' AND ' + convert(varchar(10),(@StartPosition + @PageLen)) + ' ORDER BY RowNumber'
end
else
begin
set @RevOrderBy = upper(@OrderBy) + ','
set @RevOrderBy = replace(replace(replace(@RevOrderBy ,' ASC,',' ~A,'),' DESC,',' ASC,'),' ~A',' DESC')
set @RevOrderBy = left(@RevOrderBy ,len(@RevOrderBy)-1)
set @strSQL =' SELECT *' +
' FROM (' +
'SELECT ' + @Fields + ', ROW_NUMBER() OVER(ORDER BY ' + @OrderBy + ') AS RowNumber, ' +
' ROW_NUMBER() OVER(ORDER BY ' + @RevOrderBy + ') AS RowNumberInverse ' +
' FROM [' + @TableName + ']) as [' + @TableName + '_A] ' +
' WHERE RowNumber BETWEEN ' + convert(varchar(10),@StartPosition) +
' AND ' + convert(varchar(10),(@StartPosition + @PageLen)) + ' ORDER BY RowNumber'
end
--SELECT SQL=@strSQL
EXEC(@strSQL)
End



Enjoy!
By peter.krickner - Wednesday, October 14, 2009 1:33 AM

Thank you for this. I've used it now, but haven't studied reads and performance gains (or losses) yet. Realized though that the column(s) used for sorting has to be unique in order for the total row count to be accurate. If there are several equal records in the sorted column, the total row count will not be accurate. You have to make sure that the sorting is 100% opposite to the initial sorting. Just a simple addition to the solution after trying it out. Thank you again!

/ Peter.
By peter.krickner - Wednesday, October 14, 2009 1:37 AM

Sorry for stating an already noticed "feature". Only read the earliest posts...
By Jeff Moden - Monday, December 6, 2010 12:27 PM

Paul Muharsky-474732 (5/20/2010)
I disagree that all your SQL should be optimized to 1 million+ records.


Heh... Yeah... I know I'm responding to an old post but the kind of attitude above why there are so many posts on this site that start with "I have this performance problem. Can anyone help"? If you look back at Peso's code, 4 milli-seconds vs almost 6 seconds isn't something that anyone in their right mind would disagree with. ;-)

Write high performance code as if your life depended on it because saying "That's not my job" may be a self-fulfilling prophecy. ;-)
By SQLRNNR - Friday, May 7, 2010 4:52 AM

Nice article. Thanks for sharing.
By Mike Dougherty - Friday, May 7, 2010 2:32 AM

I haven't implemented this yet, but my thoughts on paging is this:

1) Store each unique combination of request parameters in a table (parent)
2) Run the query, store the results in another table with a foreign key to the parent table. Since this result is now read-only and there will be no provision for updating the rows, update the parent row's resultrowscount for easy access to the total.
3) Also set cache timeout values on the parent. Suppose large result sets that took a long time to run are given large cache timeout while smaller results are given shorter timeouts. Perhaps the data itself determines its life: a query on real-time data has is stale in 3 minutes while first quarter report (run in may) has a life of 30 days or more.
4) Pages are retrieved from the prefetched (and denormalized) child table.
Further requests for the same query are served from the prefetched result if the results are within the staleness tolerance. In the case the data is stale, the old key is marked obsolete and the process starts over. (This allows a nice audit/analysis to be done on cache hits and results rather than deleting old results)
If business rules allow for Top N to prevent absurdly large results then go for it, otherwise the cache at least mitigates the need to redundantly query the results from the source system.

I think a better question is to ask if users really need pages of data more than they need better results in general. "Eyeballing" long lists is a huge waste of time. Thanks to Google (et al.) we expect relevant results of search to be on the first page. If we're not sure what we're looking for, we might desperately look at page 2 or 3. After that, we hopefully have an idea how to search more specifically for better results. This makes search an iterative process of refining criteria rather than a brute-force perusal of pages of results. I was going to make this concept the focus of my seminar paper this semester but the day the topic proposal was due I read the SQLServerCentral article on genetic algorithms in T-SQL so I did that instead.

I urge anyone thinking of implementing results paging to consider how/why the user can be better served with a more conversational and iterative process. Our job should be to fetch what they really want from our database (even when they aren't sure what they want). Simply giving the user multiple pages of what they asked for makes us look lazy and makes them work much harder than they should.
By srinivas-254339 - Sunday, December 5, 2010 4:40 PM

nice solution. and very nicely written too, but the holy grill solution is not giving the correct 'Total count of records' in case if the 'ROW_NUMBER() OVER(ORDER BY' column is not the unique column.
By srinivas-254339 - Sunday, December 5, 2010 4:46 PM

Thanks SSC Rookie. your solution helped me.
By brian lewis - Friday, May 7, 2010 10:14 AM

One small observation from trying this: you have to be sure that the ORDER BY fields are unique, otherwise the Ascending and Descending sequences may not mirror each other, so the sum of the UP and DOWN row_number() fields is not guaranteed to be the total row count.

With that in mind, though, the central idea here does give another way to calculate a MEDIAN - sort for row_number on the value you are interested in (possibly including an ID in the ORDER BY to separate duplicate values) , then your median is the value where UP and DOWN are the same, or , if there is an even number of values, it's the average of those two where abs(UP - DOWN) = 1

Both these cases reduce to
avg(value).....WHERE (UP - DOWN) between -1 and 1



This snippet finds the median of the field enSum, in the table called Enrolments:


with RankedEnrol
As
(
Select enSum
, row_number() over (ORDER BY enSum, enID) UP -- identity is included in the ORDER BY to ensure uniqueness
, row_number() over (ORDER BY enSum DESC, enID DESC) DOWN
FROM Enrolments
)
SELECT avg(enSum) MedianEnrol
from RankedEnrol
WHERE (UP - DOWN) between -1 and 1

By SwePeso - Monday, December 6, 2010 1:15 PM

It seems the dual TOP/ORDER BY was the most efficient approach by the time.
I tested the new Denali OFFSET/FETCH and I got the exact same number of reads for @StartRow = 5000, but 2 ms instead of 4 ms.


SELECT Number
FROM dbo.TallyNumbers
ORDER BY Number
OFFSET @StartRow - 1 ROWS
FETCH NEXT 50 ROWS ONLY

By alen teplitsky - Friday, May 7, 2010 1:21 AM

can someone give me a real world example of when this would be used in production? i've never seen it used, but it could be because a lot of our code was written for SQL 2000 and just migrated over with minimal changes
By TheSQLGuru - Saturday, October 26, 2013 11:16 AM

Sorry, but at first blush I think parameter sniffing (without some other query addition) could totally screw you here. Maybe an OPTION (RECOMPILE) could help, depending on what version of SQL Server you are on.
By Paul Muharsky-474732 - Thursday, May 20, 2010 1:59 AM

Numerous people have noted that the "holy grail" pattern does not work when there are duplicate values in the sort column.

a simple work around for this is:

With CTE as (
select *, row_number() OVER (ORDER BY nonunique ASC) seq
from MyTable
),
with Reverse as
(
select *, row_number() OVER (ORDER BY seq DESC) tot
)
select fields, seq+tot-1 as totalRecs
from Reverse

In my tests, this did not add any noticeable overhead over the standard "Holy Grail" pattern.

I find this does work well for small sets, and contrary to another posters statement, I disagree that all your SQL should be optimized to 1 million+ records. You should use the correct SQL and indexes for your workload, and you should have a good understanding of your workload, expected growth, access pattern, read / write ratio ,etc...

If I have a system that has little data, but a high transaction count, wicked processors and a slow disk sub-system, this pattern is ideal. For large amounts of data, this pattern has been clearly shown to break down and would not necessarily be the correct choice.

I'd be interested in seeing differences in performance with real-world examples of complex queries, as paging is typically used on search results, and search queries rarely, if ever, read from a single table as most of the examples both for and against this pattern have dictated.
By Carla Wilson-484785 - Saturday, May 8, 2010 5:31 AM

brian lewis (5/7/2010)
... the central idea here does give another way to calculate a MEDIAN - sort for row_number on the value you are interested in (possibly including an ID in the ORDER BY to separate duplicate values) , then your median is the value where UP and DOWN are the same, or , if there is an even number of values, it's the average of those two where abs(UP - DOWN) = 1

Both these cases reduce to
avg(value).....WHERE (UP - DOWN) between -1 and 1



This snippet finds the median of the field enSum, in the table called Enrolments:


with RankedEnrol
As
(
Select enSum
, row_number() over (ORDER BY enSum, enID) UP -- identity is included in the ORDER BY to ensure uniqueness
, row_number() over (ORDER BY enSum DESC, enID DESC) DOWN
FROM Enrolments
)
SELECT avg(enSum) MedianEnrol
from RankedEnrol
WHERE (UP - DOWN) between -1 and 1


Nice!
By Tony++ - Friday, May 7, 2010 2:34 AM

To pull an idea from my ...um, thin air...
When I'm looking at the executed plan each component knows how many rows it handled, and the components used in the initial selection is the total rows.
What's the merit for returning the plan XML to the caller & letting the caller look into the XML for the total rows? It at least lets the database just handle the results, and moves the 'Total rows" workload out to the caller.
By Dave Ballantyne - Monday, December 6, 2010 8:27 PM

SwePeso (12/6/2010)
It seems the dual TOP/ORDER BY was the most efficient approach by the time.
I tested the new Denali OFFSET/FETCH and I got the exact same number of reads for @StartRow = 5000, but 2 ms instead of 4 ms.


SELECT Number
FROM dbo.TallyNumbers
ORDER BY Number
OFFSET @StartRow - 1 ROWS
FETCH NEXT 50 ROWS ONLY




My take on the Denali paging....

http://sqlblogcasts.com/blogs/sqlandthelike/archive/2010/11/10/denali-paging-is-it-win-win.aspx

and

http://sqlblogcasts.com/blogs/sqlandthelike/archive/2010/11/19/denali-paging-key-seek-lookups.aspx
By barry-591755 - Friday, May 7, 2010 6:49 AM

I use a stored procedure to run the paged query, It runs the query twice; once to count, once to get the paged result set. I then return the count as the return value from the stored procedure and the result set is returned to the caller.

I also use Xml to pass in the search parameters and attributes for the paging information, but that would be a topic for another day.

Barry
By peter-757102 - Thursday, May 6, 2010 8:10 PM

Just to throw in my 2c, I try to stay away from techniques that do not work reasonably efficient or completely break down in other ways when the query to be paged spans multiple tables. It is far easier to design paging for a single table then it is for multiple.

Thats is why I like the idea of doing first a limited run (top X) initial query that fetches all the PKs in the desired order. Followed by queries that each window over the fetched keys to quickly and efficiently fetch the actual data to display. This allows for 'browsing' over the results in a detail view if the user clicks on the grid.

If browsing is not required you can ** maybe ** suffice with recording only the keys of page starts. And then build a (more complex) query that uses a page start key as a reference to get the next PageSize-1 results for each page requested. Assuming the data to query has a low mutation rate.

But whatever method you use in complex queries, having a narrow index that covers the fields being filtered and sorded on is a must. Because other then that, there is (i think) no way to get to the minimum possible I/O.
By peter-757102 - Monday, December 6, 2010 7:50 PM

SwePeso (12/6/2010)
It seems the dual TOP/ORDER BY was the most efficient approach by the time.
I tested the new Denali OFFSET/FETCH and I got the exact same number of reads for @StartRow = 5000, but 2 ms instead of 4 ms.


SELECT Number
FROM dbo.TallyNumbers
ORDER BY Number
OFFSET @StartRow - 1 ROWS
FETCH NEXT 50 ROWS ONLY



This is unreleased SQL Server 2011 functionality only, right?
By Doug Bass - Friday, May 7, 2010 8:28 AM

This article introduces a novel approach (in my experience) to tackling an old problem, and I think the concept behind it is great. However, the article could have been much better written, with greatly improved examples.

First, I would have liked to have seen a table in a sample database used, rather than [INFORMATION_SCHEMA].columns. This confused me at first.

Second, in the "grail" query, the second "row_number" column is aliased as "totrows". This should have been named something like rseq, since it actually represented the reverse sequence number, not the total number of rows. This was confusing too.

Third, I could not find mention of the fact that the columns in the OVER() clause of the "grail" query MUST include the primary key, and the columns must be listed in the same order in both OVER() clauses or it throws the calculation off.

Fourth, a lot of confusion could have been eliminated, and it would have been a lot more useful to the reader, if the article had also included some query examples with filter criteria to demonstrate the effect on the row count (viz. that # of rows returned was not the number of rows in the entire table).

Lastly, and this is a minor point, I was not familiar with the WITH construct. Speaking only for myself, it would have been nice if this had been explained too.

I give 5 stars for the concept, and 2 stars for the writing.
By mbarkell - Thursday, May 6, 2010 11:15 PM

I quite enjoyed reading this article. I found myself going come-on come-on the solution the solution, but then was pleased when the solution was presented. I find the keeping of the count of records within each record with a constant big O great.
By neil.genias - Monday, November 7, 2011 4:45 PM

This is good. But i wanted to ask, what about the situation in which, say there are only 5 records in the table, and you ask for page number 2 with page size 10?
In that case there is no data returned to you, and you will not get to know how many records are there, in the table.
By lukas.eder - Saturday, October 26, 2013 5:58 AM

I'm quite surprised that the "holy grail of paging" doesn't mention the (granted a bit less popular) "seek method" as described here:
http://blog.jooq.org/2013/10/26/faster-sql-paging-with-jooq-using-the-seek-method/

Take the following example:


SELECT TOP 10 first_name, last_name, score
FROM players
WHERE (score < @previousScore)
OR (score = @previousScore AND player_id < @previousPlayerId)
ORDER BY score DESC, player_id DESC



The @previousScore and @previousPlayerId values are the respective values of the last record from the previous page. This allows you to fetch the "next" page. If the ORDER BY direction is ASC, simply use > instead.

With the above method, you cannot immediately jump to page 4 without having first fetched the previous 40 records. But often, you do not want to jump that far anyway. Instead, you get a much faster query that might be able to fetch data in constant time, depending on your indexing. Plus, your pages remain "stable", no matter if the underlying data changes (e.g. on page 1, while you're on page 4).

This is the best way to implement paging when lazy loading more data in web applications, for instance.