SQL Server 2005 Paging – The Holy Grail

Question

SQL Server 2005 Paging – The Holy Grail

R Michael

SSCarpal Tunnel

Points: 4211
More actions
March 10, 2009 at 10:33 pm

#219035

Comments posted to this topic are about the item SQL Server 2005 Paging – The Holy Grail
SQL guy and Houston Magician

Viewing 15 posts - 1 through 15 (of 64 total)

You must be logged in to reply to this topic. Login to reply

Toby Harman SSCarpal Tunnel Points: 4168 More actions · Answer 1

This is an interesting article, which left me scratching my (SQL 2000) head over the syntax. Once I had that straight and looked up ROW_NUMBER in the Books Online I saw almost exactly this example. Notwithstanding, I think this is a good article which showcases a solution.

Well done bringing this functionality into the spotlight, and thanks for the time taken to do so.

SwePeso SSC-Dedicated Points: 39752 More actions · Answer 2

I created a TallyNumber table with only one column (ascending clustered primary key) and 2,000,001 records, ranging from 0 to 2,000,000.

First I tried your suggestion in the article, which gave me the correct records back

set statistics io on

set statistics time on

DECLARE @startRow INT ; SET @startrow = 50

;WITH cols

AS

(

SELECT number,

ROW_NUMBER() OVER(ORDER BY number) AS seq,

ROW_NUMBER() OVER(ORDER BY number DESC) AS totrows

FROM tallynumbers

)

SELECT number, totrows + seq -1 as TotRows

FROM cols

WHERE seq BETWEEN @startRow AND @startRow + 49

ORDER BY seq

set statistics time off

set statistics io off

and gave me these results

(50 row(s) affected)

Table 'TallyNumbers'. Scan count 1, logical reads 3225, physical reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0.

SQL Server Execution Times:

CPU time = 6107 ms, elapsed time = 6059 ms.

Setting totrows to constant 1

set statistics io on

set statistics time on

DECLARE @startRow INT ; SET @startrow = 50

;WITH cols

AS

(

SELECT number,

ROW_NUMBER() OVER(ORDER BY number) AS seq,

1 as totrows

FROM tallynumbers

)

SELECT number, totrows + seq -1 as TotRows

FROM cols

WHERE seq BETWEEN @startRow AND @startRow + 49

ORDER BY seq

set statistics time off

set statistics io off

gave me these results

(50 row(s) affected)

Table 'TallyNumbers'. Scan count 1, logical reads 3, physical reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

So for a large table you will have to scan all records anyway with your suggestion.

Modifying your code to this

set statistics io on

set statistics time on

DECLARE @startRow INT ; SET @startrow = 50

;WITH cols

AS

(

SELECT number,

ROW_NUMBER() OVER(ORDER BY number) AS seq

FROM tallynumbers

)

SELECT number, (select count(*) from cols) as TotRows

FROM cols

WHERE seq BETWEEN @startRow AND @startRow + 49

ORDER BY seq

set statistics time off

set statistics io off

gave me these results

(50 row(s) affected)

Table 'TallyNumbers'. Scan count 4, logical reads 3250, physical reads 0.

SQL Server Execution Times:

CPU time = 360 ms, elapsed time = 191 ms.

So it's not all about reads. It's about "customer satisfaction". My last suggestion runs in less than 0.2 seconds with 3250 reads, and your suggestion runs in 6.1 seconds and 3225 reads, on my single-column 2,000,001 record table.

The reads only went up by a measly 0.78%, but time went down by a staggering 96.8% !

Doing this all over again, but using @StartRow = 5000 gave almost same time for your solution because all records are to be numbered. Same amount of reads too.

(50 row(s) affected)

Table 'TallyNumbers'. Scan count 1, logical reads 3225, physical reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0.

SQL Server Execution Times:

CPU time = 6017 ms, elapsed time = 5995 ms.

My last suggestion gave these results (for @Startrow = 5000)

(50 row(s) affected)

Table 'TallyNumbers'. Scan count 4, logical reads 3260, physical reads 0.

SQL Server Execution Times:

CPU time = 235 ms, elapsed time = 122 ms.

Which is only a minimal amount of more reads.

The suggestion increased the number of reads by 0.1% and decreased time by 98.0%.

N 56°04'39.16"
E 12°55'05.25"

SwePeso SSC-Dedicated Points: 39752 More actions · Answer 3

My best attempt this far is to have a trigger on tallynumbers table and report the number of records in a CountingTable.

That gave me a total of only 14 reads and 4 ms in runtime! And this is for @StartRow = 5000

set statistics io on

set statistics time on

DECLARE @startRow INT ; SET @startrow = 5000

declare @items int

select @items = number From CountingTable where table_name = 'tallynumbers'

SELECTNumber,

@items

FROM(

SELECTTOP(50)

Number

FROM(

SELECTTOP(@startrow + 49)

Number

FROMTallyNumbers

ORDER BYNumber

) AS d

ORDER BYNumber DESC

) AS q

ORDER BYNumber

set statistics time off

set statistics io off

Results were

(1 row(s) affected)

Table 'CountingTable'. Scan count 1, logical reads 1, physical reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

(50 row(s) affected)

Table 'TallyNumbers'. Scan count 1, logical reads 13, physical reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 4 ms.

4 ms compared to 5995 ms (decrease by 99.9%) and 14 reads compared to 3225 is a decrease by 99.6%.

What can this tell us?

1. Have number of total records stored in some management table and maintained by a trigger

2. If possible, have the "numbering" work done by possible same trigger and have a "sequence" column in the source table if paging is done a lot of times. Or use the "multiple order bys" principle.

N 56°04'39.16"
E 12°55'05.25"

tymberwyld SSCertifiable Points: 7810 More actions · Answer 4

About the whole row count issue. I know this isn't bullet proof but SQL is already tracking the row counts in every table. You don't need to create any Triggers and keep your own "tally" tables. Now, of course if you're paging a View, you'll still have to get the row counts using one of your methodologies, however, you can use the following query for tables:

DECLARE @Object SysName

SET @Object = 'dbo.MyTable' -- Schema.ObjectName

-- Get RowCount for Object

-- NOTE: Sometimes there are multiple rows returned when a Table has many indexes,

-- however, each index contains the same value for the row count

SELECT TOP 1 P.Rows

FROM sys.partitions P

INNER JOIN sys.indexes I ON (P.object_id = I.object_id) AND (P.index_id = I.index_id)

INNER JOIN sys.objects O ON (P.object_id = O.object_id)

WHERE (

-- O.type IN('S', 'U')

-- AND

(I.type IN(0,1))

)

AND (

O.name = PARSENAME(@Object, 1)

AND O.[schema_id] = IsNull(SCHEMA_ID(PARSENAME(@Object, 2)), O.[schema_id])

)

ORDER BY O.name

You can also do the same in SQL 2000 by querying the dbo.sysindexes.

vlado_work SSC Enthusiast Points: 131 More actions · Answer 5

Based on my experience with a big table, the best approach is to get a Total Rows number in a separate call:

set statistics io on

set statistics time on

DECLARE @TotRows AS INT

DECLARE @startRow INT ; SET @startrow = 50

;WITH cols

AS

(

SELECT table_name, column_name,

ROW_NUMBER() OVER(ORDER BY table_name, column_name) AS seq

FROM [INFORMATION_SCHEMA].columns

)

--select * from cols

SELECT table_name, column_name

FROM cols

WHERE seq BETWEEN @startRow AND @startRow + 49

ORDER BY seq

select @TotRows=count(*)

FROM [INFORMATION_SCHEMA].columns

--SELECT @TotRows

set statistics time off

set statistics io off

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 6

Nice article.

I have been doing similar coding for a few years now with great success. My typical situation gets just the primary key fields from many tables necessary to filter/retrieve the results, then joining those keys back to their original tables to get the required output.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

daninmanchester Ten Centuries Points: 1204 More actions · Answer 7

I published an FAQ some years ago which granted was a bit immature at the time but I still use the principal today in a more devloped and mature way. It could certainly be refined to make use of some of the advances in SQL server etc.

FAQ

The approach basically uses a mix of the database and DAL / BLL layers to process the paging set.

In the first instance it hits the database to return list of all the primary keys for the result set. This is a prety small and efficient dataset if you use int as your primary key and a hash table or similar.

The DAL/BLL layer then processes this pulling out the total number of records and the current page we want. It then pulls the required page records using the primary keys which is very efficient.

To make this even more efficient I cache the initial list of primary keys so when the user requests the next page, etc all I have to do is look for the relevant primary keys in the cached list and pull the records I want based on primary key. This means the orginal complex query is only fired once regardless of the number of pages and any future requests on the same dataset only pull records based on known primary keys.

Dan Bayley
affordable website design UK
Free Google and Yahoo sitemap generator

prasad.puranik SSC Journeyman Points: 96 More actions · Answer 8

nice solution. and very nicely written too. looking forward to read more from you.

hhcosmin SSC Enthusiast Points: 108 More actions · Answer 9

well... we use the same approach but i do not fully agree with it. you say you execute the first complex query only once and then page on it. what if there appears new data in the DB? i guess the set would be stale. i guess this should be coupled with some data retention policy to make the set of ids expire after a certain time. the second problem is that retrieving the first pages is very cheap in term of cost. the price gets maximum when you reach the last page. retrieving the last page with a query is almost as expensive as returning the whole set iof ids because the server must skip the first n records. the majorty of users don't even bother to go beyond the first few pages and that means that the complex query that you executed is mainly unused. in our case one can return tens of thousands of ids or even milions but the user is mostly interested by the first page mostly. so when you have large sets of data getting all the ids is useless mostly.

another thing to note is that retrieving the count of records of a very complex query costs as much as getting the list of ids. so if the user is not interested to know the exact count (which could be 100 or 10 millions) one could retrieve the first thousand of ids (depending on the needs) ... page through them and say the query resulted in more than 1000 hits. the user is never interested by more than a few tens or hundreds of records anyway. this could also be coupled with a retention policy for further improvements. i believe this is the best solution when one could potentially return a large number of records (more than 5000 let's say).

i have another amendament. let's say the user.. in a very special case wants to go beyond the 50-100 pages that you retrieved. you could tell him that it's query retieved more than 5000 records and let him naviagte freely to the next page beyond the first 50-100 pages, retrieving the records from the DB. this adds some complexity to the BAL but it could work.

pfwojnar Grasshopper Points: 13 More actions · Answer 10

I've been using one of these two methods. The Optimizer seems to be smart enough to know not to do the COUNT multiple times. However, I've never done an analysis of this method quite as deep as the one you've done here. Great job by the way, I may need to change what I'm doing.

DECLARE @startRow INT ; SET @startrow = 50

;WITH cols

AS

(

SELECT table_name, column_name,

ROW_NUMBER() OVER(ORDER BY table_name, column_name) AS seq

FROM [INFORMATION_SCHEMA].columns

)

SELECT table_name, column_name, (SELECT COUNT(*) FROM cols) AS TotRows

FROM cols

WHERE seq BETWEEN @startRow AND @startRow + 49

ORDER BY seq

For the other way, replace (SELECT COUNT(*) FROM cols) AS TotRows with (SELECT MAX(seq) FROM cols) AS TotRows.

StarNamer SSCrazy Eights Points: 8746 More actions · Answer 11

I tried this and found that it depends on the indexes avaialble.

Using a 1.2 million record table as follows

CREATE TABLE [dbo].[tblInvoice](

[SpecialGLIndicator] [varchar](50) NOT NULL,

[DocumentNumber] [bigint] NOT NULL,

[PostingDate] [smalldatetime] NOT NULL,

[DocumentDate] [smalldatetime] NOT NULL,

[EnteredOn] [smalldatetime] NOT NULL,

[DocTypeID] [int] NOT NULL,

[Period] [varchar](4) NOT NULL,

[PKID] [int] NOT NULL,

[Amount] [money] NOT NULL,

[UserID] [int] NOT NULL,

[TranCodeID] [int] NOT NULL,

[CompanyID] [int] NOT NULL,

[Month] [smalldatetime] NOT NULL,

[WithPO] [tinyint] NOT NULL,

[Lines] [int] NOT NULL,

[docs] [int] NOT NULL,

[TransCode] [varchar](10) NOT NULL,

CONSTRAINT [PK_tblInvoice] PRIMARY KEY CLUSTERED

(

[DocumentNumber] ASC,

[CompanyID] ASC,

[Month] ASC

) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [IX_27754] ON [dbo].[tblInvoice]

(

[CompanyID] ASC,

[Month] ASC

)

INCLUDE ( [docs])

GO

CREATE NONCLUSTERED INDEX [IX_27777] ON [dbo].[tblInvoice]

(

[CompanyID] ASC,

[Month] ASC,

[WithPO] ASC

)

INCLUDE ( [UserID], [docs])

GO

CREATE NONCLUSTERED INDEX [IX_27779] ON [dbo].[tblInvoice]

(

[Month] ASC,

[WithPO] ASC

)

INCLUDE ( [UserID], [CompanyID], [docs])

GO

CREATE NONCLUSTERED INDEX [IX_31155] ON [dbo].[tblInvoice]

(

[UserID] ASC,

[Month] ASC

)

INCLUDE ( [CompanyID], [docs])

GO

CREATE NONCLUSTERED INDEX [IX_tblInvoice_PostingDate] ON [dbo].[tblInvoice]

(

[PostingDate] ASC

)

GO

I found that the 2-bite (select count(*) then select) method was most efficient.

-- select count
Table 'tblInvoice'. Scan count 1, logical reads 3855, physical reads 3, read-ahead reads 3851, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 156 ms, elapsed time = 810 ms.
-- select
(10 row(s) affected)
Table 'tblInvoice'. Scan count 1, logical reads 41, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1938 ms, elapsed time = 688 ms.
-- double row_number
(10 row(s) affected)
Table 'tblInvoice'. Scan count 4, logical reads 4251, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 11406 ms, elapsed time = 6212 ms.

However, if I add an index as follows.

CREATE NONCLUSTERED INDEX [IX_test1] ON [dbo].[tblInvoice]

(

[CompanyID] ASC,

[Month] DESC,

[DocumentNumber] ASC

)

The IO improves.

Table 'tblInvoice'. Scan count 1, logical reads 2832, physical reads 0, read-ahead reads 2821, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 156 ms, elapsed time = 350 ms.
(10 row(s) affected)
Table 'tblInvoice'. Scan count 1, logical reads 239, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 94 ms, elapsed time = 103 ms.
(10 row(s) affected)
Table 'tblInvoice'. Scan count 1, logical reads 2832, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 8345 ms, elapsed time = 4556 ms.

Unfortunately, the CPU and Elapsed times are still worse. So it seems it's not a panacea for all ills.

Still a very interesting technique which is worth trying. Thanks for the article.

Derek

Scott Monnig Grasshopper Points: 19 More actions · Answer 12

Gang,

I suggest that Peso's first response at 9:21 AM where a simple count(*) of the cols table be seriously considered.

This approach, which we employ as well, serves the need well when you consider that you have already queried the table/tables in question in the "cols" portion. No need to go back again just for the count.

Consider that if you have developed a robust data layer in an application where ad-hoc querying is supported, the Row_Number() function may take multiple columns into consideration. The approach presented requires you to reverse the sort direction of each criteria. Peso's (and our employed approach) does not require you to do that. Simply query as you are required to in the "cols" or similar CTE expression and Count(*) that for your total rows which match your criteria.

werner.broser Old Hand Points: 345 More actions · Answer 13

werner.broser

Old Hand

Points: 345

March 11, 2009 at 7:16 am

#957450

Nice method, but works only for unique fields!

johanp.za Valued Member Points: 63 More actions · Answer 14

Maybe not the best but I've been using this for years.

Keen to try the methods posted in the article.

USE [*********]

GO

SET ANSI_NULLS OFF

GO

SET QUOTED_IDENTIFIER OFF

GO

ALTER PROCEDURE [dbo].[ReturnRows]

(

@SQL nvarchar(4000),

@Page int,

@RecsPerPage int,

@ID varchar(255),

@sort varchar(255)

)

AS

DECLARE @STR nvarchar(4000)

SET @STR='Select TOP '+CAST(@RecsPerPage AS varchar(20))+' * FROM ('+@SQL+') T WHERE T.'+@ID+' NOT IN

(select TOP '+CAST((@RecsPerPage*(@Page-1)) AS varchar(20))+' '+@ID+' FROM ('+@SQL+') T9 ORDER BY '+@Sort+') order by '+@Sort

PRINT @STR

EXEC sp_ExecuteSql @STR