Blog Post

Offset without OFFSET

,

A while ago Robert Cary posted an article on SQL Server Central entitled 2005 Paging – The Holy Grail which is, as the title would suggest about paging in SQL Server.  This article provoked some really interesting chat around the subject and is well worth a read.

This is now a lot easier in SQL Server 2012 with the introduction of the OFFSET extension to the ORDER BY clause,  but what is the most optimal method is you are not using 2012 ?

Well, whilst playing around by the OFFSET portion of my “What’s new in SQL Server 2012 – TSQL” presentation, I hit on a different method that I’ve not seen published before.

Now whilst finding which rows are on which page is a problem, it is only part of a much wider problem, that being that cost of the lookups to find other related data.  For example:  You have a list of people which you are paging through in the order of LastName,  but you also wish to display FirstName.  That is not in your index and so a key lookup occurs, OK I could INCLUDE it in the index but im just simplifying the problem.

So, to demonstrate this I need to create an index on Person.Person in AdventureWorks.

Create index idxLastName on Person.Person(LastName)
The query for the “holy grail” method would look something like this :
with ctePaging
as
(
Select LastName,FirstName,
       row_number() over (order by LastName,BusinessEntityID)-1  as RowN
 from  Person.Person
)
Select * from ctePaging
where RowN between 20 and 39 
order by RowN;

The issue here is that SQL Server has initiated an index scan (against a different index than the one we created) and had to process all the rows in the table and then sort them. 

image

We only want 20 rows returned so this is quite a lot of wasted effort on the engine's part.

OFFSET has been introduced in 2012 and running the equivalent query of :

Select LastName,FirstName,BusinessEntityID
 from  Person.Person
 order by LastName,BusinessEntityID
 offset 20 rows fetch next 20 rows only;

Gives us the query plan of :

image

Even this is non-optimal though, as the key lookup has occurred 40 times , even though we only needed the data (in this case FirstName) for 20 rows.

This can be resolved by doing the key lookup yourself.

with cteKeySeek
as
(
Select LastName,BusinessEntityID
 from  Person.Person
 order by LastName,BusinessEntityID
 offset 20 rows fetch next 20 rows only
)
Select cteKeySeek.LastName,
       FirstName,
       cteKeySeek.BusinessEntityID
 from  cteKeySeek
 inner join  Person.Person   
   on  cteKeySeek.BusinessEntityID =   Person.BusinessEntityID
order  by cteKeySeek.LastName,FirstName,cteKeySeek.BusinessEntityID;

Even though its longer, wordier and involves a join , it is more efficient as the join has replaced the key lookup and it is now only occurring on the 20 rows of data that we need

image

Quite neat hey ? When using OFFSET it is important to remember that no magic is happening, SQL Server still has to ‘count’ and scan through the rows that are not to be processed before it can decide which ones it does need. 

A comparable query for previous versions and taking the lead from the holy grail method would be :

with cteKeySeek
as
(
Select BusinessEntityID,LastName,
       row_number() over (order by LastName,BusinessEntityID)-1 
             as RowN
 from  Person.Person
)
Select cteKeySeek.LastName,FirstName,cteKeySeek.BusinessEntityID ,RowN
  from cteKeySeek
  inner loop join  Person.Person   
   on  cteKeySeek.BusinessEntityID =   Person.BusinessEntityID
where RowN >= 20 and rown<=39
order by LastName,BusinessEntityID;

Which does similarly filter the rows before doing the index lookup

image

It does however still involve a scan of 19,972 rows of which 19,932 are irrelevant to our final result set.  You may of noticed in the OFFSET versions that the TOP operator is used to filter the data and ‘stop’ the scan once it has reached the last row that we are interested in.  What if we could do something similar.

What about this ?:

with cteKeySeek
as
(
Select BusinessEntityID,LastName,
       row_number() over (order by LastName,BusinessEntityID)-1 
             as RowN
 from  Person.Person
)
Select top(20) cteKeySeek.LastName,FirstName,cteKeySeek.BusinessEntityID ,RowN
  from cteKeySeek
  inner loop join  Person.Person   
   on  cteKeySeek.BusinessEntityID =   Person.BusinessEntityID
where RowN >= 20 and RowN<=39
order by LastName,BusinessEntityID;

That does have the rather interesting effect of doing exactly that:

image

So, this is looking (at least in-terms of rowcounts) very similar to the OFFSET functionality.  If we look for a page of data further on (rows 200 to 219) and look at an profiler trace we can see how the three type of query compare.

image

So as you can see over a medium size (ish) dataset the fake and real offset are comparable in terms of IO.

Hope this helps someone, who needs to do paging

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating