Custom Pagination in SQL Server 2005

  • Peter,

    I love how you only return the totals in the first row of the output. this will definitely reduce the amount of data being returned to the application. I have a few things to note here about the functionality and you may have addressed them in your environment. First there should be some sort of filter in the first CTE. This way you can reduce the amount of data searched through. Secondly, have you looked at using a stored procedure to handle the code, instead of the application executing ad-hoc SQL? The reason I ask is, the db engine will never be able to reuse a query plan unless an exact binary match already exists. This means you are not benefiting from query plan reuse or parameterization. Also, by introducing a stored procedure solution, you can remove the nested cte and use variables to maintain the totals. You can then apply a case statement to the final result set to only put the totals on the first row. For example, case when @PgFirstRecord = RowNbr then @TotalPages else null end.

  • TheSQLGuru (1/8/2009)


    Peter, I couldn't really follow the code you had. Was it supposed to be just separate batch files or was some/all of it supposed to be wrapped up in sprocs/functions/etc? Any chance you can attach it as a zip of appropriate file types?

    You just have to paste each section together and it will build a nested cte.

  • TheSQLGuru (1/8/2009)


    Peter, I couldn't really follow the code you had. Was it supposed to be just separate batch files or was some/all of it supposed to be wrapped up in sprocs/functions/etc? Any chance you can attach it as a zip of appropriate file types?

    If you concatenate all the code snipets you have a working example. It consist of two parts, the first code block created a demo table (I assume you got that working). The others are meant to be one, but to allow for some ceomments I put them in seperate code blocks.

    The forum smileys are irritating tho, usualy they mean ')', but i see they sometimes cover the text "as" as well (when assigning a table alias). But to be clear I will an attachement this round. Containing separate scripts for generating a demo table and the demo itself.

  • Adam Haines (1/8/2009)


    Peter,

    I love how you only return the totals in the first row of the output. this will definitely reduce the amount of data being returned to the application. I have a few things to note here about the functionality and you may have addressed them in your environment. First there should be some sort of filter in the first CTE. This way you can reduce the amount of data searched through. Secondly, have you looked at using a stored procedure to handle the code, instead of the application executing ad-hoc SQL? The reason I ask is, the db engine will never be able to reuse a query plan unless an exact binary match already exists. This means you are not benefiting from query plan reuse or parameterization. Also, by introducing a stored procedure solution, you can remove the nested cte and use variables to maintain the totals. You can then apply a case statement to the final result set to only put the totals on the first row. For example, case when @PgFirstRecord = RowNbr then @TotalPages else null end.

    Thanks for the compliment 🙂

    I simplified the (fictual) query as far as I could for demo purposes, thus no where clause. But you are right in making it clear that when a filter need to be used in the query, it should be done in the topmost CTE (this might not immediatly obvious for those new to CTE's).

    In my production situation some of the base queries span several screen pages with complex (partial generated) conditions. The true variable values in the code are all send to SQL sever separate from the base query, it is a feature of the language I use this in (ColdFusion). So as long as a second request comes in with the exact same structure (which is quite likely when working on a grid that has default settings) the query plans wil get reused without a problem. There will just be several for each permutation / filtering options that are in use.

    There are cases when stored procedures make more sense of course, but not in the environment I work in and the type/method of development we do. It is all too small scale, no dedicted DBA and every developer needs to make his own SQL code. Usualy that goes quite well as the average SQL knowledge level ain't too bad. If things to seem too slow (that happends sometimes), it's me who works on it to solve the issue (quite liking that part I have to say).

  • Adam Haines (1/8/2009)


    ...First there should be some sort of filter in the first CTE. This way you can reduce the amount of data searched through...

    You are using CTE to build PagingCTE from which you then will extract the needed page. Hmm, on each request to the server you'll need to reconstruct the whole PagingCTE to retreive just a small fraction of it to the client.

    For the particular case when FirstName and LastName are NOT supplied, that is, Contacts are not filtered, from what I know about the optimizer, you may wish not using CTE at all. As far as I understand, you are trying to position the first record of the current page just after the last record of the page that was previosly retreived.

    Once you are using "ContactID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED", why not make use of it?

    CREATE PROCEDURE [dbo].[usp_ContactPaging]

    (

    @pageFirstRowIndex int,

    @pageRows int

    )

    AS

    SET ROWCOUNT @pageFirstRowIndex

    SELECT @FirstContactID= ContactId FROM Contacts ORDER BY ContactId

    At this point you have ContactId of the first row of the current page and can easily get next @pageRows rows.

    The nice thing about this trick, as far as I know about the optimizer, is that you are getting @FirstContactID not in @pageFirstRowIndex steps, but due to the binary search in much less steps. For a million records it should be sort of 20.

  • phystech (1/10/2009)


    Adam Haines (1/8/2009)


    ...First there should be some sort of filter in the first CTE. This way you can reduce the amount of data searched through...

    You are using CTE to build PagingCTE from which you then will extract the needed page. Hmm, on each request to the server you'll need to reconstruct the whole PagingCTE to retreive just a small fraction of it to the client.

    For the particular case when FirstName and LastName are NOT supplied, that is, Contacts are not filtered, from what I know about the optimizer, you may wish not using CTE at all. As far as I understand, you are trying to position the first record of the current page just after the last record of the page that was previosly retreived.

    Once you are using "ContactID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED", why not make use of it?

    CREATE PROCEDURE [dbo].[usp_ContactPaging]

    (

    @pageFirstRowIndex int,

    @pageRows int

    )

    AS

    SET ROWCOUNT @pageFirstRowIndex

    SELECT @FirstContactID= ContactId FROM Contacts ORDER BY ContactId

    At this point you have ContactId of the first row of the current page and can easily get next @pageRows rows.

    The nice thing about this trick, as far as I know about the optimizer, is that you are getting @FirstContactID not in @pageFirstRowIndex steps, but due to the binary search in much less steps. For a million records it should be sort of 20.

    I am not sure if I understand you correctly. but here is some thoughts.

    A primary key (identity or otherwise) cannot be used due to any sensible order by clause in the base query. Remember here that an order by on just the identity/primary key is meaningless, and not a case that is valid to consider for optimization. Now you might suggest using other knowledge about the previous visited page to speed up subsequent ones, but here you have the issue of changing data making the assumption such an optimization relies on, invalid.

    As I see it:

    The best is to apply sensible filtering and indexes to speed up the base query. The resulting smaller set can be paged over with low cost and the results will always be accurate. Note that paging over a very large set is something you do not want as the results will be next to meaningless for the end user and end up not being used at all. In such a scenario you want the user to enter some filter restrictions so that you can use those in your base query to get the desired smaller set that can be page over efficiently. Often used filtering is to examine only one year at a time, filter by some category or require a minimum length for a search string (or any combination of those).

    Thus beyond paging over a moderately sized set, it is no longer a database design/code issue, but an application design issue instead. You got to combat the problems where they are caused and not try to fix resulting symptoms in the database by making assumption based optimizations. From this perspective it is even valid to set a maximum for the base set by using a top clause in the base query. This will keep performance in a desirable range and then have the application signal a warning that the result set is to large when the maximum number of rows is found. The user can then refine his search criteria without having to wait minutes for the first query to finish, I know of no user that is going to visit all 30000 pages anyway.

  • peter (1/10/2009)


    A primary key (identity or otherwise) cannot be used due to any sensible order by clause in the base query. Remember here that an order by on just the identity/primary key is meaningless, and not a case that is valid to consider for optimization.

    This is a bit over-generalized statement that the order by identity is meaningless. Just one example: imagine a data-logger application. There is an incoming stream of measurements, and absolutely guaranted is that the later measurement (with later datetime) will be stored under the greater identity. Then a user needs to get a page corresponding to desired date and move left or right on the next pages. The number of pages is enormous and a user don't care about the number, he just wants to move left/right from the chosen timepoint.

    Many apps could be reduced to this model. Even that about contacts with billions of records, but that's another story.

    ...Note that paging over a very large set is something you do not want as the results will be next to meaningless for the end user and end up not being used at all. In such a scenario you want the user to enter some filter restrictions so that you can use those in your base query to get the desired smaller set that can be page over efficiently. Often used filtering is to examine only one year at a time, filter by some category or require a minimum length for a search string (or any combination of those).

    Filtering by, say, category won't break the idea. Still it will be the same initial Id and the page size, just WHERE clause will change. Different order would be an issue, that could be solved by creating sort of "custom index column" in adition to the identity. Maintaining such "custom index" will be a headache, but when the number of requests is very big it could help paging over the huge date.

    Thus beyond paging over a moderately sized set, it is no longer a database design/code issue, but an application design issue instead. You got to combat the problems where they are caused and not try to fix resulting symptoms in the database by making assumption based optimizations.

    Why not "making assumption based optimizations" and why to treat an app designer like an enemy? 🙂

  • phystech (1/10/2009)


    peter (1/10/2009)


    A primary key (identity or otherwise) cannot be used due to any sensible order by clause in the base query. Remember here that an order by on just the identity/primary key is meaningless, and not a case that is valid to consider for optimization.

    This is a bit over-generalized statement that the order by identity is meaningless. Just one example: imagine a data-logger application. There is an incoming stream of measurements, and absolutely guaranted is that the later measurement (with later datetime) will be stored under the greater identity. Then a user needs to get a page corresponding to desired date and move left or right on the next pages. The number of pages is enormous and a user don't care about the number, he just wants to move left/right from the chosen timepoint.

    Many apps could be reduced to this model. Even that about contacts with billions of records, but that's another story.

    ...Note that paging over a very large set is something you do not want as the results will be next to meaningless for the end user and end up not being used at all. In such a scenario you want the user to enter some filter restrictions so that you can use those in your base query to get the desired smaller set that can be page over efficiently. Often used filtering is to examine only one year at a time, filter by some category or require a minimum length for a search string (or any combination of those).

    Filtering by, say, category won't break the idea. Still it will be the same initial Id and the page size, just WHERE clause will change. Different order would be an issue, that could be solved by creating sort of "custom index column" in adition to the identity. Maintaining such "custom index" will be a headache, but when the number of requests is very big it could help paging over the huge date.

    Thus beyond paging over a moderately sized set, it is no longer a database design/code issue, but an application design issue instead. You got to combat the problems where they are caused and not try to fix resulting symptoms in the database by making assumption based optimizations.

    Why not "making assumption based optimizations" and why to treat an app designer like an enemy? 🙂

    My comments are focused on non-meaningful primary keys, which is what identity columns are. It is just a compact number (32 bit int in most cases) and contrary to popular believe cannot always be assumed to be in time sequential order (identity inserts). Nor can it be assumed the total range is without gaps for that matter (failed insert transactions or explicit deletions). Thus sorting on identity is in fact likely to be meaningless unless special conditions are met (which are very hard to guarantee).

    Your case is quite simple and efficient, BUT it does not follow the same usage pattern as the generic paging solutions posted here that can handle any sorting required. Your case only works perfectly when you go 1 page forward or 1 page backward (with some special handling) or when there are no gaps in the whole identity range. If those conditions are not met, you cannot go from page 10 to 111 that easy. In your example it is safer to use some date field with in index on it to work with and have the user specify a datetime range to filter on in the first place (it is unlikely the user is just blindly paging without looking for something specific).

    Sure there are more special cases, there will always be some, and in each special case you want a special solution and that solution is more likely to be in the design of the application layer then in the database. And I am in no way hostile to designers, nor developers (I am one myself) mind you. It is just that you should confront/combat problems where they originate. If an user interface promotes inefficient actions (like paging trough 100 pages looking for a specific entry as it has no search), it generates unnecessary load on the database layer. Now you can optimize and special case all you want in the database, but be honest...that is not the best place to deal with the issue. If a redesign of that part of the application results to less browsing a user works more efficient and so does your application and database.

    PS.

    An application can do more to make interactive (browsing in this case) more responsive and efficient. It can work with larger/multiple pages internally and only show one page to the user. Say you got a visible page size of 50 entries, you simply load 550 (11 pages) using a SQL paging mechanism (which is still efficient and from there you can go 5 pages back or forward without any database access.

  • peter (1/10/2009)


    My comments are focused on non-meaningful primary keys, which is what identity columns are. It is just a compact number (32 bit int in most cases) and contrary to popular believe cannot always be assumed to be in time sequential order (identity inserts). Nor can it be assumed the total range is without gaps for that matter (failed insert transactions or explicit deletions). Thus sorting on identity is in fact likely to be meaningless unless special conditions are met (which are very hard to guarantee).

    - You can achieve "time sequential order" by creating a "buffer" table for incoming data. And then move the blocks of sorted records from that buffer table into the main table;

    - Gaps in the range should not matter since you could use "TOP" clause to retreive the required number of records.

    ... In your example it is safer to use some date field with in index on it to work with and have the user specify a datetime range to filter on in the first place (it is unlikely the user is just blindly paging without looking for something specific)

    Yes.

    ...It is just that you should confront/combat problems where they originate.

    Yes, it would be better to reuse proven technique on this or that side (in our case - database side). But what if the technique just don't fit the app requirements.

    An application can do more to make interactive (browsing in this case) more responsive and efficient. It can work with larger/multiple pages internally and only show one page to the user. Say you got a visible page size of 50 entries, you simply load 550 (11 pages) using a SQL paging mechanism (which is still efficient and from there you can go 5 pages back or forward without any database access.

    Yes, but a user will wait for 11 pages to come even if he needs only one. In Ajax way it would be to supply a user with the requested page and then to send him and store left and right pages on the client when he is working with the requested page. On moving him to the left page later - feed his client buffer with the next left page.

Viewing 9 posts - 61 through 68 (of 68 total)

You must be logged in to reply to this topic. Login to reply