Custom Pagination in SQL Server 2005

  • @GaryN,

    I am one of those .net developers as well. What holds me back using Adam's solution is mainly a maintenance thing. Having a few dozen tables and views within a application easily, keeping track of separate paging procedures is nasty.

    I am using this procedure in combination with a 'return only the recordcount' version. Which looks like SELECT COUNT(*) FROM {0} WHERE {1}, constructed from the same tablename/whereclause parameters. Recordcount/pagesize delivers me the number of pages.

    Personally I am not experiencing any strong-typing issues. Since I am using a code-generator to generate my data classes and CRUD procedures. These classes grant me access to column-name properties. When you use these, you can compile-error track easily in your code when column name changes require adjustments. A datareader then populates my strong typed class. Btw, this same datareader could just as easy fill a strong types dataset too.

    Regarding security, respecting the normal rules like minimal rights only and checking user input for malicious input, I think it's just as safe an anything else. Many times you end up constructing the clauses without user-input at all, by combining combobox selections etc.

    Regarding performance, it does what I need it to do, which is returning somewhere between 25 to 500 rows in a fraction of a second. Off course common sense is the main factor here, make sure your where clause addresses indexed fields. Pretty sure 'my way' is not the most performant.

    No doubt there are dozens of simular solutions for paging. I think Adam has provided us with a great insight into his solution. Stick with whatever works for you, would be my advice here.

  • All good for SQL 2005, but what for SQL 2000 ?

    In that case, you can have a look at the article I wrote : http://www.codeproject.com/KB/aspnet/paging_various_databases.aspx

  • I agree with the comments about increasing resource utilization on the database server. This is an issue of architecture and how you are going to address pagination. Within any particular software package it should be done in the same method if possible and this will allow that but what is the true penalty for using this method? Is it better to distribute the pagination processing to the webserver? What are the costs interms of complex queries being passed around. And what about the reforming of this list? I doubt SQL is keeping everything locked down waiting for the next request in this method. So is there a possibility or probability of shifting data?

  • What happened to a simple query like bellow? You only need to set the number of rows per page and the number of page you want to retrieve. I added the versions for both 2K and 2K5 because the query uses the TOP clause and it needs to be dynamic in 2K while in 2K5 and up you can use variables(expressions) in the TOP clause. The "numbers" table in this case was just that, numbers from 1 to 10000 but it can be any table. It helps if theres is a unique key on the table of course. If you change the page_number value you will retrieve the corresponding rows for that page (26 to 50 in this case). You can also change to a different number of rows per page.

    DECLARE

    @sql varchar(1000),

    @rows_per_page int,

    @page_number int

    -- set the number of rows per page

    SELECT

    @rows_per_page=25

    -- set the page number you want to retrieve

    SELECT

    @page_number=2

    -- SQL 2000 version using dynamic query

    SELECT @sql='

    SELECT

    *

    FROM

    numbers act

    INNER JOIN

    (

    SELECT TOP '+cast(@rows_per_page as varchar)+'

    number

    FROM

    (

    SELECT TOP '+cast(@page_number*@rows_per_page as varchar)+'

    number

    FROM

    numbers

    ORDER BY

    number ASC

    ) AS Act1

    ORDER BY

    number DESC

    ) AS Act2

    ON Act2.number=act.number

    ORDER BY

    act.number ASC

    '

    EXEC (@sql)

    -- SQL 2005 version where variables can be used in TOP clauses

    SELECT

    *

    FROM

    numbers act

    INNER JOIN

    (

    SELECT TOP (@rows_per_page)

    number

    FROM

    (

    SELECT TOP (@page_number*@rows_per_page)

    number

    FROM

    numbers

    ORDER BY

    number ASC

    ) AS Act1

    ORDER BY

    number DESC

    ) AS Act2

    ON Act2.number=act.number

    ORDER BY

    act.number ASC

    Don't just give the hungry man a fish, teach him how to catch it as well.

    the sqlist

  • one flaw would be ORDER BY and TOP. Check this out:

    Check this out

    http://blogs.msdn.com/queryoptteam/archive/2006/03/24/560396.aspx

  • What can we do to make this query index seek? Well, we really only have two options. One is to use dynamic sql to build the predicate dynamically. Two is to introduce control flow logic and have differing statements for each possibility

    Third option is to use CASE expression in a smart way instead of using either the COALESCE function or (WHERE FirstName = @FirstName OR @FirstName IS NULL) trick.

  • phystech (1/6/2009)


    What can we do to make this query index seek? Well, we really only have two options. One is to use dynamic sql to build the predicate dynamically. Two is to introduce control flow logic and have differing statements for each possibility

    Third option is to use CASE expression in a smart way instead of using either the COALESCE function or (WHERE FirstName = @FirstName OR @FirstName IS NULL) trick.

    This will still result in a table scan. If you look at the execution plan, you will see that COALESCE actually uses a case expression to evaluate each option in the function. This means using COALESCE and a case expression should result in the same execution plan. The reason you get a table scan with coalesce is because the predicate is always equal to itself, but the optimizer still has to check each row to see if it equals itself, thus resulting in a table scan.

  • Adam I agree with your statement on using Control Logic (and great article btw). I'm curious though, using this method have you run into problems with queries plans. What I mean is for example when the stored procedure is first ran it gets compiled with the query plan for the case where FirstName is null and LastName is what is index on; so subsequent times when FirstName is not null a unoptimal query plan is used.

    The solutions I know of are to have the procedure recompile each time it is used, thus a new query plan or use your control logic to call out other stored procedures that have just one query in them. I've prefered the 2nd option in the past, as I prefer the finite control and want to skip the hit of recompiling. But perhaps you know of a better solution?

    Thanks.

  • Although the article is well written but I have a word of advice. For very large tables row_number() over().. has been inefficient for me when it comes to pagination!


    * Noel

  • Eric Inman (1/6/2009)


    one flaw would be ORDER BY and TOP. Check this out:

    Check this out

    http://blogs.msdn.com/queryoptteam/archive/2006/03/24/560396.aspx%5B/quote%5D

    The title is "TOP 100 Percent ORDER BY Considered Harmful". My query doesn't use TOP 100(N)Percent. Another aspect is that normally and recommended by good practices any table should have a primary key where in SQL server is by default clustered. Clustered means is that the physical order of the rows will be determined by the ascending/descending order(asc by default) of the primary clustered key. Now, I agree that this is not always the situation, unfortunately I might add, usually due to bad design but normally that would be the case and I also noted in my example that a unique key (i should have said primary, i agree) would help. Conclusion is that I can rest assured that given the normal conditions, a primary clustered key, the query will work.

    Just try it.

    Cheers.

    Don't just give the hungry man a fish, teach him how to catch it as well.

    the sqlist

  • Bryan Smith (1/6/2009)


    Adam I agree with your statement on using Control Logic (and great article btw). I'm curious though, using this method have you run into problems with queries plans. What I mean is for example when the stored procedure is first ran it gets compiled with the query plan for the case where FirstName is null and LastName is what is index on; so subsequent times when FirstName is not null a unoptimal query plan is used.

    The solutions I know of are to have the procedure recompile each time it is used, thus a new query plan or use your control logic to call out other stored procedures that have just one query in them. I've prefered the 2nd option in the past, as I prefer the finite control and want to skip the hit of recompiling. But perhaps you know of a better solution?

    Thanks.

    Well you are partially correct in your assumption, but the stored procedure should not recompile every time. What you should see is a query plan for each set of options, in this case there are three. A first name and no last name, no first name and a last name, and finally no first name and no last name. For example, if you supply first name and no last name a query plan will be created. The next go around you supply no first name and a lastname.. this will result in a new query plan. You will be able to reuse the query plan for first name and no last name every time you execute the stored procedure with those parameters and likeswise for no first name and a last name. This means you should be reusing query plans quite often.

    You can use the code below to watch the execution plan count increase for each run.

    USE [tempdb]

    GO

    ;WITH cte

    AS

    (

    SELECT

    qs.execution_count,

    SUBSTRING(qt.text,qs.statement_start_offset/2,

    (CASE WHEN qs.statement_end_offset = -1

    THEN len(CONVERT(nvarchar(max), qt.text)) * 2

    ELSE qs.statement_end_offset

    END -qs.statement_start_offset)/2) AS query_text,

    qt.dbid, dbname=db_name(qt.dbid),

    qt.objectid

    FROM sys.dm_exec_query_stats qs

    CROSS apply sys.dm_exec_sql_text(qs.sql_handle) AS qt

    WHERE db_name(qt.dbid) = 'tempdb'

    )

    SELECT *

    FROM cte

    WHERE query_text LIKE ';WITH PagingCTE%'

    ORDER BY execution_count DESC

  • Adam Haines (1/6/2009)


    phystech (1/6/2009)


    What can we do to make this query index seek? Well, we really only have two options. One is to use dynamic sql to build the predicate dynamically. Two is to introduce control flow logic and have differing statements for each possibility

    Third option is to use CASE expression in a smart way instead of using either the COALESCE function or (WHERE FirstName = @FirstName OR @FirstName IS NULL) trick.

    This will still result in a table scan. If you look at the execution plan, you will see that COALESCE actually uses a case expression to evaluate each option in the function. This means using COALESCE and a case expression should result in the same execution plan. The reason you get a table scan with coalesce is because the predicate is always equal to itself, but the optimizer still has to check each row to see if it equals itself, thus resulting in a table scan.

    No, I said "use CASE expression in a smart way":

    ---

    WHERE CASE

    WHEN @FirstName IS NOT NULL THEN

    CASE WHEN FirstName = @FirstName THEN 1

    ELSE 0 END

    ELSE 1 END = 1

    ---

    instead of

    ---

    WHERE [FirstName] = COALESCE(@FirstName,FirstName)

    ---

    The difference is that column name "FirstName" is not involved in

    "CASE

    WHEN @FirstName IS NOT NULL THEN"

    Thus we have some sort of short circuiting preventing from the full table scan (At least I beleive so).

  • noeld (1/6/2009)


    Although the article is well written but I have a word of advice. For very large tables row_number() over().. has been inefficient for me when it comes to pagination!

    Noeld,

    I too have seen many pagination solutions hit the ceiling when they try to work off very large tables. The key things to note here are:

    -the where clause in the inner portion of the cte will make a large table relativley small with the proper indexing and filtering.

    - no reasonable person should ever need to return millions of rows of data on an application screen because that is simply too much data for them to evaluate.

    All-in-all, with the proper filteration and indexing this method should do fairly well in most scenarios even on large tables. Even in the examples provided, you get ms response times, in searching through a million records and that is pretty good.

  • phystech (1/6/2009)


    Adam Haines (1/6/2009)


    phystech (1/6/2009)


    What can we do to make this query index seek? Well, we really only have two options. One is to use dynamic sql to build the predicate dynamically. Two is to introduce control flow logic and have differing statements for each possibility

    Third option is to use CASE expression in a smart way instead of using either the COALESCE function or (WHERE FirstName = @FirstName OR @FirstName IS NULL) trick.

    This will still result in a table scan. If you look at the execution plan, you will see that COALESCE actually uses a case expression to evaluate each option in the function. This means using COALESCE and a case expression should result in the same execution plan. The reason you get a table scan with coalesce is because the predicate is always equal to itself, but the optimizer still has to check each row to see if it equals itself, thus resulting in a table scan.

    No, I said "use CASE expression in a smart way":

    ---

    WHERE CASE

    WHEN @FirstName IS NOT NULL THEN

    CASE WHEN FirstName = @FirstName THEN 1

    ELSE 0 END

    ELSE 1 END = 1

    ---

    instead of

    ---

    WHERE [FirstName] = COALESCE(@FirstName,FirstName)

    ---

    The difference is that column name "FirstName" is not involved in

    "CASE

    WHEN @FirstName IS NOT NULL THEN"

    Thus we have some sort of short circuiting preventing from the full table scan (At least I beleive so).

    I believe this will still cause an index scan/table scan. The problem I see with this is the variable in the predicate, which means the optimizer will have to compare the variable with the column value for each row to determine if it should be given the value of 1.

  • I remember reading a similar article on here back in 2007 (http://www.sqlservercentral.com/articles/Advanced+Querying/3181/[/url])

    Always good for these things to pop up every now and again, keeps people thinking about ways to do things as well as letting newer members know that these things exist.

    And yes, these functions belong to that well know SQL saying of "it depends". If you only have a dataset with a few hundred or thousand records then server side paging is probably a waste of time and effort. However, if you have millions of rows that are going to be brought back then suddenly it all makes sense. Data manipulation is what SQL is good at, why not let it do it (when appropriate and regardless of whether you use a 2k or 2k5 method).

    the only thing to think about really is: Where is the best place to page my data?

    Think about:

    - How many rows will be returned?

    - How heavily used is the SQL server?

    - How heavily used is the web server? **for web applications, remember, the rows not being sent to the browser are still being stored on the web server)

    - How good is your network?

    - Is the Moon in the 8th house of Aquarius?

    And so on..

    Also don't be afraid to use these code samples and experiment, just because you don't have a use for this exact implementation, doesn't mean that a modified version won't solve another problem.

    -d

Viewing 15 posts - 31 through 45 (of 68 total)

You must be logged in to reply to this topic. Login to reply