• phystech (1/10/2009)


    Adam Haines (1/8/2009)


    ...First there should be some sort of filter in the first CTE. This way you can reduce the amount of data searched through...

    You are using CTE to build PagingCTE from which you then will extract the needed page. Hmm, on each request to the server you'll need to reconstruct the whole PagingCTE to retreive just a small fraction of it to the client.

    For the particular case when FirstName and LastName are NOT supplied, that is, Contacts are not filtered, from what I know about the optimizer, you may wish not using CTE at all. As far as I understand, you are trying to position the first record of the current page just after the last record of the page that was previosly retreived.

    Once you are using "ContactID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED", why not make use of it?

    CREATE PROCEDURE [dbo].[usp_ContactPaging]

    (

    @pageFirstRowIndex int,

    @pageRows int

    )

    AS

    SET ROWCOUNT @pageFirstRowIndex

    SELECT @FirstContactID= ContactId FROM Contacts ORDER BY ContactId

    At this point you have ContactId of the first row of the current page and can easily get next @pageRows rows.

    The nice thing about this trick, as far as I know about the optimizer, is that you are getting @FirstContactID not in @pageFirstRowIndex steps, but due to the binary search in much less steps. For a million records it should be sort of 20.

    I am not sure if I understand you correctly. but here is some thoughts.

    A primary key (identity or otherwise) cannot be used due to any sensible order by clause in the base query. Remember here that an order by on just the identity/primary key is meaningless, and not a case that is valid to consider for optimization. Now you might suggest using other knowledge about the previous visited page to speed up subsequent ones, but here you have the issue of changing data making the assumption such an optimization relies on, invalid.

    As I see it:

    The best is to apply sensible filtering and indexes to speed up the base query. The resulting smaller set can be paged over with low cost and the results will always be accurate. Note that paging over a very large set is something you do not want as the results will be next to meaningless for the end user and end up not being used at all. In such a scenario you want the user to enter some filter restrictions so that you can use those in your base query to get the desired smaller set that can be page over efficiently. Often used filtering is to examine only one year at a time, filter by some category or require a minimum length for a search string (or any combination of those).

    Thus beyond paging over a moderately sized set, it is no longer a database design/code issue, but an application design issue instead. You got to combat the problems where they are caused and not try to fix resulting symptoms in the database by making assumption based optimizations. From this perspective it is even valid to set a maximum for the base set by using a top clause in the base query. This will keep performance in a desirable range and then have the application signal a warning that the result set is to large when the maximum number of rows is found. The user can then refine his search criteria without having to wait minutes for the first query to finish, I know of no user that is going to visit all 30000 pages anyway.