Custom Pagination in SQL Server 2005

  • Adam Haines (1/6/2009)

    WHERE CASE

    WHEN @FirstName IS NOT NULL THEN

    CASE WHEN FirstName = @FirstName THEN 1

    ELSE 0 END

    ELSE 1 END = 1

    ---

    instead of

    ---

    WHERE [FirstName] = COALESCE(@FirstName,FirstName)

    ---

    The difference is that column name "FirstName" is not involved in

    "CASE

    WHEN @FirstName IS NOT NULL THEN"

    Thus we have some sort of short circuiting preventing from the full table scan (At least I beleive so).

    I believe this will still cause an index scan/table scan. The problem I see with this is the variable in the predicate, which means the optimizer will have to compare the variable with the column value for each row to determine if it should be given the value of 1.

    The optimizer knows that

    "CASE WHEN @FirstName IS NOT NULL THEN"

    can be evaluated before the query is launched, since no column name is involved here and the resulting value could be either 0 or 1. So it can branch the plan into two subtrees. One subtree would result in logical "true" right at this point (no @FirstName was supplied) and the second subtree would go evaluating "FirstName = @FirstName " using the index, that is very quickly.

  • phystech (1/6/2009)


    Adam Haines (1/6/2009)

    WHERE CASE

    WHEN @FirstName IS NOT NULL THEN

    CASE WHEN FirstName = @FirstName THEN 1

    ELSE 0 END

    ELSE 1 END = 1

    ---

    instead of

    ---

    WHERE [FirstName] = COALESCE(@FirstName,FirstName)

    ---

    The difference is that column name "FirstName" is not involved in

    "CASE

    WHEN @FirstName IS NOT NULL THEN"

    Thus we have some sort of short circuiting preventing from the full table scan (At least I beleive so).

    I believe this will still cause an index scan/table scan. The problem I see with this is the variable in the predicate, which means the optimizer will have to compare the variable with the column value for each row to determine if it should be given the value of 1.

    The optimizer knows that

    "CASE WHEN @FirstName IS NOT NULL THEN"

    can be evaluated before the query is launched, since no column name is involved here and the resulting value could be either 0 or 1. So it can branch the plan into two subtrees. One subtree would result in logical "true" right at this point (no @FirstName was supplied) and the second subtree would go evaluating "FirstName = @FirstName " using the index, that is very quickly.

    Lets assume this is true (I did not test it so won't state yea or nay on that). What you WILL get is a cached plan with whatever gets executed first. That guarantees that other calls that want the opposite plan will be completely suboptimal in performance. If you are on SQL 2005 you can use OPTION (RECOMPILE) on the statement to help out, but that still assumes that the optimizer can and does do the right thing with whatever inputs are given, which I question.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (1/7/2009)


    phystech (1/6/2009)


    Adam Haines (1/6/2009)

    The optimizer knows that

    "CASE WHEN @FirstName IS NOT NULL THEN"

    can be evaluated before the query is launched, since no column name is involved here and the resulting value could be either 0 or 1. So it can branch the plan into two subtrees. One subtree would result in logical "true" right at this point (no @FirstName was supplied) and the second subtree would go evaluating "FirstName = @FirstName " using the index, that is very quickly.

    Lets assume this is true (I did not test it so won't state yea or nay on that). What you WILL get is a cached plan with whatever gets executed first. That guarantees that other calls that want the opposite plan will be completely suboptimal in performance. If you are on SQL 2005 you can use OPTION (RECOMPILE) on the statement to help out, but that still assumes that the optimizer can and does do the right thing with whatever inputs are given, which I question.

    I've been using the approach within rather complex query for some months. And my strong impression is that the short-circuiting works. Though I don't short-circuit it on NULL value, I do it on zero string length, but this should not be different. I'll try to test the approach "stand-alone" within a week and publish on my blog.

    Thanks.

  • Nice one.. The discussion also interesting..

  • I've been using the approach within rather complex query for some months. And my strong impression is that the short-circuiting works. Though I don't short-circuit it on NULL value, I do it on zero string length, but this should not be different. I'll try to test the approach "stand-alone" within a week and publish on my blog.

    Thanks.

    If you never passed in a NULL value then the CASE is surperfluous, right? You will never get (or need) a plan that executes the ELSE. I will say that it would be VERY helpful if this construct gives an index seek when appropriate and a table scan when it is not, but there will still be the cached plan issue to deal with.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (1/7/2009)

    ...If you never passed in a NULL value then the CASE is surperfluous, right?..

    No, zero length string plays the role of NULL. That is if the parameter is zero length then grab all the records (sort of).

  • phystech (1/7/2009)


    TheSQLGuru (1/7/2009)


    phystech (1/6/2009)


    Adam Haines (1/6/2009)

    The optimizer knows that

    "CASE WHEN @FirstName IS NOT NULL THEN"

    can be evaluated before the query is launched, since no column name is involved here and the resulting value could be either 0 or 1. So it can branch the plan into two subtrees. One subtree would result in logical "true" right at this point (no @FirstName was supplied) and the second subtree would go evaluating "FirstName = @FirstName " using the index, that is very quickly.

    Lets assume this is true (I did not test it so won't state yea or nay on that). What you WILL get is a cached plan with whatever gets executed first. That guarantees that other calls that want the opposite plan will be completely suboptimal in performance. If you are on SQL 2005 you can use OPTION (RECOMPILE) on the statement to help out, but that still assumes that the optimizer can and does do the right thing with whatever inputs are given, which I question.

    I've been using the approach within rather complex query for some months. And my strong impression is that the short-circuiting works. Though I don't short-circuit it on NULL value, I do it on zero string length, but this should not be different. I'll try to test the approach "stand-alone" within a week and publish on my blog.

    Thanks.

    Phystech,

    Can you confirm that the code you are posting actually generates index seeks? I still do not see how it is possible. I wan to step through the code you posted and demonstrate how the optimizer will treat each section.

    WHERE CASE

    WHEN @FirstName IS NOT NULL THEN

    CASE WHEN FirstName = @FirstName THEN 1

    ELSE 0 END

    ELSE 1 END = 1 and

    The first (outer) case expression does exactly what you say it will do. If the variable is null, the filter is short circuited. I agree with this. It is the inner case expression that I have a problem with. The inner case expression compares each FirstName value in the table to the variable @FirstName. I am sure we can both agree on this; however, I believe that the optimizer is treating this differently than you are expecting. The inner case expression will have to be evaluated for every row returned by the query. The optimizer has no way of knowing whether or not the FirstName Column equals @FirstName until it checks each row returned. You can confirm this by using your code in my initial stored procedure, with your modification.

    WHERE CASE

    WHEN @FirstName IS NOT NULL THEN

    CASE WHEN FirstName = @FirstName THEN 1

    ELSE 0 END

    ELSE 1 END = 1 and

    CASE

    WHEN @LastName IS NOT NULL THEN

    CASE WHEN LastName = @LastName THEN 1

    ELSE 0 END

    ELSE 1 END = 1

    You will see no matter what combination of FirstName and LastName is used the end result is the same. The behavior is the same regardless of the case expression being in the select statement or the predicate. This behavior occurs because the optimizer has to look through each returned rows and evaluate FirstName to see if it matches the variable. The optimizer cannot seek a set of rows with the case expression and you cannot short circuit the inner case statement because you can potentially exclude rows. However, there is a small catch that may be covering up the actual behavior, in your environment. if you apply a differing filter that is always present, the case expressions behavior can be hidden away in another execution plan task. For example, you use your code in the where clause on firstname and use LastName like @LastName + '%', for lastName. You will see that the scan is gone. The scan is gone because the optimizer was able to use an index and then can simple apply the case express on each of the returned rows. The bottom line is you will always get an index scan using the logic you have posted, unless you use some other filter that gaurentees a seek. I would still recommend avoiding case expressions in the where clause, but to each is own.

  • Adam Haines (1/7/2009)

    ...Can you confirm that the code you are posting actually generates index seeks?...

    I need a couple of days to prove/disprove it, right now I'm working on my current project. I'll post here then.

    Thanks.

  • phystech (1/7/2009)


    TheSQLGuru (1/7/2009)

    ...If you never passed in a NULL value then the CASE is surperfluous, right?..

    No, zero length string plays the role of NULL. That is if the parameter is zero length then grab all the records (sort of).

    Your "sort of" is a critical statement! 🙂

    WHERE CASE

    WHEN @FirstName IS NOT NULL THEN

    CASE WHEN FirstName = @FirstName THEN 1

    ELSE 0 END

    ELSE 1 END = 1

    The ELSE 1 will never be hit, thus the entire statement can be simplified to this (since you never have a NULL value passed in):

    WHERE CASE WHEN FirstName = @FirstName THEN 1

    ELSE 0 END = 1

    and that, I believe, is no different logically than this:

    WHERE FirstName = @FirstName

    in which case you will obviously do an index seek if the statistics estimate 1-2% of the rows will be hit based on the incoming value of @FirstName.

    And we are back to the parameter sniffing plan caching problem. :w00t:

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (1/7/2009)


    phystech (1/7/2009)


    TheSQLGuru (1/7/2009)

    ...If you never passed in a NULL value then the CASE is surperfluous, right?..

    No, zero length string plays the role of NULL. That is if the parameter is zero length then grab all the records (sort of).

    Your "sort of" is a critical statement! 🙂

    WHERE CASE

    WHEN @FirstName IS NOT NULL THEN

    CASE WHEN FirstName = @FirstName THEN 1

    ELSE 0 END

    ELSE 1 END = 1

    The ELSE 1 will never be hit, thus the entire statement can be simplified to this (since you never have a NULL value passed in):

    WHERE CASE WHEN FirstName = @FirstName THEN 1

    ELSE 0 END = 1

    and that, I believe, is no different logically than this:

    WHERE FirstName = @FirstName

    in which case you will obviously do an index seek if the statistics estimate 1-2% of the rows will be hit based on the incoming value of @FirstName.

    And we are back to the parameter sniffing plan caching problem. :w00t:

    Please, don't take it too literally 🙂 I do not compare it to NULL:

    WHERE CASE

    WHEN LEN(@FirstName) > 0 THEN

    CASE WHEN FirstName = @FirstName THEN 1

    ELSE 0 END

    ELSE 1 END = 1

  • TheSQLGuru (1/7/2009)


    phystech (1/7/2009)


    TheSQLGuru (1/7/2009)

    ...If you never passed in a NULL value then the CASE is surperfluous, right?..

    No, zero length string plays the role of NULL. That is if the parameter is zero length then grab all the records (sort of).

    Your "sort of" is a critical statement! 🙂

    WHERE CASE

    WHEN @FirstName IS NOT NULL THEN

    CASE WHEN FirstName = @FirstName THEN 1

    ELSE 0 END

    ELSE 1 END = 1

    The ELSE 1 will never be hit, thus the entire statement can be simplified to this (since you never have a NULL value passed in):

    WHERE CASE WHEN FirstName = @FirstName THEN 1

    ELSE 0 END = 1

    and that, I believe, is no different logically than this:

    WHERE FirstName = @FirstName

    in which case you will obviously do an index seek if the statistics estimate 1-2% of the rows will be hit based on the incoming value of @FirstName.

    And we are back to the parameter sniffing plan caching problem. :w00t:

    Unfortunately, this still should yield a index scan. The case expression is still evaluated for every row, as the optimizer doesnt know if the variable matches the case expression, until it evaluates it. For this reason the optimizer cannot seek and much evaluate each FirstName.

    Here is a simple example.

    select firstname

    from contacts

    WHERE

    CASE WHEN FirstName = @FirstName

    THEN 1

    ELSE 0

    END = 1

    QueryPlan:

    |--Index Scan(OBJECT:([tempdb].[dbo].[Contacts].[IXC_FirstName]), WHERE:(CASE WHEN [tempdb].[dbo].[Contacts].[FirstName]=[@firstname] THEN (1) ELSE (0) END=(1)))

  • Which is why I was saying it should be decompiled down to it's logical equivalent, which will do the right thing:

    create table #t (fname varchar(20))

    insert #t (fname)

    select top 100 'asdf'

    from sysobjects

    insert #t (fname) values ('zzzz')

    create index a on #t (fname)

    declare @fname varchar(20)

    set @fname = 'asdf' --does index scan

    select *

    from #t

    WHERE

    CASE WHEN fname = @fname

    THEN 1

    ELSE 0

    END = 1

    dbcc freeproccache

    go

    declare @fname varchar(20)

    set @fname = 'zzzz' --does index scan as well, even though only 1 row

    select *

    from #t

    WHERE

    CASE WHEN fname = @fname

    THEN 1

    ELSE 0

    END = 1

    dbcc freeproccache

    go

    declare @fname varchar(20)

    set @fname = 'asdf' --does index seek, since covering and ordered

    select fname

    from #t

    WHERE fname = @fname

    dbcc freeproccache

    go

    declare @fname varchar(20)

    set @fname = 'zzzz' --does index seek, since 1 row

    select fname

    from #t

    WHERE fname = @fname

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • First I am back a day later then I expected and definatly had hoped for, sorry for that. And please someone FIX this forum (smileys in code, persistent extended page widths even after I reformatted the code to fit).

    Now my 'paging' code that I said would contribute. Bear in mind I work in an enviromment that is able to send parameterized queries from the application layer to the database. The SQL code itself originates from the application layer most of the time where queries are created dynamically with relativly ease.

    What I created a while back was a method of reducing the SQL complexity when dealing with paging on the application side. For it to work the main query needs to be written in a CTE and return a sequence number that can be sorded on in the actual query. It is also used by the paging logic in this query to exclude unwanted records among other things.

    At the end of the normal query (now wrapped in a CTE), an application function is called to generate the paging part of the query for the developer automatically. The part deals with limiting the amount of data is transmidded by SQL sever to the appliation sever. The code also takes care of out of bounds cases and adjusts the page number accordingly. On top of this an optional part generates a comma seperated list of primary key IDs of all the records in the result, including the ones not returned due to paging.

    First we need some demo data (10 000 records should do):

    -- Create demo table with primary key

    --

    create table dbo.DemoTable

    (

    Id int not null

    , Name varchar(36) not null

    , SomeData varbinary(20) not null

    , constraint pk_DemoTable primary key clustered( ID )

    )

    ;

    go

    -- Insert 10.000 test records into the demo table

    --

    insert into dbo.DemoTable( Id, Name, SomeData )

    select

    numbers.N

    , newID()

    , hashBytes( 'SHA1', cast( numbers.N as varchar ) )

    from

    (

    select top 10000

    row_number() over ( order by t1.object_id )

    from

    master.sys.All_Columns as t1 with( nolock )

    cross join master.sys.All_Columns as t2 with( nolock )

    ) as numbers( N )

    go

    -- Index the demo table

    --

    create index ix_DemoTable_Name on dbo.DemoTable ( Name );

    create index ix_DemoTable_SomeData on dbo.DemoTable ( SomeData );

    go

    -- Show the first few records to get a feel of what the data looks like

    --

    select top 10 * from dbo.DemoTable where Id <= 10

    go

    Next is a demo query itself that demonstrates the techniques used and makes clear how easy the results can be processed further:

    -- For demo purposes only: these parameters are normaly injected into

    -- the paging secopn of the query as constants

    --

    declare @pgNum int;

    declare @pgSize int;

    select @pgNum = 300, @pgSize = 25;

    ;

    with

    /* This code is put into the page that shows the grid */

    pgFromQ as

    (

    select

    row_number() over

    (

    -- Order by, to be generated by application code so grid sorting can change over time

    order by

    demo.Name desc

    , demo.Id asc -- ALWAYS include the primary key as the last field when sorting

    ) as 'pgRowNumber'

    , demo.Id

    , demo.Name

    , demo.SomeData

    from

    dbo.DemoTable demo

    )

    Then the developer calls a function that generates the rest of the query.

    It requires a few parameters:

    1: Page to show (4)

    2: Page size (25)

    3: Optional field name ("Id" in this example).

    When specified it causes a comma seperated list to be generated of all the IDs.

    This can be used to 'browse' over the results in the any detail screen.

    It looks like (in whatever language you use):

    pagerSQL( 4, 25, "Id" )

    This code then generates the remainder of the query (2 parts). The first part is some more CTEs that handle page clipping and the like.

    , pgCountQ( recordCount, pageCount ) as

    (

    -- determines the number of pages

    select top 1

    count(*)

    , floor( (count(*) + @pgSize - 1) / @pgSize )

    from

    pgFromQ

    )

    , pgPageQ( activePage, recordCount, pageCount ) as

    (

    select top 1

    case

    when @pgNum < 1

    then 1

    when @pgNum > pageCount

    then pageCount

    else @pgNum

    end

    , recordCount

    , pageCount

    from

    pgCountQ

    )

    , pgRangeQ( firstRecord, activePage, recordCount, pageCount ) as

    (

    select top 1

    case

    when activePage = 0

    then 0

    else ( @pgSize * ( activePage - 1 ) + 1 )

    end

    , activePage

    , recordCount

    , pageCount

    from

    pgPageQ

    )

    -- this is the optional part that is generates the comma separated list of a key field ("Id" in this example)

    , pgRecListQ( recIDs ) as

    (

    select

    stuff

    (

    (

    select

    cast( ',' as varchar(max) ) + cast( i.Id as varchar )

    from

    pgFromQ as i

    order by

    i.pgRowNumber

    for

    xml path('')

    )

    , 1

    , 1

    , ''

    )

    )

    -- end of optional part

    The second part of the generated code performs the actual query and makes sure the least amount of data is returned by SQL.

    /*

    -- And finaly the select that returns the result set (only the first record of the resulting page

    -- contains extra such as number of first record, the active page number, the recordcount and the pagecount.

    -- And inf choosen also the optional comman separated list of IDs that represent the non-paged result.

    */

    select

    case when pgRangeQ.recordCount = 0 or pgFromQ.pgRowNumber = pgRangeQ.firstRecord then pgRangeQ.firstRecord else null end as 'pgFirstRecord'

    , case when pgRangeQ.recordCount = 0 or pgFromQ.pgRowNumber = pgRangeQ.firstRecord then pgRangeQ.activePage else null end as 'pgActivePage'

    , case when pgRangeQ.recordCount = 0 or pgFromQ.pgRowNumber = pgRangeQ.firstRecord then pgRangeQ.recordCount else null end as 'pgRecordCount'

    , case when pgRangeQ.recordCount = 0 or pgFromQ.pgRowNumber = pgRangeQ.firstRecord then pgRangeQ.pageCount else null end as 'pgPageCount'

    -- this is the optional part that is returns the comma separated list of a key field ("Id" in this example)

    , case when pgFromQ.pgRowNumber = pgRangeQ.firstRecord then ( select top 1 recIDs from pgRecListQ ) else null end as 'pgRecordIDs'

    -- end of optional part

    , pgFromQ.*

    from

    pgRangeQ

    left join pgFromQ on pgFromQ.pgRowNumber between pgRangeQ.firstRecord and pgRangeQ.firstRecord + @pgSize - 1

    order by

    pgFromQ.pgRowNumber

    ;

    The automaticaly generated code might seem complicated at first but once understood it is not that bad, nor is it expensive. It is nicely hidden from the functionality of the original query and it makes processing the results easy and efficient. The only times I experienced bad performance was when testing with generation of the optional comma separated list over say 100 000 records and when the original/main query was slow. It is always a matter of using the right design for the job, so that ain't a negative for the method itself. It certainly beats all the paging I ever seen before in applications on speed, transmission efficiency and usability.

  • Adam Haines (1/7/2009) ...Unfortunately, this still should yield a index scan. The case expression is still evaluated for every row, as the optimizer doesnt know if the variable matches the case expression, until it evaluates it. For this reason the optimizer cannot seek and much evaluate each FirstName...

    You seem to be right. I created a table in MS SQL Managament Studio

    CREATE TABLE GUIDs(

    GUID uniqueidentifier

    )

    CREATE INDEX IX_GUID ON GUIDs(GUID)

    Populated it with 10 million records with unique GUID in each record. And then tried to fetch an existent/non-existent record from the table using

    - direct

    WHERE GUID = @GUID

    - COALESCE

    WHERE GUID = COALESCE(@GUID,GUID)

    - and CASE:

    WHERE CASE

    WHEN @GUID IS NOT NULL THEN

    CASE WHEN GUID = @GUID THEN 1

    ELSE 0 END

    ELSE 1 END = 1

    For example:

    DECLARE @GUID uniqueidentifier

    SET @GUID = '093cd7ed-db22-40fb-9f77-5be0e33aaaaa'

    SELECT * FROM GUIDS

    WHERE CASE

    WHEN @GUID IS NOT NULL THEN

    CASE WHEN GUID = @GUID THEN 1

    ELSE 0 END

    ELSE 1 END = 1

    Yes, direct "WHERE GUID = @GUID" fetched the record in no time

    "COALESCE" and "CASE" seemed to load the table in memory (~400 Mbyte) on the first run and then full-scanned the table. The loading took about 5 secs and the scan took ~1 sec. The PC was Core Duo 3.16 with 4G memory under Vista 64 Home.

    Yes, I did "Display estimated execution plan" in SQL Managament Studio and saw "table scan cost 100%". I just did not beleive my eyes the optimizer was that stupid (in my opinion).

    So, my apologies, and thanks for the oportunity to gain the knowledge.

  • Peter, I couldn't really follow the code you had. Was it supposed to be just separate batch files or was some/all of it supposed to be wrapped up in sprocs/functions/etc? Any chance you can attach it as a zip of appropriate file types?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 15 posts - 46 through 60 (of 68 total)

You must be logged in to reply to this topic. Login to reply