Odd Construct in a WHERE clause

  • I came across an odd construct in a WHERE clause and I was wondering if anyone else had seen this before and could possibly explain why it would cause a Stored Proc to consume 100% CPU and never return a result. It had previously been working fine. I was able to build other stored procs with this construct and had no errors. It looks like this

    WHERE (colOne = @num1 or @num1 = 0)

    and (colTwo = @num2 or @num2 = 0)

    So if you had a table such as

    colOne | colTwo

    -----------------

    1 | 2

    2 | 2

    2 | 3

    3 | 4

    And your SQL statement was:

    DECLARE @num1 int

    DECLARE @num2 int

    SET @num1 = 0

    SET @num2 = 0

    SELECT *

    FROM myTable

    WHERE (colOne = @num1 or @num1 = 0)

    and (colTwo = @num2 or @num2 = 0)

    You would return all rows in myTable. If you changed the @num2 = 2 then you would only return the first and second rows (1,2) and (2,2). If you again changed @num1=2 and @num2=2 then you would only return the second row (2,2).

    It seems a wonderful way to put a "include all values for a column" in your where statement. It isn't quite like saying "return all rows for colOne because it is TRUE when setting @num1 = 0" but that is the way it behaves.

    I have been working SQL server for a number of years and never came across this before, so please pardon me if it sounds dumb to ask, but can anyone shed light on what SQL server is doing "under the hood" to make this work? I have done some searching on it, but since I can't quite figure out a good way to describe it I've not found any other postings on this.

    Also, does anyone have any insight in to why this clause would cause a stored proc to start a CPU race event and run indefinitely? I have created other stored procs with the example I posted and it doesn't cause this behavior. The stored proc that caused this issue was evaluating a table with just over 200k rows in it.

    Any insights would be appreciated.

    Thank you,

    -Joseph

  • It really has nothing to do with SQL itself, it is simple logic.

    Let's look at the first part of the WHERE clause:

    (colOne = @num1 or @num1 = 0)

    This part is true when colOne = @num1 or @num1 = 0 are true. If @num1 = 0 is true, it doesn't matter if colOne = @num1 or not. If @num1 is not zero (0), then for this statemetn to be true, colOne = @num1 must be true.

    This logic can also be applied to the second part of the where clause:

    (colTwo = @num2 or @num2 = 0)

    Taking the entire where clause now, both statements must evaluate to true because of the AND between them.

    Does this help?

    edit:

    As for the way it is working in this one case, I have no idea. We'd need to see the actual structure of the table(s) and the code to really help you there.

    😎

  • Thanks for the reply, Lynn.

    That part I get. I guess what I find odd the second part of the statement doesn't evaluate the equality of a column but of a variable, yet it still decides what is returned for that column.

    In my example I used two statements grouped with ( )'s but if you were to remove them and use only one of the columns in the where statement it would evaluate the same way. So, I guess my question might be better worded to ask, "How is SQL Server grouping what appears to be a non-related equality (@num1 =0) with a specific column when I would think it should require using the column in the evaluation, like (@colOne = @num1)."

    -Joseph

  • To me... This makes no sense.. "and (colTwo = @num2 or @num2 = 0)"

    I do not know why you are doing the second or.... "or @num2 = 0"

    I think you might be able to do this to accomplish what you are trying to do with a COALESCE function

    SELECT *

    FROM myTable

    WHERE col1 = COALESCE(@NUM1, col1)

    and col2 = COALESCE(@NUM2, col2)

    I think this will work for you. If you do not assign a value to the @num variables you will receive all rows. If you assign a value the data will drill down to those values....

    Does this help?

  • What this does (colOne = @num1 or @num1 = 0) does is create a conditional where clause statement. If the value of @num1 = 0, you get all rows returned. If @num1 is not equal to 0, then you will only get the rows where colOne = @num1.

    The same is true with (colTwo = @num2 or @num2 = 0).

    Also, (colOne = @num1 or @num1 = 0) and (colTwo = @num2 or @num2 = 0) DOES NOT EQUAL

    colOne = @num1 or @num1 = 0 and colTwo = @num2 or @num2 = 0.

    Without the parens, @num1 = 0 and colTwo = @num2 is evaluated first >

    colOne = @num1 or (@num1 = 0 and colTwo = @num2) or @num2 = 0

    Here is some test code to play with:

    create table #TestData (

    colOne int,

    colTwo int);

    insert into #TestData (

    colOne,

    colTwo)

    select 1,2 union all

    select 2,2 union all

    select 2,3 union all

    select 3,4;

    select * from #TestData;

    declare @num1 int,

    @num2 int;

    set @num1 = 0;

    set @num2 = 0;

    select

    colOne,

    colTwo

    from

    #TestData

    where

    (colOne = @num1 or @num1 = 0)

    and (colTwo = @num2 or @num2 = 0)

    set @num1 = 1;

    set @num2 = 0;

    select

    colOne,

    colTwo

    from

    #TestData

    where

    (colOne = @num1 or @num1 = 0)

    and (colTwo = @num2 or @num2 = 0)

    set @num1 = 10;

    set @num2 = 0;

    select

    colOne,

    colTwo

    from

    #TestData

    where

    colOne = @num1 or @num1 = 0

    and colTwo = @num2 or @num2 = 0

    drop table #TestData;

    😎

  • Thanks parackson, I will give the COALESCE a try in the query. Unfortunately I wasn't around when this script was produced for the report so I can't attest to why it wasn't used to begin with, but it is baiscally used as you say. Supply a 0 for the @num1 and get all rows for that column. Supply a > 0 value and get just those specific rows which = @num.

    Any ideas on why the script, written the way it is, would cause the CPU to max out at 100%? To give the responders a little more background. The script is used in a report that has been around a while. It also has some "AND colXYZ not between 100 and 300" clauses in the WHERE statement. Recently another clause was added with a different set of range values then the Stored Proc was updated. At that time the CPU started hitting 100% each time this Proc was called; the longest I let the proc try to run was 30 minutes but it never returned. When I pulled the code out of the stored proc and ran it in Query Analyzer or SSMS, it ran just fine and returned the expected 56K rows in 9 sec. I tried this both from the server having the issue and my desktop. Same results. Perfmon didn't show any memory issues, infact the only counter I saw that was indicating a problem was the maxed out CPU.

    -Joseph

  • The problem with this is that the repeated (Column = Variable or variable = constant) expressions confuse the optimiser's estimations of rows affected (on 2005 definitly, not sure about 2008)

    It's unlikely that a query like that will be able to use any indexes. Likely, you'll be getting a table scan.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • You stated that a new contruct was added to the SP.... Try removing it and then re-running the sp.... It sounds as your issues started once that line of code was added (can you provide new line of code as well)....

    Addditionally with every additionally clause in your where statement it just means more work for the processor....

  • GilaMonster (9/3/2008)


    The problem with this is that the repeated (Column = Variable or variable = constant) expressions confuse the optimiser's estimations of rows affected (on 2005 definitly, not sure about 2008)

    It's unlikely that a query like that will be able to use any indexes. Likely, you'll be getting a table scan.

    True, but I suspect more than that if it never returns a result set and it used to work (probably upgraded from SQL Server 2000?). There may actualy be more going on in the proc than we know from the post.

    😎

  • Sorry, I should have stated the server before hand. The server is still SQL2000 SP4, I know it's out of date, and it will be upgraded in the coming months.

    And to answer the question about removing the newly added additional clause to the where statement. I did try that, and it didn't help. The stored proc still kept burning up CPU.

    This is the stored proc, but i'm not sure seeing the whole thing would help:

    CREATE PROCEDURE [dbo].[userproc_xyz] (

    @company_id int ,

    @sub_no int = 0

    )

    AS

    -- produce the data for the report

    SELECT TITLE.sub_no,

    SUBSCRIBER.company_name,

    TITLE.title_no,

    TITLE_XREF.billed_date_time,

    TITLE_XREF.cancel_date_time,

    TITLE.abstractor_actl_cost,

    PRODUCT.product_id,

    TITLE.title_code,

    PROP.prop_no,

    PROP.abstractor_actl_cost prop_abstractor_actl_cost,

    CASE WHEN DateDiff(d,TITLE_XREF.billed_date_time,GetDate()) < 30 THEN '< 30 days'

    WHEN DateDiff(d,TITLE_XREF.billed_date_time,GetDate()) BETWEEN 30 AND 60 THEN '30-60 days'

    WHEN DateDiff(d,TITLE_XREF.billed_date_time,GetDate()) BETWEEN 61 AND 90 THEN '61-90 days'

    WHEN DateDiff(d,TITLE_XREF.billed_date_time,GetDate()) BETWEEN 91 AND 120 THEN '91-120 days'

    WHEN DateDiff(d,TITLE_XREF.billed_date_time,GetDate()) > 120 THEN '> 120 days'

    END interval,

    CASE WHEN TITLE_XREF.cancel_date_time Is Null THEN 'bnc'

    ELSE 'bc'

    END category

    INTO #report_data

    FROM TITLE

    LEFT JOIN TITLE_XREF ON TITLE.company_id = TITLE_XREF.company_id AND TITLE.title_no = TITLE_XREF.title_no

    LEFT JOIN SUBSCRIBER ON TITLE.company_id = SUBSCRIBER.company_id AND TITLE.sub_no = SUBSCRIBER.sub_no

    LEFT JOIN PRODUCT ON TITLE.title_code = PRODUCT.product_sub_code AND PRODUCT.product_id = 'B'

    LEFT JOIN (SELECT * FROM ORDER_DEAL) od1 ON TITLE.title_no = od1.order_no AND TITLE.company_id = od1.company_id

    LEFT JOIN (SELECT * FROM ORDER_DEAL WHERE product_id = 'P' AND product_sub_code = 123) od2 ON od1.deal_no = od2.deal_no

    LEFT JOIN PROP ON od2.order_no = PROP.prop_no

    WHERE TITLE.company_id = @company_id

    AND (TITLE.sub_no = @sub_no OR @sub_no = 0 ) -- all subscrober (0) or one subscriber

    AND TITLE.sub_no NOT BETWEEN 99000 AND 100000 -- test subscribers

    AND TITLE.title_code <> 110 -- Title Commitment (Pass through Agent)

    AND TITLE.title_code NOT BETWEEN 115 and 117 -- SAFE products

    AND TITLE.title_code NOT BETWEEN 123 and 134 -- SAFE products

    AND TITLE.title_code NOT BETWEEN 135 and 138 -- SAFE Products *** this was the new code ***

    AND TITLE.open_status = 'H'

    AND NOT TITLE_XREF.billed_date_time Is Null

    AND TITLE_XREF.title_premium_billed_date_time Is Null

    SELECT * FROM #report_data

    ORDER BY sub_no, title_no

    -- remove the temp table

    DROP TABLE #report_data

    It does perform 2 full table scans each consuming 31% and 29%. This code pulled from the stored proc and if I supply the values for @company_id and @sub_no run in around 9 to 11 seconds even with the table scans. I was only able to get an explain plan that showed the scans when not running it in a stored proc. It only causes a CPU problem when run as a Stored Proc. I might have to chalk this up to SQL Server Query Optimizer getting confused, but I wish I knew of a way to isolate that. Again removing the extra clause didn't help. Forcing a Recompile didn't help. Clearing the Proc Cache didn't help either. I tired the recompile and clearing the proc cache using the code as you see it, and with the new line removed.

  • Couple of things that can help to speed it up....

    Declare the GETDATE() as a variable and call it out... Helps with processing time.

    Add "WITH (NOLOCK)" to each and every table in your join statments. If you can call out an actual index this can help.

    Change this line

    LEFT JOIN (SELECT * FROM ORDER_DEAL) od1 ON TITLE.title_no = od1.order_no AND TITLE.company_id = od1.company_id

    To

    LEFT JOIN ORDER_DEAL AS od1 WITH (NOLOCK) ON TITLE.title_no = od1.order_no AND TITLE.company_id = od1.company_id

    Move "OR" query to last part of where clause

    AND (TITLE.sub_no = @sub_no OR @sub_no = 0 ) -- all subscrober (0) or one subscriber

    New code can be improved

    AND TITLE.title_code NOT BETWEEN 123 and 134 -- SAFE products

    AND TITLE.title_code NOT BETWEEN 135 and 138 -- SAFE Products *** this was the new code ***

    No need for second line just make it

    AND TITLE.title_code NOT BETWEEN 123 and 138 -- SAFE products

    Is this a large table

    (SELECT * FROM ORDER_DEAL WHERE product_id = 'P' AND product_sub_code = 123)

    Because you could think about writing this to a temp table and then referencing it in the actual query

  • First thing is that there's absolutely no need for the temp table. Just do the select straight with the order by included.

    Which tables is it scanning and what do the indexes on that table look like?

    While it's running, look in sysprocesses and see if there's a last wait type. If so, what is it?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (9/3/2008)


    First thing is that there's absolutely no need for the temp table. Just do the select straight with the order by included.

    Good catch, Gila, completly went right over that code.

  • parackson (9/3/2008)


    Add "WITH (NOLOCK)" to each and every table in your join statments. If you can call out an actual index this can help.

    Just bear in mind that NOLOCK means dirty reads, so if the data's changing a lot, you may read something you don't want to. There's also the chance of missing rows completely or reading rows twice under certain circumstances.

    Nolock essentially means to SQL "I don't care if my result set is slightly inaccurate."

    If the data is not changing a lot, then you are probably not running into lock waits and hence nolock won't help much. Nolock is not a silver bullet to be added to every select statement without careful consideration.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Can you try changing the order in your WHERE clause to 'WHERE (@num1 = 0 or colOne = @num1)'?

Viewing 15 posts - 1 through 15 (of 34 total)

You must be logged in to reply to this topic. Login to reply