NOT makes query never complete?

  • I understand it may be difficult to assist me without knowing my underlying schema and all the details. Let me begin by asking in general:

    I have a query that sums up some sales by customer number and product type, EXCLUDING 3 customers, something like:

    SELECT customers.cus_no, sales.prod_type, SUM(sales.total_sales)

    FROM dbo.sales

    INNER JOIN dbo.customers

    ON customers.cus_no = sales.cus_no

    WHERE sales.date = 20130131

    AND customers.cus_no NOT IN ('1','2','3')

    group by cus_no, prod_type

    So you can see customers 1, 2 and 3 are excluded.

    But say if I wanted customers 4 and 5 to be in the report, but only for the 'Bike' product_type, I did this:

    SELECT customers.cus_no, sales.prod_type, SUM(sales.total_sales)

    FROM dbo.sales

    INNER JOIN dbo.customers

    ON customers.cus_no = sales.cus_no

    WHERE sales.date = 20130131

    AND customers.cus_no NOT IN ('1','2','3')

    AND NOT ( customers.cus_no IN ( '4','5')

    AND sales.prod_type <> 'Bike'

    )

    group by cus_no, prod_type

    I run this, and the query never completes. It runs forever and ever.

  • Is there a question here?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I apologize, I thought the question was implied. Surely was to all my co-workers.

    Why does the second statement run forever? Is there a better way to achieve what I'm trying to add onto the second statement, that will work?

  • No way to tell without seeing the indexes and exec plan. Probably the indexes don't support the additional predicates, possibly combined with row estimation errors (NOT is a little hard to estimate) produces a plan that's highly sub-optimal.

    The NOT outside the bracket can be converted to the negative of the two conditions inside, with an OR rather than an AND, and ORs require quite different indexing than ANDs do.

    WHERE sales.date = 20130131

    AND customers.cus_no NOT IN ('1','2','3')

    AND NOT ( customers.cus_no IN ( '4','5')

    AND sales.prod_type <> 'Bike'

    )

    means (via De Morgan's laws)

    WHERE sales.date = 20130131

    AND customers.cus_no NOT IN ('1','2','3')

    AND (customers.cus_no NOT IN ( '4','5') OR sales.prod_type = 'Bike')

    which can further be expanded to

    WHERE (sales.date = 20130131 AND customers.cus_no NOT IN ('1','2','3','4','5'))

    OR (sales.date = 20130131 AND customers.cus_no NOT IN ('1','2','3') AND sales.prod_type = 'Bike')

    That OR is probably what's messing the query up.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • EDIT: I notice you're adding more to your post. I'll check it again - thanks!!

    What is interesting is I can remove the additional NOT where clause in statement number 2, and move it into the SELECT statement as a CASE that evaluates to 1 or 0 to determine if it's that customer/number, product type combination.

    I run this, and the new column works fine, and the statement runs quick.

    Then I wrap it as a sub query and ask it to select * from that sub query table where the CASE statement column is 0 or 1 (either, pick one), and it also hangs indefinitely.

    Such as:

    SELECT *

    FROM

    (

    SELECT CASE WHEN cus_no IN ('1','2')

    AND product_type <> 'BIKE' THEN 1 ELSE 0 END AS the_column

    FROM [aforementioned tables]

    ) subq

    WHERE the_column = 1

    Run just the inner subquery, it runs quick. Run the whole statement, it locks up again.

  • Because the optimiser works on the entire query in one go, not the subquery first and the outer query second, the two very likely simplify to the same form.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I'm not familiar with De Morgan's laws but your explanation of my situation using them made sense (end result the same to me).

    For now, can you think of any way this statement can be written (without any special needs) that would potentially be more efficient despite the underlying schema?

  • Without knowing the underlying indexes, not really. It's not the syntax, the optimiser's smart enough to convert the forms, it's going to be the indexes that make a difference.

    Without knowing what you have, this is a guess, but I'd put one index on sales.date and another on sales.prod_type, sales.date and one on customers.cus_no. That would be a start, I'd use the exec plan to tweak from there.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I suspect the dbo.sales should really be clustered by date. If so, and you fix that, the rest of the query won't hurt the performance that much one way or the other.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Gentlemen, thank you for your assistance, esp. without knowing my schema.

    I forced the join on two tables to be hashed, as the query plan was showing nested loop.

    I am currently educating myself further on these internal join processes.

    The query runs under a second now.

    Thanks!!

  • Join hints are a bad idea in most cases. Rather see why SQL's picking a loop join (probably low row estimations on one or both tables) and fix the cause.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Low row estimation is the cause. It predicted 6k~ and ended up doing millions.

    I guess the fix is a better index, key, join, etc. However, can you take a moment to tell me what is logically going on, and how the lack of one or more of those items makes this occur?

  • Lack of what?

    Yes, low row estimation, whatever the cause, will cause very inefficient query plans. Operators that are good on smaller row counts are terrible on larger ones.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • holyforce (2/1/2013)


    SELECT *

    FROM

    (

    SELECT CASE WHEN cus_no IN ('1','2')

    AND product_type <> 'BIKE' THEN 1 ELSE 0 END AS the_column

    FROM [aforementioned tables]

    ) subq

    WHERE the_column = 1

    Run just the inner subquery, it runs quick. Run the whole statement, it locks up again.

    It's most likely not exactly right.

    Make sure you've got the whole recorset in your Management Studio window before you say "subquery runs quick".

    Because you do not have any filter every row is gonna be displayed, so SSMS starts displaying resulting rows as they arrive.

    It does not need to wait till the whole data set is processed.

    Different story when you apply the WHERE clause.

    Because there is no way any index can be used SQL Server needs to build the recordset from the subquery as a table in memory, and then apply the filter against it.

    _____________
    Code for TallyGenerator

  • Sergiy (2/3/2013)


    Different story when you apply the WHERE clause.

    Because there is no way any index can be used SQL Server needs to build the recordset from the subquery as a table in memory, and then apply the filter against it.

    Not how filters work. Sure, that's not something where SQL can seek for a value, but it can read from the index/table and apply the filter as it goes. Secondary filters are not a blocking operation, that's sorts, hash joins, hash aggregates, things where the entire resultset has to be available to work on.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply