February 1, 2013 at 11:15 am
I understand it may be difficult to assist me without knowing my underlying schema and all the details. Let me begin by asking in general:
I have a query that sums up some sales by customer number and product type, EXCLUDING 3 customers, something like:
SELECT customers.cus_no, sales.prod_type, SUM(sales.total_sales)
FROM dbo.sales
INNER JOIN dbo.customers
ON customers.cus_no = sales.cus_no
WHERE sales.date = 20130131
AND customers.cus_no NOT IN ('1','2','3')
group by cus_no, prod_type
So you can see customers 1, 2 and 3 are excluded.
But say if I wanted customers 4 and 5 to be in the report, but only for the 'Bike' product_type, I did this:
SELECT customers.cus_no, sales.prod_type, SUM(sales.total_sales)
FROM dbo.sales
INNER JOIN dbo.customers
ON customers.cus_no = sales.cus_no
WHERE sales.date = 20130131
AND customers.cus_no NOT IN ('1','2','3')
AND NOT ( customers.cus_no IN ( '4','5')
AND sales.prod_type <> 'Bike'
)
group by cus_no, prod_type
I run this, and the query never completes. It runs forever and ever.
February 1, 2013 at 11:44 am
Is there a question here?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 1, 2013 at 11:47 am
I apologize, I thought the question was implied. Surely was to all my co-workers.
Why does the second statement run forever? Is there a better way to achieve what I'm trying to add onto the second statement, that will work?
February 1, 2013 at 11:55 am
No way to tell without seeing the indexes and exec plan. Probably the indexes don't support the additional predicates, possibly combined with row estimation errors (NOT is a little hard to estimate) produces a plan that's highly sub-optimal.
The NOT outside the bracket can be converted to the negative of the two conditions inside, with an OR rather than an AND, and ORs require quite different indexing than ANDs do.
WHERE sales.date = 20130131
AND customers.cus_no NOT IN ('1','2','3')
AND NOT ( customers.cus_no IN ( '4','5')
AND sales.prod_type <> 'Bike'
)
means (via De Morgan's laws)
WHERE sales.date = 20130131
AND customers.cus_no NOT IN ('1','2','3')
AND (customers.cus_no NOT IN ( '4','5') OR sales.prod_type = 'Bike')
which can further be expanded to
WHERE (sales.date = 20130131 AND customers.cus_no NOT IN ('1','2','3','4','5'))
OR (sales.date = 20130131 AND customers.cus_no NOT IN ('1','2','3') AND sales.prod_type = 'Bike')
That OR is probably what's messing the query up.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 1, 2013 at 12:03 pm
EDIT: I notice you're adding more to your post. I'll check it again - thanks!!
What is interesting is I can remove the additional NOT where clause in statement number 2, and move it into the SELECT statement as a CASE that evaluates to 1 or 0 to determine if it's that customer/number, product type combination.
I run this, and the new column works fine, and the statement runs quick.
Then I wrap it as a sub query and ask it to select * from that sub query table where the CASE statement column is 0 or 1 (either, pick one), and it also hangs indefinitely.
Such as:
SELECT *
FROM
(
SELECT CASE WHEN cus_no IN ('1','2')
AND product_type <> 'BIKE' THEN 1 ELSE 0 END AS the_column
FROM [aforementioned tables]
) subq
WHERE the_column = 1
Run just the inner subquery, it runs quick. Run the whole statement, it locks up again.
February 1, 2013 at 12:05 pm
Because the optimiser works on the entire query in one go, not the subquery first and the outer query second, the two very likely simplify to the same form.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 1, 2013 at 12:11 pm
I'm not familiar with De Morgan's laws but your explanation of my situation using them made sense (end result the same to me).
For now, can you think of any way this statement can be written (without any special needs) that would potentially be more efficient despite the underlying schema?
February 1, 2013 at 12:38 pm
Without knowing the underlying indexes, not really. It's not the syntax, the optimiser's smart enough to convert the forms, it's going to be the indexes that make a difference.
Without knowing what you have, this is a guess, but I'd put one index on sales.date and another on sales.prod_type, sales.date and one on customers.cus_no. That would be a start, I'd use the exec plan to tweak from there.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 1, 2013 at 1:14 pm
I suspect the dbo.sales should really be clustered by date. If so, and you fix that, the rest of the query won't hurt the performance that much one way or the other.
SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.
February 1, 2013 at 1:30 pm
Gentlemen, thank you for your assistance, esp. without knowing my schema.
I forced the join on two tables to be hashed, as the query plan was showing nested loop.
I am currently educating myself further on these internal join processes.
The query runs under a second now.
Thanks!!
February 1, 2013 at 3:04 pm
Join hints are a bad idea in most cases. Rather see why SQL's picking a loop join (probably low row estimations on one or both tables) and fix the cause.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 2, 2013 at 4:54 pm
Low row estimation is the cause. It predicted 6k~ and ended up doing millions.
I guess the fix is a better index, key, join, etc. However, can you take a moment to tell me what is logically going on, and how the lack of one or more of those items makes this occur?
February 3, 2013 at 1:11 am
Lack of what?
Yes, low row estimation, whatever the cause, will cause very inefficient query plans. Operators that are good on smaller row counts are terrible on larger ones.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 3, 2013 at 7:01 pm
holyforce (2/1/2013)
SELECT *
FROM
(
SELECT CASE WHEN cus_no IN ('1','2')
AND product_type <> 'BIKE' THEN 1 ELSE 0 END AS the_column
FROM [aforementioned tables]
) subq
WHERE the_column = 1
Run just the inner subquery, it runs quick. Run the whole statement, it locks up again.
It's most likely not exactly right.
Make sure you've got the whole recorset in your Management Studio window before you say "subquery runs quick".
Because you do not have any filter every row is gonna be displayed, so SSMS starts displaying resulting rows as they arrive.
It does not need to wait till the whole data set is processed.
Different story when you apply the WHERE clause.
Because there is no way any index can be used SQL Server needs to build the recordset from the subquery as a table in memory, and then apply the filter against it.
_____________
Code for TallyGenerator
February 4, 2013 at 1:47 am
Sergiy (2/3/2013)
Different story when you apply the WHERE clause.Because there is no way any index can be used SQL Server needs to build the recordset from the subquery as a table in memory, and then apply the filter against it.
Not how filters work. Sure, that's not something where SQL can seek for a value, but it can read from the index/table and apply the filter as it goes. Secondary filters are not a blocking operation, that's sorts, hash joins, hash aggregates, things where the entire resultset has to be available to work on.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply