RE: Can a strawberry query be done better?

SSC Guru

Points: 1003857

July 4, 2013 at 3:30 pm

Banana-823045 (7/4/2013)
Coming from MySQL background, one query design I learned early on was 'strawberry query', which got its odd name from the MySQL newsgroup. It is a very useful pattern for solving the problem of answering questions like "who's the best performing salesperson of month?" or similar questions.
The solution basically involves doing a "triangular join" and filtering for NULLs. Using salesperson example:
SELECT
s.SalesPerson,
s.SalesMonth,
s.SalesAmount,
s.Customer
FROM Sales AS s
LEFT JOIN Sales AS m
ON s.SalesMonth = m.SalesMonth
AND s.SalesAmount < m.SalesAmount
WHERE m.SalesID IS NULL;
Note that the query is free to include other fields from the same row because there is no GROUP BY; the grouping is implicitly done via the self-join. We are guaranteed to get only one possible row each month for a given sales which also is the greatest amount. There is no any other row that's greater than the greatest amount of given month so m.SalesID must be NULL.
This also works for getting the minimum; just reverse the inequality operator on the join criteria. Also, there is no TOP 1 ... ORDER BY which can be problematic when you need to get multiple results (e.g. you want to see all 12 months at once.)
Now, that worked well with MySQL. However, I'm also aware that T-SQL language has some features that doesn't exist in the MySQL dialect and also whether there might be a better way of doing it in T-SQL. I don't exactly trust myself to interpret the best execution plans so I'd be very interested in hearing from others whether this can be outperformed by alternatives such as using ROW_NUMBER() or whatever other approaches.
Thanks!

Do you have an example of the data that is contained in the SalesMonth column?

Also, could you provide the CREATE TABLE for the Sales table so we can see the datatypes, etc. It could very well make a difference.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)