Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1213141516»»

Celko's SQL Stumper: The Data Warehouse Problem Expand / Collapse
Author
Message
Posted Tuesday, October 6, 2009 3:42 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, September 26, 2013 1:22 PM
Points: 3, Visits: 71
I, for one, like the new results. I think it makes sense to use for the judging.

I've been trying to improve performance by getting rid of the sort step but continuing to use the row_number function. I tried putting the sort in a CTE (below) but it doesn't seem to help. I was able to do one thing, further below.

------------------------------------

WITH dailysalestotals1 AS
(SELECT customer_id,order_date,order_amt, row_number() over(partition by customer_id order by order_date desc) as row_number
FROM dailysalestotals)
select customer_id, 'dropped purchase amount on',
last_order_date = max(order_date)
from (select row_number, customer_id, order_date, order_amt
from dailysalestotals1) dt
where row_number in (1,2)
group by customer_id
having sum(case when row_number = 1 then order_amt else -order_amt end) < 0

-------------------------------------------------------

This is all I could come up with. It's pretty fast but requires two scans. I used with rollup in the CTE to pull out a single record with a count of orders by customer, then compared the count to the row number of the indivdual order dates to pull the last two.

WITH dailysalestotals1 AS
(SELECT top 100000 d1.customer_id,d1.order_date,d1.order_amt, row_number() over(partition by d1.customer_id order by (select 1)) as row_number, COUNT(1) count
FROM dailysalestotals d1
GROUP BY d1.customer_id,d1.order_date,d1.order_amt with rollup
HAVING order_amt is not null or (order_date is null and order_amt is null))
select dt.customer_id, 'dropped purchase amount on',
last_order_date = max(dt.order_date)
from (select customer_id,order_date,order_amt, row_number
from dailysalestotals1 where order_amt is not null
) dt
cross apply (select d2.customer_id, d2.count from dailysalestotals1 d2 where order_date is null and order_amt is null and dt.customer_id = d2.customer_id) dt1
where dt.row_number + 1 >= dt1.count
group by dt.customer_id
having sum(case when dt.row_number = dt1.count then order_amt when dt.row_number = dt1.count-1 then -order_amt end) < 0
Post #798845
Posted Tuesday, October 6, 2009 8:52 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 5:26 AM
Points: 1,945, Visits: 2,895
>> I also understand that many DBAs don't want to rely on undocumented and unsopported features like this.. No problem: I will take care of the problem when it arises, meanwhile I think I will take all the advantages. <<

BAD PROGRAMMER!! No beer for you on Friday after work! How do you know that your would even be around to take care of the problem when it arises? Imagine a civil engineer who built bridges on that kind of assumption. Do oun really want to never find another job?

Seriously, you should always favor data integrity over speed. If it does not have to be right, then answer is 42 (Douglas Addams). Favor portable over dialect; 80% or more of the lifetime cost is in maintaining code. This is all basic Software Engineering.


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #798902
Posted Wednesday, October 7, 2009 1:42 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: 2 days ago @ 9:14 AM
Points: 5,018, Visits: 10,527
Joe Celko (10/6/2009)
>> I also understand that many DBAs don't want to rely on undocumented and unsopported features like this.. No problem: I will take care of the problem when it arises, meanwhile I think I will take all the advantages. <<

BAD PROGRAMMER!! No beer for you on Friday after work! How do you know that your would even be around to take care of the problem when it arises? Imagine a civil engineer who built bridges on that kind of assumption. Do oun really want to never find another job?

Seriously, you should always favor data integrity over speed. If it does not have to be right, then answer is 42 (Douglas Addams). Favor portable over dialect; 80% or more of the lifetime cost is in maintaining code. This is all basic Software Engineering.


Joe, I'm 100% with you. In all the three cases where I applied the quirky update to a production routine I put a long comment in the procedure to help identify the issue, just in case a tile falls from a roof into my head...
Imagine you have a 10 minutes window to run a job over a huge amount of data and a running total is involved. Any other approach takes at least 40 minutes and the quirky update takes 4 minutes: which one would you choose? In this case I decided to keep my current job rather than worrying about never finding a new one.

It's also true that documented and supported features change and stop working without notice applying patches and service packs (I could cite the behaviour of SQL Server 2008 parameter sniffing, just to make an example), so I think I'll take the risk in this case, counting the advantage that I expect the quirky update to stop working at some point, so I always test it deeply when I apply patches.
As a side note, I always try to stick to portable code, even if I know that complete portability nowadays is nearer to possibile than it was 10 years ago, but still very far. I'm sure nobody on Earth knows it better than you do. I worked for many years on a multi-platform ERP software and the biggest issue was writing portable code: lots of branches there to make it work on different RDBMS.

Anyway, I didn't want to turn this thread into a discussion on the pros and cons of the quirky update: SSC is full of threads on this topic and they all are hot discussions. At some points it tends to turn to some kind of religion war, and I don't want to be one of the soldiers...

My apologies to everybody on the thread for annoying them with this.


--
Gianluca Sartori

Get your two-cent-answer quickly
spaghettidba.com
@spaghettidba
Post #798975
Posted Wednesday, October 7, 2009 4:34 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: 2 days ago @ 9:14 AM
Points: 5,018, Visits: 10,527
Okay, since I already lost my beer, let's be bad until the end

DECLARE @customer_id char(10)
DECLARE @alreadyMatched int = 0
DECLARE @customers nvarchar(max) = ''
DECLARE @order_date date = GETDATE()
DECLARE @order_amt decimal(8,2) = 0


SELECT
@alreadyMatched =
CASE @customer_id
WHEN customer_id THEN @alreadyMatched + 1
ELSE 0
END,
@customers =
CASE @alreadyMatched
WHEN 1 THEN
CASE
WHEN @order_amt < order_amt
THEN
CASE @customers
WHEN '' THEN @customers
ELSE @customers + ' UNION ALL '
END
+ ' SELECT ''' + RTRIM(customer_id) + ''' as customer_id, '''+ CONVERT(char(8), @order_date ,112) +''' as order_date '
ELSE @customers
END
ELSE @customers
END,
@customer_id = customer_id,
@order_date = order_date,
@order_amt = order_amt
FROM DailySalesTotals
ORDER BY customer_id ASC, order_date DESC

SET @customers = '
SELECT customer_id, CONVERT(date, order_date, 112) AS order_date
FROM (' + @customers + ') AS data
'

EXEC sp_executesql @customers



--
Gianluca Sartori

Get your two-cent-answer quickly
spaghettidba.com
@spaghettidba
Post #799040
Posted Friday, October 9, 2009 10:11 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, February 10, 2010 11:47 AM
Points: 34, Visits: 136
Gianluca Sartori (10/7/2009)
Okay, since I already lost my beer, let's be bad until the end

Gianluca,
You've inspired me to lost another beer and rewrite your beautiful query not to use sorting, so that in this way the performance had been improved

So it would be following:
DECLARE @customer_id char(10) 
DECLARE @prev_customer_id char(10)
DECLARE @prev_prev_customer_id char(10)

DECLARE @customers nvarchar(max) = ''
DECLARE @order_date date = GETDATE()
DECLARE @order_amt decimal(8,2) = 0
DECLARE @prev_order_amt decimal(8,2) = 0
DECLARE @prev_prev_order_amt decimal(8,2) = 0


SELECT @prev_prev_customer_id = @prev_customer_id,
@prev_customer_id = @customer_id,
@customer_id = customer_id,
@prev_prev_order_amt = @prev_order_amt,
@prev_order_amt = @order_amt,
@order_amt = order_amt,


@customers = case when @customer_id <> @prev_customer_id
and @prev_customer_id = @prev_prev_customer_id
and @prev_prev_order_amt > @prev_order_amt

then case @customers when '' then @customers else @customers + ' UNION ALL ' end
+ ' SELECT ''' + RTRIM(@prev_customer_id) + ''' as customer_id, '''+ CONVERT(char(8), @order_date ,112) +''' as order_date '
else @customers
end,
@order_date = order_date

FROM DailySalesTotals
ORDER BY customer_id ASC, order_date asc

SET @customers = '
SELECT customer_id, CONVERT(date, order_date, 112) AS order_date
FROM (' + @customers + ') AS data
'
exec sp_executesql @customers

Best regards,
Andriy
Post #800921
Posted Friday, October 9, 2009 10:46 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: 2 days ago @ 7:03 AM
Points: 579, Visits: 2,519
Apologies for taking so long to announce the winners but we wanted to go through all the entries once again to be certain there was not a winner in there that we'd overlooked. The quality of entries has been remarkable.

We have decided that the quality of the entrants was so high that we had to award two winners. The overall winner, with code that met, most closely, all Joe’s rules, and which showed a great deal of analysis and fine-tuning was, we both felt, Gianluca Sartori. It was a great contribution and he receives a $100 Amazon voucher and a round of applause.

However, Peso, the great Peter Larsson, made a great contribution to the competition and provided what seems about the fastest-performing solution. He came up with at least two very original ideas and so is the first recipient of…The Phil Factor Speed Phreak award, a $60 amazon voucher and the right to display the coveted Phil Factor SQL Speed Phreak badge on his site.(OK, OK, it is still being drawn as we speak).

And now for the details

There were ten entrants who we’d cheerfully have given the award to. They provided good, very fast, solutions that conformed to the spirit of Joe’s rules. The rest were of a very high standard. I’m in awe of them

First off the blocks was Chris Howarth with A CTE solution that allowed him to join the latest order of every customer with the second. It was a very good solution that performed pretty well, but we were surprised by the overhead. His second solution was the first that introduced the idea of the pivot, and was strikingly ingenious. His was the first of many entries to use the ‘ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY order_date DESC’ technique, which was subsequently chosen with many ingenious variations by JeriHatTrick, andriy.zabavskyy, Plamen, Ivan Yong, Herman van Midden, Eric Pratley and Alex Kuznetsov. Andriy’s version was lightning fast. The PIVOT solution had probably its finest hour in a nice simple example by Gustavo that deserved to run faster than it did.

William Brewer was the first to wonder if there was a way to get a faster result than Joe’s by avoiding the correlated subquery. There wasn’t. Paul Ireland later proved that Celko could be beaten with a ‘traditional’ approach, and there were some very fine examples such as RobertFolkerts. Well done, Paul. Mark Marinovic had a solution that performed in the same region as Chris Howarth’s but was probably too complex. GSquared came up with an idea of creating a temporary table with each customer’s totals ordered, so he could then put a primary key on both customer_id and order_date. The subsequent SQL statement was simple but one can’t help thinking that a non-clustered covering index might have performed better. GSquared’s entry was important as he was the first with ‘ROW_NUMBER() OVER (PARTITION BY Customer_ID ORDER BY Order_Date)’ that fuelled several top entries. It might have done better but the time taken to create the temporary table was greater than the fastest entries total duration. Peter Brinkhaus’s entry used two CTEs and produced a solution that had a respectable speed but was never going to please Joe with its’ '1753/01/01' date trick.

back_of_the_fleet tried a solution somewhat similar to Chris Howarths but decided to try to avoid the CTE. It didn’t perform any better, though I can’t help thinking that it could be improved. Steve Rowland’s routine was snappy and beautifully simple. I felt sure that it was going to win. Suddenly, from nowhere, Peso entered the ring, and a new level of performance took over. Peso had done some careful work and his routine was very clever. He had avoided a second ROW_NUMBER by a very deft GROUP BY, but the HAVING clause was dead clever. He then went on to experiment with various other versions but I think he got it right first time. Peso is a man who lives and breathes SQL Performance. Sometimes his routines looked odd but they always worked well. A string of variations followed, such as Ivan Yongs, but they failed to improve much on the original.

v_paronov, by contrast tried a ‘traditional’ approach that I really enjoyed, but which couldn’t compete, performance-wise with what came to be a favourite, the ‘ROW_NUMBER() OVER (PARTITION BY Customer_ID ORDER BY Order_Date)’ solution by GSquared, Peter Brinkhaus, Steve Rowland and Peso

AndyM came up with a fast and ingenious solution that had us wondering if anyone could improve on it. Like Peso, he had gone for a single ROW_NUMBER followed by a GROUP BY and it is fascinating to compare the two.

I was amazed by maxdemarzi’s solution. It was so compact. It was magnificent, and it made many of the previous entries seem verbose and awkward. Even now I wonder if it is possible to tweak it into contention as far as its performance goes. He followed up with a second solution that was faster, but not as fast as Peso and the other front-runners. Quan_L_Nguyen later produced a variant, likewise based on ‘RANK() OVER (PARTITION BY’.

John McVay came in with a variation of the two-barrelled CTE solution which performed well but his entry was followed by Gianluca Sartori’s first SQL that took so long that it had to be removed from the test harness. This was swiftly followed by a version that performed better than any other. What made it surprising was that it didn’t use any new features from sQL Server 2005/8 It was jaw-droppingly ingenious. Gianluca had thought a lot about the problem. As Joe put it, the analysis was impressive. Things were definitely hotting up. KevRiley then came in with the first CROSS APPLY entry, and later produced some interesting variations. Just when it seemed we’d thought of all ways around the problem, Kev had come up with a great and original solution. The CROSS APPLY technique was used to excellent effects by mByther. It was neat and fast.
I liked the solutions that used a view. Joe glowed with pleasure. girish Bhat did a great job with the first view-based solution, though it made the test harness lightly more complicated.

For the other entrants that I have not the space to mention, many thanks for entering and there were a lot of excellent ideas there. Reading through the two forums was an education. Many thanks once again for everyone who contributed and stand by for a new competition! Next time ...who knows the winner may be you.



Best wishes,

Phil Factor
Simple Talk
Post #800950
Posted Saturday, October 10, 2009 4:00 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, August 14, 2014 2:22 AM
Points: 2,684, Visits: 2,438
Congratulations Gianluca! Well deserved.
Well done also to Peso - I was merely in both your shadows.....

This was great fun! I went off the boil towards the end, as I couldn't keep up with all the entries.

Can't wait for the next one!


Kev
Post #801192
Posted Monday, October 12, 2009 2:10 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, February 10, 2010 11:47 AM
Points: 34, Visits: 136
Thank you all, guys for this challenge and discussion thread.
It was very good learning experience.

Congratulations to Gianluca and Peso!
Andriy
Post #801412
Posted Monday, October 12, 2009 2:13 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Wednesday, August 13, 2014 6:36 AM
Points: 2,397, Visits: 3,405
Thank you both!
Congratulations Gianluca, well done!

I guess we all see eachother for next stumper?



N 56°04'39.16"
E 12°55'05.25"
Post #801413
Posted Monday, October 12, 2009 2:31 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: 2 days ago @ 7:03 AM
Points: 579, Visits: 2,519
I have started a new Phil Factor SQL Speed Phreak Competition on the new beta site ask.sqlservercentral.com
http://ask.sqlservercentral.com/questions/92/the-subscription-list-sql-problem
The ‘Subscription List’ SQL Problem

It would be great if you would join in. I'm keen on seeing if the 'stackOverflow' model is a good way of doing SQL code competitions, and if we can use voting to determine winners if there is a tie-break.

The site will ask you to register, as we haven't got the links working yet to carry your login details.




Best wishes,

Phil Factor
Simple Talk
Post #801422
« Prev Topic | Next Topic »

Add to briefcase «««1213141516»»

Permissions Expand / Collapse