|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Tuesday, January 15, 2013 11:11 AM
Points: 1,945,
Visits: 2,782
|
|
Comments posted to this topic are about the item Celko's SQL Stumper: The Data Warehouse Problem
Books in Celko Series for Morgan-Kaufmann Publishing Analytics and OLAP in SQL Data and Databases: Concepts in Practice Data, Measurements and Standards in SQL SQL for Smarties SQL Programming Style SQL Puzzles and Answers Thinking in Sets Trees and Hierarchies in SQL
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Thursday, May 30, 2013 8:03 AM
Points: 266,
Visits: 1,025
|
|
First, using a CTE and the ROW_NUMBER function, assign a row number to each row, partitioned by customer_id and ordered by order date descending.
Then use two instances of the CTE in the output query - once to represent the most recent order, and again to represent the second most recent order. The respective orders are obtained by filtering on the RowNumber column.
Then join the two instances of the CTE together on customer_id and apply a filter to return only those customers whose most recent order amount was less then the amount of their second most recent order.
Here's the code:
WITH CTE AS ( SELECT ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY order_date DESC) AS RowNumber, customer_id, order_date, order_amt FROM DailySalesTotals ) SELECT order1.customer_id, ' dropped purchase amount on ', order1.order_date FROM CTE order1 INNER JOIN CTE order2 ON order2.customer_id = order1.customer_id --The second most recent order AND order2.rownumber = 2 WHERE --The most recent order order1.RowNumber = 1 --Recent order amount lower than second most recent order amount AND order1.order_amt < order2.order_amt;
|
|
|
|
|
Mr or Mrs. 500
      
Group: General Forum Members
Last Login: Wednesday, June 12, 2013 2:10 AM
Points: 535,
Visits: 2,295
|
|
Ahem... As I'm helping Joe judge this, I'll start by congratulating Chris on this entry. It is very ingenious and shows the way.. Chris, you're on the right track, but beware. On my Server, it is a third slower than Joe's original solution (using his test data) and it gives false positives (Phil in the future: I was wrong about this - I was working with a mangled version of the Simple-Talk version of Chris's entry- now fixed.) Sorry). With the test data, it says, for example, 'Adam dropped purchase amount on 2008-11-25' whereas the raw data gives....
/* Customer_id order_date order_amt ----------- ---------- --------------------------------------- Adam 2004-05-08 278.46 Adam 2005-02-26 612.43 Adam 2005-08-13 452.65 Adam 2007-07-22 499.30 Adam 2008-11-25 799.89 */
Best wishes,
Phil Factor Simple Talk
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Thursday, May 30, 2013 8:03 AM
Points: 266,
Visits: 1,025
|
|
Phil Factor (9/28/2009)
Ahem... As I'm helping Joe judge this, I'll start by congratulating Chris on this entry. It is very ingenious and shows the way.. Chris, you're on the right track, but beware. On my Server, it is a third slower than Joe's original solution (using his test data) and it gives false positives. With the test data, it says, for example, 'Adam dropped purchase amount on 2008-11-25' whereas the raw data gives.... /* Customer_id order_date order_amt ----------- ---------- --------------------------------------- Adam 2004-05-08 278.46 Adam 2005-02-26 612.43 Adam 2005-08-13 452.65 Adam 2007-07-22 499.30 Adam 2008-11-25 799.89 */
Hi Phil
Thanks for the feedback.
Unfortunately I can't replicate the inconsistency that you're witnessing between mine and Joe's (original) versions using Joe's test data. Both queries return 391 rows, the first four rows of each resultset (when ordered by customer_id) are as follows:
/* --Joe's solution: customer_id (No column name) (No column name) -------------------------------------------------------- Aaron dropped purchase amount on 2008-08-07 Abel dropped purchase amount on 2008-12-23 Abraham dropped purchase amount on 2008-01-03 Aimee dropped purchase amount on 2009-09-19
--My solution: customer_id (No column name) order_date -------------------------------------------------------- Aaron dropped purchase amount on 2008-08-07 Abel dropped purchase amount on 2008-12-23 Abraham dropped purchase amount on 2008-01-03 Aimee dropped purchase amount on 2009-09-19 */
In terms of performance, using SQL Profiler to monitor performance both queries take a mean of 100ms to complete, but the original shows ~1850 reads whereas my version shows ~24 reads.
I'm running 32-bit SQL Server 2008 SP1 CU4 (10.0.2734).
Cheers Chris
Edit: Just noticed that my second post to the Simple-Talk page (http://www.simple-talk.com/sql/t-sql-programming/celkos-sql-stumper-the-data-warehouse-problem-/) has had the second WHERE condition removed - possibly when the code was re-formatted? I guess that this could account for the inconsistency if this is where you obtained my version from.
|
|
|
|
|
Mr or Mrs. 500
      
Group: General Forum Members
Last Login: Wednesday, June 12, 2013 2:10 AM
Points: 535,
Visits: 2,295
|
|
Oops! Yes, looks like I was working on code that had got mangled in the code-highlighting process. Sorry about that. It gives the same result as Joe's code now! It is also giving slightly better performance than Joe's code. What a difference one line makes!
Best wishes,
Phil Factor Simple Talk
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: 2 days ago @ 3:28 AM
Points: 188,
Visits: 362
|
|
/*--------------------------------------------------------------------------+ ¦ The execution plan looks way over the top for this, but here goes anyway. ¦ ¦ The idea is to assign row numbers to the base recordset, and match it ¦ ¦ to a second recordset which has a row number offset by +1. Using a join ¦ ¦ to match row number to offset row number the order amounts from current ¦ ¦ row and previous row can be compared. ¦ +--------------------------------------------------------------------------*/
select customer_id, ' dropped purchase amount on ', order_date from ( select customer_id, row_no = row_number() over ( partition by customer_id order by order_date ), order_date, max_order_date = max(order_date) over (partition by customer_id), order_amt from DailySalesTotals ) as dst, ( select prev_row_customer_id = customer_id, offset_row_no = row_number() over ( partition by customer_id order by order_date ) + 1, prev_row_order_amt = order_amt from DailySalesTotals ) as PreviousRow_DailySalesTotals where customer_id = prev_row_customer_id and row_no = offset_row_no and order_date = max_order_date and order_amt < prev_row_order_amt
Inflatable sailors never die, they gently deflate over time.
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Today @ 1:08 AM
Points: 1,563,
Visits: 6,116
|
|
-- For each customer, number all records customer in descending order of order_date ;WITH NumberedDailySalesTotals AS ( SELECT customer_id, order_date, order_amt, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) rn FROM DailySalesTotals ) -- For each customer, collect the date, amount of the record with the highest date, -- and the date and amount of the second highest date , PivotDailySalesTotals AS ( SELECT customer_id, MAX(CASE WHEN rn = 1 THEN order_date ELSE '1753/01/01' END) last_date, SUM(CASE WHEN rn = 1 THEN order_amt ELSE 0.0 END) last_amt, MAX(CASE WHEN rn = 2 THEN order_date ELSE '1753/01/01' END) second_last_date, SUM(CASE WHEN rn = 2 THEN order_amt ELSE 0.0 END) second_last_amt FROM NumberedDailySalesTotals dst GROUP BY customer_id ) -- Filter out the records where last_amount is less then the second last amount -- Requires only a single table scan and 12 logical reads SELECT customer_id, ' dropped purchase amount on ', last_date FROM PivotDailySalesTotals WHERE last_amt < second_last_amt
|
|
|
|
|
SSCoach
         
Group: General Forum Members
Last Login: 2 days ago @ 1:45 PM
Points: 15,442,
Visits: 9,572
|
|
I used a slight modification on the CTE solution:
SELECT *, Row_Number() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS OrderSeq INTO #T FROM ##DailySalesTotals ;
ALTER TABLE #T ADD PRIMARY KEY (customer_id, order_date) ;
CREATE INDEX IDX_OrderSeq ON #T (OrderSeq, order_amt) ;
SELECT T2.customer_id, T2.order_date as 'dropped qty on' FROM #T T1 INNER JOIN #T T2 ON T1.customer_id = T2.customer_id AND T1.OrderSeq = 2 AND T2.OrderSeq = 1 AND T1.order_amt > T2.order_amt ; On my desktop machine, the CTE solution took an average of 50 milliseconds, while this took an average of 46.
I substituted a global temp table for the original table, but otherwise kept Joe's script intact. (Just makes it easier to clean up after myself when I do this kind of testing.) Edit: Actually, I had to change the date field to datetime, because I don't have 2008 available here. I'll test on 2008 when I get home.
The query just uses a temp table with an index on it instead of a CTE. Since SQL Server creates a worktable in tempdb for the CTE anyway, the only real difference is adding the index.
I tested explicitly creating the temp table instead of Select Into, and that took it up to a 56 millisecond average.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Monday, June 10, 2013 9:00 AM
Points: 223,
Visits: 285
|
|
This is my solution. I've not used CTE's yet, but the idea is very similar - good work posted so far by others!
I first used the RANK() function on the purchase date, then eliminated all sales records except the most two recent. I turned the result set on its side (for ease of comparison and fewer table reads) and compared the most recent sale to the second most recent sale. The results are filtered for those with decreasing "most recent sale" amounts. Of course, for additional logic the CASE statements would need to be changed or added to.
My main focus was portability and (hopefully) ease of maintenance. I do not have an Oracle instance to test on, but this exact code should also work in 9i + (or whatever release had the Windowing functions available). This was a fun excercise and brought me back to my old DW days.
SELECT t1.customer_id, ' dropped purchase amount on ', MAX(CASE WHEN t1.order_ordinal = 1 THEN t1.order_date END) FROM( SELECT DST.customer_id, DST.order_date, DST.order_amt, RANK() OVER (PARTITION BY DST.customer_id ORDER BY DST.order_date DESC) AS order_ordinal, RANK() OVER (PARTITION BY DST.customer_id ORDER BY DST.order_amt ASC) AS order_amt_ordinal FROM dbo.DailySalesTotals AS DST) t1 WHERE --Only review last n orders t1.order_ordinal <= 2 GROUP BY t1.customer_id HAVING --Eliminate customers with fewer than n orders MAX(CASE WHEN t1.order_ordinal = 2 THEN t1.order_date END) IS NOT NULL --Compare most recent order amount with first order amount and return those where the purchase amount decreased --Change or add CASE statements to alter scenarios desired AND MAX(CASE WHEN t1.order_ordinal = 1 THEN t1.order_amt END) < MAX(CASE WHEN t1.order_ordinal = 2 THEN t1.order_amt END) GO Statistics with full data set in the *.zip file: SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. Warning: Null value is eliminated by an aggregate or other SET operation.
(391 row(s) affected) Table 'DailySalesTotals'. Scan count 1, logical reads 13, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times: CPU time = 15 ms, elapsed time = 121 ms.
Mark Marinovic
|
|
|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Yesterday @ 2:29 AM
Points: 2,677,
Visits: 2,277
|
|
Mark,
when I run your code I get some CPU and more logical reads - have you tried on a clean 'server'
i.e after restarting or running dbcc dropcleanbuffers & dbcc freeproccache
[please don't run these on a live production server]
Kev
|
|
|
|