Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12345»»»

Celko's SQL Stumper: The Data Warehouse Problem Expand / Collapse
Author
Message
Posted Monday, September 28, 2009 12:39 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, November 18, 2014 12:29 PM
Points: 1,945, Visits: 3,121
Comments posted to this topic are about the item Celko's SQL Stumper: The Data Warehouse Problem

Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #794383
Posted Monday, September 28, 2009 2:42 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, October 1, 2014 6:34 AM
Points: 288, Visits: 1,078
First, using a CTE and the ROW_NUMBER function, assign a row number to each row, partitioned by customer_id and ordered by order date descending.

Then use two instances of the CTE in the output query - once to represent the most recent order, and again to represent the second most recent order. The respective orders are obtained by filtering on the RowNumber column.

Then join the two instances of the CTE together on customer_id and apply a filter to return only those customers whose most recent order amount was less then the amount of their second most recent order.

Here's the code:

WITH CTE
AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY order_date DESC) AS RowNumber,
customer_id,
order_date,
order_amt
FROM DailySalesTotals
)
SELECT order1.customer_id,
' dropped purchase amount on ',
order1.order_date
FROM CTE order1
INNER JOIN CTE order2
ON order2.customer_id = order1.customer_id
--The second most recent order
AND order2.rownumber = 2
WHERE
--The most recent order
order1.RowNumber = 1
--Recent order amount lower than second most recent order amount
AND order1.order_amt < order2.order_amt;

Post #794399
Posted Monday, September 28, 2009 3:05 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Monday, November 10, 2014 12:04 PM
Points: 590, Visits: 2,565
Ahem... As I'm helping Joe judge this, I'll start by congratulating Chris on this entry. It is very ingenious and shows the way.. Chris, you're on the right track, but beware. On my Server, it is a third slower than Joe's original solution (using his test data) and it gives false positives (Phil in the future: I was wrong about this - I was working with a mangled version of the Simple-Talk version of Chris's entry- now fixed.) Sorry). With the test data, it says, for example, 'Adam dropped purchase amount on 2008-11-25'
whereas the raw data gives....


/*
Customer_id order_date order_amt
----------- ---------- ---------------------------------------
Adam 2004-05-08 278.46
Adam 2005-02-26 612.43
Adam 2005-08-13 452.65
Adam 2007-07-22 499.30
Adam 2008-11-25 799.89
*/




Best wishes,

Phil Factor
Simple Talk
Post #794408
Posted Monday, September 28, 2009 3:35 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, October 1, 2014 6:34 AM
Points: 288, Visits: 1,078
Phil Factor (9/28/2009)
Ahem... As I'm helping Joe judge this, I'll start by congratulating Chris on this entry. It is very ingenious and shows the way.. Chris, you're on the right track, but beware. On my Server, it is a third slower than Joe's original solution (using his test data) and it gives false positives. With the test data, it says, for example, 'Adam dropped purchase amount on 2008-11-25'
whereas the raw data gives....


/*
Customer_id order_date order_amt
----------- ---------- ---------------------------------------
Adam 2004-05-08 278.46
Adam 2005-02-26 612.43
Adam 2005-08-13 452.65
Adam 2007-07-22 499.30
Adam 2008-11-25 799.89
*/




Hi Phil

Thanks for the feedback.

Unfortunately I can't replicate the inconsistency that you're witnessing between mine and Joe's (original) versions using Joe's test data. Both queries return 391 rows, the first four rows of each resultset (when ordered by customer_id) are as follows:

/*
--Joe's solution:
customer_id (No column name) (No column name)
--------------------------------------------------------
Aaron dropped purchase amount on 2008-08-07
Abel dropped purchase amount on 2008-12-23
Abraham dropped purchase amount on 2008-01-03
Aimee dropped purchase amount on 2009-09-19

--My solution:
customer_id (No column name) order_date
--------------------------------------------------------
Aaron dropped purchase amount on 2008-08-07
Abel dropped purchase amount on 2008-12-23
Abraham dropped purchase amount on 2008-01-03
Aimee dropped purchase amount on 2009-09-19
*/

In terms of performance, using SQL Profiler to monitor performance both queries take a mean of 100ms to complete, but the original shows ~1850 reads whereas my version shows ~24 reads.

I'm running 32-bit SQL Server 2008 SP1 CU4 (10.0.2734).

Cheers
Chris

Edit: Just noticed that my second post to the Simple-Talk page (http://www.simple-talk.com/sql/t-sql-programming/celkos-sql-stumper-the-data-warehouse-problem-/) has had the second WHERE condition removed - possibly when the code was re-formatted? I guess that this could account for the inconsistency if this is where you obtained my version from.
Post #794413
Posted Monday, September 28, 2009 4:02 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Monday, November 10, 2014 12:04 PM
Points: 590, Visits: 2,565
Oops! Yes, looks like I was working on code that had got mangled in the code-highlighting process. Sorry about that. It gives the same result as Joe's code now! It is also giving slightly better performance than Joe's code. What a difference one line makes!


Best wishes,

Phil Factor
Simple Talk
Post #794418
Posted Monday, September 28, 2009 5:02 AM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Tuesday, November 4, 2014 5:40 AM
Points: 190, Visits: 400
/*--------------------------------------------------------------------------+
¦ The execution plan looks way over the top for this, but here goes anyway. ¦
¦ The idea is to assign row numbers to the base recordset, and match it ¦
¦ to a second recordset which has a row number offset by +1. Using a join ¦
¦ to match row number to offset row number the order amounts from current ¦
¦ row and previous row can be compared. ¦
+--------------------------------------------------------------------------*/

select customer_id, ' dropped purchase amount on ', order_date
from ( select customer_id,
row_no = row_number() over ( partition by customer_id order by order_date ),
order_date,
max_order_date = max(order_date) over (partition by customer_id),
order_amt
from DailySalesTotals
) as dst,
( select prev_row_customer_id = customer_id,
offset_row_no = row_number() over ( partition by customer_id order by order_date ) + 1,
prev_row_order_amt = order_amt
from DailySalesTotals
) as PreviousRow_DailySalesTotals
where customer_id = prev_row_customer_id
and row_no = offset_row_no
and order_date = max_order_date
and order_amt < prev_row_order_amt



Inflatable sailors never die, they gently deflate over time.

Post #794430
Posted Monday, September 28, 2009 7:29 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Friday, November 21, 2014 10:09 PM
Points: 1,598, Visits: 6,659
-- For each customer, number all records customer in descending order of order_date
;WITH NumberedDailySalesTotals AS
(
SELECT
customer_id, order_date, order_amt,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) rn
FROM
DailySalesTotals
)
-- For each customer, collect the date, amount of the record with the highest date,
-- and the date and amount of the second highest date
, PivotDailySalesTotals AS
(
SELECT
customer_id,
MAX(CASE WHEN rn = 1 THEN order_date ELSE '1753/01/01' END) last_date,
SUM(CASE WHEN rn = 1 THEN order_amt ELSE 0.0 END) last_amt,
MAX(CASE WHEN rn = 2 THEN order_date ELSE '1753/01/01' END) second_last_date,
SUM(CASE WHEN rn = 2 THEN order_amt ELSE 0.0 END) second_last_amt
FROM
NumberedDailySalesTotals dst
GROUP BY
customer_id
)
-- Filter out the records where last_amount is less then the second last amount
-- Requires only a single table scan and 12 logical reads
SELECT
customer_id, ' dropped purchase amount on ', last_date
FROM
PivotDailySalesTotals
WHERE
last_amt < second_last_amt

Post #794497
Posted Monday, September 28, 2009 8:06 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Monday, November 17, 2014 12:50 PM
Points: 13,872, Visits: 9,598
I used a slight modification on the CTE solution:
SELECT
*,
Row_Number() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS OrderSeq
INTO
#T
FROM
##DailySalesTotals ;

ALTER TABLE #T
ADD
PRIMARY KEY (customer_id, order_date) ;

CREATE INDEX IDX_OrderSeq ON #T (OrderSeq, order_amt) ;

SELECT
T2.customer_id, T2.order_date as 'dropped qty on'
FROM
#T T1
INNER JOIN #T T2
ON T1.customer_id = T2.customer_id
AND T1.OrderSeq = 2
AND T2.OrderSeq = 1
AND T1.order_amt > T2.order_amt ;

On my desktop machine, the CTE solution took an average of 50 milliseconds, while this took an average of 46.

I substituted a global temp table for the original table, but otherwise kept Joe's script intact. (Just makes it easier to clean up after myself when I do this kind of testing.) Edit: Actually, I had to change the date field to datetime, because I don't have 2008 available here. I'll test on 2008 when I get home.

The query just uses a temp table with an index on it instead of a CTE. Since SQL Server creates a worktable in tempdb for the CTE anyway, the only real difference is adding the index.

I tested explicitly creating the temp table instead of Select Into, and that took it up to a 56 millisecond average.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #794512
Posted Monday, September 28, 2009 9:22 AM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Friday, August 29, 2014 2:54 PM
Points: 223, Visits: 339
This is my solution. I've not used CTE's yet, but the idea is very similar - good work posted so far by others!

I first used the RANK() function on the purchase date, then eliminated all sales records except the most two recent. I turned the result set on its side (for ease of comparison and fewer table reads) and compared the most recent sale to the second most recent sale. The results are filtered for those with decreasing "most recent sale" amounts. Of course, for additional logic the CASE statements would need to be changed or added to.

My main focus was portability and (hopefully) ease of maintenance. I do not have an Oracle instance to test on, but this exact code should also work in 9i + (or whatever release had the Windowing functions available). This was a fun excercise and brought me back to my old DW days.

SELECT
t1.customer_id,
' dropped purchase amount on ',
MAX(CASE WHEN t1.order_ordinal = 1 THEN t1.order_date END)
FROM(
SELECT
DST.customer_id,
DST.order_date,
DST.order_amt,
RANK() OVER
(PARTITION BY DST.customer_id ORDER BY DST.order_date DESC) AS order_ordinal,
RANK() OVER
(PARTITION BY DST.customer_id ORDER BY DST.order_amt ASC) AS order_amt_ordinal
FROM dbo.DailySalesTotals AS DST) t1
WHERE
--Only review last n orders
t1.order_ordinal <= 2
GROUP BY
t1.customer_id
HAVING
--Eliminate customers with fewer than n orders
MAX(CASE WHEN t1.order_ordinal = 2 THEN t1.order_date END) IS NOT NULL

--Compare most recent order amount with first order amount and return those where the purchase amount decreased
--Change or add CASE statements to alter scenarios desired
AND MAX(CASE WHEN t1.order_ordinal = 1 THEN t1.order_amt END) < MAX(CASE WHEN t1.order_ordinal = 2 THEN t1.order_amt END)
GO

Statistics with full data set in the *.zip file:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.

(391 row(s) affected)
Table 'DailySalesTotals'. Scan count 1, logical reads 13, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 121 ms.

Mark Marinovic
Post #794558
Posted Monday, September 28, 2009 9:46 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 4:35 AM
Points: 2,716, Visits: 2,480
Mark,

when I run your code I get some CPU and more logical reads - have you tried on a clean 'server'

i.e after restarting or running dbcc dropcleanbuffers & dbcc freeproccache

[please don't run these on a live production server]


Kev
Post #794580
« Prev Topic | Next Topic »

Add to briefcase 12345»»»

Permissions Expand / Collapse