September 19, 2012 at 4:26 pm
I work for a credit card company. Our database has a customer table and a transaction table. Fields in the customer table are SSN and CustomerKey. Fields in the transaction table are CustomerKey, transaction date (Transdate), and transaction amount (TransAmt).
I need a query that can identify each ssn where the sum of any of their transaction amounts > 1000 within a two consecutive day period in 2012. If a ssn has transaction amounts > 1000 within a two day period, I need the query to return all the transactions for that ssn.
Here is an example of the raw data in the Transaction Table:
Trans#-----CustKey-----Date--------Amount
1-----------12345----01/01/12--------$600
2-----------12345----01/02/12--------$500
3-----------67890----01/03/12--------$10
4-----------98765----04/01/12--------$600
5-----------43210----04/02/12--------$600
6-----------43210----04/03/12--------$100
7-----------13579----04/02/12--------$600
8-----------24568----04/03/12--------$100
Here is an example of the raw data in the Customer Table:
CustKey-----SSN
12345------123456789
67890------123456789
98765------987654321
43210------987654321
13579------246801357
24568------246801357
Here are the results I need:
Trans#------SSN---------Date---------Amount
1--------123456789----01/01/12---------$600
2--------123456789----01/02/12---------$500
3--------123456789----01/03/12----------$10
4--------987654321----04/01/12---------$600
5--------987654321----04/02/12---------$600
6--------987654321----04/03/12---------$100
As you can see in my results included all transactions for SSN 123456789 and 987654321, and excluded SSN 246801357.
Here is what I tried.
select c.ssn, t.*
from TransDtl t
join Customer c on t.CustomerKey = c.CustomerKey
join(select c.ssn, SUM(t.TransAmt) as Total
from TransDtl t
join Customer c on t.CustomerKey = c.CustomerKey
where t.postdate between '1/4/2012' and '1/5/2012'
group by c.ssn
having SUM(t.transamt) > 1000) x
on x.ssn = c.SSN
where t.postdate between '1/4/2012' and '1/5/2012'
The problem I am having is that I can only run my above query for 2 consecutive days out of the year. I want to run the query for the whole year, and identify any transaction in that year where the sum of the transactions exceeded 1000 in any given 2 consecutive day period; and then list all the transactions for the ssn.
Thank you!
September 19, 2012 at 5:27 pm
It might be useful for you to read:
Title: Finding Gaps in a Sequential Number Sequence
http://www.sqlservercentral.com/Forums/Topic267061-176-2.aspx#bm1135582
And pay particular attention to the comments on the article by Jeff Moden, and all the additional contributors in the discussion of the article
September 19, 2012 at 7:26 pm
egerencher (9/19/2012)
Here are the results I need:Trans#------SSN---------Date---------Amount
1--------123456789----01/01/12---------$600
2--------123456789----01/02/12---------$500
3--------123456789----01/03/12----------$10
4--------987654321----04/01/12---------$600
5--------987654321----04/02/12---------$600
6--------987654321----04/03/12---------$100
Firstly, I'd like to question why trans# 3 and 6 appear in your expected results. The total charges for period 02-03 Jan is $510 for 123456789 and for 02-03 Apr is $700 for 987654321.
Secondly, the next time you post, you should help us to help you by providing some DDL and readily consumable test data like this (check the link in bitbucket's signature line for more info):
DECLARE @Trans TABLE
(Trans# INT IDENTITY, CustKey INT, Date DATETIME, Amount MONEY)
INSERT INTO @Trans
SELECT 12345,'01/01/12',$600 UNION ALL SELECT 12345,'01/02/12',$500
UNION ALL SELECT 67890,'01/03/12',$10 UNION ALL SELECT 98765,'04/01/12',$600
UNION ALL SELECT 43210,'04/02/12',$600 UNION ALL SELECT 43210,'04/03/12',$100
UNION ALL SELECT 13579,'04/02/12',$600 UNION ALL SELECT 24568,'04/03/12',$100
DECLARE @Cust TABLE
(CustKey INT, SSN VARCHAR(9))
INSERT INTO @Cust
SELECT 12345,'123456789' UNION ALL SELECT 67890,'123456789'
UNION ALL SELECT 98765,'987654321' UNION ALL SELECT 43210,'987654321'
UNION ALL SELECT 13579,'246801357' UNION ALL SELECT 24568,'246801357'
If I'm right about the expected results and ignoring any performance considerations of this approach, you may be able to get the transaction list as follows:
DECLARE @StartDT DATETIME = '2012-09-20'
,@DaysBack INT = 365
;WITH Tally (n) AS (
SELECT TOP (@DaysBack) 1+@DaysBack-ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns),
MyTrans AS (
SELECT Trans#, c.CustKey, SSN, Date, Amount
FROM @Trans t
INNER JOIN @Cust c ON c.CustKey = t.Custkey)
SELECT DISTINCT Trans#, SSN, Date, Amount
FROM (
SELECT StartDT, EndDT=DATEADD(day, 1, StartDT), Trans#, SSN, Date, Amount, CustKey
,Charges=SUM(Amount) OVER (PARTITION BY SSN, StartDT)
FROM Tally a
CROSS APPLY (SELECT StartDT=DATEADD(day, -n, @StartDt)) b
INNER JOIN MyTrans ON Date >= StartDt AND Date <= DATEADD(day, 1, StartDT)
) b
WHERE Charges >= 1000
I have chosen to use a Tally table but a Calendar table would work just as well.
Edit: Oh yes. And let's hope for the sake of privacy and legality, those SSNs are encrypted!
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
September 19, 2012 at 9:08 pm
Very interesting problem! Thank you for posting it. There are some subtle "real-world" quirks in here that make this potentially quite complex.
Dwain is correct, posting some SQL Code to create the tables and populate them with test data is very helpful. I've done that here for anyone else who wants to try the problem. I changed the names of two fields for my own sanity: Trans# became TransID (I don't like special characters in database object names) and Date became TransDate because Date is a keyword in SQL Server. Although, as I think about it, PostDate might be a better name, since that seems to better reflect what the data represents: the date that the credit card transaction posted.
Oh, I also put a creation script for a tally table in here. I thought I would need one, but I ended up not using it. I left it for anyone else who might want to use one. I developed this solution on SQL Server 2008 R2, but I believe it would run on version 2005 or later.
CREATE TABLE Customer
( CustKey INT PRIMARY KEY
,SSN NCHAR(9) NOT NULL
);
INSERT Customer (CustKey, SSN)
SELECT 12345, '123456789'
UNION ALL
SELECT 67890, '123456789'
UNION ALL
SELECT 98765, '987654321'
UNION ALL
SELECT 43210, '987654321'
UNION ALL
SELECT 13579, '246801357'
UNION ALL
SELECT 24568, '246801357';
CREATE TABLE TransDtl
( TransID INT PRIMARY KEY
,CustKey NCHAR(9) NOT NULL
,TransDate DATETIME NOT NULL
,Amount MONEY NOT NULL
);
INSERT TransDtl (TransID,CustKey,TransDate,Amount)
SELECT 1, 12345, '01/01/12', 600.00
UNION ALL
SELECT 2, 12345, '01/02/12', 500.00
UNION ALL
SELECT 3, 67890, '01/03/12', 10.00
UNION ALL
SELECT 4, 98765, '04/01/12', 600.00
UNION ALL
SELECT 5, 43210, '04/02/12', 600.00
UNION ALL
SELECT 6, 43210, '04/03/12', 100.00
UNION ALL
SELECT 7, 13579, '04/02/12', 600.00
UNION ALL
SELECT 8, 24568, '04/03/12', 100.00;
SELECT TOP 1000001
IDENTITY(INT,0,1) AS [N]
INTO dbo.Tally
FROM master.sys.all_columns AS AC1
CROSS JOIN master.sys.all_columns AS AC2;
/* Add a tightly packed clustered Primary Key to maximize
* performance and minimize space used.
*/
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N
PRIMARY KEY CLUSTERED (N)
WITH FILLFACTOR = 100;
I'd like to make one quick comment about the data before I dive into my solution. I noticed in the Customer table that some of the SSN's are associated with more than one CustKey. Since you said that you work for a credit card company, I assume that the SSN field represents individual customers and the CustKey field represents individual credit cards. In other words, a customer may have more than one card, but you want to create an alert when a customer charges over a certain amount of money within a certain amount of time, whether they put all of the charges on one card or spread them across multiple cards. (That's one of the "real-world quirks" to which I referred above.)
With that observation out of the way, here is my solution. I'll present the whole thing here and then walk through it step by step.
SET STATISTICS IO ON;
WITH CustomerTrans
AS
( SELECT Customer.SSN AS SSN
,TransDtl.TransID AS TransID
,TransDtl.TransDate AS TransDate
,TransDtl.Amount AS Amount
FROM Customer
JOIN TransDtl
ON Customer.CustKey = TransDtl.CustKey
),
TransGroup
AS
( SELECT A.SSN AS SSN
,ROW_NUMBER() OVER (PARTITION BY A.SSN ORDER BY A.SSN) AS GroupID
,MIN(A.TransDate) AS StartDate
,MAX(B.TransDate) AS EndDate
FROM CustomerTrans AS A
JOIN CustomerTrans AS B
ON A.SSN = B.SSN
AND DATEDIFF(Day,A.TransDate,B.TransDate) BETWEEN 0 AND 1
GROUP BY A.SSN
),
GroupTotal
AS
( SELECT TransGroup.SSN AS SSN
,TransGroup.GroupID AS TransGroupID
,SUM(CustomerTrans.Amount) AS TransGroupAmount
FROM TransGroup
JOIN CustomerTrans
ON TransGroup.SSN = CustomerTrans.SSN
AND CustomerTrans.TransDate BETWEEN TransGroup.StartDate AND TransGroup.EndDate
GROUP BY TransGroup.SSN
,TransGroup.GroupID
HAVING SUM(CustomerTrans.Amount) >= 1000.00
),
FlaggedCustomer
AS
( SELECT SSN
FROM GroupTotal
GROUP BY SSN
)
SELECT CustomerTrans.TransID
,FlaggedCustomer.SSN
,CustomerTrans.TransDate
,CustomerTrans.Amount
FROM FlaggedCustomer
JOIN CustomerTrans
ON FlaggedCustomer.SSN = CustomerTrans.SSN;
First, you'll note that I've set Statistics IO on. That's because I want to see how many times my solution scans the data tables. That will give us some clue as to how well the solution might scale. We don't have much data to work with here, but I imagine that you have a massive amount of data to query at work. My solution scans the data a total of 5 times. Ideally, we'd like for the solution to scan the data only once. There are other factors to consider, of course, but this is just a quick and dirty rule of thumb. Based on this, I would expect my solution to perform fairly well on large data sets. (I also tried Dwain's solution with statistics IO on, and his solution does a total of 6 scans. So, the performance of the two solutions would probably be very similar.)
My solution is all one query, but I've broken the "steps" I use into Common Table Expressions (or CTE's). Each CTE builds on the previous CTE's. So, below I'll re-post each CTE separately and explain what it does and how this builds into a solution to your problem.
WITH CustomerTrans
AS
( SELECT Customer.SSN AS SSN
,TransDtl.TransID AS TransID
,TransDtl.TransDate AS TransDate
,TransDtl.Amount AS Amount
FROM Customer
JOIN TransDtl
ON Customer.CustKey = TransDtl.CustKey
),
The first CTE associates all of the individual transactions with the Customer SSN's, rather than the CustKey in the TransDtl table. We query this CTE throughout the rest of the solution rather than the source TransDtl table, because we need to find Customers that posted more than $1,000 in charges over a two day period, not individual Credit Cards.
TransGroup
AS
( SELECT A.SSN AS SSN
,ROW_NUMBER() OVER (PARTITION BY A.SSN ORDER BY A.SSN) AS GroupID
,MIN(A.TransDate) AS StartDate
,MAX(B.TransDate) AS EndDate
FROM CustomerTrans AS A
JOIN CustomerTrans AS B
ON A.SSN = B.SSN
AND DATEDIFF(Day,A.TransDate,B.TransDate) BETWEEN 0 AND 1
GROUP BY A.SSN
),
This second CTE is the core part of my solution. This is loosely based on one of Itzik Ben-Gan's solutions to the "Islands" problem (finding groups of sequential sequences). I got it from the first volume SQL Server MVP Deep Dives book, but I imagine you can find it online somewhere. The SQL Server Central article that BitBucket references above probably mentions it, too.
However, another one of those "real-world quirks" in this problem is that the intervals are not fixed. We want to find the customers with transactions totaling $1,000 or more in any two day period. So, here I join the CustomerTrans CTE I just created to itself on the Customer SSN where the the post date of the transaction in the first data set is within one day of the transaction in the second data set. Because of the way the DATEDIFF function is written here, the A transactions will only join with other transactions posted on the same day or the next day, and the B transactions will only join with other transactions posted on the same day or the previous day.
Because we don't filter out results where A.TransID = B.TransID this self-join will generate duplicate rows, but that is OK. We just want the starting date for this group of transactions, the ending date and some kind of Group Identifier that we can use later. (Besides, if a customer charges over $1,000 in a single transaction, that meets the selection criteria set out in the problem description.)
That Group Identifier is generated by the ROW_NUMBER() function and it is the key to this portion of the solution. (It is also what I borrowed from Itzik Ben-Gan's solution to the "Islands" problem.)
GroupTotal
AS
( SELECT TransGroup.SSN AS SSN
,TransGroup.GroupID AS TransGroupID
,SUM(CustomerTrans.Amount) AS TransGroupAmount
FROM TransGroup
JOIN CustomerTrans
ON TransGroup.SSN = CustomerTrans.SSN
AND CustomerTrans.TransDate BETWEEN TransGroup.StartDate AND TransGroup.EndDate
GROUP BY TransGroup.SSN
,TransGroup.GroupID
HAVING SUM(CustomerTrans.Amount) >= 1000.00
),
The third CTE joins back to our first CTE, CustomerTrans, groups the transactions by Customer SSN and the Group ID we generated in the second CTE and calculates the total transaction amount for each group. Finally it filters out the groups having a total transaction amount less than $1,000.
FlaggedCustomer
AS
( SELECT SSN
FROM GroupTotal
GROUP BY SSN
)
Our final CTE before we present our final output gets a distinct list of the customers meeting the criteria set forth in the problem.
SELECT CustomerTrans.TransID
,FlaggedCustomer.SSN
,CustomerTrans.TransDate
,CustomerTrans.Amount
FROM FlaggedCustomer
JOIN CustomerTrans
ON FlaggedCustomer.SSN = CustomerTrans.SSN;
And here is our final output. We use the list of Customers with more than $1,000 in charges within any two-day period from the last CTE, join it back to the original list of Customer transactions and list all of the transactions for Customers meeting the criteria.
I'd be very interested to see what solutions others can create. I'd also like to see how the various solutions perform on large data sets. I would expect my solution to have reasonable performance on larger data sets, but it could certainly be beaten. Perhaps Phil Factor would be interested in presenting a similar problem on his Speed Phreaks site. π
Thanks again for posting your problem. I hope that this has been helpful to you. π
September 19, 2012 at 9:47 pm
David - Really nice explanation of your approach. I have to say mine falls short in that regard.
I think it's important to look at the whole picture though. That includes both IOs and timings over a large test data set. So let's put together a small test harness (about 16,000 transaction rows) and give it a whirl:
CREATE TABLE #Customer
( CustKey INT PRIMARY KEY
,SSN NCHAR(9) NOT NULL
);
CREATE TABLE #TransDtl
( TransID INT IDENTITY PRIMARY KEY
,CustKey NCHAR(9) NOT NULL
,TransDate DATETIME NOT NULL
,Amount MONEY NOT NULL
);
INSERT INTO #Customer
SELECT 12345,'123456789' UNION ALL SELECT 67890,'123456789'
UNION ALL SELECT 98765,'987654321' UNION ALL SELECT 43210,'987654321'
UNION ALL SELECT 13579,'246801357' UNION ALL SELECT 24568,'246801357'
INSERT INTO #TransDtl
SELECT 12345,'01/01/12',$600 UNION ALL SELECT 12345,'01/02/12',$500
UNION ALL SELECT 67890,'01/03/12',$700 UNION ALL SELECT 98765,'04/01/12',$600
UNION ALL SELECT 43210,'04/02/12',$600 UNION ALL SELECT 43210,'04/03/12',$100
UNION ALL SELECT 13579,'04/02/12',$600 UNION ALL SELECT 24568,'04/03/12',$100
DECLARE @StartDT DATETIME = '2012-09-20'
,@DaysBack INT = 365
;WITH Tally (n) AS (
SELECT TOP 2000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b)
INSERT INTO #TransDtl
SELECT CustKey
,TransDate=DATEADD(day, -ABS(CHECKSUM(NEWID())) % 360, @StartDT)
,Amount=ABS(CHECKSUM(NEWID())) % 1000
FROM #TransDtl
CROSS APPLY Tally
-- Holders to capture output
DECLARE @TransID INT
,@SSN NCHAR(9)
,@TransDate DATETIME
,@Amount MONEY
PRINT '----------------- Dwain'
SET STATISTICS TIME, IO ON
;WITH Tally (n) AS (
SELECT TOP (@DaysBack) 1+@DaysBack-ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns),
MyTrans AS (
SELECT TransID, c.CustKey, SSN, TransDate, Amount
FROM #TransDtl t
INNER JOIN #Customer c ON c.CustKey = t.Custkey)
SELECT DISTINCT @TransID=TransID, @SSN=SSN, @TransDate=TransDate, @Amount=Amount
FROM (
SELECT StartDT, EndDT=DATEADD(day, 1, StartDT), TransID, SSN, TransDate, Amount, CustKey
,Charges=SUM(Amount) OVER (PARTITION BY SSN, StartDT)
FROM Tally a
CROSS APPLY (SELECT StartDT=DATEADD(day, -n, @StartDt)) b
INNER JOIN MyTrans ON TransDate >= StartDt AND TransDate <= DATEADD(day, 1, StartDT)
) b
WHERE Charges >= 1000
SET STATISTICS TIME, IO OFF
PRINT '----------------- David'
SET STATISTICS TIME, IO ON
;WITH CustomerTrans
AS
( SELECT #Customer.SSN AS SSN
,#TransDtl.TransID AS TransID
,#TransDtl.TransDate AS TransDate
,#TransDtl.Amount AS Amount
FROM #Customer
JOIN #TransDtl
ON #Customer.CustKey = #TransDtl.CustKey
),
TransGroup
AS
( SELECT A.SSN AS SSN
,ROW_NUMBER() OVER (PARTITION BY A.SSN ORDER BY A.SSN) AS GroupID
,MIN(A.TransDate) AS StartDate
,MAX(B.TransDate) AS EndDate
FROM CustomerTrans AS A
JOIN CustomerTrans AS B
ON A.SSN = B.SSN
AND DATEDIFF(Day,A.TransDate,B.TransDate) BETWEEN 0 AND 1
GROUP BY A.SSN
),
GroupTotal
AS
( SELECT TransGroup.SSN AS SSN
,TransGroup.GroupID AS TransGroupID
,SUM(CustomerTrans.Amount) AS TransGroupAmount
FROM TransGroup
JOIN CustomerTrans
ON TransGroup.SSN = CustomerTrans.SSN
AND CustomerTrans.TransDate BETWEEN TransGroup.StartDate AND TransGroup.EndDate
GROUP BY TransGroup.SSN
,TransGroup.GroupID
HAVING SUM(CustomerTrans.Amount) >= 1000.00
),
FlaggedCustomer
AS
( SELECT SSN
FROM GroupTotal
GROUP BY SSN
)
SELECT @TransID=CustomerTrans.TransID
,@SSN=FlaggedCustomer.SSN
,@TransDate=CustomerTrans.TransDate
,@Amount=CustomerTrans.Amount
FROM FlaggedCustomer
JOIN CustomerTrans
ON FlaggedCustomer.SSN = CustomerTrans.SSN;
SET STATISTICS TIME, IO OFF
DROP TABLE #TransDtl
DROP TABLE #Customer
Here are the IO/timing results:
----------------- Dwain
Table 'Worktable'. Scan count 368, logical reads 101966, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#TransDtl 000000000060'. Scan count 1, logical reads 95, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Customer 00000000005F'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 312 ms, elapsed time = 368 ms.
----------------- David
Table '#Customer 00000000005F'. Scan count 7, logical reads 20, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#TransDtl 000000000060'. Scan count 19, logical reads 665, physical reads 0, read-ahead reads 2, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 10, logical reads 534164, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 47565 ms, elapsed time = 41821 ms.
Looks like IOs isn't everything!
Note that I am not saying mine will rip apart the multi-million transaction rows you'd expect in a year's worth of credit card transactions for thousands of customers though. As I said, I thought there might be better performing options available out there.
We still need to hear back from the OP on my comments regarding expected results.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
September 19, 2012 at 9:57 pm
FYI. These were my timing stats on 960,000 transactions:
SQL Server Execution Times:
CPU time = 17815 ms, elapsed time = 99414 ms.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
September 19, 2012 at 10:03 pm
You guys are awesome. I can't wait to get back to work to try this out. I will report back the results tomorrow. I'm a SQL first grader, so bear with me. And yes, the SSN's are encrypted π
September 19, 2012 at 10:18 pm
egerencher (9/19/2012)
You guys are awesome. I can't wait to get back to work to try this out. I will report back the results tomorrow. I'm a SQL first grader, so bear with me. And yes, the SSN's are encrypted π
You're welcome.
The below version (usable in the test harness) appears to be slightly swifter. I made the Tally table do the work of calculating the StartDT, which allowed me to remove the CROSS APPLY.
;WITH Tally (StartDT) AS (
SELECT TOP (@DaysBack) DATEADD(day, -(1+@DaysBack-ROW_NUMBER() OVER (ORDER BY (SELECT NULL))), @StartDt)
FROM sys.all_columns),
MyTrans AS (
SELECT TransID, c.CustKey, SSN, TransDate, Amount
FROM #TransDtl t
INNER JOIN #Customer c ON c.CustKey = t.Custkey)
SELECT DISTINCT @TransID=TransID, @SSN=SSN, @TransDate=TransDate, @Amount=Amount
FROM (
SELECT StartDT, EndDT=DATEADD(day, 1, StartDT), TransID, SSN, TransDate, Amount, CustKey
,Charges=SUM(Amount) OVER (PARTITION BY SSN, StartDT)
FROM Tally a
--CROSS APPLY (SELECT StartDT=DATEADD(day, -n, @StartDt)) b
INNER JOIN MyTrans ON TransDate >= StartDt AND TransDate <= DATEADD(day, 1, StartDT)
) b
WHERE Charges >= 1000
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
September 19, 2012 at 10:26 pm
Many of your assumptions are correct, and I apologize for not making it clearer in the beginning.
In regards to the following:
"Firstly, I'd like to question why trans# 3 and 6 appear in your expected results. The total charges for period 02-03 Jan is $510 for 123456789 and for 02-03 Apr is $700 for 987654321.?" I thought it would make it easier to include the transaction than to exclude it. But as I am finding out, nothing is easy with my problem. So your point is well taken, and these transactions are not something I need in the results.
In regards to the field names, you are correct again. The real field names are postdate, transamt, ssn, customerkey, and the transaction table is really called the transdtl table, which is exactly what you assumed. Again, I was trying to make it easy, which again, didn't work. :/
You are dead on with the "real-world quirk". A customer may have more than one card, and I want to create an alert when a customer charges over a certain amount of money within a certain amount of time, whether they put all of the charges on one card or spread them across multiple cards. BRILLIANT
September 19, 2012 at 10:37 pm
egerencher (9/19/2012)
"Firstly, I'd like to question why trans# 3 and 6 appear in your expected results. The total charges for period 02-03 Jan is $510 for 123456789 and for 02-03 Apr is $700 for 987654321.?" I thought it would make it easier to include the transaction than to exclude it. But as I am finding out, nothing is easy with my problem. So your point is well taken, and these transactions are not something I need in the results.
Happy to hear my analysis was correct.
egerencher (9/19/2012)
You are dead on with the "real-world quirk". A customer may have more than one card, and I want to create an alert when a customer charges over a certain amount of money within a certain amount of time, whether they put all of the charges on one card or spread them across multiple cards. BRILLIANT
How about this for a quirk? Why wouldn't you want to assign a different monetary amount to each customer? That would increase the complexity slightly, but not too much to handle.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
September 19, 2012 at 11:24 pm
Incidentally, a slight change to the table DDL and adding an index had a nice impact on my second solution's elapsed time:
CREATE TABLE #Customer
( CustKey INT
,SSN NCHAR(9) NOT NULL
,PRIMARY KEY CLUSTERED (CustKey, SSN)
);
CREATE TABLE #TransDtl
( TransID INT IDENTITY PRIMARY KEY
,CustKey NCHAR(9) NOT NULL
,TransDate DATETIME NOT NULL
,Amount MONEY NOT NULL
);
CREATE INDEX TransDtl1
ON #TransDtl (CustKey, TransDate)
Results over 960,000 rows:
SQL Server Execution Times:
CPU time = 16942 ms, elapsed time = 34897 ms.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
September 20, 2012 at 2:59 am
Simpler, quicker, same results:
SELECT
t.TransID,
c.SSN,
t.TransDate,
t.Amount
FROM #Customer c
INNER JOIN #TransDtl t
ON t.CustKey = c.CustKey
CROSS APPLY (
SELECT Amount = SUM(ti.Amount)
FROM #TransDtl ti
WHERE t.CustKey = c.CustKey
AND ti.TransDate BETWEEN t.TransDate-1 AND t.TransDate
) s
WHERE s.Amount >= 1000
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
September 20, 2012 at 4:27 am
Here are the IO/timing results:
----------------- Dwain
Table 'Worktable'. Scan count 368, logical reads 101966, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#TransDtl 000000000060'. Scan count 1, logical reads 95, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Customer 00000000005F'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 312 ms, elapsed time = 368 ms.
----------------- David
Table '#Customer 00000000005F'. Scan count 7, logical reads 20, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#TransDtl 000000000060'. Scan count 19, logical reads 665, physical reads 0, read-ahead reads 2, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 10, logical reads 534164, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 47565 ms, elapsed time = 41821 ms.
Looks like IOs isn't everything!
Very, very nice, Dwain! I honestly thought my solution would scale better than that. :blush: Once again, the tally table kicks ... well, you know. π
I can't wait to tear this apart. How did you generate your test data? Do you have a script you could share, or did you use a third-party tool? (If it was a third-party tool, please tell us which one.
As I said before, I think this would be an excellent problem for Phil Factor's Speed Phreaks. It is exactly the sort of Real World problem they like to blow apart. :smooooth:
September 20, 2012 at 4:37 am
ChrisM@Work (9/20/2012)
Simpler, quicker, same results:
SELECT
t.TransID,
c.SSN,
t.TransDate,
t.Amount
FROM #Customer c
INNER JOIN #TransDtl t
ON t.CustKey = c.CustKey
CROSS APPLY (
SELECT Amount = SUM(ti.Amount)
FROM #TransDtl ti
WHERE t.CustKey = c.CustKey
AND ti.TransDate BETWEEN t.TransDate-1 AND t.TransDate
) s
WHERE s.Amount >= 1000
Simpler - for sure!
Test harness (800K rows):
CREATE TABLE #Customer
( CustKey INT
,SSN NCHAR(9) NOT NULL
,PRIMARY KEY CLUSTERED (CustKey, SSN)
);
CREATE TABLE #TransDtl
( TransID INT IDENTITY PRIMARY KEY
,CustKey NCHAR(9) NOT NULL
,TransDate DATETIME NOT NULL
,Amount MONEY NOT NULL
);
CREATE INDEX TransDtl1
ON #TransDtl (CustKey, TransDate)
INSERT INTO #Customer
SELECT 12345,'123456789' UNION ALL SELECT 67890,'123456789'
UNION ALL SELECT 98765,'987654321' UNION ALL SELECT 43210,'987654321'
UNION ALL SELECT 13579,'246801357' UNION ALL SELECT 24568,'246801357'
INSERT INTO #TransDtl
SELECT 12345,'01/01/12',$600 UNION ALL SELECT 12345,'01/02/12',$500
UNION ALL SELECT 67890,'01/03/12',$700 UNION ALL SELECT 98765,'04/01/12',$600
UNION ALL SELECT 43210,'04/02/12',$600 UNION ALL SELECT 43210,'04/03/12',$100
UNION ALL SELECT 13579,'04/02/12',$600 UNION ALL SELECT 24568,'04/03/12',$100
DECLARE @StartDT DATETIME = '2012-09-20'
,@DaysBack INT = 365
;WITH Tally (n) AS (
SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b)
INSERT INTO #TransDtl
SELECT CustKey
,TransDate=DATEADD(day, -ABS(CHECKSUM(NEWID())) % 360, @StartDT)
,Amount=ABS(CHECKSUM(NEWID())) % 1000
FROM #TransDtl
CROSS APPLY Tally
-- Holders to capture output
DECLARE @TransID INT
,@SSN NCHAR(9)
,@TransDate DATETIME
,@Amount MONEY
PRINT '----------------- Dwain New'
SET STATISTICS TIME, IO ON
;WITH Tally (StartDT) AS (
SELECT TOP (@DaysBack) DATEADD(day, -(1+@DaysBack-ROW_NUMBER() OVER (ORDER BY (SELECT NULL))), @StartDt)
FROM sys.all_columns),
MyTrans AS (
SELECT TransID, c.CustKey, SSN, TransDate, Amount
FROM #TransDtl t
INNER JOIN #Customer c ON c.CustKey = t.Custkey)
SELECT DISTINCT @TransID=TransID, @SSN=SSN, @TransDate=TransDate, @Amount=Amount
FROM (
SELECT StartDT, EndDT=DATEADD(day, 1, StartDT), TransID, SSN, TransDate, Amount, CustKey
,Charges=SUM(Amount) OVER (PARTITION BY SSN, StartDT)
FROM Tally a
INNER JOIN MyTrans ON TransDate >= StartDt AND TransDate <= DATEADD(day, 1, StartDT)
) b
WHERE Charges >= 1000
SET STATISTICS TIME, IO OFF
PRINT '----------------- Chris'
SET STATISTICS TIME, IO ON
SELECT
@TransID=t.TransID,
@SSN=c.SSN,
@TransDate=t.TransDate,
@Amount=t.Amount
FROM #Customer c
INNER JOIN #TransDtl t
ON t.CustKey = c.CustKey
CROSS APPLY (
SELECT Amount = SUM(ti.Amount)
FROM #TransDtl ti
WHERE t.CustKey = c.CustKey
AND ti.TransDate BETWEEN t.TransDate-1 AND t.TransDate
) s
WHERE s.Amount >= 1000
SET STATISTICS TIME, IO OFF
DROP TABLE #TransDtl
DROP TABLE #Customer
Quicker?
----------------- Dwain New
Table 'Worktable'. Scan count 368, logical reads 5682428, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#TransDtl 000000000089'. Scan count 1, logical reads 4669, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Customer 000000000088'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 14539 ms, elapsed time = 26793 ms.
----------------- Chris
Table '#Customer 000000000088'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 2160, logical reads 2429734, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#TransDtl 000000000089'. Scan count 6, logical reads 9338, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 11590 ms, elapsed time = 8124 ms.
Yeah, OK I'll give you that too.
Even better it seems (easier on the CPU anyway) if you remove parallelism with OPTION (MAXDOP 1).
----------------- Dwain New
SQL Server Execution Times:
CPU time = 15038 ms, elapsed time = 25966 ms.
----------------- Chris
SQL Server Execution Times:
CPU time = 8517 ms, elapsed time = 9068 ms
I knew someone would come along that knew what they were doing and best me. π
Edit: Fixed the quoted section. Didn't mean to quote myself.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
September 20, 2012 at 4:40 am
David Moutray (9/20/2012)
Here are the IO/timing results:
----------------- Dwain
Table 'Worktable'. Scan count 368, logical reads 101966, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#TransDtl 000000000060'. Scan count 1, logical reads 95, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Customer 00000000005F'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 312 ms, elapsed time = 368 ms.
----------------- David
Table '#Customer 00000000005F'. Scan count 7, logical reads 20, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#TransDtl 000000000060'. Scan count 19, logical reads 665, physical reads 0, read-ahead reads 2, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 10, logical reads 534164, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 47565 ms, elapsed time = 41821 ms.
Looks like IOs isn't everything!
Very, very nice, Dwain! I honestly thought my solution would scale better than that. :blush: Once again, the tally table kicks ... well, you know. π
I can't wait to tear this apart. How did you generate your test data? Do you have a script you could share, or did you use a third-party tool? (If it was a third-party tool, please tell us which one.
As I said before, I think this would be an excellent problem for Phil Factor's Speed Phreaks. It is exactly the sort of Real World problem they like to blow apart. :smooooth:
The test harness has the following that expands the row set based on the 8 transactions provided:
;WITH Tally (n) AS (
SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b)
INSERT INTO #TransDtl
SELECT CustKey
,TransDate=DATEADD(day, -ABS(CHECKSUM(NEWID())) % 360, @StartDT)
,Amount=ABS(CHECKSUM(NEWID())) % 1000
FROM #TransDtl
CROSS APPLY Tally
Yes the Tally table kicked some until ChrisM@Work jumped into the fray!
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
Viewing 15 posts - 1 through 15 (of 29 total)
You must be logged in to reply to this topic. Login to reply