List Transactions that Meet Greater Than Criteria

Question

List Transactions that Meet Greater Than Criteria

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 1

dwain.c (9/20/2012)
...
I knew someone would come along that knew what they were doing and best me. 😀

Haha! I had no idea that this query would be any faster when I wrote it - I was just surprised that, as the simplest option - at least to me - nobody had posted it up!

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Dwain Camps SSC Guru Points: 86978 More actions · Answer 2

ChrisM@Work (9/20/2012)
dwain.c (9/20/2012)
...
I knew someone would come along that knew what they were doing and best me. 😀
Haha! I had no idea that this query would be any faster when I wrote it - I was just surprised that, as the simplest option - at least to me - nobody had posted it up!

Simple yet brilliant! Wish I'd thought of it. :w00t:

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86978 More actions · Answer 3

dwain.c (9/20/2012)
ChrisM@Work (9/20/2012)
dwain.c (9/20/2012)
...
I knew someone would come along that knew what they were doing and best me. 😀
Haha! I had no idea that this query would be any faster when I wrote it - I was just surprised that, as the simplest option - at least to me - nobody had posted it up!
Simple yet brilliant! Wish I'd thought of it. :w00t:

Hold on - I demand a recount!

When I run this (Chris's query):

CREATE TABLE #Customer

( CustKey INT

,SSN NCHAR(9) NOT NULL

,PRIMARY KEY CLUSTERED (CustKey, SSN)

);

CREATE TABLE #TransDtl

( TransID INT IDENTITY PRIMARY KEY

,CustKey NCHAR(9) NOT NULL

,TransDate DATETIME NOT NULL

,Amount MONEY NOT NULL

);

CREATE INDEX TransDtl1

ON #TransDtl (CustKey, TransDate)

INSERT INTO #Customer

SELECT 12345,'123456789' UNION ALL SELECT 67890,'123456789'

UNION ALL SELECT 98765,'987654321' UNION ALL SELECT 43210,'987654321'

UNION ALL SELECT 13579,'246801357' UNION ALL SELECT 24568,'246801357'

INSERT INTO #TransDtl

SELECT 12345,'01/01/12',$600 UNION ALL SELECT 12345,'01/02/12',$500

UNION ALL SELECT 67890,'01/03/12',$700 UNION ALL SELECT 98765,'04/01/12',$600

UNION ALL SELECT 43210,'04/02/12',$600 UNION ALL SELECT 43210,'04/03/12',$100

UNION ALL SELECT 13579,'04/02/12',$600 UNION ALL SELECT 24568,'04/03/12',$100

SELECT

t.TransID,

c.SSN,

t.TransDate,

t.Amount

FROM #Customer c

INNER JOIN #TransDtl t

ON t.CustKey = c.CustKey

CROSS APPLY (

SELECT Amount = SUM(ti.Amount)

FROM #TransDtl ti

WHERE t.CustKey = c.CustKey

AND ti.TransDate BETWEEN t.TransDate-1 AND t.TransDate

) s

WHERE s.Amount >= 1000

DROP TABLE #TransDtl

I get this:

TransIDSSNTransDateAmount

21234567892012-01-02 00:00:00.000500.00

31234567892012-01-03 00:00:00.000700.00

59876543212012-04-02 00:00:00.000600.00

69876543212012-04-03 00:00:00.000100.00

72468013572012-04-02 00:00:00.000600.00

82468013572012-04-03 00:00:00.000100.00

But one of those SSNs shouldn't be there!

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 4

dwain.c (9/20/2012)
....
But one of those SSNs shouldn't be there!

Gah! I hate it when that happens. Here's a new, completely different version;

;WITH

MyTrans AS (

SELECT TransID, c.CustKey, SSN, TransDate, Amount

FROM #TransDtl t

INNER JOIN #Customer c

ON c.CustKey = t.Custkey),

AggregatedData AS (

SELECT SSN, TransDate, SUMAmount = SUM(Amount)

FROM MyTrans

GROUP BY SSN, TransDate

)

SELECT

t.TransID, t.SSN, t.TransDate, t.Amount

FROM MyTrans t

WHERE EXISTS (

SELECT 1

FROM AggregatedData a

INNER JOIN AggregatedData b

ON b.SSN = a.SSN

AND (b.TransDate = a.TransDate+1 OR b.TransDate = a.TransDate-1)

AND b.SUMAmount + a.SUMAmount >= 1000

WHERE a.SSN = t.SSN AND a.TransDate = t.TransDate -- outer ref

)

ORDER BY t.SSN, t.TransDate

Also, I nticed that Dwain's code employs a couple of variables to determine the dates in the tally table. If the playing field is to be levelled, these should really be determined from the data, as follows;

;WITH Tally AS (

SELECT StartDT, EndDT = DATEADD(day, 1, StartDT)

FROM (

SELECT MAXTransDate = MAX(TransDate), DaysBack = 1 + DATEDIFF(dd,MIN(TransDate),MAX(TransDate))

FROM #TransDtl) d

CROSS APPLY (

SELECT TOP (DaysBack) StartDT = DATEADD(day, -(1+DaysBack-ROW_NUMBER() OVER (ORDER BY (SELECT NULL))), MAXTransDate)

FROM sys.all_columns

) x

),

MyTrans AS (

SELECT TransID, c.CustKey, SSN, TransDate, Amount

FROM #TransDtl t

INNER JOIN #Customer c ON c.CustKey = t.Custkey)

SELECT DISTINCT

TransID,

SSN,

TransDate,

Amount

FROM (

SELECT StartDT, EndDT, TransID, SSN, TransDate, Amount, CustKey,

Charges = SUM(Amount) OVER (PARTITION BY SSN, StartDT)

FROM Tally a

INNER JOIN MyTrans ON TransDate >= StartDt AND TransDate <= EndDT

) b

WHERE Charges >= 1000

ORDER BY SSN, TransDate, TransID

... or similar. Unfortunately, it really hammers performance.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Dwain Camps SSC Guru Points: 86978 More actions · Answer 5

Actually the OP said he wants to look back 1 year, hence:

DATEDIFF(dd,MIN(TransDate),MAX(TransDate))

So maybe your Tally table takes the performance hit for no reason when you can just use the local variable I used (@DaysBack). I also used the other local variable @StartDate so you could look back from a past point in time.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86978 More actions · Answer 6

Chris - Trying out your new approaches in the below test harness (800K rows):

CREATE TABLE #Customer

( CustKey INT

,SSN NCHAR(9) NOT NULL

,PRIMARY KEY CLUSTERED (CustKey, SSN)

);

CREATE TABLE #TransDtl

( TransID INT IDENTITY PRIMARY KEY

,CustKey NCHAR(9) NOT NULL

,TransDate DATETIME NOT NULL

,Amount MONEY NOT NULL

);

CREATE INDEX TransDtl1

ON #TransDtl (CustKey, TransDate)

INSERT INTO #Customer

SELECT 12345,'123456789' UNION ALL SELECT 67890,'123456789'

UNION ALL SELECT 98765,'987654321' UNION ALL SELECT 43210,'987654321'

UNION ALL SELECT 13579,'246801357' UNION ALL SELECT 24568,'246801357'

INSERT INTO #TransDtl

SELECT 12345,'01/01/12',$600 UNION ALL SELECT 12345,'01/02/12',$500

UNION ALL SELECT 67890,'01/03/12',$700 UNION ALL SELECT 98765,'04/01/12',$600

UNION ALL SELECT 43210,'04/02/12',$600 UNION ALL SELECT 43210,'04/03/12',$100

UNION ALL SELECT 13579,'04/02/12',$600 UNION ALL SELECT 24568,'04/03/12',$100

DECLARE @StartDT DATETIME = '2012-09-20'

,@DaysBack INT = 365

;WITH Tally (n) AS (

SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM sys.all_columns a CROSS JOIN sys.all_columns b)

INSERT INTO #TransDtl

SELECT CustKey

,TransDate=DATEADD(day, -ABS(CHECKSUM(NEWID())) % 360, @StartDT)

,Amount=ABS(CHECKSUM(NEWID())) % 1000

FROM #TransDtl

CROSS APPLY Tally

-- Holders to capture output

DECLARE @TransID INT

,@SSN NCHAR(9)

,@TransDate DATETIME

,@Amount MONEY

PRINT '----------------- Dwain New'

SET STATISTICS TIME ON

;WITH Tally (StartDT) AS (

SELECT TOP (@DaysBack) DATEADD(day, -(1+@DaysBack-ROW_NUMBER() OVER (ORDER BY (SELECT NULL))), @StartDt)

FROM sys.all_columns),

MyTrans AS (

SELECT TransID, c.CustKey, SSN, TransDate, Amount

FROM #TransDtl t

INNER JOIN #Customer c ON c.CustKey = t.Custkey)

SELECT DISTINCT @TransID=TransID, @SSN=SSN, @TransDate=TransDate, @Amount=Amount

FROM (

SELECT StartDT, EndDT=DATEADD(day, 1, StartDT), TransID, SSN, TransDate, Amount, CustKey

,Charges=SUM(Amount) OVER (PARTITION BY SSN, StartDT)

FROM Tally a

INNER JOIN MyTrans ON TransDate >= StartDt AND TransDate <= DATEADD(day, 1, StartDT)

) b

WHERE Charges >= 1000

SET STATISTICS TIME OFF

PRINT '----------------- Chris''s completely new approach'

SET STATISTICS TIME ON

;WITH

MyTrans AS (

SELECT TransID, c.CustKey, SSN, TransDate, Amount

FROM #TransDtl t

INNER JOIN #Customer c

ON c.CustKey = t.Custkey),

AggregatedData AS (

SELECT SSN, TransDate, SUMAmount = SUM(Amount)

FROM MyTrans

GROUP BY SSN, TransDate

)

SELECT

@TransID=t.TransID, @SSN=t.SSN, @TransDate=t.TransDate, @Amount=t.Amount

FROM MyTrans t

WHERE EXISTS (

SELECT 1

FROM AggregatedData a

INNER JOIN AggregatedData b

ON b.SSN = a.SSN

AND (b.TransDate = a.TransDate+1 OR b.TransDate = a.TransDate-1)

AND b.SUMAmount + a.SUMAmount >= 1000

WHERE a.SSN = t.SSN AND a.TransDate = t.TransDate -- outer ref

)

ORDER BY t.SSN, t.TransDate

SET STATISTICS TIME OFF

PRINT '----------------- Chris''s variant (approach 2)'

SET STATISTICS TIME ON

;WITH Tally AS (

SELECT StartDT, EndDT = DATEADD(day, 1, StartDT)

FROM (

SELECT MAXTransDate = MAX(TransDate), DaysBack = 1 + DATEDIFF(dd,MIN(TransDate),MAX(TransDate))

FROM #TransDtl) d

CROSS APPLY (

SELECT TOP (DaysBack) StartDT = DATEADD(day, -(1+DaysBack-ROW_NUMBER() OVER (ORDER BY (SELECT NULL))), MAXTransDate)

FROM sys.all_columns

) x

),

MyTrans AS (

SELECT TransID, c.CustKey, SSN, TransDate, Amount

FROM #TransDtl t

INNER JOIN #Customer c ON c.CustKey = t.Custkey)

SELECT DISTINCT

@TransID=TransID,

@SSN=SSN,

@TransDate=TransDate,

@Amount=Amount

FROM (

SELECT StartDT, EndDT, TransID, SSN, TransDate, Amount, CustKey,

Charges = SUM(Amount) OVER (PARTITION BY SSN, StartDT)

FROM Tally a

INNER JOIN MyTrans ON TransDate >= StartDt AND TransDate <= EndDT

) b

WHERE Charges >= 1000

ORDER BY SSN, TransDate, TransID

SET STATISTICS TIME OFF

DROP TABLE #TransDtl

DROP TABLE #Customer

And I get these results:

----------------- Dwain New

SQL Server Execution Times:

CPU time = 14446 ms, elapsed time = 33791 ms.

----------------- Chris's completely new approach

SQL Server Execution Times:

CPU time = 7052 ms, elapsed time = 9677 ms.

----------------- Chris's variant (approach 2)

SQL Server Execution Times:

CPU time = 16895 ms, elapsed time = 35730 ms.

So your completely new approach, in addition to generating the correct results (I checked this time :-)), also wins the performance race! And not by a small margin either.

Congratulations on a job well done!

The only possible issue is that your completely new approach doesn't limit the lookback to 1 year. Only my test data does.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

egerencher Old Hand Points: 343 More actions · Answer 7

I have been working about all day on this thing, and I have to say that everyone's time is so much appreciated. I am so happy to report that I got it to work. HELL YEAHHHHHHHH!

Chris, your revised query worked like a charm. Since I like to limit my scope period, I have to first create a temp table of all the transaction within a given period of time, let's say August 2012. Keep in mind that I am not a DBA, and can't create views or tables...but I can ask my DBA to create one for me, and I can kill the first step of creating a temp table.

I then run your query on the temp table, and POW, I get my results. I can't tell you how much this excites me. I think what ever code junkie DNA you guys have, I may have a bit of my own...cause I get a real kick out of this stuff.

Dwain Camps SSC Guru Points: 86978 More actions · Answer 8

You don't really need a temp table. All you need to do is modify the first CTE in Chris's approach (add WHERE):

;WITH

MyTrans AS (

SELECT TransID, c.CustKey, SSN, TransDate, Amount

FROM #TransDtl t

INNER JOIN #Customer c

ON c.CustKey = t.Custkey

WHERE TransDate BETWEEN '2011-09-22' AND '2012-09-21')

But if you're happy, so are we.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

egerencher Old Hand Points: 343 More actions · Answer 9

Let me throw this out there. If I need to get the same results, but instead of sum of transamt > 1000 in 2 days, I need to the count of transactions > 10 in 2 days, Chris, how would you alter your query?

OOOOOHHHHHHHHH. Can't wait for the sequel 😀

Dwain Camps SSC Guru Points: 86978 More actions · Answer 10

egerencher (9/21/2012)
Let me throw this out there. If I need to get the same results, but instead of sum of transamt > 1000 in 2 days, I need to the count of transactions > 10 in 2 days, Chris, how would you alter your query?
OOOOOHHHHHHHHH. Can't wait for the sequel 😀

I can answer that.

See where he calculates SUMAmount? Change that to a COUNT instead of SUM and change where it is used later to be > 10 instead of > 1000.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

egerencher Old Hand Points: 343 More actions · Answer 11

D-WAINE!!!! You the man! I brought my laptop from work just to make this work. I'll go and try it right now and see if I can manage it!!!! Update you soon!

egerencher Old Hand Points: 343 More actions · Answer 12

Once again. It worked. GENIUS. OK. Here comes a curve ball. Instead of sum of transamt > 1000 in 2 days, how about sum of transamt > 5000 in 7 days.

...and here comes the pitch...

Dwain Camps SSC Guru Points: 86978 More actions · Answer 13

egerencher (9/21/2012)
Once again. It worked. GENIUS. OK. Here comes a curve ball. Instead of sum of transamt > 1000 in 2 days, how about sum of transamt > 5000 in 7 days.
...and here comes the pitch...

That one I wouldn't want to advise you on Chris's query without testing it.

I know for my version, there's 3 places where "1" appears. You would just need to change them to 7.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

egerencher Old Hand Points: 343 More actions · Answer 14

YES YES YES YES. Dwaine. Your code worked so well. Not sure why I was having trouble with it earlier today. OH MY!!!! That is awesome. Two ways to skin a cat!

THANKS AGAIN!

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 15

dwain.c (9/21/2012)
egerencher (9/21/2012)
Once again. It worked. GENIUS. OK. Here comes a curve ball. Instead of sum of transamt > 1000 in 2 days, how about sum of transamt > 5000 in 7 days.
...and here comes the pitch...
That one I wouldn't want to advise you on Chris's query without testing it.
I know for my version, there's 3 places where "1" appears. You would just need to change them to 7.

Sorry - won't have time to figure that one out today. Dwain, cheers mate for the tweaks 😀

There's a small issue with the 800,008 row test data set; the amounts are so high that every row in the entire set is returned. I've tweaked it so that the amounts fit the filter with about a third or so coming through. This changes the performance of both queries quite a bit. Here's the tweaked sample generator:

;WITH Tally (n) AS (

SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM sys.all_columns a CROSS JOIN sys.all_columns b)

INSERT INTO #TransDtl

SELECT CustKey

--,TransDate = DATEADD(day, -ABS(CHECKSUM(NEWID())) % 360, @StartDT)

--,Amount = (ABS(CHECKSUM(NEWID())) % 1000) -- was 1000

,TransDate = DATEADD(day, -ABS(CHECKSUM(NEWID())) % 980, @StartDT)

,Amount = (ABS(CHECKSUM(NEWID())) % 3.141) -- don't use PI(), it's float and incompatible with modulo operator

FROM #TransDtl

CROSS APPLY Tally

And here's the io/timing stats output with noise removed;

Original test script;

----------------- Dwain New

Table 'syscolpars'. Scan count 1, logical reads 6

Table 'Worktable'. Scan count 368, logical reads 5681975

Table 'Worktable'. Scan count 0, logical reads 0

Table '#TransDtl_000000000102'. Scan count 1, logical reads 4669

Table '#Customer_000000000101'. Scan count 1, logical reads 2

SQL Server Execution Times:

CPU time = 17551 ms, elapsed time = 20879 ms.

----------------- Chris

Table 'Worktable'. Scan count 3, logical reads 6512

Table '#Customer_000000000101'. Scan count 3, logical reads 6

Table '#TransDtl_000000000102'. Scan count 9, logical reads 14007

SQL Server Execution Times:

CPU time = 6239 ms, elapsed time = 12882 ms.

New test script;

----------------- Dwain New

Table 'syscolpars'. Scan count 1, logical reads 6

Table 'Worktable'. Scan count 368, logical reads 3618368

Table 'Worktable'. Scan count 0, logical reads 0

Table '#TransDtl_000000000100'. Scan count 1, logical reads 4669

Table '#Customer_0000000000FF'. Scan count 1, logical reads 2

SQL Server Execution Times:

CPU time = 8034 ms, elapsed time = 8068 ms.

----------------- Chris New

Table 'Worktable'. Scan count 0, logical reads 0

Table '#Customer_0000000000FF'. Scan count 3, logical reads 6

Table '#TransDtl_000000000100'. Scan count 9, logical reads 14007

SQL Server Execution Times:

CPU time = 3574 ms, elapsed time = 2351 ms.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden