List Transactions that Meet Greater Than Criteria

  • dwain.c (9/20/2012)


    ...

    I knew someone would come along that knew what they were doing and best me. πŸ˜€

    Haha! I had no idea that this query would be any faster when I wrote it - I was just surprised that, as the simplest option - at least to me - nobody had posted it up!

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (9/20/2012)


    dwain.c (9/20/2012)


    ...

    I knew someone would come along that knew what they were doing and best me. πŸ˜€

    Haha! I had no idea that this query would be any faster when I wrote it - I was just surprised that, as the simplest option - at least to me - nobody had posted it up!

    Simple yet brilliant! Wish I'd thought of it. :w00t:


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (9/20/2012)


    ChrisM@Work (9/20/2012)


    dwain.c (9/20/2012)


    ...

    I knew someone would come along that knew what they were doing and best me. πŸ˜€

    Haha! I had no idea that this query would be any faster when I wrote it - I was just surprised that, as the simplest option - at least to me - nobody had posted it up!

    Simple yet brilliant! Wish I'd thought of it. :w00t:

    Hold on - I demand a recount!

    When I run this (Chris's query):

    CREATE TABLE #Customer

    ( CustKey INT

    ,SSN NCHAR(9) NOT NULL

    ,PRIMARY KEY CLUSTERED (CustKey, SSN)

    );

    CREATE TABLE #TransDtl

    ( TransID INT IDENTITY PRIMARY KEY

    ,CustKey NCHAR(9) NOT NULL

    ,TransDate DATETIME NOT NULL

    ,Amount MONEY NOT NULL

    );

    CREATE INDEX TransDtl1

    ON #TransDtl (CustKey, TransDate)

    INSERT INTO #Customer

    SELECT 12345,'123456789' UNION ALL SELECT 67890,'123456789'

    UNION ALL SELECT 98765,'987654321' UNION ALL SELECT 43210,'987654321'

    UNION ALL SELECT 13579,'246801357' UNION ALL SELECT 24568,'246801357'

    INSERT INTO #TransDtl

    SELECT 12345,'01/01/12',$600 UNION ALL SELECT 12345,'01/02/12',$500

    UNION ALL SELECT 67890,'01/03/12',$700 UNION ALL SELECT 98765,'04/01/12',$600

    UNION ALL SELECT 43210,'04/02/12',$600 UNION ALL SELECT 43210,'04/03/12',$100

    UNION ALL SELECT 13579,'04/02/12',$600 UNION ALL SELECT 24568,'04/03/12',$100

    SELECT

    t.TransID,

    c.SSN,

    t.TransDate,

    t.Amount

    FROM #Customer c

    INNER JOIN #TransDtl t

    ON t.CustKey = c.CustKey

    CROSS APPLY (

    SELECT Amount = SUM(ti.Amount)

    FROM #TransDtl ti

    WHERE t.CustKey = c.CustKey

    AND ti.TransDate BETWEEN t.TransDate-1 AND t.TransDate

    ) s

    WHERE s.Amount >= 1000

    DROP TABLE #TransDtl

    DROP TABLE #TransDtl

    I get this:

    TransIDSSNTransDateAmount

    21234567892012-01-02 00:00:00.000500.00

    31234567892012-01-03 00:00:00.000700.00

    59876543212012-04-02 00:00:00.000600.00

    69876543212012-04-03 00:00:00.000100.00

    72468013572012-04-02 00:00:00.000600.00

    82468013572012-04-03 00:00:00.000100.00

    But one of those SSNs shouldn't be there!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (9/20/2012)


    ....

    But one of those SSNs shouldn't be there!

    Gah! I hate it when that happens. Here's a new, completely different version;

    ;WITH

    MyTrans AS (

    SELECT TransID, c.CustKey, SSN, TransDate, Amount

    FROM #TransDtl t

    INNER JOIN #Customer c

    ON c.CustKey = t.Custkey),

    AggregatedData AS (

    SELECT SSN, TransDate, SUMAmount = SUM(Amount)

    FROM MyTrans

    GROUP BY SSN, TransDate

    )

    SELECT

    t.TransID, t.SSN, t.TransDate, t.Amount

    FROM MyTrans t

    WHERE EXISTS (

    SELECT 1

    FROM AggregatedData a

    INNER JOIN AggregatedData b

    ON b.SSN = a.SSN

    AND (b.TransDate = a.TransDate+1 OR b.TransDate = a.TransDate-1)

    AND b.SUMAmount + a.SUMAmount >= 1000

    WHERE a.SSN = t.SSN AND a.TransDate = t.TransDate -- outer ref

    )

    ORDER BY t.SSN, t.TransDate

    Also, I nticed that Dwain's code employs a couple of variables to determine the dates in the tally table. If the playing field is to be levelled, these should really be determined from the data, as follows;

    ;WITH Tally AS (

    SELECT StartDT, EndDT = DATEADD(day, 1, StartDT)

    FROM (

    SELECT MAXTransDate = MAX(TransDate), DaysBack = 1 + DATEDIFF(dd,MIN(TransDate),MAX(TransDate))

    FROM #TransDtl) d

    CROSS APPLY (

    SELECT TOP (DaysBack) StartDT = DATEADD(day, -(1+DaysBack-ROW_NUMBER() OVER (ORDER BY (SELECT NULL))), MAXTransDate)

    FROM sys.all_columns

    ) x

    ),

    MyTrans AS (

    SELECT TransID, c.CustKey, SSN, TransDate, Amount

    FROM #TransDtl t

    INNER JOIN #Customer c ON c.CustKey = t.Custkey)

    SELECT DISTINCT

    TransID,

    SSN,

    TransDate,

    Amount

    FROM (

    SELECT StartDT, EndDT, TransID, SSN, TransDate, Amount, CustKey,

    Charges = SUM(Amount) OVER (PARTITION BY SSN, StartDT)

    FROM Tally a

    INNER JOIN MyTrans ON TransDate >= StartDt AND TransDate <= EndDT

    ) b

    WHERE Charges >= 1000

    ORDER BY SSN, TransDate, TransID

    ... or similar. Unfortunately, it really hammers performance.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Actually the OP said he wants to look back 1 year, hence:

    DATEDIFF(dd,MIN(TransDate),MAX(TransDate))

    So maybe your Tally table takes the performance hit for no reason when you can just use the local variable I used (@DaysBack). I also used the other local variable @StartDate so you could look back from a past point in time.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Chris - Trying out your new approaches in the below test harness (800K rows):

    CREATE TABLE #Customer

    ( CustKey INT

    ,SSN NCHAR(9) NOT NULL

    ,PRIMARY KEY CLUSTERED (CustKey, SSN)

    );

    CREATE TABLE #TransDtl

    ( TransID INT IDENTITY PRIMARY KEY

    ,CustKey NCHAR(9) NOT NULL

    ,TransDate DATETIME NOT NULL

    ,Amount MONEY NOT NULL

    );

    CREATE INDEX TransDtl1

    ON #TransDtl (CustKey, TransDate)

    INSERT INTO #Customer

    SELECT 12345,'123456789' UNION ALL SELECT 67890,'123456789'

    UNION ALL SELECT 98765,'987654321' UNION ALL SELECT 43210,'987654321'

    UNION ALL SELECT 13579,'246801357' UNION ALL SELECT 24568,'246801357'

    INSERT INTO #TransDtl

    SELECT 12345,'01/01/12',$600 UNION ALL SELECT 12345,'01/02/12',$500

    UNION ALL SELECT 67890,'01/03/12',$700 UNION ALL SELECT 98765,'04/01/12',$600

    UNION ALL SELECT 43210,'04/02/12',$600 UNION ALL SELECT 43210,'04/03/12',$100

    UNION ALL SELECT 13579,'04/02/12',$600 UNION ALL SELECT 24568,'04/03/12',$100

    DECLARE @StartDT DATETIME = '2012-09-20'

    ,@DaysBack INT = 365

    ;WITH Tally (n) AS (

    SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b)

    INSERT INTO #TransDtl

    SELECT CustKey

    ,TransDate=DATEADD(day, -ABS(CHECKSUM(NEWID())) % 360, @StartDT)

    ,Amount=ABS(CHECKSUM(NEWID())) % 1000

    FROM #TransDtl

    CROSS APPLY Tally

    -- Holders to capture output

    DECLARE @TransID INT

    ,@SSN NCHAR(9)

    ,@TransDate DATETIME

    ,@Amount MONEY

    PRINT '----------------- Dwain New'

    SET STATISTICS TIME ON

    ;WITH Tally (StartDT) AS (

    SELECT TOP (@DaysBack) DATEADD(day, -(1+@DaysBack-ROW_NUMBER() OVER (ORDER BY (SELECT NULL))), @StartDt)

    FROM sys.all_columns),

    MyTrans AS (

    SELECT TransID, c.CustKey, SSN, TransDate, Amount

    FROM #TransDtl t

    INNER JOIN #Customer c ON c.CustKey = t.Custkey)

    SELECT DISTINCT @TransID=TransID, @SSN=SSN, @TransDate=TransDate, @Amount=Amount

    FROM (

    SELECT StartDT, EndDT=DATEADD(day, 1, StartDT), TransID, SSN, TransDate, Amount, CustKey

    ,Charges=SUM(Amount) OVER (PARTITION BY SSN, StartDT)

    FROM Tally a

    INNER JOIN MyTrans ON TransDate >= StartDt AND TransDate <= DATEADD(day, 1, StartDT)

    ) b

    WHERE Charges >= 1000

    SET STATISTICS TIME OFF

    PRINT '----------------- Chris''s completely new approach'

    SET STATISTICS TIME ON

    ;WITH

    MyTrans AS (

    SELECT TransID, c.CustKey, SSN, TransDate, Amount

    FROM #TransDtl t

    INNER JOIN #Customer c

    ON c.CustKey = t.Custkey),

    AggregatedData AS (

    SELECT SSN, TransDate, SUMAmount = SUM(Amount)

    FROM MyTrans

    GROUP BY SSN, TransDate

    )

    SELECT

    @TransID=t.TransID, @SSN=t.SSN, @TransDate=t.TransDate, @Amount=t.Amount

    FROM MyTrans t

    WHERE EXISTS (

    SELECT 1

    FROM AggregatedData a

    INNER JOIN AggregatedData b

    ON b.SSN = a.SSN

    AND (b.TransDate = a.TransDate+1 OR b.TransDate = a.TransDate-1)

    AND b.SUMAmount + a.SUMAmount >= 1000

    WHERE a.SSN = t.SSN AND a.TransDate = t.TransDate -- outer ref

    )

    ORDER BY t.SSN, t.TransDate

    SET STATISTICS TIME OFF

    PRINT '----------------- Chris''s variant (approach 2)'

    SET STATISTICS TIME ON

    ;WITH Tally AS (

    SELECT StartDT, EndDT = DATEADD(day, 1, StartDT)

    FROM (

    SELECT MAXTransDate = MAX(TransDate), DaysBack = 1 + DATEDIFF(dd,MIN(TransDate),MAX(TransDate))

    FROM #TransDtl) d

    CROSS APPLY (

    SELECT TOP (DaysBack) StartDT = DATEADD(day, -(1+DaysBack-ROW_NUMBER() OVER (ORDER BY (SELECT NULL))), MAXTransDate)

    FROM sys.all_columns

    ) x

    ),

    MyTrans AS (

    SELECT TransID, c.CustKey, SSN, TransDate, Amount

    FROM #TransDtl t

    INNER JOIN #Customer c ON c.CustKey = t.Custkey)

    SELECT DISTINCT

    @TransID=TransID,

    @SSN=SSN,

    @TransDate=TransDate,

    @Amount=Amount

    FROM (

    SELECT StartDT, EndDT, TransID, SSN, TransDate, Amount, CustKey,

    Charges = SUM(Amount) OVER (PARTITION BY SSN, StartDT)

    FROM Tally a

    INNER JOIN MyTrans ON TransDate >= StartDt AND TransDate <= EndDT

    ) b

    WHERE Charges >= 1000

    ORDER BY SSN, TransDate, TransID

    SET STATISTICS TIME OFF

    DROP TABLE #TransDtl

    DROP TABLE #Customer

    And I get these results:

    ----------------- Dwain New

    SQL Server Execution Times:

    CPU time = 14446 ms, elapsed time = 33791 ms.

    ----------------- Chris's completely new approach

    SQL Server Execution Times:

    CPU time = 7052 ms, elapsed time = 9677 ms.

    ----------------- Chris's variant (approach 2)

    SQL Server Execution Times:

    CPU time = 16895 ms, elapsed time = 35730 ms.

    So your completely new approach, in addition to generating the correct results (I checked this time :-)), also wins the performance race! And not by a small margin either.

    Congratulations on a job well done!

    The only possible issue is that your completely new approach doesn't limit the lookback to 1 year. Only my test data does.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I have been working about all day on this thing, and I have to say that everyone's time is so much appreciated. I am so happy to report that I got it to work. HELL YEAHHHHHHHH!

    Chris, your revised query worked like a charm. Since I like to limit my scope period, I have to first create a temp table of all the transaction within a given period of time, let's say August 2012. Keep in mind that I am not a DBA, and can't create views or tables...but I can ask my DBA to create one for me, and I can kill the first step of creating a temp table.

    I then run your query on the temp table, and POW, I get my results. I can't tell you how much this excites me. I think what ever code junkie DNA you guys have, I may have a bit of my own...cause I get a real kick out of this stuff.

  • You don't really need a temp table. All you need to do is modify the first CTE in Chris's approach (add WHERE):

    ;WITH

    MyTrans AS (

    SELECT TransID, c.CustKey, SSN, TransDate, Amount

    FROM #TransDtl t

    INNER JOIN #Customer c

    ON c.CustKey = t.Custkey

    WHERE TransDate BETWEEN '2011-09-22' AND '2012-09-21')

    But if you're happy, so are we.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Let me throw this out there. If I need to get the same results, but instead of sum of transamt > 1000 in 2 days, I need to the count of transactions > 10 in 2 days, Chris, how would you alter your query?

    OOOOOHHHHHHHHH. Can't wait for the sequel πŸ˜€

  • egerencher (9/21/2012)


    Let me throw this out there. If I need to get the same results, but instead of sum of transamt > 1000 in 2 days, I need to the count of transactions > 10 in 2 days, Chris, how would you alter your query?

    OOOOOHHHHHHHHH. Can't wait for the sequel πŸ˜€

    I can answer that.

    See where he calculates SUMAmount? Change that to a COUNT instead of SUM and change where it is used later to be > 10 instead of > 1000.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • D-WAINE!!!! You the man! I brought my laptop from work just to make this work. I'll go and try it right now and see if I can manage it!!!! Update you soon!

  • Once again. It worked. GENIUS. OK. Here comes a curve ball. Instead of sum of transamt > 1000 in 2 days, how about sum of transamt > 5000 in 7 days.

    ...and here comes the pitch...

  • egerencher (9/21/2012)


    Once again. It worked. GENIUS. OK. Here comes a curve ball. Instead of sum of transamt > 1000 in 2 days, how about sum of transamt > 5000 in 7 days.

    ...and here comes the pitch...

    That one I wouldn't want to advise you on Chris's query without testing it.

    I know for my version, there's 3 places where "1" appears. You would just need to change them to 7.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • YES YES YES YES. Dwaine. Your code worked so well. Not sure why I was having trouble with it earlier today. OH MY!!!! That is awesome. Two ways to skin a cat!

    THANKS AGAIN!

  • dwain.c (9/21/2012)


    egerencher (9/21/2012)


    Once again. It worked. GENIUS. OK. Here comes a curve ball. Instead of sum of transamt > 1000 in 2 days, how about sum of transamt > 5000 in 7 days.

    ...and here comes the pitch...

    That one I wouldn't want to advise you on Chris's query without testing it.

    I know for my version, there's 3 places where "1" appears. You would just need to change them to 7.

    Sorry - won't have time to figure that one out today. Dwain, cheers mate for the tweaks πŸ˜€

    There's a small issue with the 800,008 row test data set; the amounts are so high that every row in the entire set is returned. I've tweaked it so that the amounts fit the filter with about a third or so coming through. This changes the performance of both queries quite a bit. Here's the tweaked sample generator:

    ;WITH Tally (n) AS (

    SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b)

    INSERT INTO #TransDtl

    SELECT CustKey

    --,TransDate = DATEADD(day, -ABS(CHECKSUM(NEWID())) % 360, @StartDT)

    --,Amount = (ABS(CHECKSUM(NEWID())) % 1000) -- was 1000

    ,TransDate = DATEADD(day, -ABS(CHECKSUM(NEWID())) % 980, @StartDT)

    ,Amount = (ABS(CHECKSUM(NEWID())) % 3.141) -- don't use PI(), it's float and incompatible with modulo operator

    FROM #TransDtl

    CROSS APPLY Tally

    And here's the io/timing stats output with noise removed;

    Original test script;

    ----------------- Dwain New

    Table 'syscolpars'. Scan count 1, logical reads 6

    Table 'Worktable'. Scan count 368, logical reads 5681975

    Table 'Worktable'. Scan count 0, logical reads 0

    Table '#TransDtl_000000000102'. Scan count 1, logical reads 4669

    Table '#Customer_000000000101'. Scan count 1, logical reads 2

    SQL Server Execution Times:

    CPU time = 17551 ms, elapsed time = 20879 ms.

    ----------------- Chris

    Table 'Worktable'. Scan count 3, logical reads 6512

    Table '#Customer_000000000101'. Scan count 3, logical reads 6

    Table '#TransDtl_000000000102'. Scan count 9, logical reads 14007

    SQL Server Execution Times:

    CPU time = 6239 ms, elapsed time = 12882 ms.

    New test script;

    ----------------- Dwain New

    Table 'syscolpars'. Scan count 1, logical reads 6

    Table 'Worktable'. Scan count 368, logical reads 3618368

    Table 'Worktable'. Scan count 0, logical reads 0

    Table '#TransDtl_000000000100'. Scan count 1, logical reads 4669

    Table '#Customer_0000000000FF'. Scan count 1, logical reads 2

    SQL Server Execution Times:

    CPU time = 8034 ms, elapsed time = 8068 ms.

    ----------------- Chris New

    Table 'Worktable'. Scan count 0, logical reads 0

    Table '#Customer_0000000000FF'. Scan count 3, logical reads 6

    Table '#TransDtl_000000000100'. Scan count 9, logical reads 14007

    SQL Server Execution Times:

    CPU time = 3574 ms, elapsed time = 2351 ms.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply