List Transactions that Meet Greater Than Criteria

  • I work for a credit card company. Our database has a customer table and a transaction table. Fields in the customer table are SSN and CustomerKey. Fields in the transaction table are CustomerKey, transaction date (Transdate), and transaction amount (TransAmt).

    I need a query that can identify each ssn where the sum of any of their transaction amounts > 1000 within a two consecutive day period in 2012. If a ssn has transaction amounts > 1000 within a two day period, I need the query to return all the transactions for that ssn.

    Here is an example of the raw data in the Transaction Table:

    Trans#-----CustKey-----Date--------Amount

    1-----------12345----01/01/12--------$600

    2-----------12345----01/02/12--------$500

    3-----------67890----01/03/12--------$10

    4-----------98765----04/01/12--------$600

    5-----------43210----04/02/12--------$600

    6-----------43210----04/03/12--------$100

    7-----------13579----04/02/12--------$600

    8-----------24568----04/03/12--------$100

    Here is an example of the raw data in the Customer Table:

    CustKey-----SSN

    12345------123456789

    67890------123456789

    98765------987654321

    43210------987654321

    13579------246801357

    24568------246801357

    Here are the results I need:

    Trans#------SSN---------Date---------Amount

    1--------123456789----01/01/12---------$600

    2--------123456789----01/02/12---------$500

    3--------123456789----01/03/12----------$10

    4--------987654321----04/01/12---------$600

    5--------987654321----04/02/12---------$600

    6--------987654321----04/03/12---------$100

    As you can see in my results included all transactions for SSN 123456789 and 987654321, and excluded SSN 246801357.

    Here is what I tried.

    select c.ssn, t.*

    from TransDtl t

    join Customer c on t.CustomerKey = c.CustomerKey

    join(select c.ssn, SUM(t.TransAmt) as Total

    from TransDtl t

    join Customer c on t.CustomerKey = c.CustomerKey

    where t.postdate between '1/4/2012' and '1/5/2012'

    group by c.ssn

    having SUM(t.transamt) > 1000) x

    on x.ssn = c.SSN

    where t.postdate between '1/4/2012' and '1/5/2012'

    The problem I am having is that I can only run my above query for 2 consecutive days out of the year. I want to run the query for the whole year, and identify any transaction in that year where the sum of the transactions exceeded 1000 in any given 2 consecutive day period; and then list all the transactions for the ssn.

    Thank you!

  • It might be useful for you to read:

    Title: Finding Gaps in a Sequential Number Sequence

    http://www.sqlservercentral.com/Forums/Topic267061-176-2.aspx#bm1135582

    And pay particular attention to the comments on the article by Jeff Moden, and all the additional contributors in the discussion of the article

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • egerencher (9/19/2012)


    Here are the results I need:

    Trans#------SSN---------Date---------Amount

    1--------123456789----01/01/12---------$600

    2--------123456789----01/02/12---------$500

    3--------123456789----01/03/12----------$10

    4--------987654321----04/01/12---------$600

    5--------987654321----04/02/12---------$600

    6--------987654321----04/03/12---------$100

    Firstly, I'd like to question why trans# 3 and 6 appear in your expected results. The total charges for period 02-03 Jan is $510 for 123456789 and for 02-03 Apr is $700 for 987654321.

    Secondly, the next time you post, you should help us to help you by providing some DDL and readily consumable test data like this (check the link in bitbucket's signature line for more info):

    DECLARE @Trans TABLE

    (Trans# INT IDENTITY, CustKey INT, Date DATETIME, Amount MONEY)

    INSERT INTO @Trans

    SELECT 12345,'01/01/12',$600 UNION ALL SELECT 12345,'01/02/12',$500

    UNION ALL SELECT 67890,'01/03/12',$10 UNION ALL SELECT 98765,'04/01/12',$600

    UNION ALL SELECT 43210,'04/02/12',$600 UNION ALL SELECT 43210,'04/03/12',$100

    UNION ALL SELECT 13579,'04/02/12',$600 UNION ALL SELECT 24568,'04/03/12',$100

    DECLARE @Cust TABLE

    (CustKey INT, SSN VARCHAR(9))

    INSERT INTO @Cust

    SELECT 12345,'123456789' UNION ALL SELECT 67890,'123456789'

    UNION ALL SELECT 98765,'987654321' UNION ALL SELECT 43210,'987654321'

    UNION ALL SELECT 13579,'246801357' UNION ALL SELECT 24568,'246801357'

    If I'm right about the expected results and ignoring any performance considerations of this approach, you may be able to get the transaction list as follows:

    DECLARE @StartDT DATETIME = '2012-09-20'

    ,@DaysBack INT = 365

    ;WITH Tally (n) AS (

    SELECT TOP (@DaysBack) 1+@DaysBack-ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns),

    MyTrans AS (

    SELECT Trans#, c.CustKey, SSN, Date, Amount

    FROM @Trans t

    INNER JOIN @Cust c ON c.CustKey = t.Custkey)

    SELECT DISTINCT Trans#, SSN, Date, Amount

    FROM (

    SELECT StartDT, EndDT=DATEADD(day, 1, StartDT), Trans#, SSN, Date, Amount, CustKey

    ,Charges=SUM(Amount) OVER (PARTITION BY SSN, StartDT)

    FROM Tally a

    CROSS APPLY (SELECT StartDT=DATEADD(day, -n, @StartDt)) b

    INNER JOIN MyTrans ON Date >= StartDt AND Date <= DATEADD(day, 1, StartDT)

    ) b

    WHERE Charges >= 1000

    I have chosen to use a Tally table but a Calendar table would work just as well.

    Edit: Oh yes. And let's hope for the sake of privacy and legality, those SSNs are encrypted!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Very interesting problem! Thank you for posting it. There are some subtle "real-world" quirks in here that make this potentially quite complex.

    Dwain is correct, posting some SQL Code to create the tables and populate them with test data is very helpful. I've done that here for anyone else who wants to try the problem. I changed the names of two fields for my own sanity: Trans# became TransID (I don't like special characters in database object names) and Date became TransDate because Date is a keyword in SQL Server. Although, as I think about it, PostDate might be a better name, since that seems to better reflect what the data represents: the date that the credit card transaction posted.

    Oh, I also put a creation script for a tally table in here. I thought I would need one, but I ended up not using it. I left it for anyone else who might want to use one. I developed this solution on SQL Server 2008 R2, but I believe it would run on version 2005 or later.

    CREATE TABLE Customer

    ( CustKey INT PRIMARY KEY

    ,SSN NCHAR(9) NOT NULL

    );

    INSERT Customer (CustKey, SSN)

    SELECT 12345, '123456789'

    UNION ALL

    SELECT 67890, '123456789'

    UNION ALL

    SELECT 98765, '987654321'

    UNION ALL

    SELECT 43210, '987654321'

    UNION ALL

    SELECT 13579, '246801357'

    UNION ALL

    SELECT 24568, '246801357';

    CREATE TABLE TransDtl

    ( TransID INT PRIMARY KEY

    ,CustKey NCHAR(9) NOT NULL

    ,TransDate DATETIME NOT NULL

    ,Amount MONEY NOT NULL

    );

    INSERT TransDtl (TransID,CustKey,TransDate,Amount)

    SELECT 1, 12345, '01/01/12', 600.00

    UNION ALL

    SELECT 2, 12345, '01/02/12', 500.00

    UNION ALL

    SELECT 3, 67890, '01/03/12', 10.00

    UNION ALL

    SELECT 4, 98765, '04/01/12', 600.00

    UNION ALL

    SELECT 5, 43210, '04/02/12', 600.00

    UNION ALL

    SELECT 6, 43210, '04/03/12', 100.00

    UNION ALL

    SELECT 7, 13579, '04/02/12', 600.00

    UNION ALL

    SELECT 8, 24568, '04/03/12', 100.00;

    SELECT TOP 1000001

    IDENTITY(INT,0,1) AS [N]

    INTO dbo.Tally

    FROM master.sys.all_columns AS AC1

    CROSS JOIN master.sys.all_columns AS AC2;

    /* Add a tightly packed clustered Primary Key to maximize

    * performance and minimize space used.

    */

    ALTER TABLE dbo.Tally

    ADD CONSTRAINT PK_Tally_N

    PRIMARY KEY CLUSTERED (N)

    WITH FILLFACTOR = 100;

    I'd like to make one quick comment about the data before I dive into my solution. I noticed in the Customer table that some of the SSN's are associated with more than one CustKey. Since you said that you work for a credit card company, I assume that the SSN field represents individual customers and the CustKey field represents individual credit cards. In other words, a customer may have more than one card, but you want to create an alert when a customer charges over a certain amount of money within a certain amount of time, whether they put all of the charges on one card or spread them across multiple cards. (That's one of the "real-world quirks" to which I referred above.)

    With that observation out of the way, here is my solution. I'll present the whole thing here and then walk through it step by step.

    SET STATISTICS IO ON;

    WITH CustomerTrans

    AS

    ( SELECT Customer.SSN AS SSN

    ,TransDtl.TransID AS TransID

    ,TransDtl.TransDate AS TransDate

    ,TransDtl.Amount AS Amount

    FROM Customer

    JOIN TransDtl

    ON Customer.CustKey = TransDtl.CustKey

    ),

    TransGroup

    AS

    ( SELECT A.SSN AS SSN

    ,ROW_NUMBER() OVER (PARTITION BY A.SSN ORDER BY A.SSN) AS GroupID

    ,MIN(A.TransDate) AS StartDate

    ,MAX(B.TransDate) AS EndDate

    FROM CustomerTrans AS A

    JOIN CustomerTrans AS B

    ON A.SSN = B.SSN

    AND DATEDIFF(Day,A.TransDate,B.TransDate) BETWEEN 0 AND 1

    GROUP BY A.SSN

    ),

    GroupTotal

    AS

    ( SELECT TransGroup.SSN AS SSN

    ,TransGroup.GroupID AS TransGroupID

    ,SUM(CustomerTrans.Amount) AS TransGroupAmount

    FROM TransGroup

    JOIN CustomerTrans

    ON TransGroup.SSN = CustomerTrans.SSN

    AND CustomerTrans.TransDate BETWEEN TransGroup.StartDate AND TransGroup.EndDate

    GROUP BY TransGroup.SSN

    ,TransGroup.GroupID

    HAVING SUM(CustomerTrans.Amount) >= 1000.00

    ),

    FlaggedCustomer

    AS

    ( SELECT SSN

    FROM GroupTotal

    GROUP BY SSN

    )

    SELECT CustomerTrans.TransID

    ,FlaggedCustomer.SSN

    ,CustomerTrans.TransDate

    ,CustomerTrans.Amount

    FROM FlaggedCustomer

    JOIN CustomerTrans

    ON FlaggedCustomer.SSN = CustomerTrans.SSN;

    First, you'll note that I've set Statistics IO on. That's because I want to see how many times my solution scans the data tables. That will give us some clue as to how well the solution might scale. We don't have much data to work with here, but I imagine that you have a massive amount of data to query at work. My solution scans the data a total of 5 times. Ideally, we'd like for the solution to scan the data only once. There are other factors to consider, of course, but this is just a quick and dirty rule of thumb. Based on this, I would expect my solution to perform fairly well on large data sets. (I also tried Dwain's solution with statistics IO on, and his solution does a total of 6 scans. So, the performance of the two solutions would probably be very similar.)

    My solution is all one query, but I've broken the "steps" I use into Common Table Expressions (or CTE's). Each CTE builds on the previous CTE's. So, below I'll re-post each CTE separately and explain what it does and how this builds into a solution to your problem.

    WITH CustomerTrans

    AS

    ( SELECT Customer.SSN AS SSN

    ,TransDtl.TransID AS TransID

    ,TransDtl.TransDate AS TransDate

    ,TransDtl.Amount AS Amount

    FROM Customer

    JOIN TransDtl

    ON Customer.CustKey = TransDtl.CustKey

    ),

    The first CTE associates all of the individual transactions with the Customer SSN's, rather than the CustKey in the TransDtl table. We query this CTE throughout the rest of the solution rather than the source TransDtl table, because we need to find Customers that posted more than $1,000 in charges over a two day period, not individual Credit Cards.

    TransGroup

    AS

    ( SELECT A.SSN AS SSN

    ,ROW_NUMBER() OVER (PARTITION BY A.SSN ORDER BY A.SSN) AS GroupID

    ,MIN(A.TransDate) AS StartDate

    ,MAX(B.TransDate) AS EndDate

    FROM CustomerTrans AS A

    JOIN CustomerTrans AS B

    ON A.SSN = B.SSN

    AND DATEDIFF(Day,A.TransDate,B.TransDate) BETWEEN 0 AND 1

    GROUP BY A.SSN

    ),

    This second CTE is the core part of my solution. This is loosely based on one of Itzik Ben-Gan's solutions to the "Islands" problem (finding groups of sequential sequences). I got it from the first volume SQL Server MVP Deep Dives book, but I imagine you can find it online somewhere. The SQL Server Central article that BitBucket references above probably mentions it, too.

    However, another one of those "real-world quirks" in this problem is that the intervals are not fixed. We want to find the customers with transactions totaling $1,000 or more in any two day period. So, here I join the CustomerTrans CTE I just created to itself on the Customer SSN where the the post date of the transaction in the first data set is within one day of the transaction in the second data set. Because of the way the DATEDIFF function is written here, the A transactions will only join with other transactions posted on the same day or the next day, and the B transactions will only join with other transactions posted on the same day or the previous day.

    Because we don't filter out results where A.TransID = B.TransID this self-join will generate duplicate rows, but that is OK. We just want the starting date for this group of transactions, the ending date and some kind of Group Identifier that we can use later. (Besides, if a customer charges over $1,000 in a single transaction, that meets the selection criteria set out in the problem description.)

    That Group Identifier is generated by the ROW_NUMBER() function and it is the key to this portion of the solution. (It is also what I borrowed from Itzik Ben-Gan's solution to the "Islands" problem.)

    GroupTotal

    AS

    ( SELECT TransGroup.SSN AS SSN

    ,TransGroup.GroupID AS TransGroupID

    ,SUM(CustomerTrans.Amount) AS TransGroupAmount

    FROM TransGroup

    JOIN CustomerTrans

    ON TransGroup.SSN = CustomerTrans.SSN

    AND CustomerTrans.TransDate BETWEEN TransGroup.StartDate AND TransGroup.EndDate

    GROUP BY TransGroup.SSN

    ,TransGroup.GroupID

    HAVING SUM(CustomerTrans.Amount) >= 1000.00

    ),

    The third CTE joins back to our first CTE, CustomerTrans, groups the transactions by Customer SSN and the Group ID we generated in the second CTE and calculates the total transaction amount for each group. Finally it filters out the groups having a total transaction amount less than $1,000.

    FlaggedCustomer

    AS

    ( SELECT SSN

    FROM GroupTotal

    GROUP BY SSN

    )

    Our final CTE before we present our final output gets a distinct list of the customers meeting the criteria set forth in the problem.

    SELECT CustomerTrans.TransID

    ,FlaggedCustomer.SSN

    ,CustomerTrans.TransDate

    ,CustomerTrans.Amount

    FROM FlaggedCustomer

    JOIN CustomerTrans

    ON FlaggedCustomer.SSN = CustomerTrans.SSN;

    And here is our final output. We use the list of Customers with more than $1,000 in charges within any two-day period from the last CTE, join it back to the original list of Customer transactions and list all of the transactions for Customers meeting the criteria.

    I'd be very interested to see what solutions others can create. I'd also like to see how the various solutions perform on large data sets. I would expect my solution to have reasonable performance on larger data sets, but it could certainly be beaten. Perhaps Phil Factor would be interested in presenting a similar problem on his Speed Phreaks site. πŸ™‚

    Thanks again for posting your problem. I hope that this has been helpful to you. 😎

  • David - Really nice explanation of your approach. I have to say mine falls short in that regard.

    I think it's important to look at the whole picture though. That includes both IOs and timings over a large test data set. So let's put together a small test harness (about 16,000 transaction rows) and give it a whirl:

    CREATE TABLE #Customer

    ( CustKey INT PRIMARY KEY

    ,SSN NCHAR(9) NOT NULL

    );

    CREATE TABLE #TransDtl

    ( TransID INT IDENTITY PRIMARY KEY

    ,CustKey NCHAR(9) NOT NULL

    ,TransDate DATETIME NOT NULL

    ,Amount MONEY NOT NULL

    );

    INSERT INTO #Customer

    SELECT 12345,'123456789' UNION ALL SELECT 67890,'123456789'

    UNION ALL SELECT 98765,'987654321' UNION ALL SELECT 43210,'987654321'

    UNION ALL SELECT 13579,'246801357' UNION ALL SELECT 24568,'246801357'

    INSERT INTO #TransDtl

    SELECT 12345,'01/01/12',$600 UNION ALL SELECT 12345,'01/02/12',$500

    UNION ALL SELECT 67890,'01/03/12',$700 UNION ALL SELECT 98765,'04/01/12',$600

    UNION ALL SELECT 43210,'04/02/12',$600 UNION ALL SELECT 43210,'04/03/12',$100

    UNION ALL SELECT 13579,'04/02/12',$600 UNION ALL SELECT 24568,'04/03/12',$100

    DECLARE @StartDT DATETIME = '2012-09-20'

    ,@DaysBack INT = 365

    ;WITH Tally (n) AS (

    SELECT TOP 2000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b)

    INSERT INTO #TransDtl

    SELECT CustKey

    ,TransDate=DATEADD(day, -ABS(CHECKSUM(NEWID())) % 360, @StartDT)

    ,Amount=ABS(CHECKSUM(NEWID())) % 1000

    FROM #TransDtl

    CROSS APPLY Tally

    -- Holders to capture output

    DECLARE @TransID INT

    ,@SSN NCHAR(9)

    ,@TransDate DATETIME

    ,@Amount MONEY

    PRINT '----------------- Dwain'

    SET STATISTICS TIME, IO ON

    ;WITH Tally (n) AS (

    SELECT TOP (@DaysBack) 1+@DaysBack-ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns),

    MyTrans AS (

    SELECT TransID, c.CustKey, SSN, TransDate, Amount

    FROM #TransDtl t

    INNER JOIN #Customer c ON c.CustKey = t.Custkey)

    SELECT DISTINCT @TransID=TransID, @SSN=SSN, @TransDate=TransDate, @Amount=Amount

    FROM (

    SELECT StartDT, EndDT=DATEADD(day, 1, StartDT), TransID, SSN, TransDate, Amount, CustKey

    ,Charges=SUM(Amount) OVER (PARTITION BY SSN, StartDT)

    FROM Tally a

    CROSS APPLY (SELECT StartDT=DATEADD(day, -n, @StartDt)) b

    INNER JOIN MyTrans ON TransDate >= StartDt AND TransDate <= DATEADD(day, 1, StartDT)

    ) b

    WHERE Charges >= 1000

    SET STATISTICS TIME, IO OFF

    PRINT '----------------- David'

    SET STATISTICS TIME, IO ON

    ;WITH CustomerTrans

    AS

    ( SELECT #Customer.SSN AS SSN

    ,#TransDtl.TransID AS TransID

    ,#TransDtl.TransDate AS TransDate

    ,#TransDtl.Amount AS Amount

    FROM #Customer

    JOIN #TransDtl

    ON #Customer.CustKey = #TransDtl.CustKey

    ),

    TransGroup

    AS

    ( SELECT A.SSN AS SSN

    ,ROW_NUMBER() OVER (PARTITION BY A.SSN ORDER BY A.SSN) AS GroupID

    ,MIN(A.TransDate) AS StartDate

    ,MAX(B.TransDate) AS EndDate

    FROM CustomerTrans AS A

    JOIN CustomerTrans AS B

    ON A.SSN = B.SSN

    AND DATEDIFF(Day,A.TransDate,B.TransDate) BETWEEN 0 AND 1

    GROUP BY A.SSN

    ),

    GroupTotal

    AS

    ( SELECT TransGroup.SSN AS SSN

    ,TransGroup.GroupID AS TransGroupID

    ,SUM(CustomerTrans.Amount) AS TransGroupAmount

    FROM TransGroup

    JOIN CustomerTrans

    ON TransGroup.SSN = CustomerTrans.SSN

    AND CustomerTrans.TransDate BETWEEN TransGroup.StartDate AND TransGroup.EndDate

    GROUP BY TransGroup.SSN

    ,TransGroup.GroupID

    HAVING SUM(CustomerTrans.Amount) >= 1000.00

    ),

    FlaggedCustomer

    AS

    ( SELECT SSN

    FROM GroupTotal

    GROUP BY SSN

    )

    SELECT @TransID=CustomerTrans.TransID

    ,@SSN=FlaggedCustomer.SSN

    ,@TransDate=CustomerTrans.TransDate

    ,@Amount=CustomerTrans.Amount

    FROM FlaggedCustomer

    JOIN CustomerTrans

    ON FlaggedCustomer.SSN = CustomerTrans.SSN;

    SET STATISTICS TIME, IO OFF

    DROP TABLE #TransDtl

    DROP TABLE #Customer

    Here are the IO/timing results:

    ----------------- Dwain

    Table 'Worktable'. Scan count 368, logical reads 101966, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TransDtl 000000000060'. Scan count 1, logical reads 95, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'syscolpars'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Customer 00000000005F'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 312 ms, elapsed time = 368 ms.

    ----------------- David

    Table '#Customer 00000000005F'. Scan count 7, logical reads 20, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TransDtl 000000000060'. Scan count 19, logical reads 665, physical reads 0, read-ahead reads 2, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 10, logical reads 534164, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 47565 ms, elapsed time = 41821 ms.

    Looks like IOs isn't everything!

    Note that I am not saying mine will rip apart the multi-million transaction rows you'd expect in a year's worth of credit card transactions for thousands of customers though. As I said, I thought there might be better performing options available out there.

    We still need to hear back from the OP on my comments regarding expected results.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • FYI. These were my timing stats on 960,000 transactions:

    SQL Server Execution Times:

    CPU time = 17815 ms, elapsed time = 99414 ms.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • You guys are awesome. I can't wait to get back to work to try this out. I will report back the results tomorrow. I'm a SQL first grader, so bear with me. And yes, the SSN's are encrypted πŸ™‚

  • egerencher (9/19/2012)


    You guys are awesome. I can't wait to get back to work to try this out. I will report back the results tomorrow. I'm a SQL first grader, so bear with me. And yes, the SSN's are encrypted πŸ™‚

    You're welcome.

    The below version (usable in the test harness) appears to be slightly swifter. I made the Tally table do the work of calculating the StartDT, which allowed me to remove the CROSS APPLY.

    ;WITH Tally (StartDT) AS (

    SELECT TOP (@DaysBack) DATEADD(day, -(1+@DaysBack-ROW_NUMBER() OVER (ORDER BY (SELECT NULL))), @StartDt)

    FROM sys.all_columns),

    MyTrans AS (

    SELECT TransID, c.CustKey, SSN, TransDate, Amount

    FROM #TransDtl t

    INNER JOIN #Customer c ON c.CustKey = t.Custkey)

    SELECT DISTINCT @TransID=TransID, @SSN=SSN, @TransDate=TransDate, @Amount=Amount

    FROM (

    SELECT StartDT, EndDT=DATEADD(day, 1, StartDT), TransID, SSN, TransDate, Amount, CustKey

    ,Charges=SUM(Amount) OVER (PARTITION BY SSN, StartDT)

    FROM Tally a

    --CROSS APPLY (SELECT StartDT=DATEADD(day, -n, @StartDt)) b

    INNER JOIN MyTrans ON TransDate >= StartDt AND TransDate <= DATEADD(day, 1, StartDT)

    ) b

    WHERE Charges >= 1000


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Many of your assumptions are correct, and I apologize for not making it clearer in the beginning.

    In regards to the following:

    "Firstly, I'd like to question why trans# 3 and 6 appear in your expected results. The total charges for period 02-03 Jan is $510 for 123456789 and for 02-03 Apr is $700 for 987654321.?" I thought it would make it easier to include the transaction than to exclude it. But as I am finding out, nothing is easy with my problem. So your point is well taken, and these transactions are not something I need in the results.

    In regards to the field names, you are correct again. The real field names are postdate, transamt, ssn, customerkey, and the transaction table is really called the transdtl table, which is exactly what you assumed. Again, I was trying to make it easy, which again, didn't work. :/

    You are dead on with the "real-world quirk". A customer may have more than one card, and I want to create an alert when a customer charges over a certain amount of money within a certain amount of time, whether they put all of the charges on one card or spread them across multiple cards. BRILLIANT

  • egerencher (9/19/2012)


    "Firstly, I'd like to question why trans# 3 and 6 appear in your expected results. The total charges for period 02-03 Jan is $510 for 123456789 and for 02-03 Apr is $700 for 987654321.?" I thought it would make it easier to include the transaction than to exclude it. But as I am finding out, nothing is easy with my problem. So your point is well taken, and these transactions are not something I need in the results.

    Happy to hear my analysis was correct.

    egerencher (9/19/2012)


    You are dead on with the "real-world quirk". A customer may have more than one card, and I want to create an alert when a customer charges over a certain amount of money within a certain amount of time, whether they put all of the charges on one card or spread them across multiple cards. BRILLIANT

    How about this for a quirk? Why wouldn't you want to assign a different monetary amount to each customer? That would increase the complexity slightly, but not too much to handle.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Incidentally, a slight change to the table DDL and adding an index had a nice impact on my second solution's elapsed time:

    CREATE TABLE #Customer

    ( CustKey INT

    ,SSN NCHAR(9) NOT NULL

    ,PRIMARY KEY CLUSTERED (CustKey, SSN)

    );

    CREATE TABLE #TransDtl

    ( TransID INT IDENTITY PRIMARY KEY

    ,CustKey NCHAR(9) NOT NULL

    ,TransDate DATETIME NOT NULL

    ,Amount MONEY NOT NULL

    );

    CREATE INDEX TransDtl1

    ON #TransDtl (CustKey, TransDate)

    Results over 960,000 rows:

    SQL Server Execution Times:

    CPU time = 16942 ms, elapsed time = 34897 ms.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Simpler, quicker, same results:

    SELECT

    t.TransID,

    c.SSN,

    t.TransDate,

    t.Amount

    FROM #Customer c

    INNER JOIN #TransDtl t

    ON t.CustKey = c.CustKey

    CROSS APPLY (

    SELECT Amount = SUM(ti.Amount)

    FROM #TransDtl ti

    WHERE t.CustKey = c.CustKey

    AND ti.TransDate BETWEEN t.TransDate-1 AND t.TransDate

    ) s

    WHERE s.Amount >= 1000

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Here are the IO/timing results:

    ----------------- Dwain

    Table 'Worktable'. Scan count 368, logical reads 101966, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TransDtl 000000000060'. Scan count 1, logical reads 95, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'syscolpars'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Customer 00000000005F'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 312 ms, elapsed time = 368 ms.

    ----------------- David

    Table '#Customer 00000000005F'. Scan count 7, logical reads 20, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TransDtl 000000000060'. Scan count 19, logical reads 665, physical reads 0, read-ahead reads 2, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 10, logical reads 534164, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 47565 ms, elapsed time = 41821 ms.

    Looks like IOs isn't everything!

    Very, very nice, Dwain! I honestly thought my solution would scale better than that. :blush: Once again, the tally table kicks ... well, you know. πŸ™‚

    I can't wait to tear this apart. How did you generate your test data? Do you have a script you could share, or did you use a third-party tool? (If it was a third-party tool, please tell us which one.

    As I said before, I think this would be an excellent problem for Phil Factor's Speed Phreaks. It is exactly the sort of Real World problem they like to blow apart. :smooooth:

  • ChrisM@Work (9/20/2012)


    Simpler, quicker, same results:

    SELECT

    t.TransID,

    c.SSN,

    t.TransDate,

    t.Amount

    FROM #Customer c

    INNER JOIN #TransDtl t

    ON t.CustKey = c.CustKey

    CROSS APPLY (

    SELECT Amount = SUM(ti.Amount)

    FROM #TransDtl ti

    WHERE t.CustKey = c.CustKey

    AND ti.TransDate BETWEEN t.TransDate-1 AND t.TransDate

    ) s

    WHERE s.Amount >= 1000

    Simpler - for sure!

    Test harness (800K rows):

    CREATE TABLE #Customer

    ( CustKey INT

    ,SSN NCHAR(9) NOT NULL

    ,PRIMARY KEY CLUSTERED (CustKey, SSN)

    );

    CREATE TABLE #TransDtl

    ( TransID INT IDENTITY PRIMARY KEY

    ,CustKey NCHAR(9) NOT NULL

    ,TransDate DATETIME NOT NULL

    ,Amount MONEY NOT NULL

    );

    CREATE INDEX TransDtl1

    ON #TransDtl (CustKey, TransDate)

    INSERT INTO #Customer

    SELECT 12345,'123456789' UNION ALL SELECT 67890,'123456789'

    UNION ALL SELECT 98765,'987654321' UNION ALL SELECT 43210,'987654321'

    UNION ALL SELECT 13579,'246801357' UNION ALL SELECT 24568,'246801357'

    INSERT INTO #TransDtl

    SELECT 12345,'01/01/12',$600 UNION ALL SELECT 12345,'01/02/12',$500

    UNION ALL SELECT 67890,'01/03/12',$700 UNION ALL SELECT 98765,'04/01/12',$600

    UNION ALL SELECT 43210,'04/02/12',$600 UNION ALL SELECT 43210,'04/03/12',$100

    UNION ALL SELECT 13579,'04/02/12',$600 UNION ALL SELECT 24568,'04/03/12',$100

    DECLARE @StartDT DATETIME = '2012-09-20'

    ,@DaysBack INT = 365

    ;WITH Tally (n) AS (

    SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b)

    INSERT INTO #TransDtl

    SELECT CustKey

    ,TransDate=DATEADD(day, -ABS(CHECKSUM(NEWID())) % 360, @StartDT)

    ,Amount=ABS(CHECKSUM(NEWID())) % 1000

    FROM #TransDtl

    CROSS APPLY Tally

    -- Holders to capture output

    DECLARE @TransID INT

    ,@SSN NCHAR(9)

    ,@TransDate DATETIME

    ,@Amount MONEY

    PRINT '----------------- Dwain New'

    SET STATISTICS TIME, IO ON

    ;WITH Tally (StartDT) AS (

    SELECT TOP (@DaysBack) DATEADD(day, -(1+@DaysBack-ROW_NUMBER() OVER (ORDER BY (SELECT NULL))), @StartDt)

    FROM sys.all_columns),

    MyTrans AS (

    SELECT TransID, c.CustKey, SSN, TransDate, Amount

    FROM #TransDtl t

    INNER JOIN #Customer c ON c.CustKey = t.Custkey)

    SELECT DISTINCT @TransID=TransID, @SSN=SSN, @TransDate=TransDate, @Amount=Amount

    FROM (

    SELECT StartDT, EndDT=DATEADD(day, 1, StartDT), TransID, SSN, TransDate, Amount, CustKey

    ,Charges=SUM(Amount) OVER (PARTITION BY SSN, StartDT)

    FROM Tally a

    INNER JOIN MyTrans ON TransDate >= StartDt AND TransDate <= DATEADD(day, 1, StartDT)

    ) b

    WHERE Charges >= 1000

    SET STATISTICS TIME, IO OFF

    PRINT '----------------- Chris'

    SET STATISTICS TIME, IO ON

    SELECT

    @TransID=t.TransID,

    @SSN=c.SSN,

    @TransDate=t.TransDate,

    @Amount=t.Amount

    FROM #Customer c

    INNER JOIN #TransDtl t

    ON t.CustKey = c.CustKey

    CROSS APPLY (

    SELECT Amount = SUM(ti.Amount)

    FROM #TransDtl ti

    WHERE t.CustKey = c.CustKey

    AND ti.TransDate BETWEEN t.TransDate-1 AND t.TransDate

    ) s

    WHERE s.Amount >= 1000

    SET STATISTICS TIME, IO OFF

    DROP TABLE #TransDtl

    DROP TABLE #Customer

    Quicker?

    ----------------- Dwain New

    Table 'Worktable'. Scan count 368, logical reads 5682428, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TransDtl 000000000089'. Scan count 1, logical reads 4669, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'syscolpars'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Customer 000000000088'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 14539 ms, elapsed time = 26793 ms.

    ----------------- Chris

    Table '#Customer 000000000088'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 2160, logical reads 2429734, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TransDtl 000000000089'. Scan count 6, logical reads 9338, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 11590 ms, elapsed time = 8124 ms.

    Yeah, OK I'll give you that too.

    Even better it seems (easier on the CPU anyway) if you remove parallelism with OPTION (MAXDOP 1).

    ----------------- Dwain New

    SQL Server Execution Times:

    CPU time = 15038 ms, elapsed time = 25966 ms.

    ----------------- Chris

    SQL Server Execution Times:

    CPU time = 8517 ms, elapsed time = 9068 ms

    I knew someone would come along that knew what they were doing and best me. πŸ˜€

    Edit: Fixed the quoted section. Didn't mean to quote myself.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • David Moutray (9/20/2012)


    Here are the IO/timing results:

    ----------------- Dwain

    Table 'Worktable'. Scan count 368, logical reads 101966, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TransDtl 000000000060'. Scan count 1, logical reads 95, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'syscolpars'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Customer 00000000005F'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 312 ms, elapsed time = 368 ms.

    ----------------- David

    Table '#Customer 00000000005F'. Scan count 7, logical reads 20, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TransDtl 000000000060'. Scan count 19, logical reads 665, physical reads 0, read-ahead reads 2, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 10, logical reads 534164, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 47565 ms, elapsed time = 41821 ms.

    Looks like IOs isn't everything!

    Very, very nice, Dwain! I honestly thought my solution would scale better than that. :blush: Once again, the tally table kicks ... well, you know. πŸ™‚

    I can't wait to tear this apart. How did you generate your test data? Do you have a script you could share, or did you use a third-party tool? (If it was a third-party tool, please tell us which one.

    As I said before, I think this would be an excellent problem for Phil Factor's Speed Phreaks. It is exactly the sort of Real World problem they like to blow apart. :smooooth:

    The test harness has the following that expands the row set based on the 8 transactions provided:

    ;WITH Tally (n) AS (

    SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b)

    INSERT INTO #TransDtl

    SELECT CustKey

    ,TransDate=DATEADD(day, -ABS(CHECKSUM(NEWID())) % 360, @StartDT)

    ,Amount=ABS(CHECKSUM(NEWID())) % 1000

    FROM #TransDtl

    CROSS APPLY Tally

    Yes the Tally table kicked some until ChrisM@Work jumped into the fray!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply