Running Totals by Date and Account

  • Jeff Moden wrote:

    Jonathan AC Roberts wrote:

    Jeff Moden wrote:

    I do have a way to avoid that problem that will blow the doors off of most other methods for calculating running totals but it does take a bit of setup and purists will insist that it will blow up someday (even when there's a built in test-for-success added to the code).  Let me know if you' like to use that.

    Jeff, It's not a cursor is it? 😛

    PS: that was a joke.

    Heh... ironically, it is... sort of... it's a "Pseudo Cursor".  I think you're probably already familiar with the "Quirky Update", yes?

    I have some recollection of seeing it once, but i think it went over my head .

    Maybe we could generate some test data in a table and compare it with any methods anyone can think of?

  • No problem... Here's the code (below) to build a simple table with 1,000 accounts of 10,000 rows each of constrained random data for a total of 10 million rows.  I also took the time to ensure that it's massively fragmented with an average segment size of just one page on the expected clustered index for a bit of extra fun.

    The goal would be to populate the two obviously named columns that currently contain NULLs.  There are no restrictions on what can be done/used so long as it's native to SQL Server, is done using only T-SQL, and does NOT make use of stuff like SQLCLR.  If you need to change the indexes, that won't be counted against your time.  Of course, the code must create the correct answers.  I'll be happy to act as the common point for testing so that we're not fighting differences between machines.  I'm running SQL Server 2017 Developer Edition with 24GB of RAM allocated to SQL Server.

    My machine is a Dell Alienware 17R with 32GB of RAM, 2TB of NVME SSDs (yeah, I over-provisioned it a wee bit), and has an Intel  I7-8750H CPU with 6 core threaded to 12.  It's one of "those" systems that runs between 2.2 to 4GHz (wish they'd actually make servers like that).  I'll do the tests at 4GHz.  I also have a SEAGATE ST1000LM049 BARRACUDA PRO 1TB 7200RPM SATA-6GBPS 128MB BUFFER bit of spinning rust if we need to test with that.

    /*************************************************************************************
    Create the test table with a non-clustered Primary Key and a separate clustered index
    *************************************************************************************/
    --===== If the test table already exists, drop it in case we need to rerun.
    -- The 3 part naming is overkill, but prevents accidents on real tables.
    IF OBJECT_ID('dbo.TransactionDetail') IS NOT NULL
    DROP TABLE dbo.TransactionDetail
    ;
    --===== Create the test table (TransactionDetail) with a NON clustered PK
    CREATE TABLE dbo.TransactionDetail
    (
    TransactionDetailID INT IDENTITY(1,1), --Temporal "tie-breaker"
    Date DATETIME,
    AccountID INT,
    Amount MONEY,
    AccountRunningTotal MONEY, --Running total across each account
    AccountRunningCount INT, --Like "Rank" across each account
    CONSTRAINT PK_TransactionDetail_TransactionDetailID
    PRIMARY KEY NONCLUSTERED (TransactionDetailID)
    WITH FILLFACTOR = 100
    )
    ;
    --===== Add a clustered index that will easily cause page splits because
    -- of the randomized data being inserted. This index also represents
    -- the expected sort order of most of the code examples.
    CREATE UNIQUE CLUSTERED INDEX IXC_Transaction_AccountID_Date_TransactionDetailID
    ON dbo.TransactionDetail (AccountID, Date, TransactionDetailID)
    ;
    -----------------------------------------------------------------------------------------------------------------------
    /*************************************************************************************
    Populate the table using a rather slow method but one that's sure to cause lots of
    Page splits and that will fragment the table with over 99% fragmentation.
    *************************************************************************************/
    --===== Preset the environment for appearance
    SET NOCOUNT ON
    ;
    --===== Populate the table in "segments" to force page splits.
    -- Normally this would NOT have a While loop in it.
    -- Because the While loop is there and page splits are happening,
    -- this takes about 65 seconds to create on my box.
    WHILE (ISNULL(IDENT_CURRENT('TransactionDetail'),0)) < 10000000 --Yeah... 10 Million rows.
    BEGIN
    INSERT INTO dbo.TransactionDetail
    (Date, AccountID, Amount)
    SELECT TOP 10000 --Small batch size to drive fragmentation through the roof.
    --5 years worth of dates with times from 1/1/2015 to 12/31/2019
    RAND(CHECKSUM(NEWID()))*DATEDIFF(dd,'2015','2020')+CONVERT(DATETIME,'2015') AS Date,
    --1000 different account numbers
    ABS(CHECKSUM(NEWID()))%1000+1,
    --Dollar amounts from -9999.99 to + 9999.99
    CAST(CHECKSUM(NEWID())%1000000 /100.0 AS MONEY)
    FROM sys.all_columns ac1
    CROSS JOIN sys.all_columns ac2
    ;
    END
    ;
    --===== Display the data limits and counts summary.
    SELECT AccountIDs = COUNT(DISTINCT AccountID)
    ,MinAccountID = MIN(AccountID)
    ,MaxAccountID = MAX(AccountID)
    ,MinAmount = MIN(Amount)
    ,MaxAmount = MAX(Amount)
    ,[RowCount] = COUNT(*)
    ,AvgRowsPerAcct = COUNT(*)/COUNT(DISTINCT AccountID)
    FROM dbo.TransactionDetail
    ;
    --===== Display the detailed fragmentation data to show that the CI is 99% fragmented.
    SELECT *
    FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('dbo.TransactionDetail'),NULL,NULL,'DETAILED')
    ;
    --===== Show what 10 of the rows look like.
    SELECT TOP 10 *
    FROM dbo.TransactionDetail
    ORDER BY AccountID, Date, TransactionDetailID
    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I didn't mess with any indexes on the table you provided.

    Here are a few methods:

    Update Windowed SUM/COUNT using ROWS BETWEEN

    /*-- ****************************************************************************************
    -- Update Windowed SUM/COUNT using ROWS BETWEEN
    -- ******************************************************************************************/
    ;WITH CTE AS
    (
    SELECT SUM(Amount) OVER (PARTITION BY AccountId ORDER BY Date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) NewAccountRunningTotal,
    COUNT(*) OVER (PARTITION BY AccountId ORDER BY Date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) NewAccountRunningCount,
    AccountRunningTotal,
    AccountRunningCount
    FROM dbo.TransactionDetail td
    )
    UPDATE CTE
    SET AccountRunningTotal=NewAccountRunningTotal,
    AccountRunningCount=NewAccountRunningCount
    GO

    Update Windowed SUM/COUNT using RANGE BETWEEN

    /*-- ****************************************************************************************
    -- Update Windowed SUM/COUNT using RANGE BETWEEN
    -- *****************************************************************************************/
    ;WITH CTE AS
    (
    SELECT SUM(Amount) OVER (PARTITION BY AccountId ORDER BY Date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) NewAccountRunningTotal,
    COUNT(*) OVER (PARTITION BY AccountId ORDER BY Date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) NewAccountRunningCount,
    AccountRunningTotal,
    AccountRunningCount
    FROM dbo.TransactionDetail td
    )
    UPDATE CTE
    SET AccountRunningTotal=NewAccountRunningTotal,
    AccountRunningCount=NewAccountRunningCount
    GO

    Update Windowed SUM/COUNT

    /*-- ****************************************************************************************
    -- Update Windowed SUM/COUNT
    -- *****************************************************************************************/
    ;WITH CTE AS
    (
    SELECT SUM(Amount) OVER (PARTITION BY AccountId ORDER BY Date ASC) NewAccountRunningTotal,
    COUNT(*) OVER (PARTITION BY AccountId ORDER BY Date ASC) NewAccountRunningCount,
    AccountRunningTotal,
    AccountRunningCount
    FROM dbo.TransactionDetail td
    )
    UPDATE CTE
    SET AccountRunningTotal=NewAccountRunningTotal,
    AccountRunningCount=NewAccountRunningCount
    GO

    Update Cross Apply

    /*-- ****************************************************************************************
    -- Update Cross Apply
    -- *****************************************************************************************/
    UPDATE td
    SET td.AccountRunningTotal=td2.AccountRunningTotal,
    td.AccountRunningCount=td2.AccountRunningCount
    FROM dbo.TransactionDetail td
    CROSS APPLY(SELECT SUM(td2.Amount) AccountRunningTotal,
    COUNT(*) AccountRunningCount
    FROM dbo.TransactionDetail td2
    WHERE td2.AccountID=td.AccountID
    AND td2.Date<=td.Date) td2
    GO

    Updatable Cursor

    /*-- ****************************************************************************************
    -- Updatable cursor
    -- *****************************************************************************************/
    SET NOCOUNT ON
    DECLARE @TransactionDetailID int,
    @Date datetime,
    @AccountID int,
    @Amount money,
    @AccountRunningTotal money,
    @AccountRunningCount int

    DECLARE @SaveAccountID int=-1

    DECLARE @cursor cursor

    SET @cursor = cursor FORWARD_ONLY
    FOR SELECT td.[TransactionDetailID],
    td.[Date],
    td.[AccountID],
    td.[Amount],
    td.[AccountRunningTotal],
    td.[AccountRunningCount]
    FROM [dbo].[TransactionDetail] td
    ORDER BY td.AccountId, td.Date, td.TransactionDetailID
    FOR UPDATE OF td.[AccountRunningTotal], td.[AccountRunningCount]

    OPEN @cursor
    FETCH NEXT FROM @cursor INTO @TransactionDetailID, @Date, @AccountID, @Amount, @AccountRunningTotal, @AccountRunningCount

    WHILE @@FETCH_STATUS = 0 BEGIN

    IF @AccountID <> @SaveAccountID BEGIN
    SELECT @AccountRunningTotal = @Amount,
    @AccountRunningCount = 1
    END ELSE BEGIN
    SELECT @AccountRunningTotal += @Amount,
    @AccountRunningCount += 1
    END

    UPDATE [dbo].[TransactionDetail]
    SET AccountRunningTotal = @AccountRunningTotal,
    AccountRunningCount = @AccountRunningCount
    WHERE CURRENT OF @cursor

    SELECT @SaveAccountID=@AccountID

    FETCH NEXT FROM @cursor INTO @TransactionDetailID, @Date, @AccountID, @Amount, @AccountRunningTotal, @AccountRunningCount

    END
    CLOSE @cursor
    DEALLOCATE @cursor
    GO

    Temporary Table method with ROWS BETWEEN and MERGE JOIN

    /*-- ****************************************************************************************
    -- Temporary Table method with ROWS BETWEEN and MERGE JOIN
    -- *****************************************************************************************/
    IF OBJECT_ID('tempdb..#TransactionDetail','U') IS NOT NULL DROP TABLE #TransactionDetail

    SELECT AccountID,
    Date,
    TransactionDetailID,
    SUM(Amount) OVER (PARTITION BY AccountId ORDER BY Date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AccountRunningTotal,
    COUNT(*) OVER (PARTITION BY AccountId ORDER BY Date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AccountRunningCount
    INTO #TransactionDetail
    FROM [dbo].[TransactionDetail] td

    CREATE UNIQUE CLUSTERED INDEX IX_#TransactionDetail_1 ON #TransactionDetail(AccountID, Date, TransactionDetailID)

    UPDATE td
    SET td.AccountRunningTotal=td2.AccountRunningTotal,
    td.AccountRunningCount=td2.AccountRunningCount
    FROM [dbo].[TransactionDetail] td
    INNER JOIN #TransactionDetail td2
    ON td2.AccountID = td.AccountID
    AND td2.Date = td.Date
    AND td2.TransactionDetailID = td.TransactionDetailID

    DROP TABLE #TransactionDetail
    GO

     

Viewing 3 posts - 16 through 17 (of 17 total)

You must be logged in to reply to this topic. Login to reply