Replace calculation in cursor with set based logic

  • Hello and thanks if you can help. The calculation in the cursor is correct. Alas my attempt at doing the calculation in a select statement is not.  SQL Server 2019

    DROP TABLE IF EXISTS #Test

    CREATE TABLE #Test (PricingDate DATE NOT NULL,Price DECIMAL (5,2) NOT NULL,Factor DECIMAL (9,7) NULL)
    INSERT INTO #Test VALUES ('2025-04-01',11.94,NULL)
    INSERT INTO #Test VALUES ('2025-04-02',11.39,-4.6063581)
    INSERT INTO #Test VALUES ('2025-04-03',11.88,4.3020044)
    INSERT INTO #Test VALUES ('2025-04-04',11.83,-0.0252615)
    INSERT INTO #Test VALUES ('2025-04-05',12.19,3.0431671)
    INSERT INTO #Test VALUES ('2025-04-06',12.64,3.691608)
    --SELECT * FROM #Test

    DROP TABLE IF EXISTS #Output

    CREATE TABLE #Output (PricingDate DATE NOT NULL,Price DECIMAL (5,2) NOT NULL,Factor DECIMAL (9,7) NULL,Calculation DECIMAL (9,7) NULL,)

    DECLARE
    @PricingDate DATE,
    @Price DECIMAL (5,2),
    @Factor DECIMAL (9,7),
    @Calculation DECIMAL (9,7),
    @PreviousCalculation DECIMAL (9,7),
    @Count SMALLINT = 1

    DECLARE TestCursor CURSOR FOR SELECT PricingDate,Price,Factor FROM #Test ORDER BY PricingDate

    OPEN TestCursor
    FETCH NEXT FROM TestCursor into @PricingDate,@Price,@Factor
    WHILE @@FETCH_STATUS = 0
    BEGIN
    -- SELECT @PricingDate,@Price,@Factor
    IF @Count = 1
    BEGIN
    --SET @Calculation = (1 + (COALESCE(@Factor,0)/100)) * @Price
    SET @Calculation = @Price
    SET @PreviousCalculation = @Calculation
    SET @Count = @Count + 1
    INSERT INTO #Output SELECT @PricingDate,@Price,@Factor,@Calculation
    --SELECT @PricingDate AS PricingDate,@Price AS Price,@Factor AS Factor,@Calculation AS Calculation
    END
    ELSE
    BEGIN
    SET @Calculation = (1 + (COALESCE(@Factor,0)/100)) * @PreviousCalculation
    SET @Count = @Count + 1
    SET @PreviousCalculation = @Calculation
    INSERT INTO #Output SELECT @PricingDate,@Price,@Factor,@Calculation
    --SELECT @PricingDate AS PricingDate,@Price AS Price,@Factor AS Factor,@Calculation AS Calculation
    END
    FETCH NEXT FROM TestCursor into @PricingDate,@Price,@Factor
    END

    CLOSE TestCursor;
    DEALLOCATE TestCursor;

    SELECT
    ROW_NUMBER() OVER (ORDER BY PricingDate) AS RowNumber,
    PricingDate,
    Price,
    LAG(Price) OVER (ORDER BY PricingDate) AS LagPrice,
    Factor,
    LAG(Factor) OVER (ORDER BY PricingDate) AS LagFactor,
    Calculation,
    --NewCalculation
    CASE
    WHEN ROW_NUMBER() OVER (ORDER BY PricingDate) = 1
    THEN Price
    WHEN ROW_NUMBER() OVER (ORDER BY PricingDate) >= 2
    THEN
    (1 + (COALESCE(@Factor,0)/100)) *
    (
    (1 + (COALESCE(LAG(Factor) OVER (ORDER BY PricingDate),0)/100))
    *
    (1 + (COALESCE(@Factor,0)/100)) * (LAG(Price) OVER (ORDER BY PricingDate))
    )
    END AS NewCalculation
    FROM #Output
  • If you're trying to create a running total, use a windowing function.

    SUM(1 + COALESCE(Fact0r,0)/100) OVER (ORDER BY PricingDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

    ?

    Just make sure your "window" defining how which rows you want to accumulate are right.

  • Thanks I'll take a look at that windowing method. I don't think this is a running total but rather a calculation which needs the calculation from the prior row.

    Also I think I confused matters because in my attempted NewCalculation I was referencing a parameter (@Factor) rather than the Factor column from the #Output temp table. So my attempt should have been the below although the correction still does not get my desired result.

    SELECT
    ROW_NUMBER() OVER (ORDER BY PricingDate) AS RowNumber,
    PricingDate,
    Price,
    LAG(Price) OVER (ORDER BY PricingDate) AS LagPrice,
    Factor,
    LAG(Factor) OVER (ORDER BY PricingDate) AS LagFactor,
    Calculation,
    --NewCalculation
    CASE
    WHEN ROW_NUMBER() OVER (ORDER BY PricingDate) = 1
    THEN Price
    WHEN ROW_NUMBER() OVER (ORDER BY PricingDate) >= 1
    THEN
    (1 + (COALESCE(Factor,0)/100)) *
    (
    (1 + (COALESCE(LAG(Factor) OVER (ORDER BY PricingDate),0)/100))
    *
    (1 + (COALESCE(Factor,0)/100)) * (LAG(Price) OVER (ORDER BY PricingDate))
    )
    END AS NewCalculation
    FROM #Output

     

  • My current thinking is this needs to be done using recursion. The calculation need to reference the previous calculation which needs to reference the previous calculation etc. I was hoping to do this in a view. I've always stayed away from recursive queries for performance reasons and it just seems like a cursor under the hood. Still looking for help on this and would be much appreciative if someone could point me in the right direction.

  • Why are you using these data types with this numerical precision?  By making @Factor DECIMAL (9,7) and applying this percentage to a price it adds significance to decimal places to the right of the 4th place.  This means there will be an inscrutable influence on calculations where the balance is carried forward, which is what's happening.  One possible way to handle this could be to CAST the intermediate calculations to the MONEY data type.  However, I wouldn't recommend it.  Why is this precision called for?  Without knowing more it's difficult to say anything.

    Just duplicating the calculation without worrying about issues related to precision (which in this case are a severe pita imo) maybe this

    with lag_cte as (
    select *, lag(price) over (order by PricingDate) lag_price
    from #Test
    cross apply (values ((1 + (isnull(Factor,0)/100)))) v(factor_pct))
    select *, lag_price*factor_pct
    from lag_cte;

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • A quirky update seems to work, although that requires adding a Calculation column to the original table and making sure the original table is clustered by PricingDate, viz:

    CREATE TABLE #Test (PricingDate DATE NOT NULL PRIMARY KEY CLUSTERED,Price DECIMAL (5,2) NOT NULL,Factor DECIMAL (9,7) NULL,Calculation DECIMAL(9, 7))

    DECLARE @Calculation decimal(9, 7);
    DECLARE @Factor decimal(9, 7);
    DECLARE @PreviousCalculation decimal(9, 7);
    DECLARE @Price decimal(5, 2)
    DECLARE @PricingDate date;

    ;WITH CteTest AS (
    /* Cte not required here, oc; I used it to try to find a way
    to not have to add a Calculation column to main table */
    SELECT *
    FROM #Test
    )
    UPDATE CteTest
    SET
    @PreviousCalculation = CASE WHEN Factor IS NULL THEN Price ELSE @Calculation END,
    @Calculation = Calculation = (1 + (COALESCE(Factor,0)/100)) * @PreviousCalculation
    FROM CteTest WITH (TABLOCKX)

    SELECT * FROM #Output ORDER BY PricingDate;
    SELECT * FROM #Test ORDER BY PricingDate;

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher wrote:

    A quirky update seems to work

    Thanks Scott, appreciate it. I am trying to build a view since that is what my downstream processing expects. I forgot about the quirky update. It's always good to go back and read Jeff's article. I am studying Steve's replay and will respond in depth there.

  • Steve Collins wrote:

    Why are you using these data types with this numerical precision?   

    The precision issue is a problem. I have a legacy system with certain data types and a cursor. Then a new system I want to migrate to with different data types where I am trying to use a set based select statement. Then I have my test harness here where I just used data types for demonstration.

    Ideally I'd have a subject matter expert to tell me what precision/rounding to use. Realistically the legacy system is going to be considered 'correct'. My goal was to replicate the legacy system since its the only version of the truth I have.

    The Cursor and Quirky methods return the same precision as my legacy. Also matches and excel proof of concept I have.

    The CrossApply method strays and I assume this straying will increase over longer periods of time.

    I guess I am struggling to understand why Cursor and Quirky replicate my legacy results and the CrossApply method won't. Can I make a set based method consistent with my legacy results by appropriate rounding? I have tried but seems to get worse. Is the set based method just correct differently or is it flawed inherently with this type of problem. Would a windowing method have the same issues as the CrossApply?

    Notes on the revised script below. I am showing the Cursor, Quirky and CrossApply methods. I have added an EntityID column to better represent my actual data. I can't make EntityID work with the CrossApply method.

    Much thanks to this community.

    DROP TABLE IF EXISTS #Test

    --Test data
    CREATE TABLE #Test (EntityID INT NOT NULL,PricingDate DATE NOT NULL,Price DECIMAL (5,2) NOT NULL,Factor DECIMAL (9,7) NULL,Calculation DECIMAL(9, 7))--Add calculation to support Quirky
    INSERT INTO #Test VALUES (1,'2025-04-01',11.94,NULL,NULL)
    INSERT INTO #Test VALUES (1,'2025-04-02',11.39,-4.6063581,NULL)
    INSERT INTO #Test VALUES (1,'2025-04-03',11.88,4.3020044,NULL)
    INSERT INTO #Test VALUES (1,'2025-04-04',11.83,-0.0252615,NULL)
    INSERT INTO #Test VALUES (1,'2025-04-05',12.19,3.0431671,NULL)
    INSERT INTO #Test VALUES (1,'2025-04-06',12.64,3.691608,NULL)

    INSERT INTO #Test VALUES (2,'2025-04-01',11.94,NULL,NULL)
    INSERT INTO #Test VALUES (2,'2025-04-02',11.39,-4.6063581,NULL)
    INSERT INTO #Test VALUES (2,'2025-04-03',11.88,4.3020044,NULL)
    INSERT INTO #Test VALUES (2,'2025-04-04',11.83,-0.0252615,NULL)
    INSERT INTO #Test VALUES (2,'2025-04-05',12.19,3.0431671,NULL)
    INSERT INTO #Test VALUES (2,'2025-04-06',12.64,3.691608,NULL)

    ALTER TABLE #Test ADD PRIMARY KEY CLUSTERED (EntityID,PricingDate)

    --Table #Output is used to capture cursor output
    DROP TABLE IF EXISTS #Output
    CREATE TABLE #Output (EntityID INT NOT NULL,PricingDate DATE NOT NULL,Price DECIMAL (5,2) NOT NULL,Factor DECIMAL (9,7) NULL,Calculation DECIMAL (9,7) NULL,)

    --Cursor paramaters
    DECLARE @EntityID INT,@EntityIDPrevious INT = 0,@PricingDate DATE,@Price DECIMAL (5,2),@Factor DECIMAL (9,7),@Calculation DECIMAL (9,7),@PreviousCalculation DECIMAL (9,7),@Count SMALLINT = 1

    --Cursor processing
    DECLARE TestCursor CURSOR FOR SELECT EntityID,PricingDate,Price,Factor FROM #Test ORDER BY EntityID,PricingDate

    OPEN TestCursor
    FETCH NEXT FROM TestCursor INTO @EntityID,@PricingDate,@Price,@Factor
    WHILE @@FETCH_STATUS = 0
    BEGIN

    IF @EntityID <> @EntityIDPrevious
    BEGIN
    SET @Calculation = @Price
    SET @PreviousCalculation = @Calculation
    --SET @Count = @Count + 1
    SET @EntityIDPrevious = @EntityID
    INSERT INTO #Output SELECT @EntityID,@PricingDate,@Price,@Factor,@Calculation
    --SELECT @PricingDate AS PricingDate,@Price AS Price,@Factor AS Factor,@Calculation AS Calculation
    END
    ELSE
    BEGIN
    SET @Calculation = (1 + (COALESCE(@Factor,0)/100)) * @PreviousCalculation
    --SET @Count = @Count + 1
    SET @EntityIDPrevious = @EntityID
    SET @PreviousCalculation = @Calculation
    INSERT INTO #Output SELECT @EntityID,@PricingDate,@Price,@Factor,@Calculation
    --SELECT @PricingDate AS PricingDate,@Price AS Price,@Factor AS Factor,@Calculation AS Calculation
    END
    FETCH NEXT FROM TestCursor INTO @EntityID,@PricingDate,@Price,@Factor
    END

    CLOSE TestCursor;
    DEALLOCATE TestCursor;

    SELECT 'Cursor',* FROM #Output ORDER BY PricingDate, EntityID

    DROP TABLE IF EXISTS #Quirky
    SELECT * INTO #Quirky FROM #Test

    ;WITH CteTest AS (
    /* Cte not required here, oc; I used it to try to find a way
    to not have to add a Calculation column to main table */
    SELECT *
    FROM #Quirky
    )
    UPDATE CteTest
    SET
    @PreviousCalculation = CASE WHEN Factor IS NULL THEN Price ELSE @Calculation END,
    @Calculation = Calculation = (1 + (COALESCE(Factor,0)/100)) * @PreviousCalculation
    FROM CteTest WITH (TABLOCKX)

    SELECT 'Quirky',* FROM #Quirky ORDER BY PricingDate;


    WITH lag_cte AS
    (
    SELECT *, LAG(Price) OVER (ORDER BY PricingDate) lag_price
    FROM #Test
    CROSS APPLY (VALUES ((1 + (ISNULL(Factor,0)/100)))) v(factor_pct)
    WHERE EntityID = 1
    )
    SELECT 'CrossApply',*, lag_price*factor_pct
    FROM lag_cte;
  • Seems like you should be able to CAST the end result of the query to get less precision. The only problem is if you're losing precision in the CTE.

  • I would change the following

    (1 + (COALESCE(Factor,0)/100)) * @PreviousCalculation

    to be

    (1 + (COALESCE(Factor,0.)/100.)) * @PreviousCalculation -- notice the "dot" on the divide (100.) and on coalesce of factor-

    depending on the values you may also have silent truncation (instead of rounding which would be default in other tools) - you may need to add it to the calculation.

  • My query was insufficient because it applied the factor percentage to the lagged price and not to the lagged calculated price.  It seems recursion is unavoidable.  This query uses a recursive CTE

    with recur_cte as (
    select PricingDate, Price, Factor, cast(Price as decimal(9,7)) Calculation
    from #Test
    where Factor is null
    union all
    select t.PricingDate, t.Price, t.Factor, cast((1 + (isnull(t.Factor,0)/100)) * r.Calculation as decimal(9,7))
    from #Test t
    join recur_cte r on t.PricingDate = dateadd(day, 1, r.PricingDate))
    select *
    from recur_cte
    order by PricingDate;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Regarding the precision of the data types

    cast(Price as decimal(9,7)) Calculation

    The line above CAST's the price (Price) to DECIMAL(9, 7).  Imo this is not a good practice.  When it comes to US dollars there are only 4 places of significance to the right of the decimal point.  Any digits further to the right, i.e. 5th to infinity, doesn't impact rounding to the penny.  The SQL Server MONEY datatype is a weird space saving utility roughly equivalent to FLOAT but with only 4 digit significance.  Imo MONEY datatype is of questionable necessity and has very limited application

    When the Factor column's row value is 4.3020044 and it is applied to $1.0000000 (which is imaginary and cannot exist)  the result is insignificant digits being populated with value.  Reports and various aggregated numbers might not add up to the penny.  In my experience these sorts of discrepancies are frustrating for data users to encounter.  For the SQL developer making the numbers add up exactly and answering questions can be a difficult conversation

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Thanks to all who responded. I'll review and post back.

  • Following up. Steve's solution using the recursive query works but unfortunately performance is abysmal with my live data. Performance is worse than a cursor.  I looked at tuning with an experienced colleague and we came to the conclusion that the recursive query can not be adequately tuned. I am going to move towards a model where daily processing stores the data and a query selects the stored data.

    I enlisted Gemini AI to see if there was a non-recursive solution. After a couple of rounds of providing wrong answers it also threw in the towel. Thanks all.

    You are absolutely correct, and I apologize again for the persistent inaccuracies. It seems the set-based logic I've provided so far hasn't perfectly captured the sequential dependency of the calculation within each EntityID without relying on recursion.

    You are right to question whether this specific problem can be solved with a purely non-recursive set-based approach using standard window functions in a straightforward manner. The challenge lies in the fact that the calculation for each row depends on the calculated value of the immediately preceding row within the same group, not just the original Price or Factor.

    While window functions like LAG can access the previous row's column value, directly referencing the calculated value of the previous row within the same set-based operation becomes tricky without some form of iterative processing or recursion.

    Possibility of a Very Complex Set-Based Solution (Potentially Less Performant and Harder to Maintain):

    It might be theoretically possible to construct a very complex set-based query involving multiple joins and window functions that attempts to simulate the cumulative calculation. However, such a solution would likely be:

    Extremely difficult to write and understand.

    Potentially less performant than the recursive CTE in many scenarios due to the complexity of the joins and calculations involved.

    Hard to maintain and debug.

    Why Recursion is Often Used for This Type of Problem:

    Problems where the current row's value depends on the calculated value of the previous row within a group are classic use cases for recursive CTEs. They provide a more intuitive and manageable way to express this sequential dependency in SQL.

    Considering Alternatives (If Performance is Critical):

    If the performance of the recursive CTE is truly unacceptable for your large datasets, you might need to consider alternatives outside of a single, purely set-based SQL query:

    Procedural Logic (e.g., Stored Procedures with Loops): While you wanted to avoid cursors, using a stored procedure with a loop that processes each EntityID sequentially could offer better performance than a poorly optimized recursive CTE in some cases. However, this moves away from a purely declarative set-based approach.

    Application-Level Logic: Performing this calculation in your application code (e.g., Python, C#, Java) by fetching the data ordered by EntityID and PricingDate and then iterating through it to perform the calculation might be more efficient for very large datasets, especially if you can process data in batches.

    Conclusion on Purely Non-Recursive Set-Based Solution:

    While I will continue to ponder if there's a more elegant non-recursive set-based solution I might be overlooking, based on the nature of the dependency, achieving the exact same sequential calculation without recursion or procedural logic within a single standard SQL query is likely to be very difficult and potentially counterproductive in terms of performance and maintainability.

    For now, the recursive CTE remains the most straightforward and generally recommended SQL approach for this type of problem. If performance is a critical bottleneck with the recursive CTE, exploring procedural logic or application-level processing might be necessary.

    I apologize that I haven't been able to provide a performant, purely non-recursive set-based solution that matches the cursor's output exactly. I will keep thinking about alternative approaches, but the inherent sequential dependency makes this a challenging problem for standard set-based operations.

     

  • If the quirky UPDATE works, can you modify the original table to include the new calculated column?

    Let me ask: once a value has been calc'd, would it ever change?  In other words, once you've made the calc, can you skip that row when doing future calcs, and only have to calc for new rows?

     

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply