Replace calculation in cursor with set based logic

Question

Replace calculation in cursor with set based logic

SSCoach

Points: 15785

April 8, 2025 at 6:45 pm

Hello and thanks if you can help. The calculation in the cursor is correct. Alas my attempt at doing the calculation in a select statement is not. SQL Server 2019

DROP TABLE IF EXISTS #Test

CREATE TABLE #Test (PricingDate DATE NOT NULL,Price DECIMAL (5,2) NOT NULL,Factor DECIMAL (9,7) NULL)
INSERT INTO #Test VALUES ('2025-04-01',11.94,NULL)
INSERT INTO #Test VALUES ('2025-04-02',11.39,-4.6063581)
INSERT INTO #Test VALUES ('2025-04-03',11.88,4.3020044)
INSERT INTO #Test VALUES ('2025-04-04',11.83,-0.0252615)
INSERT INTO #Test VALUES ('2025-04-05',12.19,3.0431671)
INSERT INTO #Test VALUES ('2025-04-06',12.64,3.691608)
--SELECT * FROM #Test

DROP TABLE IF EXISTS #Output

CREATE TABLE #Output (PricingDate DATE NOT NULL,Price DECIMAL (5,2) NOT NULL,Factor DECIMAL (9,7) NULL,Calculation DECIMAL (9,7) NULL,)

DECLARE 
@PricingDate DATE,
@Price DECIMAL (5,2),
@Factor DECIMAL (9,7),
@Calculation DECIMAL (9,7),
@PreviousCalculation DECIMAL (9,7),
@Count SMALLINT = 1

DECLARE TestCursor CURSOR FOR SELECT PricingDate,Price,Factor FROM #Test ORDER BY PricingDate

OPEN TestCursor
FETCH NEXT FROM TestCursor into @PricingDate,@Price,@Factor
WHILE @@FETCH_STATUS = 0
BEGIN 
--  SELECT @PricingDate,@Price,@Factor
IF @Count = 1 
BEGIN
--SET @Calculation = (1 + (COALESCE(@Factor,0)/100)) * @Price
SET @Calculation = @Price
SET @PreviousCalculation = @Calculation
SET @Count = @Count + 1
INSERT INTO #Output SELECT @PricingDate,@Price,@Factor,@Calculation
--SELECT @PricingDate AS PricingDate,@Price AS Price,@Factor AS Factor,@Calculation AS Calculation
 END
 ELSE
 BEGIN
SET @Calculation = (1 + (COALESCE(@Factor,0)/100)) * @PreviousCalculation
SET @Count = @Count + 1
SET @PreviousCalculation = @Calculation
INSERT INTO #Output SELECT @PricingDate,@Price,@Factor,@Calculation
--SELECT @PricingDate AS PricingDate,@Price AS Price,@Factor AS Factor,@Calculation AS Calculation
END
FETCH NEXT FROM TestCursor into @PricingDate,@Price,@Factor
END
 
 CLOSE TestCursor;
DEALLOCATE TestCursor;

SELECT
ROW_NUMBER() OVER (ORDER BY PricingDate) AS RowNumber,
PricingDate,
Price,
LAG(Price) OVER (ORDER BY PricingDate) AS LagPrice,
Factor,
LAG(Factor) OVER (ORDER BY PricingDate) AS LagFactor,
Calculation,
--NewCalculation
CASE 
WHEN ROW_NUMBER() OVER (ORDER BY PricingDate) = 1
THEN Price
WHEN ROW_NUMBER() OVER (ORDER BY PricingDate) >= 2 
THEN 
(1 + (COALESCE(@Factor,0)/100))  *
(
(1 + (COALESCE(LAG(Factor) OVER (ORDER BY PricingDate),0)/100)) 
*
(1 + (COALESCE(@Factor,0)/100)) * (LAG(Price) OVER (ORDER BY PricingDate)) 
)
END AS NewCalculation
FROM #Output

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

pietlinden SSC Guru Points: 63572 More actions · Answer 1

If you're trying to create a running total, use a windowing function.

SUM(1 + COALESCE(Fact0r,0)/100) OVER (ORDER BY PricingDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

?

Just make sure your "window" defining how which rows you want to accumulate are right.

Chrissy321 SSCoach Points: 15785 More actions · Answer 2

Thanks I'll take a look at that windowing method. I don't think this is a running total but rather a calculation which needs the calculation from the prior row.

Also I think I confused matters because in my attempted NewCalculation I was referencing a parameter (@Factor) rather than the Factor column from the #Output temp table. So my attempt should have been the below although the correction still does not get my desired result.

SELECT
ROW_NUMBER() OVER (ORDER BY PricingDate) AS RowNumber,
PricingDate,
Price,
LAG(Price) OVER (ORDER BY PricingDate) AS LagPrice,
Factor,
LAG(Factor) OVER (ORDER BY PricingDate) AS LagFactor,
Calculation,
--NewCalculation
CASE 
WHEN ROW_NUMBER() OVER (ORDER BY PricingDate) = 1
THEN Price
WHEN ROW_NUMBER() OVER (ORDER BY PricingDate) >= 1 
THEN 
(1 + (COALESCE(Factor,0)/100))  *
(
(1 + (COALESCE(LAG(Factor) OVER (ORDER BY PricingDate),0)/100)) 
*
(1 + (COALESCE(Factor,0)/100)) * (LAG(Price) OVER (ORDER BY PricingDate)) 
)
END AS NewCalculation
FROM #Output

Chrissy321 SSCoach Points: 15785 More actions · Answer 3

My current thinking is this needs to be done using recursion. The calculation need to reference the previous calculation which needs to reference the previous calculation etc. I was hoping to do this in a view. I've always stayed away from recursive queries for performance reasons and it just seems like a cursor under the hood. Still looking for help on this and would be much appreciative if someone could point me in the right direction.

Steve Collins SSCrazy Points: 2198 More actions · Answer 4

Why are you using these data types with this numerical precision? By making @Factor DECIMAL (9,7) and applying this percentage to a price it adds significance to decimal places to the right of the 4th place. This means there will be an inscrutable influence on calculations where the balance is carried forward, which is what's happening. One possible way to handle this could be to CAST the intermediate calculations to the MONEY data type. However, I wouldn't recommend it. Why is this precision called for? Without knowing more it's difficult to say anything.

Just duplicating the calculation without worrying about issues related to precision (which in this case are a severe pita imo) maybe this

with lag_cte as (
    select *, lag(price) over (order by PricingDate) lag_price
    from #Test
         cross apply (values ((1 + (isnull(Factor,0)/100)))) v(factor_pct))
select *, lag_price*factor_pct
from lag_cte;

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

ScottPletcher SSC Guru Points: 101192 More actions · Answer 5

A quirky update seems to work, although that requires adding a Calculation column to the original table and making sure the original table is clustered by PricingDate, viz:

CREATE TABLE #Test (PricingDate DATE NOT NULL PRIMARY KEY CLUSTERED,Price DECIMAL (5,2) NOT NULL,Factor DECIMAL (9,7) NULL,Calculation DECIMAL(9, 7))

DECLARE @Calculation decimal(9, 7);
DECLARE @Factor decimal(9, 7);
DECLARE @PreviousCalculation decimal(9, 7);
DECLARE @Price decimal(5, 2)
DECLARE @PricingDate date;

;WITH CteTest AS (
    /* Cte not required here, oc; I used it to try to find a way 
       to not have to add a Calculation column to main table */
    SELECT *
    FROM #Test
)
UPDATE CteTest
SET
@PreviousCalculation = CASE WHEN Factor IS NULL THEN Price ELSE @Calculation END,
@Calculation = Calculation = (1 + (COALESCE(Factor,0)/100)) * @PreviousCalculation
FROM CteTest WITH (TABLOCKX)

SELECT * FROM #Output ORDER BY PricingDate;
SELECT * FROM #Test ORDER BY PricingDate;

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Chrissy321 SSCoach Points: 15785 More actions · Answer 6

ScottPletcher wrote:

A quirky update seems to work

Thanks Scott, appreciate it. I am trying to build a view since that is what my downstream processing expects. I forgot about the quirky update. It's always good to go back and read Jeff's article. I am studying Steve's replay and will respond in depth there.

Chrissy321 SSCoach Points: 15785 More actions · Answer 7

Steve Collins wrote:

Why are you using these data types with this numerical precision?

The precision issue is a problem. I have a legacy system with certain data types and a cursor. Then a new system I want to migrate to with different data types where I am trying to use a set based select statement. Then I have my test harness here where I just used data types for demonstration.

Ideally I'd have a subject matter expert to tell me what precision/rounding to use. Realistically the legacy system is going to be considered 'correct'. My goal was to replicate the legacy system since its the only version of the truth I have.

The Cursor and Quirky methods return the same precision as my legacy. Also matches and excel proof of concept I have.

The CrossApply method strays and I assume this straying will increase over longer periods of time.

I guess I am struggling to understand why Cursor and Quirky replicate my legacy results and the CrossApply method won't. Can I make a set based method consistent with my legacy results by appropriate rounding? I have tried but seems to get worse. Is the set based method just correct differently or is it flawed inherently with this type of problem. Would a windowing method have the same issues as the CrossApply?

Notes on the revised script below. I am showing the Cursor, Quirky and CrossApply methods. I have added an EntityID column to better represent my actual data. I can't make EntityID work with the CrossApply method.

Much thanks to this community.

DROP TABLE IF EXISTS #Test

--Test data
CREATE TABLE #Test (EntityID INT NOT NULL,PricingDate DATE NOT NULL,Price DECIMAL (5,2) NOT NULL,Factor DECIMAL (9,7) NULL,Calculation DECIMAL(9, 7))--Add calculation to support Quirky
INSERT INTO #Test VALUES (1,'2025-04-01',11.94,NULL,NULL)
INSERT INTO #Test VALUES (1,'2025-04-02',11.39,-4.6063581,NULL)
INSERT INTO #Test VALUES (1,'2025-04-03',11.88,4.3020044,NULL)
INSERT INTO #Test VALUES (1,'2025-04-04',11.83,-0.0252615,NULL)
INSERT INTO #Test VALUES (1,'2025-04-05',12.19,3.0431671,NULL)
INSERT INTO #Test VALUES (1,'2025-04-06',12.64,3.691608,NULL)

INSERT INTO #Test VALUES (2,'2025-04-01',11.94,NULL,NULL)
INSERT INTO #Test VALUES (2,'2025-04-02',11.39,-4.6063581,NULL)
INSERT INTO #Test VALUES (2,'2025-04-03',11.88,4.3020044,NULL)
INSERT INTO #Test VALUES (2,'2025-04-04',11.83,-0.0252615,NULL)
INSERT INTO #Test VALUES (2,'2025-04-05',12.19,3.0431671,NULL)
INSERT INTO #Test VALUES (2,'2025-04-06',12.64,3.691608,NULL)

ALTER TABLE #Test ADD PRIMARY KEY CLUSTERED (EntityID,PricingDate)

--Table #Output is used to capture cursor output
DROP TABLE IF EXISTS #Output
CREATE TABLE #Output (EntityID INT NOT NULL,PricingDate DATE NOT NULL,Price DECIMAL (5,2) NOT NULL,Factor DECIMAL (9,7) NULL,Calculation DECIMAL (9,7) NULL,)

--Cursor paramaters
DECLARE @EntityID INT,@EntityIDPrevious INT = 0,@PricingDate DATE,@Price DECIMAL (5,2),@Factor DECIMAL (9,7),@Calculation DECIMAL (9,7),@PreviousCalculation DECIMAL (9,7),@Count SMALLINT = 1

--Cursor processing
DECLARE TestCursor CURSOR FOR SELECT EntityID,PricingDate,Price,Factor FROM #Test ORDER BY EntityID,PricingDate

OPEN TestCursor
FETCH NEXT FROM TestCursor INTO @EntityID,@PricingDate,@Price,@Factor
WHILE @@FETCH_STATUS = 0
BEGIN 

IF @EntityID <> @EntityIDPrevious
BEGIN
SET @Calculation = @Price
SET @PreviousCalculation = @Calculation
--SET @Count = @Count + 1
SET @EntityIDPrevious = @EntityID
INSERT INTO #Output SELECT @EntityID,@PricingDate,@Price,@Factor,@Calculation
--SELECT @PricingDate AS PricingDate,@Price AS Price,@Factor AS Factor,@Calculation AS Calculation
 END
 ELSE
 BEGIN
SET @Calculation = (1 + (COALESCE(@Factor,0)/100)) * @PreviousCalculation
--SET @Count = @Count + 1
SET @EntityIDPrevious = @EntityID
SET @PreviousCalculation = @Calculation
INSERT INTO #Output SELECT @EntityID,@PricingDate,@Price,@Factor,@Calculation
--SELECT @PricingDate AS PricingDate,@Price AS Price,@Factor AS Factor,@Calculation AS Calculation
END
FETCH NEXT FROM TestCursor INTO @EntityID,@PricingDate,@Price,@Factor
END
 
CLOSE TestCursor;
DEALLOCATE TestCursor;

SELECT 'Cursor',* FROM #Output ORDER BY PricingDate, EntityID

DROP TABLE IF EXISTS #Quirky
SELECT * INTO #Quirky FROM #Test

;WITH CteTest AS (
/* Cte not required here, oc; I used it to try to find a way 
   to not have to add a Calculation column to main table */
SELECT *
FROM #Quirky
)
UPDATE CteTest
SET
@PreviousCalculation = CASE WHEN Factor IS NULL THEN Price ELSE @Calculation END,
@Calculation = Calculation = (1 + (COALESCE(Factor,0)/100)) * @PreviousCalculation
FROM CteTest WITH (TABLOCKX)

SELECT 'Quirky',* FROM #Quirky ORDER BY PricingDate;


WITH lag_cte AS 
(
    SELECT *, LAG(Price) OVER (ORDER BY PricingDate) lag_price
    FROM #Test
    CROSS APPLY (VALUES ((1 + (ISNULL(Factor,0)/100)))) v(factor_pct)
WHERE EntityID = 1
)
SELECT 'CrossApply',*, lag_price*factor_pct
FROM lag_cte;

pietlinden SSC Guru Points: 63572 More actions · Answer 8

Seems like you should be able to CAST the end result of the query to get less precision. The only problem is if you're losing precision in the CTE.

frederico_fonseca SSCoach Points: 16214 More actions · Answer 9

I would change the following

(1 + (COALESCE(Factor,0)/100)) * @PreviousCalculation

to be

(1 + (COALESCE(Factor,0.)/100.)) * @PreviousCalculation -- notice the "dot" on the divide (100.) and on coalesce of factor-

depending on the values you may also have silent truncation (instead of rounding which would be default in other tools) - you may need to add it to the calculation.

Steve Collins SSCrazy Points: 2198 More actions · Answer 10

My query was insufficient because it applied the factor percentage to the lagged price and not to the lagged calculated price. It seems recursion is unavoidable. This query uses a recursive CTE

with recur_cte as (
    select PricingDate, Price, Factor, cast(Price as decimal(9,7)) Calculation 
    from #Test
    where Factor is null
    union all
    select t.PricingDate, t.Price, t.Factor, cast((1 + (isnull(t.Factor,0)/100)) * r.Calculation as decimal(9,7))
    from #Test t
         join recur_cte r on t.PricingDate = dateadd(day, 1, r.PricingDate))
select * 
from recur_cte
order by PricingDate;

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Steve Collins SSCrazy Points: 2198 More actions · Answer 11

Regarding the precision of the data types

cast(Price as decimal(9,7)) Calculation

The line above CAST's the price (Price) to DECIMAL(9, 7). Imo this is not a good practice. When it comes to US dollars there are only 4 places of significance to the right of the decimal point. Any digits further to the right, i.e. 5th to infinity, doesn't impact rounding to the penny. The SQL Server MONEY datatype is a weird space saving utility roughly equivalent to FLOAT but with only 4 digit significance. Imo MONEY datatype is of questionable necessity and has very limited application

When the Factor column's row value is 4.3020044 and it is applied to $1.0000000 (which is imaginary and cannot exist) the result is insignificant digits being populated with value. Reports and various aggregated numbers might not add up to the penny. In my experience these sorts of discrepancies are frustrating for data users to encounter. For the SQL developer making the numbers add up exactly and answering questions can be a difficult conversation

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Chrissy321 SSCoach Points: 15785 More actions · Answer 12

Thanks to all who responded. I'll review and post back.

Chrissy321 SSCoach Points: 15785 More actions · Answer 13

Following up. Steve's solution using the recursive query works but unfortunately performance is abysmal with my live data. Performance is worse than a cursor. I looked at tuning with an experienced colleague and we came to the conclusion that the recursive query can not be adequately tuned. I am going to move towards a model where daily processing stores the data and a query selects the stored data.

I enlisted Gemini AI to see if there was a non-recursive solution. After a couple of rounds of providing wrong answers it also threw in the towel. Thanks all.

You are absolutely correct, and I apologize again for the persistent inaccuracies. It seems the set-based logic I've provided so far hasn't perfectly captured the sequential dependency of the calculation within each EntityID without relying on recursion.

You are right to question whether this specific problem can be solved with a purely non-recursive set-based approach using standard window functions in a straightforward manner. The challenge lies in the fact that the calculation for each row depends on the calculated value of the immediately preceding row within the same group, not just the original Price or Factor.

While window functions like LAG can access the previous row's column value, directly referencing the calculated value of the previous row within the same set-based operation becomes tricky without some form of iterative processing or recursion.

Possibility of a Very Complex Set-Based Solution (Potentially Less Performant and Harder to Maintain):

It might be theoretically possible to construct a very complex set-based query involving multiple joins and window functions that attempts to simulate the cumulative calculation. However, such a solution would likely be:

Extremely difficult to write and understand.

Potentially less performant than the recursive CTE in many scenarios due to the complexity of the joins and calculations involved.

Hard to maintain and debug.

Why Recursion is Often Used for This Type of Problem:

Problems where the current row's value depends on the calculated value of the previous row within a group are classic use cases for recursive CTEs. They provide a more intuitive and manageable way to express this sequential dependency in SQL.

Considering Alternatives (If Performance is Critical):

If the performance of the recursive CTE is truly unacceptable for your large datasets, you might need to consider alternatives outside of a single, purely set-based SQL query:

Procedural Logic (e.g., Stored Procedures with Loops): While you wanted to avoid cursors, using a stored procedure with a loop that processes each EntityID sequentially could offer better performance than a poorly optimized recursive CTE in some cases. However, this moves away from a purely declarative set-based approach.

Application-Level Logic: Performing this calculation in your application code (e.g., Python, C#, Java) by fetching the data ordered by EntityID and PricingDate and then iterating through it to perform the calculation might be more efficient for very large datasets, especially if you can process data in batches.

Conclusion on Purely Non-Recursive Set-Based Solution:

While I will continue to ponder if there's a more elegant non-recursive set-based solution I might be overlooking, based on the nature of the dependency, achieving the exact same sequential calculation without recursion or procedural logic within a single standard SQL query is likely to be very difficult and potentially counterproductive in terms of performance and maintainability.

For now, the recursive CTE remains the most straightforward and generally recommended SQL approach for this type of problem. If performance is a critical bottleneck with the recursive CTE, exploring procedural logic or application-level processing might be necessary.

I apologize that I haven't been able to provide a performant, purely non-recursive set-based solution that matches the cursor's output exactly. I will keep thinking about alternative approaches, but the inherent sequential dependency makes this a challenging problem for standard set-based operations.

ScottPletcher SSC Guru Points: 101192 More actions · Answer 14

If the quirky UPDATE works, can you modify the original table to include the new calculated column?

Let me ask: once a value has been calc'd, would it ever change? In other words, once you've made the calc, can you skip that row when doing future calcs, and only have to calc for new rows?

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Replace calculation in cursor with set based logic

Cookies on SQLServerCentral