August 4, 2022 at 4:45 pm
Hello,
Thanks if you can offer any assistance. I am trying to use LAG to return the previous non null value but I am not getting my desired results. I added some comments in my code showing the desired results. I do not want to use a CTE.
--Microsoft SQL Server 2019
--Build a temp calendar table
DROP TABLE IF EXISTS Calendar
CREATE TABLE Calendar (CalendarDate DATE)
DECLARE @Start DATE = '2022-07-01'
DECLARE @End DATE = '2022-07-31'
WHILE ( @Start < @End )
BEGIN
INSERT INTO Calendar (CalendarDate) VALUES( @Start )
SELECT @Start = DATEADD(DAY, 1, @Start )
END
DROP TABLE IF EXISTS MyData
SELECT *
INTO MyData
FROM
(
SELECT '2022-07-01' AS ReportDate,100AS ReportValue UNION ALL
SELECT '2022-07-05' AS ReportDate,101AS ReportValue UNION ALL
SELECT '2022-07-06' AS ReportDate,102AS ReportValue UNION ALL
SELECT '2022-07-07' AS ReportDate,103AS ReportValue UNION ALL
SELECT '2022-07-08' AS ReportDate,104AS ReportValue UNION ALL
SELECT '2022-07-11' AS ReportDate,105AS ReportValue UNION ALL
SELECT '2022-07-12' AS ReportDate,106AS ReportValue UNION ALL
SELECT '2022-07-13' AS ReportDate,107AS ReportValue UNION ALL
SELECT '2022-07-14' AS ReportDate,108AS ReportValue UNION ALL
SELECT '2022-07-15' AS ReportDate,109AS ReportValue UNION ALL
SELECT '2022-07-18' AS ReportDate,110AS ReportValue UNION ALL
SELECT '2022-07-19' AS ReportDate,111AS ReportValue UNION ALL
SELECT '2022-07-20' AS ReportDate,112AS ReportValue UNION ALL
SELECT '2022-07-21' AS ReportDate,113AS ReportValue UNION ALL
SELECT '2022-07-22' AS ReportDate,114AS ReportValue UNION ALL
SELECT '2022-07-25' AS ReportDate,115AS ReportValue UNION ALL
SELECT '2022-07-26' AS ReportDate,116AS ReportValue UNION ALL
SELECT '2022-07-27' AS ReportDate,117AS ReportValue UNION ALL
SELECT '2022-07-28' AS ReportDate,118AS ReportValue UNION ALL
SELECT '2022-07-29' AS ReportDate,119AS ReportValue
) D
GO
CREATE OR ALTER VIEW TestView
AS
SELECT
*,
LAG(ReportDate,1) OVER ( ORDER BY ReportDate ASC) AS LagReportDate,
LAG(ReportValue,1) OVER ( ORDER BY ReportDate ASC) AS LagReportValue
FROM Calendar C
LEFT JOIN MyData D
ON C.CalendarDate = D.ReportDate
GO
SELECT
*,
'This is the desired result, This date has data so I get the lag values'
FROM TestView WHERE CalendarDate = '2022-07-05'
SELECT
*,
'This is not the desired result; There is no data for the data but I want LagReportDate to be 2022-07-01 and LagReportValue to be 100'
FROM TestView WHERE CalendarDate = '2022-07-04'
SELECT
* ,
'Good here'
FROM TestView WHERE CalendarDate = '2022-07-06'
SELECT
*,
'Not the desired result'
FROM TestView WHERE CalendarDate = '2022-07-09'
August 4, 2022 at 5:06 pm
--deleted
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
August 4, 2022 at 5:25 pm
I believe that this gives you your desired results. It's based on code by Itzik Ben-Gan. I prefer to use BINARY rather than CHAR except in the case of DATE fields, because DATE fields are little-endian, which throws off the calculations.
SELECT C.#CalendarDate
, D.ReportDate
, D.ReportValue
, CAST(SUBSTRING(MAX(v.val) OVER(ORDER BY C.#CalendarDate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 1, 10) AS DATE) AS LagReportDate
, CAST(SUBSTRING(MAX(v.val) OVER(ORDER BY C.#CalendarDate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 11, 10) AS INT) AS LagReportValue
FROM #Calendar C
LEFT JOIN #MyData D
ON C.#CalendarDate = D.ReportDate
CROSS APPLY (VALUES(CAST(C.#CalendarDate AS CHAR(10)) + CAST(D.ReportValue AS CHAR(10)))) v(val)
ORDER BY C.#CalendarDate
Note: I converted your permanent tables to temporary tables and did not create your view.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 4, 2022 at 5:27 pm
Instead of using LAG you can use OUTER APPLY. I've used both in this query so you can see the difference.
CREATE OR ALTER VIEW TestView AS
SELECT C.*,
D.*,
LAG(D.ReportDate,1) OVER ( ORDER BY D.ReportDate ASC) AS LagReportDate,
LAG(D.ReportValue,1) OVER ( ORDER BY D.ReportDate ASC) AS LagReportValue,
D2.ReportDate Lag2ReportDate,
D2.ReportValue Lag2ReportValue
FROM Calendar C
LEFT JOIN MyData D
ON C.CalendarDate = D.ReportDate
OUTER APPLY(SELECT TOP(1) *
FROM MyData D2
WHERE D2.ReportDate < C.CalendarDate
ORDER BY D2.ReportDate DESC) D2
August 4, 2022 at 5:33 pm
As a side note, there is no way to implement this using just LAG()
, since 2022-07-01 can only be the immediate predecessor of one record and you would need it to be the immediate predecessor of four (2022-07-02 through 2022-07-05 inclusive).
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 4, 2022 at 5:41 pm
Here is a comparison of the two methods:
/* Windowed Aggregate (MAX) */
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#MyData______000000010891'. Scan count 1, logical reads 30, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Calendar____000000010890'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(30 rows affected)
/* CROSS APPLY/TOP(1) */
Table 'Worktable'. Scan count 60, logical reads 237, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#MyData______000000010891'. Scan count 31, logical reads 60, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Calendar____000000010890'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 3 ms.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 4, 2022 at 6:32 pm
Here is a comparison of the two methods:
/* Windowed Aggregate (MAX) */Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#MyData______000000010891'. Scan count 1, logical reads 30, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Calendar____000000010890'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(30 rows affected)
/* CROSS APPLY/TOP(1) */Table 'Worktable'. Scan count 60, logical reads 237, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#MyData______000000010891'. Scan count 31, logical reads 60, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Calendar____000000010890'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 3 ms.Drew
Yes, your "Windowed Aggregate (MAX)" is much faster way of doing it.
You can speed up my query a lot by adding an index:
create index IX_MyData_1 on MyData(ReportDate) include (ReportValue);
But I don't think it is as fast as your method, even though the execution plan says my method is faster. Just shows you can't trust and execution plan.
August 4, 2022 at 7:00 pm
drew.allen wrote:Here is a comparison of the two methods:
/* Windowed Aggregate (MAX) */Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#MyData______000000010891'. Scan count 1, logical reads 30, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Calendar____000000010890'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(30 rows affected)
/* CROSS APPLY/TOP(1) */Table 'Worktable'. Scan count 60, logical reads 237, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#MyData______000000010891'. Scan count 31, logical reads 60, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Calendar____000000010890'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 3 ms.Drew
Yes, your "Windowed Aggregate (MAX)" is much faster way of doing it.
You can speed up my query a lot by adding an index:
create index IX_MyData_1 on MyData(ReportDate) include (ReportValue);But I don't think it is as fast as your method, even though the execution plan says my method is faster. Just shows you can't trust and execution plan.
Did you also create an index on the Calendar table?
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 4, 2022 at 7:39 pm
Jonathan AC Roberts wrote:drew.allen wrote:Here is a comparison of the two methods:
/* Windowed Aggregate (MAX) */Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#MyData______000000010891'. Scan count 1, logical reads 30, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Calendar____000000010890'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(30 rows affected)
/* CROSS APPLY/TOP(1) */Table 'Worktable'. Scan count 60, logical reads 237, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#MyData______000000010891'. Scan count 31, logical reads 60, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Calendar____000000010890'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 3 ms.Drew
Yes, your "Windowed Aggregate (MAX)" is much faster way of doing it.
You can speed up my query a lot by adding an index:
create index IX_MyData_1 on MyData(ReportDate) include (ReportValue);But I don't think it is as fast as your method, even though the execution plan says my method is faster. Just shows you can't trust and execution plan.
Did you also create an index on the Calendar table?
Drew
No, only the one I listed on MyData
Here is the query I ran:
SELECT C.CalendarDate
, D.ReportDate
, D.ReportValue,
D2.ReportDate Lag2ReportDate,
D2.ReportValue Lag2ReportValue
FROM Calendar C
LEFT JOIN MyData D
ON C.CalendarDate = D.ReportDate
OUTER APPLY(SELECT TOP(1) *
FROM MyData D2
WHERE D2.ReportDate < C.CalendarDate
ORDER BY D2.ReportDate DESC) D2;
GO
SELECT C.CalendarDate
, D.ReportDate
, D.ReportValue
, CAST(SUBSTRING(MAX(v.val) OVER(ORDER BY C.CalendarDate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 1, 10) AS DATE) AS LagReportDate
, CAST(SUBSTRING(MAX(v.val) OVER(ORDER BY C.CalendarDate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 11, 10) AS INT) AS LagReportValue
FROM Calendar C
LEFT JOIN MyData D
ON C.CalendarDate = D.ReportDate
CROSS APPLY (VALUES(CAST(C.CalendarDate AS CHAR(10)) + CAST(D.ReportValue AS CHAR(10)))) v(val)
August 4, 2022 at 7:48 pm
I guess the method needs the previous date to be returned. Whenever I add a criteria for a specific date I don't get the previous data.
SELECT C.CalendarDate
, D.ReportDate
, D.ReportValue
, CAST(SUBSTRING(MAX(v.val) OVER(ORDER BY C.CalendarDate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 1, 10) AS DATE) AS LagReportDate
, CAST(SUBSTRING(MAX(v.val) OVER(ORDER BY C.CalendarDate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 11, 10) AS INT) AS LagReportValue
FROM #Calendar C
LEFT JOIN #MyData D
ON C.CalendarDate = D.ReportDate
CROSS APPLY (VALUES(CAST(C.CalendarDate AS CHAR(10)) + CAST(D.ReportValue AS CHAR(10)))) v(val)
WHERE CalendarDate = '2022-07-04'
ORDER BY C.CalendarDate
--This is more like the view I am tring to build
SELECT
*
FROM #Calendar C
--Multiple other let joins here
LEFT JOIN
(
SELECT C.CalendarDate
, D.ReportDate
, D.ReportValue
, CAST(SUBSTRING(MAX(v.val) OVER(ORDER BY C.CalendarDate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 1, 10) AS DATE) AS LagReportDate
, CAST(SUBSTRING(MAX(v.val) OVER(ORDER BY C.CalendarDate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 11, 10) AS INT) AS LagReportValue
FROM #Calendar C
LEFT JOIN #MyData D
ON C.CalendarDate = D.ReportDate
CROSS APPLY (VALUES(CAST(C.CalendarDate AS CHAR(10)) + CAST(D.ReportValue AS CHAR(10)))) v(val)
) L
ON C.CalendarDate = L.LagReportDate
WHERE C.CalendarDate = '2022-07-04'
August 4, 2022 at 7:50 pm
Drew showed me his wonderful code to do this same thing on another thread. I did a deep dive on it and the concept is wonderfully simple and it solved the problem with using MAX() OVER that I didn't solve in one of my other attempts.
To demonstrate how insanely quick Drew's code is, I "Modenized" the sample data. Here's the code. For those unfamiliar with "fnTally", you can get that at the similarly named link in my signature line at the bottom of this post or use one of your own to replace it (unless it has a loop or rCTE or other form of RBAR in it).
--===== Drop the Temp Tables to make reruns easier.
DROP TABLE IF EXISTS #Temp,#Result;
GO
CREATE TABLE #Temp
(
ID INT NOT NULL PRIMARY KEY
,VAL INT NULL
)
;
--===== Create the Test Data in #Temp
WITH
cteGenRareVal AS
(
SELECT ID = ABS(CRYPT_GEN_RANDOM(4)%1000000)+1 --Random ID's 1 to 1 Million
,Val = ABS(CRYPT_GEN_RANDOM(4)%1000000000)+1 --Random Val's 1 to 1 Billion
FROM dbo.fnTally(1,1000)
)
,cteGenID AS
(
SELECT ID = t.N
FROM dbo.fnTally(1,1000000)t
)
INSERT INTO #Temp WITH (TABLOCK)
(ID, VAL)
SELECT gid.ID
,rv.Val
FROM cteGenID gid
LEFT JOIN cteGenRareVal rv
ON gid.ID = rv.ID
ORDER BY gid.ID
;
--===== Solve the problem and dump the results to a Temp Table
-- to take display time out of the picture.
SET STATISTICS TIME ON
;
SELECT ID
,t.VAL
,NewVal = CAST(SUBSTRING(
MAX(CAST(ID AS BINARY(5)) + CAST(t.VAL AS BINARY(5)))
OVER(ORDER BY t.ID ROWS UNBOUNDED PRECEDING)
,6,5) AS INT)
INTO #Result
FROM #Temp AS t
ORDER BY ID
;
SET STATISTICS TIME OFF
;
GO
Here are the STATISTICS from my box... insanely fast code for what it does. Haven't tried the "Quirky Update" method for this but I'm thinking it'll be real close to a dead-heat.
(1000000 rows affected)
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 875 ms, elapsed time = 867 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.
(1000000 rows affected)
--Jeff Moden
Change is inevitable... Change for the better is not.
August 4, 2022 at 7:53 pm
Drew, thanks again for the solution. It's "Spec-hacular"!
--Jeff Moden
Change is inevitable... Change for the better is not.
August 4, 2022 at 8:03 pm
Drew showed me his wonderful code to do this same thing on another thread. I did a deep dive on it and the concept is wonderfully simple and it solved the problem with using MAX() OVER that I didn't solve in one of my other attempts.
To demonstrate how insanely quick Drew's code is, I "Modenized" the sample data. Here's the code. For those unfamiliar with "fnTally", you can get that at the similarly named link in my signature line at the bottom of this post or use one of your own to replace it (unless it has a loop or rCTE or other form of RBAR in it).
--===== Drop the Temp Tables to make reruns easier.
DROP TABLE IF EXISTS #Temp,#Result;
GO
CREATE TABLE #Temp
(
ID INT NOT NULL PRIMARY KEY
,VAL INT NULL
)
;
--===== Create the Test Data in #Temp
WITH
cteGenRareVal AS
(
SELECT ID = ABS(CRYPT_GEN_RANDOM(4)%1000000)+1 --Random ID's 1 to 1 Million
,Val = ABS(CRYPT_GEN_RANDOM(4)%1000000000)+1 --Random Val's 1 to 1 Billion
FROM dbo.fnTally(1,1000)
)
,cteGenID AS
(
SELECT ID = t.N
FROM dbo.fnTally(1,1000000)t
)
INSERT INTO #Temp WITH (TABLOCK)
(ID, VAL)
SELECT gid.ID
,rv.Val
FROM cteGenID gid
LEFT JOIN cteGenRareVal rv
ON gid.ID = rv.ID
ORDER BY gid.ID
;
--===== Solve the problem and dump the results to a Temp Table
-- to take display time out of the picture.
SET STATISTICS TIME ON
;
SELECT ID
,t.VAL
,NewVal = CAST(SUBSTRING(
MAX(CAST(ID AS BINARY(5)) + CAST(t.VAL AS BINARY(5)))
OVER(ORDER BY t.ID ROWS UNBOUNDED PRECEDING)
,6,5) AS INT)
INTO #Result
FROM #Temp AS t
ORDER BY ID
;
SET STATISTICS TIME OFF
;
GOHere are the STATISTICS from my box... insanely fast code for what it does. Haven't tried the "Quirky Update" method for this but I'm thinking it'll be real close to a dead-heat.
(1000000 rows affected)
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 875 ms, elapsed time = 867 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.
(1000000 rows affected)
I don't understand what that query is doing?
Is there another query written in a more standard way that produces the same results but is slower?
August 4, 2022 at 8:09 pm
I don't understand what that query is doing?
Is there another query written in a more standard way that produces the same results but is slower?
It creates an million row test table in the #Temp table with ID and mostly null values for the VAL column. The VAL column is randomly populated with 1000 random integer values.
The test simply creates that table and then runs Drew's code against it and dumps the results into the #Results table for verification.
And, yes... my previous solution on a different thread is one of the typical solutions for this. I'll see if I can post that URL. And, apologies... I guess I posted the test on the wrong post.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 4, 2022 at 8:15 pm
Yeah... I definitely posted to the wrong thread. Here's the thread I meant to post to. The only thing my post on this thread does is show the performance. It does NOT solve Chrissy321's problem.
https://www.sqlservercentral.com/forums/topic/query-help-299
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 46 total)
You must be logged in to reply to this topic. Login to reply