February 13, 2017 at 7:16 am
Hi all, I can't think of a fast way of doing this. I would like a query which updates a date column whenever status becomes T, and for all future rows, for that Id (by Date1, ascending, in the code which follows). The complication is that if there is a later row with status T, that date should be used for all subsequent rows, unless there is another T (in which case, use that row's date) and so on.
Here is some setup code:
IF OBJECT_ID('tempdb..#SomeTab', 'U') IS NOT NULL
  DROP TABLE #SomeTab;
CREATE TABLE #SomeTab
(
  SomeId INT 
, Date1 DATETIME
, Stat CHAR(1)
, Date2 DATETIME
);
INSERT #SomeTab
(
  SomeId
, Date1
, Stat
, Date2
)
VALUES
(
  1
  ,'20150101'
  ,'A'
  ,NULL
)
,(
  1
,  '20150201'
,  'A'
,  NULL
 )
,(
  1
,  '20150301'
,  'T'
,  NULL
 )
,(
  1
,  '20150401'
,  'A'
,  NULL
 )
,(
  1
,  '20150501'
,  'A'
,  NULL
 )
,(
  1
,  '20150601'
,  'T'
,  NULL
 )
,(
  1
,  '20150701'
,  'A'
,  NULL
 );
SELECT *
FROM #SomeTab st
ORDER BY st.SomeId, st.Date1
And here is how I would like the results to look:
IF OBJECT_ID('tempdb..#Results', 'U') IS NOT NULL
  DROP TABLE #Results;
CREATE TABLE #Results
(
  SomeId INT 
, Date1 DATETIME
, Stat CHAR(1)
, Date2 DATETIME
);
INSERT #Results
(
  SomeId
, Date1
, Stat
, Date2
)
VALUES
(
  1
  ,'20150101'
  ,'A'
  ,NULL
)
,(
  1
,  '20150201'
,  'A'
,  NULL
 )
,(
  1
,  '20150301'
,  'T'
, '20150301'
 )
,(
  1
,  '20150401'
,  'A'
,  '20150301'
 )
,(
  1
,  '20150501'
,  'A'
,  '20150301'
 )
,(
  1
,  '20150601'
,  'T'
,  '20150601'
 )
,(
  1
,  '20150701'
,  'A'
,  '20150601'
 );
 SELECT * FROM #Results r
 ORDER BY r.SomeId, r.Date1
--Edit: The datetimes should be dates in the code above.
February 13, 2017 at 8:02 am
This is similar to a problem I had last week. The TOP 1 option was the answer when looking for a "sliding" result.
Don't know if this fulfills your requirement that it be fast, but it does work. And it may point you in a new direction that may lead to fast.
with cte as
(
 select SomeID, Date1, Stat 
  from #someTab
  where Stat = 'T'
  
)
 Update st SET
 st.Date2 = (Select top 1 c.Date1
      from cte c
      where st.Date1 >= c.Date1
      and st.someID = c.someID
      order by c.Date1 desc
     )
 from #someTab st;
select * from #someTab;
__________________________________________________________________________________________________________
How to Post to get the most: http://www.sqlservercentral.com/articles/Best+Practices/61537/
February 13, 2017 at 8:41 am
This doesn't use a correlated subquery, so it should perform better, especially if you have an index on Date1. Parts of it are shamelessly borrowed from this article.
WITH RangeStarts AS (
    SELECT
        SomeId
    ,    Date1
    ,    Stat
    ,    CASE
            WHEN Stat = 'A' THEN NULL 
            ELSE ROW_NUMBER() OVER (PARTITION BY SomeId, Stat ORDER BY Date1)
        END AS PtnNo
    FROM #SomeTab
    )
, Partitioned AS (
    SELECT
        SomeId
    ,    Date1
    ,    Stat
    ,    COUNT(PtnNo) OVER (PARTITION BY SomeId ORDER BY Date1 ROWS UNBOUNDED PRECEDING) AS PtnNo
    FROM RangeStarts
    )
SELECT
    SomeId
,    Date1
,    Stat
,    CASE
        WHEN PtnNo = 0 THEN NULL
        ELSE FIRST_VALUE(Date1) OVER (PARTITION BY SomeId, PtnNo ORDER BY Date1)
    END AS Date2
FROM Partitioned
ORDER BY Date1
John
February 13, 2017 at 9:27 am
Yet another option.
WITH OnlyTs AS(
  SELECT Date1, LEAD(Date1, 1, '9999') OVER(ORDER BY Date1) NextDate
  FROM #SomeTab
  WHERE Stat = 'T'
)
--SELECT *
UPDATE st SET
  Date2 = t.Date1
FROM #SomeTab st
JOIN OnlyTs t ON st.Date1 >= t.Date1 AND st.Date1 < t.NextDate;
February 13, 2017 at 12:00 pm
Thanks very much to those who responded, I appreciate the help.
I've ended up with a somewhat more complex version of Luis' query, which performed the best on my data. My version was more complex because I forgot to mention the special case where multiple rows (in Date1 order) have status 'T'. In these cases, I need to set the date to the date of the first occurrence of the status T cluster. I did this by introducing another CTE. The final version of my query looks like this:
WITH BaseData
AS
(
  SELECT
   r.PolicyNumber
  ,  r.ValuationDate
  ,  PrevStatus = LAG(r.PolicyStatus, 1, '_') OVER (PARTITION BY r.PolicyNumber
                   ORDER BY r.ValuationDate
                   )
  ,  r.PolicyStatus
  FROM #Results r
)
,  OnlyTs
AS
(
  SELECT
   bd.PolicyNumber
  ,  bd.ValuationDate
  ,  NextDate = LEAD(bd.ValuationDate, 1, '9999') OVER (PARTITION BY bd.PolicyNumber
                    ORDER BY bd.ValuationDate
                    )
  FROM BaseData bd
  WHERE
   bd.PolicyStatus = 'T' AND
   bd.PrevStatus <> 'T'
)
UPDATE r
SET  r.DecrementDate = t.ValuationDate
FROM
   #Results r
JOIN OnlyTs t ON r.ValuationDate >= t.ValuationDate AND
       r.ValuationDate < t.NextDate AND
       r.PolicyNumber = t.PolicyNumber;
This updates approximately 1 million rows of a 6.5 million row temp table in 50 seconds or so.
February 13, 2017 at 4:14 pm
The problem with that solution is that it requires a JOIN which is expensive. You can do it without a JOIN.
;
WITH T_Dates AS
(
    SELECT *,
        CASE
            WHEN LAG(st.PolicyStatus) OVER(PARTITION BY st.PolicyNumber ORDER BY st.ValuationDate) = 'T' THEN NULL
            WHEN st.PolicyStatus = 'T' THEN st.ValuationDate
        END AS t_date
    FROM #SomeTab st
)
SELECT PolicyNumber, ValuationDate, PolicyStatus, MAX(t_date) OVER(PARTITION BY PolicyNumber ORDER BY ValuationDate ROWS UNBOUNDED PRECEDING) AS DecrementDate
FROM T_Dates
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
February 13, 2017 at 4:47 pm
drew.allen - Monday, February 13, 2017 4:14 PMThe problem with that solution is that it requires a JOIN which is expensive. You can do it without a JOIN.
;
WITH T_Dates AS
(
SELECT *,
CASE
WHEN LAG(st.PolicyStatus) OVER(PARTITION BY st.PolicyNumber ORDER BY st.ValuationDate) = 'T' THEN NULL
WHEN st.PolicyStatus = 'T' THEN st.ValuationDate
END AS t_date
FROM #SomeTab st
)
SELECT PolicyNumber, ValuationDate, PolicyStatus, MAX(t_date) OVER(PARTITION BY PolicyNumber ORDER BY ValuationDate ROWS UNBOUNDED PRECEDING) AS DecrementDate
FROM T_Dates
Looks like I have more testing to do tomorrow, thanks, Drew!
February 14, 2017 at 7:13 am
Phil Parkin - Monday, February 13, 2017 4:47 PMdrew.allen - Monday, February 13, 2017 4:14 PMThe problem with that solution is that it requires a JOIN which is expensive. You can do it without a JOIN.
;
WITH T_Dates AS
(
SELECT *,
CASE
WHEN LAG(st.PolicyStatus) OVER(PARTITION BY st.PolicyNumber ORDER BY st.ValuationDate) = 'T' THEN NULL
WHEN st.PolicyStatus = 'T' THEN st.ValuationDate
END AS t_date
FROM #SomeTab st
)
SELECT PolicyNumber, ValuationDate, PolicyStatus, MAX(t_date) OVER(PARTITION BY PolicyNumber ORDER BY ValuationDate ROWS UNBOUNDED PRECEDING) AS DecrementDate
FROM T_DatesLooks like I have more testing to do tomorrow, thanks, Drew!
While the SELECT version of this query is appealingly succinct, the UPDATE version is less so. At least, my UPDATE version is:
WITH T_Dates
AS
(
  SELECT
   r.PolicyNumber
  ,  r.ValuationDate
  ,  r.PolicyStatus
  ,  t_date = CASE
        WHEN LAG(r.PolicyStatus) OVER (PARTITION BY r.PolicyNumber
                  ORDER BY r.ValuationDate
                  ) = 'T' THEN
         NULL
        WHEN r.PolicyStatus = 'T' THEN
         r.ValuationDate
      END
  FROM #Results r
)
,  DecDates
AS
(
  SELECT
   T_Dates.PolicyNumber
  ,  T_Dates.ValuationDate
  ,  T_Dates.PolicyStatus
  ,  DecrementDate = MAX(T_Dates.t_date) OVER (PARTITION BY T_Dates.PolicyNumber
                 ORDER BY T_Dates.ValuationDate
                 ROWS UNBOUNDED PRECEDING
                 )
  FROM T_Dates
)
UPDATE r
SET  r.DecrementDate = DecDates.DecrementDate
FROM
   #Results r
JOIN DecDates ON r.PolicyNumber = DecDates.PolicyNumber AND
       r.ValuationDate = DecDates.ValuationDate;
In performance terms, this is roughly the same as the solution I posted earlier, despite having a cleaner-looking execution plan.
February 14, 2017 at 7:42 am
Phil Parkin - Tuesday, February 14, 2017 7:13 AMWhile the SELECT version of this query is appealingly succinct, the UPDATE version is less so. At least, my UPDATE version is:
WITH T_Dates
AS
(
SELECT
r.PolicyNumber
, r.ValuationDate
, r.PolicyStatus
, t_date = CASE
WHEN LAG(r.PolicyStatus) OVER (PARTITION BY r.PolicyNumber
ORDER BY r.ValuationDate
) = 'T' THEN
NULL
WHEN r.PolicyStatus = 'T' THEN
r.ValuationDate
END
FROM #Results r
)
, DecDates
AS
(
SELECT
T_Dates.PolicyNumber
, T_Dates.ValuationDate
, T_Dates.PolicyStatus
, DecrementDate = MAX(T_Dates.t_date) OVER (PARTITION BY T_Dates.PolicyNumber
ORDER BY T_Dates.ValuationDate
ROWS UNBOUNDED PRECEDING
)
FROM T_Dates
)
UPDATE r
SET r.DecrementDate = DecDates.DecrementDate
FROM
#Results r
JOIN DecDates ON r.PolicyNumber = DecDates.PolicyNumber AND
r.ValuationDate = DecDates.ValuationDate;
In performance terms, this is roughly the same as the solution I posted earlier, despite having a cleaner-looking execution plan.
You can update through a CTE, you don't need the join back to the table at the end.  You're using different columns names than in the test code you posted, so I'm not certain this will execute, however I have used this approach many times.  Build a CTE that returns the field you want to update and a field with the new value, then just simply execute the update on the cte.  Just make sure you are only affecting columns from a single source table. WITH T_Dates
AS
(
  SELECT
   r.PolicyNumber
  ,  r.ValuationDate
  ,  r.PolicyStatus
  , r.DecrementDate
  ,  t_date = CASE
        WHEN LAG(r.PolicyStatus) OVER (PARTITION BY r.PolicyNumber
                   ORDER BY r.ValuationDate
                  ) = 'T' THEN
          NULL
        WHEN r.PolicyStatus = 'T' THEN
          r.ValuationDate
      END
  FROM #Results r
)
,  DecDates
AS
(
  SELECT
   T_Dates.PolicyNumber
  ,  T_Dates.ValuationDate
  ,  T_Dates.PolicyStatus
  , T_Dates.DecrementDate
  ,  NewDecrementDate = MAX(T_Dates.t_date) OVER (PARTITION BY T_Dates.PolicyNumber
                 ORDER BY T_Dates.ValuationDate
                 ROWS UNBOUNDED PRECEDING
                  )
  FROM T_Dates
)
UPDATE DecDates
SET  DecrementDate = NewDecrementDate
Wes
(A solid design is always preferable to a creative workaround)
February 14, 2017 at 8:00 am
whenriksen - Tuesday, February 14, 2017 7:42 AMYou can update through a CTE, you don't need the join back to the table at the end. You're using different columns names than in the test code you posted, so I'm not certain this will execute, however I have used this approach many times. Build a CTE that returns the field you want to update and a field with the new value, then just simply execute the update on the cte. Just make sure you are only affecting columns from a single source table.
WITH T_Dates
AS
(
SELECT
r.PolicyNumber
, r.ValuationDate
, r.PolicyStatus
, r.DecrementDate
, t_date = CASE
WHEN LAG(r.PolicyStatus) OVER (PARTITION BY r.PolicyNumber
ORDER BY r.ValuationDate
) = 'T' THEN
NULL
WHEN r.PolicyStatus = 'T' THEN
r.ValuationDate
END
FROM #Results r
)
, DecDates
AS
(
SELECT
T_Dates.PolicyNumber
, T_Dates.ValuationDate
, T_Dates.PolicyStatus
, T_Dates.DecrementDate
, NewDecrementDate = MAX(T_Dates.t_date) OVER (PARTITION BY T_Dates.PolicyNumber
ORDER BY T_Dates.ValuationDate
ROWS UNBOUNDED PRECEDING
)
FROM T_Dates
)
UPDATE DecDates
SET DecrementDate = NewDecrementDate
This version works well & produces a nice clean non-parallel plan. Thanks for helping (and reminding me about updateable CTEs).
February 14, 2017 at 9:12 am
So how long does the update through the CTE take for 1 million out of the 6.5 million rows now?
--Jeff Moden
Change is inevitable... Change for the better is not.
February 14, 2017 at 9:44 am
Jeff Moden - Tuesday, February 14, 2017 9:12 AMSo how long does the update through the CTE take for 1 million out of the 6.5 million rows now?
I might know where this is going.
February 14, 2017 at 10:01 am
Jeff Moden - Tuesday, February 14, 2017 9:12 AMSo how long does the update through the CTE take for 1 million out of the 6.5 million rows now?
Takes around 38 seconds.
Out of interest, if I add a WHERE DecDates.DecrementDate <> DecDates.NewDecrementDate filter to the update, it takes around 12 seconds on second and subsequent iterations.
Viewing 13 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply