Obtaining earliest change for one colum?

  • Guys,

    I've been looking at this problem on and off for a while and I can't for the life of me find a solution. Basically we have an audit system where by every time someone changes one field in one table it appends the entire row into another table, a slight simplification but you get the idea.

    What I want to get is a historical record for one field over time - so say this was a list of individuals and it had their wealth in dollars, their job title, their address, their age etc. It may be that their wealth in dollars didn't change over the course of say 10 rows in the table but other variables did change, I'm trying to build up a picture of their wealth changing. Totally random example but I thought it might help!

    With this test data:

    SELECT * INTO #test

    FROM

    (

    SELECT 1 as Audit_ID,123 as EntID,100 as Field_A,10 as Field_B,'01-Jan-12' as ActionDate UNION

    SELECT 2 as Audit_ID,123 as EntID,110 as Field_A,10 as Field_B,'15-Feb-12' as ActionDate UNION

    SELECT 3 as Audit_ID,123 as EntID,150 as Field_A,15 as Field_B,'20-Mar-12' as ActionDate UNION

    SELECT 4 as Audit_ID,123 as EntID,100 as Field_A,10 as Field_B,'12-Jun-12' as ActionDate UNION

    SELECT 5 as Audit_ID,445 as EntID,50 as Field_A,4 as Field_B,'02-Apr-12' as ActionDate UNION

    SELECT 6 as Audit_ID,445 as EntID,60 as Field_A,3 as Field_B,'15-May-12' as ActionDate UNION

    SELECT 7 as Audit_ID,445 as EntID,30 as Field_A,3 as Field_B,'18-Aug-12' as ActionDate UNION

    SELECT 8 as Audit_ID,552 as EntID,500 as Field_A,1 as Field_B,'10-Jan-12' as ActionDate UNION

    SELECT 9 as Audit_ID,552 as EntID,450 as Field_A,3 as Field_B,'15-Mar-12' as ActionDate UNION

    SELECT 10 as Audit_ID,552 as EntID,320 as Field_A,4 as Field_B,'29-Aug-12' as ActionDate

    )x

    Say I was trying to track field B over time for Entity (EntID) 123, I'd want a results set like this:

    01-Jan-12 10

    20-Mar-12 15

    12-Jun-12 10

    I thought I could do some kind of ranking whereby I partitioned on the field I was interested on and ranked on date, and chose those with rank 1, however, this doesn't take account for the same value at different time points - e.g. using the wealth example perhaps in Jan 2010 I had $200, in Feb 2010 this went up to 350 but then it fell to $200 again in September 2010, with my ranking idea I'd just have Feb 2010 and 200, not September too.

    I thought about some form of 'double join' type thing along hte lines of Field B <> Field B, I've also got pratically no experience of them but for some reason I'm wondering if a CTE is called for?!

    Hoepfully I've explained this not too bad and my test data gives you kind of an idea!

    Thanks for reading 🙂

  • You were on the right track with the self join

    SELECT t1.ActionDate, t1.Field_B

    FROM #test t1

    LEFT JOIN #test t2 ON t1.Audit_ID = t2.Audit_ID-1

    WHERE t1.Field_B != t2.Field_B OR t2.Audit_ID IS NULL

  • Forgot the entity chunk. Yes you will need a CTE and a Row_Number function.

    [Code]

    DECLARE @EntityID INT = 123;

    WITH cte

    AS (

    SELECT row_Number() OVER (ORDER BY Audit_ID) AS RN

    ,t.Field_B

    ,t.ActionDate

    FROM #test t

    WHERE @EntityID = t.EntID

    )

    SELECT t1.ActionDate

    ,t1.Field_B

    FROM cte t1

    LEFT JOIN cte t2 ON t1.RN = t2.RN - 1

    WHERE t1.Field_B != t2.Field_B

    OR t2.RN IS NULL

    [/code]

  • Here is another option:

    SELECT

    Audit_ID,

    EntID,

    Field_A,

    Field_B,

    ActionDate,

    rn3 = ROW_NUMBER() OVER (PARTITION BY EntID ORDER BY ActionDate) - ROW_NUMBER() OVER (PARTITION BY EntID, Field_B ORDER BY ActionDate)

    INTO #test

    FROM

    (

    SELECT 1 as Audit_ID, 123 as EntID, 100 as Field_A, 10 as Field_B, CAST('01-Jan-12' AS DATETIME) as ActionDate UNION

    SELECT 2 as Audit_ID, 123 as EntID, 110 as Field_A, 10 as Field_B, CAST('15-Feb-12' AS DATETIME) as ActionDate UNION

    SELECT 3 as Audit_ID, 123 as EntID, 150 as Field_A, 15 as Field_B, CAST('20-Mar-12' AS DATETIME) as ActionDate UNION

    SELECT 4 as Audit_ID, 123 as EntID, 100 as Field_A, 10 as Field_B, CAST('12-Jun-12' AS DATETIME) as ActionDate UNION

    SELECT 5 as Audit_ID, 445 as EntID, 50 as Field_A, 4 as Field_B, CAST('02-Apr-12' AS DATETIME) as ActionDate UNION

    SELECT 6 as Audit_ID, 445 as EntID, 60 as Field_A, 3 as Field_B, CAST('15-May-12' AS DATETIME) as ActionDate UNION

    SELECT 7 as Audit_ID, 445 as EntID, 30 as Field_A, 3 as Field_B, CAST('18-Aug-12' AS DATETIME) as ActionDate UNION

    SELECT 8 as Audit_ID, 552 as EntID, 500 as Field_A, 1 as Field_B, CAST('10-Jan-12' AS DATETIME) as ActionDate UNION

    SELECT 9 as Audit_ID, 552 as EntID, 450 as Field_A, 3 as Field_B, CAST('15-Mar-12' AS DATETIME) as ActionDate UNION

    SELECT 10 as Audit_ID, 552 as EntID, 320 as Field_A, 4 as Field_B, CAST('29-Aug-12' AS DATETIME) as ActionDate

    )x

    GO

    WITH BaseData AS (

    SELECT

    Audit_ID,

    EntID,

    Field_A,

    Field_B,

    ActionDate,

    rn3 = ROW_NUMBER() OVER (PARTITION BY EntID ORDER BY ActionDate) - ROW_NUMBER() OVER (PARTITION BY EntID, Field_B ORDER BY ActionDate)

    FROM

    #test

    ), IntermediateData AS (

    SELECT

    *,

    rn = ROW_NUMBER() OVER (PARTITION BY EntID, rn3 ORDER BY ActionDate)

    FROM

    BaseData

    )

    SELECT

    *

    FROM

    IntermediateData

    WHERE

    rn = 1

    ORDER BY

    EntID,

    ActionDate;

    GO

    DROP TABLE #test;

    GO

  • Thanks for the responses guys, Lynn, I went with your method in the end, it's exactly what I was looking for!

    I'm glad I was on the right track knowing some form of ranking and or self joins and CTEs were called for, I just couldn't quite get it down. When I get some time (who know's when that'll be!) i really need to get stuck into some CTE testing to see what they can do.

    Thanks again 🙂

  • Here's an alternative with no CTE or ranking function:

    SELECT

    s.*,

    y.ReasonToKeep

    FROM Sampledata s -- CI scan

    OUTER APPLY ( -- CI seek possible

    SELECT TOP 1 *

    FROM Sampledata i

    WHERE i.EntID = s.EntID

    AND i.ActionDate < s.ActionDate

    ORDER BY i.EntID, i.ActionDate DESC) x

    CROSS APPLY ( -- compute scalar

    SELECT

    ReasonToKeep = CASE

    WHEN x.Field_B IS NULL THEN 'FIRST'

    WHEN x.Field_B <> s.Field_B THEN 'CHANGED'

    END

    ) y

    WHERE y.ReasonToKeep IS NOT NULL

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • The CTE and NoCTE versions are comparable on a small data set. I'd like to try them both on a million row data set and see how they compare.

    Perhaps this evening I will have a chance to set it up.

  • Lynn Pettis (4/12/2013)


    The CTE and NoCTE versions are comparable on a small data set. I'd like to try them both on a million row data set and see how they compare.

    Perhaps this evening I will have a chance to set it up.

    Heh if it's a race Lynn, I reserve the right to change shoes!!

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I'd like to have time to try the above non CTE approach, time doesn't allow at the moment but hopefully at some point.

    FYI the #test replica on my 'real' data yields about 600,000 rows so it's not an insignificant data set, however, it's probably quite a rare request (well, I've not had to dig into these tables for this info ever and we've had them on the go since 2008!)

  • Rob-350472 (4/12/2013)


    I'd like to have time to try the above non CTE approach, time doesn't allow at the moment but hopefully at some point.

    FYI the #test replica on my 'real' data yields about 600,000 rows so it's not an insignificant data set, however, it's probably quite a rare request (well, I've not had to dig into these tables for this info ever and we've had them on the go since 2008!)

    I know from recent experiments with similar Gaps'n'Islands solutions that the query I posted will work well providing there's an index to support seeks where shown - preferably unique. It performs best when the number of rows returned is significantly less than the total number of rows in the table and works well over many millions of rows.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply