Find Changed Rows (and the row prior to the changes)

  • Thank you for reading. I have a table (represented by #Events) that holds modifications made to another table. I do have some control over the table structure and indexing. I want to pull all of the change records that were made between two dates.

    The tricky part is to include the previous version of each record, which will usually be found prior to the start date in question.

    The code that I have provided below works. So you can use it to easily see what should be returned. But it's very slow in production.

    Can you think of a better method to pull this data together?

    Thank you in advance.

    -- Production version of this table has 4.5 million rows (roughly 1,000 rows per day)

    -- Primary key is on L4Ident (clustered)

    -- nonclustered index on ProcessDate, LinkRL4

    DROP TABLE dbo.#Events;

    DROP TABLE dbo.#Results;

    CREATE TABLE dbo.#Events (

    L4Ident int IDENTITY(1,1) NOT NULL,

    LinkRL4 int NOT NULL,

    ProcessDate date NOT NULL,

    UserName char(10) NOT NULL,

    SequenceNumber int NOT NULL,

    JobDescription char(10) NOT NULL,

    EntryType char(7) NOT NULL,

    ChangingField char(10) NOT NULL);

    CREATE TABLE dbo.#Results (

    ResultsID int IDENTITY(1,1) NOT NULL,

    L4Ident int NOT NULL,

    LinkRL4 int NOT NULL,

    ProcessDate date NOT NULL,

    UserName char(10) NOT NULL,

    SequenceNumber int NOT NULL,

    JobDescription char(10) NOT NULL,

    EntryType char(7) NOT NULL,

    ChangingField char(10) NOT NULL,

    RowNumber smallint NOT NULL);

    INSERT dbo.#Events (LinkRL4, ProcessDate, UserName, SequenceNumber, JobDescription, EntryType, ChangingField)

    VALUES (111, '2014-06-01', 'Michael', 1, 'Daily', 'INSERT', 'GM'),

    (111, '2014-06-09', 'Michael', 12, 'Daily', 'UPDATE', 'GMC'),

    (111, '2014-06-15', 'Rachel', 19, 'Daily', 'UPDATE', 'GMotors'),

    (111, '2014-07-01', 'Rachel', 24, 'Daily', 'UPDATE', 'Motors'),

    (111, '2014-07-05', 'Rachel', 39, 'Daily', 'UPDATE', 'GenMotors');

    INSERT dbo.#Events (LinkRL4, ProcessDate, UserName, SequenceNumber, JobDescription, EntryType, ChangingField)

    VALUES (112, '2014-06-04', 'Thomas', 6, 'Daily', 'INSERT', 'MB'),

    (112, '2014-06-07', 'Thomas', 10, 'Daily', 'UPDATE', 'Mercedes'),

    (112, '2014-06-15', 'Thomas', 20, 'Daily', 'UPDATE', 'MBenz'),

    (112, '2014-07-03', 'Michael', 27, 'Daily', 'UPDATE', 'Benz'),

    (112, '2014-07-05', 'Rachel', 40, 'Daily', 'UPDATE', 'Mer Benz');

    INSERT dbo.#Events (LinkRL4, ProcessDate, UserName, SequenceNumber, JobDescription, EntryType, ChangingField)

    VALUES (113, '2014-07-06', 'Michael', 55, 'Daily', 'INSERT', 'BMW');

    TRUNCATE TABLE dbo.#Results;

    DECLARE @StartDate date = '2014-06-12',

    @EndDate date = '2014-07-06';

    DECLARE @L4Ident int;

    DECLARE @CurrentLinkRL4 int,

    @MaxLinkRL4 int;

    SELECT @CurrentLinkRL4 = MIN(LinkRL4),

    @MaxLinkRL4 = MAX(LinkRL4)

    FROM dbo.#Events

    WHERE ProcessDate BETWEEN @StartDate AND @EndDate;

    WHILE ((@CurrentLinkRL4 <= @MaxLinkRL4) AND (@CurrentLinkRL4 IS NOT NULL))

    BEGIN

    -- insert all records that match within the date range

    INSERT dbo.#Results (

    L4Ident,

    LinkRL4,

    ProcessDate,

    UserName,

    SequenceNumber,

    JobDescription,

    EntryType,

    ChangingField,

    RowNumber)

    SELECT L4Ident,

    LinkRL4,

    ProcessDate,

    UserName,

    SequenceNumber,

    JobDescription,

    EntryType,

    ChangingField,

    RowNumber = RANK() OVER (PARTITION BY LinkRL4 ORDER BY L4Ident)

    FROM dbo.#Events

    WHERE LinkRL4 = @CurrentLinkRL4

    AND ProcessDate BETWEEN @StartDate AND @EndDate;

    -- find the record immediately prior to the date range

    SELECT @L4Ident = MIN(L4Ident)

    FROM dbo.#Results

    WHERE LinkRL4 = @CurrentLinkRL4;

    SELECT @L4Ident = MAX(L4Ident)

    FROM dbo.#Events

    WHERE LinkRL4 = @CurrentLinkRL4

    AND L4Ident < @L4Ident;

    -- add one record immediately prior to the date range

    INSERT dbo.#Results (

    L4Ident,

    LinkRL4,

    ProcessDate,

    UserName,

    SequenceNumber,

    JobDescription,

    EntryType,

    ChangingField,

    RowNumber)

    SELECT L4Ident,

    LinkRL4,

    ProcessDate,

    UserName,

    SequenceNumber,

    JobDescription,

    EntryType,

    ChangingField,

    RowNumber = 0

    FROM dbo.#Events

    WHERE L4Ident = @L4Ident;

    -- increment loop counter

    SELECT @CurrentLinkRL4 = MIN(LinkRL4)

    FROM dbo.#Events

    WHERE ProcessDate BETWEEN @StartDate AND @EndDate

    AND LinkRL4 > @CurrentLinkRL4;

    END

    SELECT *

    FROM dbo.#Results

    ORDER BY LinkRL4, ProcessDate, RowNumber

  • You could use a single query like below. I have not tested it against a 4.5 million row input set, but it's probably faster then the loop and all the separate queries you are using right now. I recommend to include the column LinkRL4 to the non-clustered index on ProcessDate. If the zero-offset row numbering in case of a previous row outside the data range is not important, remove the CASE expression.

    SELECT

    evt.*,

    RowNumber = ROW_NUMBER() OVER (PARTITION BY evt.LinkRL4 ORDER BY evt.L4Ident) -

    CASE WHEN MIN(evt.ProcessDate) OVER (PARTITION BY evt.LinkRL4) < @StartDate THEN 1 ELSE 0 END

    FROM

    dbo.#Events evt

    WHERE

    evt.ProcessDate <= @EndDate

    AND NOT EXISTS

    (

    SELECT

    *

    FROM

    dbo.#Events evt2

    WHERE

    evt2.LinkRL4 = evt.LinkRL4

    AND evt2.ProcessDate > evt.ProcessDate

    AND evt2.ProcessDate < @StartDate

    )

    ORDER BY

    evt.LinkRL4, evt.ProcessDate, evt.L4Ident

  • Peter, I think you're on right track, but there is a flaw. The initial sample data does not show it, but when used with date range of 6/12 - 7/01 this data does:

    INSERTdbo.#Events (LinkRL4, ProcessDate, UserName, SequenceNumber, JobDescription, EntryType, ChangingField)

    VALUES (111, '2014-06-01', 'Michael', 1, 'Daily', 'INSERT', 'GM'),

    (111, '2014-06-09', 'Michael', 12, 'Daily', 'UPDATE', 'GMC'),

    (111, '2014-06-15', 'Rachel', 19, 'Daily', 'UPDATE', 'GMotors'),

    (111, '2014-07-01', 'Rachel', 24, 'Daily', 'UPDATE', 'Motors'),

    (111, '2014-07-05', 'Rachel', 39, 'Daily', 'UPDATE', 'GenMotors');

    INSERT dbo.#Events (LinkRL4, ProcessDate, UserName, SequenceNumber, JobDescription, EntryType, ChangingField)

    VALUES (112, '2014-06-04', 'Thomas', 6, 'Daily', 'INSERT', 'MB'),

    (112, '2014-06-07', 'Thomas', 10, 'Daily', 'UPDATE', 'Mercedes'),

    (112, '2014-06-15', 'Thomas', 20, 'Daily', 'UPDATE', 'MBenz'),

    (112, '2014-07-03', 'Michael', 27, 'Daily', 'UPDATE', 'Benz'),

    (112, '2014-07-05', 'Rachel', 40, 'Daily', 'UPDATE', 'Mer Benz');

    INSERT dbo.#Events (LinkRL4, ProcessDate, UserName, SequenceNumber, JobDescription, EntryType, ChangingField)

    VALUES (113, '2014-07-06', 'Michael', 55, 'Daily', 'INSERT', 'BMW');

    INSERT dbo.#Events (LinkRL4, ProcessDate, UserName, SequenceNumber, JobDescription, EntryType, ChangingField)

    VALUES (114, '2014-06-01', 'Steve', 4, 'Daily', 'INSERT', 'VW'),

    (114, '2014-06-02', 'Steve', 60, 'Daily', 'UPDATE', 'Volks');

    INSERT dbo.#Events (LinkRL4, ProcessDate, UserName, SequenceNumber, JobDescription, EntryType, ChangingField)

    VALUES (115, '2014-06-01', 'Stan', 2, 'Daily', 'INSERT', 'BMW'),

    (115, '2014-06-02', 'Stan', 3, 'Daily', 'UPDATE', 'Beamer'),

    (115, '2014-08-02', 'Stan', 80, 'Daily', 'UPDATE', 'BMW');

    INSERT dbo.#Events (LinkRL4, ProcessDate, UserName, SequenceNumber, JobDescription, EntryType, ChangingField)

    VALUES (116, '2014-06-25', 'Lee', 32, 'Daily', 'INSERT', 'BMW'),

    (117, '2014-08-02', 'Lee', 81, 'Daily', 'UPDATE', 'BMW');

    GO

  • I think this works:

    --first part gets events within the date range

    SELECTevt.*

    FROM dbo.#Events evt

    WHERE evt.ProcessDate BETWEEN @StartDate AND @EndDate

    --second part gets related events that occurred before

    UNION ALL

    SELECTPrior.*

    FROM (SELECT LinkRL4, MIN(ProcessDate) AS ProcessDate FROM dbo.#Events evt WHERE ProcessDate BETWEEN @StartDate AND @EndDate GROUP BY LinkRL4) AS MinDate

    CROSS APPLY (

    SELECT TOP 1 evt.*

    FROM dbo.#Events evt

    WHERE evt.LinkRL4 = MinDate.LinkRL4

    AND evt.ProcessDate < MinDate.ProcessDate

    ORDER BY evt.ProcessDate DESC) as Prior

    ORDER BY LinkRL4, ProcessDate, L4Ident

  • I'd also create the best indexes while re-working the code.

    #Events indexes:

    1) Unique Clustered on ( ProcessDate, L4Ident ) --in that order

    2) Nonclustered on ( L4Ident )

    3) NNonclustered on ( LinkRL4, L4Ident ) Include ( ProcessDate ) --yes, processdate would be auto-included anyway, but I like to explicitly specify it anyway, in case the clus index changes later.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply