Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Find Changed Rows (and the row prior to the changes) Expand / Collapse
Author
Message
Posted Tuesday, July 22, 2014 6:22 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, July 28, 2014 10:27 AM
Points: 5, Visits: 37
Thank you for reading. I have a table (represented by #Events) that holds modifications made to another table. I do have some control over the table structure and indexing. I want to pull all of the change records that were made between two dates.

The tricky part is to include the previous version of each record, which will usually be found prior to the start date in question.

The code that I have provided below works. So you can use it to easily see what should be returned. But it's very slow in production.

Can you think of a better method to pull this data together?

Thank you in advance.



-- Production version of this table has 4.5 million rows (roughly 1,000 rows per day)
-- Primary key is on L4Ident (clustered)
-- nonclustered index on ProcessDate, LinkRL4
DROP TABLE dbo.#Events;
DROP TABLE dbo.#Results;

CREATE TABLE dbo.#Events (
L4Ident int IDENTITY(1,1) NOT NULL,
LinkRL4 int NOT NULL,
ProcessDate date NOT NULL,
UserName char(10) NOT NULL,
SequenceNumber int NOT NULL,
JobDescription char(10) NOT NULL,
EntryType char(7) NOT NULL,
ChangingField char(10) NOT NULL);

CREATE TABLE dbo.#Results (
ResultsID int IDENTITY(1,1) NOT NULL,
L4Ident int NOT NULL,
LinkRL4 int NOT NULL,
ProcessDate date NOT NULL,
UserName char(10) NOT NULL,
SequenceNumber int NOT NULL,
JobDescription char(10) NOT NULL,
EntryType char(7) NOT NULL,
ChangingField char(10) NOT NULL,
RowNumber smallint NOT NULL);

INSERT dbo.#Events (LinkRL4, ProcessDate, UserName, SequenceNumber, JobDescription, EntryType, ChangingField)
VALUES (111, '2014-06-01', 'Michael', 1, 'Daily', 'INSERT', 'GM'),
(111, '2014-06-09', 'Michael', 12, 'Daily', 'UPDATE', 'GMC'),
(111, '2014-06-15', 'Rachel', 19, 'Daily', 'UPDATE', 'GMotors'),
(111, '2014-07-01', 'Rachel', 24, 'Daily', 'UPDATE', 'Motors'),
(111, '2014-07-05', 'Rachel', 39, 'Daily', 'UPDATE', 'GenMotors');

INSERT dbo.#Events (LinkRL4, ProcessDate, UserName, SequenceNumber, JobDescription, EntryType, ChangingField)
VALUES (112, '2014-06-04', 'Thomas', 6, 'Daily', 'INSERT', 'MB'),
(112, '2014-06-07', 'Thomas', 10, 'Daily', 'UPDATE', 'Mercedes'),
(112, '2014-06-15', 'Thomas', 20, 'Daily', 'UPDATE', 'MBenz'),
(112, '2014-07-03', 'Michael', 27, 'Daily', 'UPDATE', 'Benz'),
(112, '2014-07-05', 'Rachel', 40, 'Daily', 'UPDATE', 'Mer Benz');

INSERT dbo.#Events (LinkRL4, ProcessDate, UserName, SequenceNumber, JobDescription, EntryType, ChangingField)
VALUES (113, '2014-07-06', 'Michael', 55, 'Daily', 'INSERT', 'BMW');




TRUNCATE TABLE dbo.#Results;


DECLARE @StartDate date = '2014-06-12',
@EndDate date = '2014-07-06';

DECLARE @L4Ident int;

DECLARE @CurrentLinkRL4 int,
@MaxLinkRL4 int;

SELECT @CurrentLinkRL4 = MIN(LinkRL4),
@MaxLinkRL4 = MAX(LinkRL4)
FROM dbo.#Events
WHERE ProcessDate BETWEEN @StartDate AND @EndDate;

WHILE ((@CurrentLinkRL4 <= @MaxLinkRL4) AND (@CurrentLinkRL4 IS NOT NULL))
BEGIN

-- insert all records that match within the date range
INSERT dbo.#Results (
L4Ident,
LinkRL4,
ProcessDate,
UserName,
SequenceNumber,
JobDescription,
EntryType,
ChangingField,
RowNumber)
SELECT L4Ident,
LinkRL4,
ProcessDate,
UserName,
SequenceNumber,
JobDescription,
EntryType,
ChangingField,
RowNumber = RANK() OVER (PARTITION BY LinkRL4 ORDER BY L4Ident)
FROM dbo.#Events
WHERE LinkRL4 = @CurrentLinkRL4
AND ProcessDate BETWEEN @StartDate AND @EndDate;

-- find the record immediately prior to the date range
SELECT @L4Ident = MIN(L4Ident)
FROM dbo.#Results
WHERE LinkRL4 = @CurrentLinkRL4;

SELECT @L4Ident = MAX(L4Ident)
FROM dbo.#Events
WHERE LinkRL4 = @CurrentLinkRL4
AND L4Ident < @L4Ident;

-- add one record immediately prior to the date range
INSERT dbo.#Results (
L4Ident,
LinkRL4,
ProcessDate,
UserName,
SequenceNumber,
JobDescription,
EntryType,
ChangingField,
RowNumber)
SELECT L4Ident,
LinkRL4,
ProcessDate,
UserName,
SequenceNumber,
JobDescription,
EntryType,
ChangingField,
RowNumber = 0
FROM dbo.#Events
WHERE L4Ident = @L4Ident;

-- increment loop counter
SELECT @CurrentLinkRL4 = MIN(LinkRL4)
FROM dbo.#Events
WHERE ProcessDate BETWEEN @StartDate AND @EndDate
AND LinkRL4 > @CurrentLinkRL4;
END


SELECT *
FROM dbo.#Results
ORDER BY LinkRL4, ProcessDate, RowNumber

Post #1595318
Posted Wednesday, July 23, 2014 4:48 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Thursday, October 9, 2014 9:46 AM
Points: 1,598, Visits: 6,646
You could use a single query like below. I have not tested it against a 4.5 million row input set, but it's probably faster then the loop and all the separate queries you are using right now. I recommend to include the column LinkRL4 to the non-clustered index on ProcessDate. If the zero-offset row numbering in case of a previous row outside the data range is not important, remove the CASE expression.

SELECT
evt.*,
RowNumber = ROW_NUMBER() OVER (PARTITION BY evt.LinkRL4 ORDER BY evt.L4Ident) -
CASE WHEN MIN(evt.ProcessDate) OVER (PARTITION BY evt.LinkRL4) < @StartDate THEN 1 ELSE 0 END
FROM
dbo.#Events evt
WHERE
evt.ProcessDate <= @EndDate
AND NOT EXISTS
(
SELECT
*
FROM
dbo.#Events evt2
WHERE
evt2.LinkRL4 = evt.LinkRL4
AND evt2.ProcessDate > evt.ProcessDate
AND evt2.ProcessDate < @StartDate
)
ORDER BY
evt.LinkRL4, evt.ProcessDate, evt.L4Ident

Post #1595411
Posted Thursday, July 31, 2014 12:09 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Friday, October 17, 2014 1:37 PM
Points: 321, Visits: 822
Peter, I think you're on right track, but there is a flaw. The initial sample data does not show it, but when used with date range of 6/12 - 7/01 this data does:

INSERT	dbo.#Events (LinkRL4, ProcessDate, UserName, SequenceNumber, JobDescription, EntryType, ChangingField)
VALUES (111, '2014-06-01', 'Michael', 1, 'Daily', 'INSERT', 'GM'),
(111, '2014-06-09', 'Michael', 12, 'Daily', 'UPDATE', 'GMC'),
(111, '2014-06-15', 'Rachel', 19, 'Daily', 'UPDATE', 'GMotors'),
(111, '2014-07-01', 'Rachel', 24, 'Daily', 'UPDATE', 'Motors'),
(111, '2014-07-05', 'Rachel', 39, 'Daily', 'UPDATE', 'GenMotors');

INSERT dbo.#Events (LinkRL4, ProcessDate, UserName, SequenceNumber, JobDescription, EntryType, ChangingField)
VALUES (112, '2014-06-04', 'Thomas', 6, 'Daily', 'INSERT', 'MB'),
(112, '2014-06-07', 'Thomas', 10, 'Daily', 'UPDATE', 'Mercedes'),
(112, '2014-06-15', 'Thomas', 20, 'Daily', 'UPDATE', 'MBenz'),
(112, '2014-07-03', 'Michael', 27, 'Daily', 'UPDATE', 'Benz'),
(112, '2014-07-05', 'Rachel', 40, 'Daily', 'UPDATE', 'Mer Benz');

INSERT dbo.#Events (LinkRL4, ProcessDate, UserName, SequenceNumber, JobDescription, EntryType, ChangingField)
VALUES (113, '2014-07-06', 'Michael', 55, 'Daily', 'INSERT', 'BMW');

INSERT dbo.#Events (LinkRL4, ProcessDate, UserName, SequenceNumber, JobDescription, EntryType, ChangingField)
VALUES (114, '2014-06-01', 'Steve', 4, 'Daily', 'INSERT', 'VW'),
(114, '2014-06-02', 'Steve', 60, 'Daily', 'UPDATE', 'Volks');

INSERT dbo.#Events (LinkRL4, ProcessDate, UserName, SequenceNumber, JobDescription, EntryType, ChangingField)
VALUES (115, '2014-06-01', 'Stan', 2, 'Daily', 'INSERT', 'BMW'),
(115, '2014-06-02', 'Stan', 3, 'Daily', 'UPDATE', 'Beamer'),
(115, '2014-08-02', 'Stan', 80, 'Daily', 'UPDATE', 'BMW');

INSERT dbo.#Events (LinkRL4, ProcessDate, UserName, SequenceNumber, JobDescription, EntryType, ChangingField)
VALUES (116, '2014-06-25', 'Lee', 32, 'Daily', 'INSERT', 'BMW'),
(117, '2014-08-02', 'Lee', 81, 'Daily', 'UPDATE', 'BMW');
GO




Post #1598423
Posted Thursday, July 31, 2014 12:10 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Friday, October 17, 2014 1:37 PM
Points: 321, Visits: 822
I think this works:

--first part gets events within the date range
SELECT evt.*
FROM dbo.#Events evt
WHERE evt.ProcessDate BETWEEN @StartDate AND @EndDate

--second part gets related events that occurred before
UNION ALL
SELECT Prior.*
FROM (SELECT LinkRL4, MIN(ProcessDate) AS ProcessDate FROM dbo.#Events evt WHERE ProcessDate BETWEEN @StartDate AND @EndDate GROUP BY LinkRL4) AS MinDate
CROSS APPLY (
SELECT TOP 1 evt.*
FROM dbo.#Events evt
WHERE evt.LinkRL4 = MinDate.LinkRL4
AND evt.ProcessDate < MinDate.ProcessDate
ORDER BY evt.ProcessDate DESC) as Prior
ORDER BY LinkRL4, ProcessDate, L4Ident




Post #1598425
Posted Friday, August 1, 2014 11:25 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 4:18 PM
Points: 2,192, Visits: 3,299
I'd also create the best indexes while re-working the code.

#Events indexes:
1) Unique Clustered on ( ProcessDate, L4Ident ) --in that order
2) Nonclustered on ( L4Ident )
3) NNonclustered on ( LinkRL4, L4Ident ) Include ( ProcessDate ) --yes, processdate would be auto-included anyway, but I like to explicitly specify it anyway, in case the clus index changes later.


SQL DBA,SQL Server MVP('07, '08, '09)

Carl Sagan said: "There is no such thing as a dumb question." Sagan obviously never watched a congressional hearing!
Post #1598779
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse