Comments posted to this topic are about the item Using OVER() to Fix Bad Version Numbers
Will keep this one in mind - but my hunch is, this won't sort out a situation where some field had a version 1 and two version 3s...
Thanks for taking the time to read the article. When you have (as in your example), a v1 and two v3 values, all you can do is make a best guess at which of the V3 values is the latest. When I fixed up the system the article talks about, I made sure the ORDER BY in the ROW_NUMBER() call ordered by start date. This should pick up the latest value most of the time, you'll have to go with the row returned by SQL Server if two rows have the same version number and start date (unless you can more ordering criteria to differentiate the rows).
It might be a better idea to generate the version numbers dynamically in SQL Server (either on INSERT or via computed column), which if done properly should prevent this problem from occurring.
Right there with Babe
I know it's dummy data, but what would you suggest to fix the inconsistency in the end date field? As it stands, it is u/s and would be better dropped.
You can update a CTE directly, so there's no need for the #BadRecords temp table and the MULTIPLES and BAD sub-queries:
WITH CTE AS(
SELECT ContactId, FieldTypeId, FieldVersion, StartDate,
ROW_NUMBER() OVER (PARTITION BY ContactId, FieldTypeId
ORDER BY StartDate) AS CorrectVersion
SET FieldVersion = CorrectVersion
WHERE FieldVersion != CorrectVersion;
You can easily put a StartDate and EndDate (as well as a current flag) into what you call the traditional row model of a data warehouse. I can't imagine how the attribute-based model you describe could ever be performant, especially when dealing with a large financial institution that would have tons of data.
Thanks for the refresher on the OVER() clause.
Hi Sam - yes, your method is valid too. There are plenty of ways to do this kind of thing, I'm just trying to give an overview of how the OVER() clause works.
David - again, I agree with you. And yes, the structure used was not particularly performant. But it was in place before I started working with the company, so the only option was to work (and fix) what was already there.
With a corrected Version Number, it's relatively straightforward to set the end date of the current row to the start date of the following row.
FieldVersion: 3, StartDate: 2016-04-06 09:00, EndDate: 2016-08-20 00:32
FieldVersion: 4, StartDate: 2016-04-06 10:00, EndDate: NULL
You want to set the EndDate of FieldVersion 3 to 2016-04-06 10:00. This code will hopefully start you off (worked fine for the sample records I used):
SELECT ContactId, FieldId, FieldTypeId, FieldVersion, FieldVersion - 1 AS PreviousFieldVersion, StartDate, EndDate
SELECT CF.ContactId, CF.FieldId, CF.FieldVersion, CF.StartDate, CF.EndDate AS CurrentEndDate,
MV.StartDate AS NewEndDate
FROM Scd.ContactFields CF
LEFT JOIN #MultiVersions MV
ON MV.ContactId = CF.ContactId AND MV.FieldTypeId = CF.FieldTypeId
AND MV.PreviousFieldVersion = CF.FieldVersion
ORDER BY CF.ContactId, CF.FieldTypeId, CF.FieldVersion
Viewing 9 posts - 1 through 8 (of 8 total)