Type 2 SCD

Question

Type 2 SCD

schleep

SSChampion

Points: 12582
More actions
August 25, 2015 at 12:10 pm

#320879

So it looks like we're going to re-do our OLTP database (180+ tables) using at Type 2 SCD approach.
Currently in our model, every table has a surrogate PK and at least one natural key enforced by a unique constraint. In general, on entity tables, the clustered index is on the PK; on join tables it's on the unique natural key.
All I/U/D is done via stored procedures.
I've read a bit about Type 2, here and elsewhere. What I'd like to know is, what/where are the "Gotcha"s, and the benefits?
- Some have mentioned the performance hit. I'm currently gathering I/U/D stats, but I'm not convinced that this is going to be an issue. Our biggest table, and likely the busiest, is only 900K rows. Next is 500K.
- I read another thread where it was suggested that the clustered index should be on the effective date and the primary key.
Should the expired date, which defaults to 12-31-9999 so I don't have to contend with NULLs, be included in the PK, or a separate NC index? Or a straight surrogate key PK, non-clustered, with clustered on the natural key and date field(s)?
- Our users also want to be able to future-date batches of I/U/D, so I'd rolled-my-own batch implementation system -- which seemed to be working alright, although we'd not progressed to testing at scale -- and then this new requirement came along. Anyone have any insight on this?
I've seen some implementations where there is a column indicating which rows are current, I can see that being useful. What's the best way of maintaining that?
I suspect the devil is going to be in the details.
Some of the advantages I can think of are,
- no need for a separate audit database, which should really speed up I/U/D.
- users can see the state of the DB at any point in time
I'll be happy to hear any thoughts, ideas, advice, warnings, etc...
Thanks

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Alvin Ramard SSC-Forever Points: 41190 More actions · Answer 1

You're talking about redoing an OLTP database using Type 2 SCD?

Type 2 SCD is something that is normally used in Reporting database/Data Warehouse. I've never heard of it in an OLTP environment.

Keeping a history of changes, as type 2 SCD deals with, is something that can apply to an OLTP system, but be careful about mixing up your environments.

Alvin Ramard
Memphis PASS Chapter[/url]

All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

schleep SSChampion Points: 12582 More actions · Answer 2

Hmmm. I see I've used the wrong term: I guess that should be "Temporal Database".

Although to be honest, I can't really see how they differ... tables have valid-from and valid-to columns...?

ZZartin SSC-Dedicated Points: 30894 More actions · Answer 3

schleep (8/26/2015)
Hmmm. I see I've used the wrong term: I guess that should be "Temporal Database".
Although to be honest, I can't really see how they differ... tables have valid-from and valid-to columns...?

An SCD would more typically be used in a datawarehouse where every single row you add to a table has the SCD logic applied to it, whatever SCD type that might be.

There's nothing wrong with having effective from/to columns in an OLTP database but they would typically be used in more of a functional manner as opposed to a reporting manner, for example a product table with effective from to dates that define when a product becomes available and when it stops being available.

schleep SSChampion Points: 12582 More actions · Answer 4

Where is gets complicated: our DB is effectively both OLTP and DSS/warehouse. For many years, we've been able to segregate the roles on separate servers, by refreshing the DSS server nightly from the OLTP. That is not to be the case going forward.

Our OLTP users have been asking for the ability to have information become "current" at some future date, while other information "expires" at some other future date.

Our DSS users now need to see what's "current" at any given moment -- a 24 hr lag is no longer acceptable; and they want to be able to report on everything as it appeared on the day they recorded a viewing statistic.

ZZartin SSC-Dedicated Points: 30894 More actions · Answer 5

Of course end users want everything available instantly 🙂

Unfortunately that kind of defeats the purpose of an SCD if it becomes an audit log, for example say someone goes into a record and changes it 10 times in 10 minutes do you now keep 10 records in your SCD? And from a design perspective if you're currently handling that SCD loading as part of your ETL into your datawarehouse do you now modify your front end application to handle populating the SCD properly?

Now do they actually want all the data available instantly or just a subset of the data that doesn't include the SCD? There are certainly options to populate a datawarehouse more often than once a day.

schleep SSChampion Points: 12582 More actions · Answer 6

Our product *is* the information in our database: if some of that content does in fact change 10 times in 10 minutes -- an unlikely scenario, but theoretically possible --, then yes, we do need all those records, because it is entirely possible that content related to each of those 10 iterations has been provided to clients.

At any point in time, all of the "current" rows in the DB is what is required to be available.

ZZartin SSC-Dedicated Points: 30894 More actions · Answer 7

Well the point of an SCD is to see data that is not what is currently available 🙂

But if you really need those populated instantly and available in the same DB as your OLTP data it makes things easier for you as you can just turn off your ETL to your datawarehouse and all the work of populating that SCD data now becomes the job of the application.

schleep SSChampion Points: 12582 More actions · Answer 8

That's what I'm getting at: there won't be a separate warehouse anymore. Although historical reporting will not be done against the live database.

And yes, the application will be managing the "currency?" of every row.

As I mentioned above, I think "Temporal Database" is more properly the term I should have been using.

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 9

schleep (8/27/2015)
That's what I'm getting at: there won't be a separate warehouse anymore. Although historical reporting will not be done against the live database.
And yes, the application will be managing the "currency?" of every row.
As I mentioned above, I think "Temporal Database" is more properly the term I should have been using.

You're now technical talking about *bi-temporal support*, not just temporal. As in - in order to provide the correct history in your OLTP, you will need to keep track of the effectivity of the change (when does the change become effective) and the date you specifically processed the change.

I would caution against trying to run OLTP against a true SCD 2 setup. As in - it's certainly okay to record changes you make to the "live" table and convey those in SCD form to a table with a similar layout in the same local DB. But keeping the very high level of changes inside of the operational 9while maintaining bi-temporal support AND tracking whether these changes are in flight or final, etc...) is going to thoroughly thrash your data access.

Your operational table needs to be worried about allowing someone to manage the changes that are ongoing and what's in draft form vs what isn't (handle the complexities of what is changing WHILE the transaction is processed), but not try to do that AND keep it in SCD form. For one thing - there really isn't a concept of "update" or "delete" in SCD - *new* entries are created any time ANY change occurs. the only update tolerated would be the part that expires the previous version of the record you just updated.

Also - not all data *belongs* in a SCD 2 form. There are reasons why there are a LOT of SCD levels out there, so picking only one SCD level to use typically means that someone hasn't done their homework on whether it's even appropriate.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

schleep SSChampion Points: 12582 More actions · Answer 10

You're now technical talking about *bi-temporal support*, not just temporal. As in - in order to provide the correct history in your OLTP, you will need to keep track of the effectivity of the change (when does the change become effective) and the date you specifically processed the change.

When you say "date ... processed the change", do you mean the date the change comes into effect, or the date the rows were inserted with a future effective/expiry date?

I would caution against trying to run OLTP against a true SCD 2 setup. As in - it's certainly okay to record changes you make to the "live" table and convey those in SCD form to a table with a similar layout in the same local DB. But keeping the very high level of changes inside of the operational 9while maintaining bi-temporal support AND tracking whether these changes are in flight or final, etc...) is going to thoroughly thrash your data access.

That's what I'm afraid of.

there really isn't a concept of "update" or "delete" in SCD - *new* entries are created any time ANY change occurs. the only update tolerated would be the part that expires the previous version of the record you just updated.

Hadn't thought of that: it might be the proverbial nail in the coffin. We rely on single row updates to make n rows "visible", where 1 > n > ~4600.

This would result in n updates + n inserts.

Ugh. More thought required.

Thanks Matt

PS: Always loved your sig line. That sign -- without the exceptions -- hangs on my wall. 🙂 It's generally ignored anyway.

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 11

schleep (8/27/2015)
You're now technical talking about *bi-temporal support*, not just temporal. As in - in order to provide the correct history in your OLTP, you will need to keep track of the effectivity of the change (when does the change become effective) and the date you specifically processed the change.
When you say "date ... processed the change", do you mean the date the change comes into effect, or the date the rows were inserted with a future effective/expiry date?
I would caution against trying to run OLTP against a true SCD 2 setup. As in - it's certainly okay to record changes you make to the "live" table and convey those in SCD form to a table with a similar layout in the same local DB. But keeping the very high level of changes inside of the operational 9while maintaining bi-temporal support AND tracking whether these changes are in flight or final, etc...) is going to thoroughly thrash your data access.
That's what I'm afraid of.
there really isn't a concept of "update" or "delete" in SCD - *new* entries are created any time ANY change occurs. the only update tolerated would be the part that expires the previous version of the record you just updated.
Hadn't thought of that: it might be the proverbial nail in the coffin. We rely on single row updates to make n rows "visible", where 1 > n > ~4600.
This would result in n updates + n inserts.
Ugh. More thought required.
Thanks Matt
PS: Always loved your sig line. That sign -- without the exceptions -- hangs on my wall. 🙂 It's generally ignored anyway.

It sounds to be that the date the change was physically processed AND the effective date are both required. By allowing future date changes you would need to account for BOTH dates (thus bi-temporal).

Example: If I were a travel company and I gave you a quote today on a flight to Aruba in 30 days from now for 600$, and a hurricane comes through tomorrow that changes that future price to 1500 before I've made my decision (half the runways are destroyed so only some flights are allowed in), you would need BOTH dates in order to be able to track your quotation changes in SCD 2 form.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

MMartin1 One Orange Chip Points: 27768 More actions · Answer 12

Currently in our model, every table has a surrogate PK and at least one natural key enforced by a unique constrain

I am wondering about the unique constraint set around the 'natural' key. If a product changes color lets say in a way that impacts sales... although the surrogate key will change to capture this change, you still want to know that this was essentially the same product that you are tracking sales measures on. I dont see that the natural key would change , unless I am missing something? Is there another business key used as well?

----------------------------------------------------

schleep SSChampion Points: 12582 More actions · Answer 13

Matt: Yes, that can be handled in my design. Although as I mentioned, I'm going to have to do some serious testing in order to determine if this is even possible..

MMartin1: in fact, I was using the incorrect terminology (again): I should have said "candidate key". I've been burned once by natural keys, when Newfoundland (NF) renamed itself to Newfoundland and Labrador (NL).