SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Deleted source records in dimension tables


Deleted source records in dimension tables

Author
Message
JustMarie
JustMarie
SSCrazy
SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)

Group: General Forum Members
Points: 2132 Visits: 1361
I know, I know. The source data shouldn't have deleted records. They should be marked inactive. However I can see that not happening due to third party systems, odd business practices, general bad practices, etc.

I've already got the three cardinal fields on all my tables regardless of what type of SCD is being used - EffectiveDt, ExpirationDt, CurrentRow.

Does anyone have a best practice for this kind of situation? Here's my thought on it and luckily I'm early into the process of building the warehouse:

Use the ExpirationDt for the date the record was deleted and leave the CurrentRow as 'Current'.

Pros and cons of this? I don't want to delete records from the warehouse since historically they did exist.
RonKyle
RonKyle
SSCertifiable
SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)

Group: General Forum Members
Points: 6832 Visits: 3620
I know, I know. The source data shouldn't have deleted records. They should be marked inactive.


It's possible to see this as being valid. I deal with deleted order details because they have in fact been deleted from the order for one reason or another. These populate a fact table, however. I can't think of a process that deletes dimension data. Assuming the tables are keyed, it would mean that they are not referenced by a related table.

If the row is referenced by a fact record, my initial thought is to have a deleted flag. It might even be exposed as an attribute. If the row is not referenced, I would consider deleting it. There are reasons to keep it, but for most companies, knowing the list of items never used is not a useful piece of information.



JustMarie
JustMarie
SSCrazy
SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)

Group: General Forum Members
Points: 2132 Visits: 1361
I'm dealing with many years of various ways of bringing in data (there's still a SQL 2k server in use) and a couple of rather unfriendly third party systems that only hold data for 30 days. So there will be and are deleted dimension records.

I really, really, really don't want to have to search all the fact tables and bridge tables that might access the deleted records. We've got plenty of space and since I don't know what tables will incur this problem a deleted flag would be iffy and needed on every ETL.

I think using the ExpirationDt will work because if there's a current record with a date that's less than 2999-12-31 then it's no longer in the source. That also indicates when it was removed and keeps the time slices accurate.

I'll leave the post open without an answer for a while (nothing against you Ron) to give other people a chance to chime in with what they do in their environments.
RonKyle
RonKyle
SSCertifiable
SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)

Group: General Forum Members
Points: 6832 Visits: 3620
nothing against you Ron


I don't take things personally. These were suggestions, but your the person on the ground.



aaron.reese
aaron.reese
SSCrazy
SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)

Group: General Forum Members
Points: 2365 Visits: 907
any time you are bringing data into the DW using a Delta you will potentially have records that are not in the Delta. This may be because they are outside the scope of the Delta or because they have been physically removed from the source data. If this is important to you then you will need to run a cleanup script to identify records you expected to see in the delta but were not there and have to assume that they have been deleted and mark them appropriately.

This still leaves you an issue with how to identify records which you are not expecting to be in the Delta but have been removed from source.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search