Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Deleted source records in dimension tables Expand / Collapse
Author
Message
Posted Wednesday, February 19, 2014 8:21 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Yesterday @ 8:27 AM
Points: 277, Visits: 618
I know, I know. The source data shouldn't have deleted records. They should be marked inactive. However I can see that not happening due to third party systems, odd business practices, general bad practices, etc.

I've already got the three cardinal fields on all my tables regardless of what type of SCD is being used - EffectiveDt, ExpirationDt, CurrentRow.

Does anyone have a best practice for this kind of situation? Here's my thought on it and luckily I'm early into the process of building the warehouse:

Use the ExpirationDt for the date the record was deleted and leave the CurrentRow as 'Current'.

Pros and cons of this? I don't want to delete records from the warehouse since historically they did exist.
Post #1543056
Posted Thursday, February 20, 2014 7:28 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Yesterday @ 10:09 AM
Points: 818, Visits: 2,075
I know, I know. The source data shouldn't have deleted records. They should be marked inactive.


It's possible to see this as being valid. I deal with deleted order details because they have in fact been deleted from the order for one reason or another. These populate a fact table, however. I can't think of a process that deletes dimension data. Assuming the tables are keyed, it would mean that they are not referenced by a related table.

If the row is referenced by a fact record, my initial thought is to have a deleted flag. It might even be exposed as an attribute. If the row is not referenced, I would consider deleting it. There are reasons to keep it, but for most companies, knowing the list of items never used is not a useful piece of information.



Post #1543488
Posted Thursday, February 20, 2014 8:07 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Yesterday @ 8:27 AM
Points: 277, Visits: 618
I'm dealing with many years of various ways of bringing in data (there's still a SQL 2k server in use) and a couple of rather unfriendly third party systems that only hold data for 30 days. So there will be and are deleted dimension records.

I really, really, really don't want to have to search all the fact tables and bridge tables that might access the deleted records. We've got plenty of space and since I don't know what tables will incur this problem a deleted flag would be iffy and needed on every ETL.

I think using the ExpirationDt will work because if there's a current record with a date that's less than 2999-12-31 then it's no longer in the source. That also indicates when it was removed and keeps the time slices accurate.

I'll leave the post open without an answer for a while (nothing against you Ron) to give other people a chance to chime in with what they do in their environments.
Post #1543522
Posted Thursday, February 20, 2014 8:46 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Yesterday @ 10:09 AM
Points: 818, Visits: 2,075
nothing against you Ron


I don't take things personally. These were suggestions, but your the person on the ground.



Post #1543552
Posted Friday, March 21, 2014 10:15 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Wednesday, July 30, 2014 10:34 AM
Points: 386, Visits: 624
any time you are bringing data into the DW using a Delta you will potentially have records that are not in the Delta. This may be because they are outside the scope of the Delta or because they have been physically removed from the source data. If this is important to you then you will need to run a cleanup script to identify records you expected to see in the delta but were not there and have to assume that they have been deleted and mark them appropriately.

This still leaves you an issue with how to identify records which you are not expecting to be in the Delta but have been removed from source.

Post #1553590
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse