Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Should a DW track data origin system? Expand / Collapse
Author
Message
Posted Wednesday, June 4, 2014 2:06 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, November 4, 2014 12:08 PM
Points: 4, Visits: 61
Hi, we are in the beginning stages of designing our data warehouse. We have a lot of history (15+ years) we'd like to move into a warehouse. The fact we are beginning our warehouse with is ticket sales and we've got our star shaped fact and dimension tables laid out to our liking. The data which will end up in our warehouse comes from 4 different SQL databases, 2 are purely historical and 2 are production but they are from 4 different pieces of software.

Our question is should we design a dimension table to keep track of the which source system the records came from or should we build the origin fields directly into each dimension table. Would it disrupt our table layout and the efficiency of the warehouse if we did design an "origin" table which every dimension touched?
Post #1577527
Posted Wednesday, June 4, 2014 2:29 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: 2 days ago @ 12:52 PM
Points: 13,636, Visits: 11,509
Personally I would just include a field in the dimension that stores the source where the record came from.
Is it possible though the dimension info for one record can come from multiple sources?

How would you use that information? Are you going to filter on it in reports? Or is it just for auditing?




How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #1577540
Posted Thursday, June 5, 2014 7:32 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: 2 days ago @ 7:20 AM
Points: 313, Visits: 1,151
For years we struggled with being able to identify where certain transactions came from. We finally added a field in the last year, to a few of our key tables called source_id it contain a numeric value that represents a particular source. We have a table that contains this source_id with other info about that source. If you can do it know, DO IT. It will save you some head ache down the road.
Post #1577785
Posted Friday, June 6, 2014 4:42 AM This worked for the OP Answer marked as solution
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, June 11, 2014 7:51 AM
Points: 5, Visits: 86
I think auditing the history of data is important. It's not something that can be added retrospectively so the sooner you implement a solution, the better.

My personal preference is for an AuditKey column in every dimension. This is a foreign key to an Audit dimension which contains detailed information on how that piece of data was put there, e.g. the date and time it was last changed as well as the ETL involved in moving it etc. You could go to town on this and it depends very much on how complex your ETL is. Keeping the history of the changes made to data can be very useful in tracking errors or problems down the line. By adding a step to all your ETL processes that inserts to the Audit dimension you can keep this system going for future changes.


Post #1578232
Posted Friday, June 6, 2014 10:18 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 9:48 AM
Points: 2,128, Visits: 1,488
Cliff O'Reilly (6/6/2014)
I think auditing the history of data is important. It's not something that can be added retrospectively so the sooner you implement a solution, the better.

My personal preference is for an AuditKey column in every dimension. This is a foreign key to an Audit dimension which contains detailed information on how that piece of data was put there, e.g. the date and time it was last changed as well as the ETL involved in moving it etc. You could go to town on this and it depends very much on how complex your ETL is. Keeping the history of the changes made to data can be very useful in tracking errors or problems down the line. By adding a step to all your ETL processes that inserts to the Audit dimension you can keep this system going for future changes.

+ 1
Post #1578395
Posted Friday, June 6, 2014 10:37 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, November 4, 2014 12:08 PM
Points: 4, Visits: 61
Thank you all for your replies they have been very helpful.

@Koen - Our intention is primarily for auditing but I can see scenarios where it would be useful for reporting. For instance the CFO may wish to see a summary of the ticket data from just System A, just System B or perhaps a summary of A and B omitting systems C & D.

@below86 - Yes, it is certainly easier to keep track from the beginning rather than adding later. We will definitely keep track and probably with a table as you Cliff & Lempster suggested.

@Cliff, & Lempster - I too like the idea of a separate table I just wasn't sure if it was the right way to go. I have a meeting with the vested parties this afternoon and if they agree too we'll add it as a separate table with a foreign key into each of our Ticket Fact as well as Location, Ticket Type, and Event dimension tables.
Post #1578407
Posted Friday, June 6, 2014 10:45 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, November 4, 2014 12:08 PM
Points: 4, Visits: 61
Oh and a little bit of extra information on the 4 systems, 2 of them are point of sale based data and 2 are from post POS settlement/adjustment software. Ideally the POS software would always be correct because POS operators would always do their job and void/refund excess tickets however reality is different and it is easier to make adjustments to ticket sales in the settlement/adjustment software. So when all of this data goes into the warehouse it will be POS Sales/Refunds + Adjustments = Bottom Line which should then match to our separate accounting software.
Post #1578412
Posted Friday, June 6, 2014 1:50 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Friday, June 27, 2014 2:05 PM
Points: 287, Visits: 810
We have a sourcesystem table, each source is defined. We needed the ability to enable/disable sources and this accomplishes it, and leaves us a trail in case something strange happens (Super Storm Sandy forced a data center there to close, no problem, I disabled the source to prevent pkgs from looking for it).

Further, all the auditing carries this sourcesystemkey. We host, so it is critical that I be able to determine very quickly where a problem is coming from.

Strongly agree on the auditing, also. It has saved my butt many times. Every table has an audit key from which I can determine when it ran, what sources it hit, how many records returned from each, dimensions referenced with hits,misses, unknowns, etc etc.

Lastly, Event Handlers. While easier than expected to implement, this also has saved my butt many times. When there is a problem and we need to consider killing pkgs, I can consult my logger to see what step every executing pkg is in. We had some delete before insert pkgs so would not want to kill past delete before insert but never knew quite where the pkg was. Now we know.

Build this stuff out sooner rather than later.

Post #1578490
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse