Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Should a DW track data origin system?


Should a DW track data origin system?

Author
Message
Christopher Rannow
Christopher Rannow
Forum Newbie
Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)

Group: General Forum Members
Points: 4 Visits: 90
Hi, we are in the beginning stages of designing our data warehouse. We have a lot of history (15+ years) we'd like to move into a warehouse. The fact we are beginning our warehouse with is ticket sales and we've got our star shaped fact and dimension tables laid out to our liking. The data which will end up in our warehouse comes from 4 different SQL databases, 2 are purely historical and 2 are production but they are from 4 different pieces of software.

Our question is should we design a dimension table to keep track of the which source system the records came from or should we build the origin fields directly into each dimension table. Would it disrupt our table layout and the efficiency of the warehouse if we did design an "origin" table which every dimension touched?
Koen Verbeeck
Koen Verbeeck
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16469 Visits: 13207
Personally I would just include a field in the dimension that stores the source where the record came from.
Is it possible though the dimension info for one record can come from multiple sources?

How would you use that information? Are you going to filter on it in reports? Or is it just for auditing?



How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at SQLKover.

MCSA SQL Server 2012 - MCSE Business Intelligence
below86
below86
SSChasing Mays
SSChasing Mays (640 reputation)SSChasing Mays (640 reputation)SSChasing Mays (640 reputation)SSChasing Mays (640 reputation)SSChasing Mays (640 reputation)SSChasing Mays (640 reputation)SSChasing Mays (640 reputation)SSChasing Mays (640 reputation)

Group: General Forum Members
Points: 640 Visits: 2121
For years we struggled with being able to identify where certain transactions came from. We finally added a field in the last year, to a few of our key tables called source_id it contain a numeric value that represents a particular source. We have a table that contains this source_id with other info about that source. If you can do it know, DO IT. :-) It will save you some head ache down the road.

-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Cliff O'Reilly
Cliff O'Reilly
Forum Newbie
Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)

Group: General Forum Members
Points: 5 Visits: 86
I think auditing the history of data is important. It's not something that can be added retrospectively so the sooner you implement a solution, the better.

My personal preference is for an AuditKey column in every dimension. This is a foreign key to an Audit dimension which contains detailed information on how that piece of data was put there, e.g. the date and time it was last changed as well as the ETL involved in moving it etc. You could go to town on this and it depends very much on how complex your ETL is. Keeping the history of the changes made to data can be very useful in tracking errors or problems down the line. By adding a step to all your ETL processes that inserts to the Audit dimension you can keep this system going for future changes.
Lempster
Lempster
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2683 Visits: 1657
Cliff O'Reilly (6/6/2014)
I think auditing the history of data is important. It's not something that can be added retrospectively so the sooner you implement a solution, the better.

My personal preference is for an AuditKey column in every dimension. This is a foreign key to an Audit dimension which contains detailed information on how that piece of data was put there, e.g. the date and time it was last changed as well as the ETL involved in moving it etc. You could go to town on this and it depends very much on how complex your ETL is. Keeping the history of the changes made to data can be very useful in tracking errors or problems down the line. By adding a step to all your ETL processes that inserts to the Audit dimension you can keep this system going for future changes.

+ 1
Christopher Rannow
Christopher Rannow
Forum Newbie
Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)

Group: General Forum Members
Points: 4 Visits: 90
Thank you all for your replies they have been very helpful.

@Koen - Our intention is primarily for auditing but I can see scenarios where it would be useful for reporting. For instance the CFO may wish to see a summary of the ticket data from just System A, just System B or perhaps a summary of A and B omitting systems C & D.

@below86 - Yes, it is certainly easier to keep track from the beginning rather than adding later. We will definitely keep track and probably with a table as you Cliff & Lempster suggested.

@Cliff, & Lempster - I too like the idea of a separate table I just wasn't sure if it was the right way to go. I have a meeting with the vested parties this afternoon and if they agree too we'll add it as a separate table with a foreign key into each of our Ticket Fact as well as Location, Ticket Type, and Event dimension tables.
Christopher Rannow
Christopher Rannow
Forum Newbie
Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)

Group: General Forum Members
Points: 4 Visits: 90
Oh and a little bit of extra information on the 4 systems, 2 of them are point of sale based data and 2 are from post POS settlement/adjustment software. Ideally the POS software would always be correct because POS operators would always do their job and void/refund excess tickets however reality is different and it is easier to make adjustments to ticket sales in the settlement/adjustment software. So when all of this data goes into the warehouse it will be POS Sales/Refunds + Adjustments = Bottom Line which should then match to our separate accounting software.
herladygeekedness
herladygeekedness
Old Hand
Old Hand (318 reputation)Old Hand (318 reputation)Old Hand (318 reputation)Old Hand (318 reputation)Old Hand (318 reputation)Old Hand (318 reputation)Old Hand (318 reputation)Old Hand (318 reputation)

Group: General Forum Members
Points: 318 Visits: 813
We have a sourcesystem table, each source is defined. We needed the ability to enable/disable sources and this accomplishes it, and leaves us a trail in case something strange happens (Super Storm Sandy forced a data center there to close, no problem, I disabled the source to prevent pkgs from looking for it).

Further, all the auditing carries this sourcesystemkey. We host, so it is critical that I be able to determine very quickly where a problem is coming from.

Strongly agree on the auditing, also. It has saved my butt many times. Every table has an audit key from which I can determine when it ran, what sources it hit, how many records returned from each, dimensions referenced with hits,misses, unknowns, etc etc.

Lastly, Event Handlers. While easier than expected to implement, this also has saved my butt many times. When there is a problem and we need to consider killing pkgs, I can consult my logger to see what step every executing pkg is in. We had some delete before insert pkgs so would not want to kill past delete before insert but never knew quite where the pkg was. Now we know.

Build this stuff out sooner rather than later.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search