SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Single Source of Truth


Single Source of Truth

Author
Message
Andy Warren
Andy Warren
SSC Guru
SSC Guru (92K reputation)SSC Guru (92K reputation)SSC Guru (92K reputation)SSC Guru (92K reputation)SSC Guru (92K reputation)SSC Guru (92K reputation)SSC Guru (92K reputation)SSC Guru (92K reputation)

Group: Moderators
Points: 92670 Visits: 2893
Comments posted to this topic are about the item Single Source of Truth

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter
Dave Poole
Dave Poole
SSC Guru
SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)

Group: General Forum Members
Points: 60706 Visits: 3986
Single source of truth in one place is a paradigm that has scuppered many an EDW project. Once an organisation gets beyond a certain scale you are on a hiding to nothing.
To start with define "single source of truth". Is there a universally agreed truth? Would the finance department agree with the sales department on what constitutes truth? Does what constitute the truth change?

Does all the data within your organisation originate in your organisation? Do you have control over all data in your organisation?

I'm finding that the source of truth for "products" resides in system 'x', the source of truth for "customer" details resides in system 'y' and the source of truth for a large chunk of reference data resides externally.

LinkedIn Profile
www.simple-talk.com
Andy Robertson
Andy Robertson
Hall of Fame
Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)

Group: General Forum Members
Points: 3293 Visits: 605
Nice article, thanks.

In my opinion there's not really any such thing as the "truth" because data changes all the time. The focus should really be on consistency. The problem is that most clients hate inconsistency - yes really hate it - it makes them look bad to their internal clients and undermines confidence in their data/project. Most clients seem ok that things change over time, but they absolutely want reports to be consistent at least within a batch.

If data changes frequently then I find it's best to generate some reporting tables or a cache of some sort to contain a consistent set of data to produce all the reports from. This has the added advantage that you can compare your cache to future caches to see what has changed or caused changes. Clients love to know what causes change so they can explain it to the end user. The quicker you can explain any changes the happier the client is in my experience.
xsevensinzx
xsevensinzx
SSC-Insane
SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)

Group: General Forum Members
Points: 21186 Visits: 5792
This is why I feel the data lake is a important component to the data warehouse. Often times, the data warehouse is not the raw source of truth. It's often the processed source of truth that is changing as the business requirements change. This means, we take raw data from a source system and conform it to a processed state for the data warehouse. Then we limit that process state with limited access and restrictions on what we can do with it (i.e.: no human intervention).

Copying the entire raw state of every source to a cheaper and more fault tolerant system before the warehouse seems to be the real source of truth. It's everything it is before you go into that processed state in the warehouse. It's the dirty and unfiltered data that can be accessed by all users to redefine the business requirements of the warehouse without actually impacting the data warehouse. It's also the one location that data can be explored and prototyped to only further enhance the data warehouse or give the data warehouse time to catch up on their own work to finally productionize what you developed.

I think in time, this will become the source of truth for many organizations just for the sheer fact it's so easy to do when in comparison of trying to develop a schema-on-write database that if harder to maintain as the raw source of truth in whatever database system you use.
Eric M Russell
Eric M Russell
SSC Guru
SSC Guru (108K reputation)SSC Guru (108K reputation)SSC Guru (108K reputation)SSC Guru (108K reputation)SSC Guru (108K reputation)SSC Guru (108K reputation)SSC Guru (108K reputation)SSC Guru (108K reputation)

Group: General Forum Members
Points: 108455 Visits: 14551
.. The goal is to have a single truth for each datapoint, if we can achieve that then we can make decisions about whether to copy that data somewhere to simplify usage or to mash it all up at a layer above the database ..

The CAP theorem applies to attempts at using a central EDW as a Single Source Of Truth. You can't reliably aggregate point in time metrics from multiple distributed data sources without tolerating a certain degree of latency, occasional unavailability, and margin for error. What you're really providing is an [Official Version Of Truth]. For a governmental or corporate enterprise, it not so much important that everyone is operating with the most accurate ideal of truth, but rather that everyone at any moment in time is operating with the same good enough version of truth. For example, it's important that society as a whole accept the official outcome of a political election, even though many folks would argue on procedural or philosophical grounds that the outcome was incorrect. Settling for a margin of error is better than remaining in a constant state of disagreement.


"The universe is complicated and for the most part beyond your control, but your life is only as complicated as you choose it to be."
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search