Single Source of Truth

Question

Single Source of Truth

Andy Warren

SSC Guru

Points: 119902
More actions
October 8, 2018 at 9:08 pm

#365282

Comments posted to this topic are about the item Single Source of Truth
Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

David.Poole SSC Guru Points: 75898 More actions · Answer 1

Single source of truth in one place is a paradigm that has scuppered many an EDW project. Once an organisation gets beyond a certain scale you are on a hiding to nothing.
To start with define "single source of truth". Is there a universally agreed truth? Would the finance department agree with the sales department on what constitutes truth? Does what constitute the truth change?

Does all the data within your organisation originate in your organisation? Do you have control over all data in your organisation?

I'm finding that the source of truth for "products" resides in system 'x', the source of truth for "customer" details resides in system 'y' and the source of truth for a large chunk of reference data resides externally.

LinkedIn Profile

allinadazework SSCarpal Tunnel Points: 4506 More actions · Answer 2

Nice article, thanks.

In my opinion there's not really any such thing as the "truth" because data changes all the time. The focus should really be on consistency. The problem is that most clients hate inconsistency - yes really hate it - it makes them look bad to their internal clients and undermines confidence in their data/project. Most clients seem ok that things change over time, but they absolutely want reports to be consistent at least within a batch.

If data changes frequently then I find it's best to generate some reporting tables or a cache of some sort to contain a consistent set of data to produce all the reports from. This has the added advantage that you can compare your cache to future caches to see what has changed or caused changes. Clients love to know what causes change so they can explain it to the end user. The quicker you can explain any changes the happier the client is in my experience.

xsevensinzx One Orange Chip Points: 25560 More actions · Answer 3

This is why I feel the data lake is a important component to the data warehouse. Often times, the data warehouse is not the raw source of truth. It's often the processed source of truth that is changing as the business requirements change. This means, we take raw data from a source system and conform it to a processed state for the data warehouse. Then we limit that process state with limited access and restrictions on what we can do with it (i.e.: no human intervention).

Copying the entire raw state of every source to a cheaper and more fault tolerant system before the warehouse seems to be the real source of truth. It's everything it is before you go into that processed state in the warehouse. It's the dirty and unfiltered data that can be accessed by all users to redefine the business requirements of the warehouse without actually impacting the data warehouse. It's also the one location that data can be explored and prototyped to only further enhance the data warehouse or give the data warehouse time to catch up on their own work to finally productionize what you developed.

I think in time, this will become the source of truth for many organizations just for the sheer fact it's so easy to do when in comparison of trying to develop a schema-on-write database that if harder to maintain as the raw source of truth in whatever database system you use.

Eric M Russell SSC Guru Points: 125522 More actions · Answer 4

.. The goal is to have a single truth for each datapoint, if we can achieve that then we can make decisions about whether to copy that data somewhere to simplify usage or to mash it all up at a layer above the database ..

The CAP theorem applies to attempts at using a central EDW as a Single Source Of Truth. You can't reliably aggregate point in time metrics from multiple distributed data sources without tolerating a certain degree of latency, occasional unavailability, and margin for error. What you're really providing is an [Official Version Of Truth]. For a governmental or corporate enterprise, it not so much important that everyone is operating with the most accurate ideal of truth, but rather that everyone at any moment in time is operating with the same good enough version of truth. For example, it's important that society as a whole accept the official outcome of a political election, even though many folks would argue on procedural or philosophical grounds that the outcome was incorrect. Settling for a margin of error is better than remaining in a constant state of disagreement.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho