• This is why I feel the data lake is a important component to the data warehouse. Often times, the data warehouse is not the raw source of truth. It's often the processed source of truth that is changing as the business requirements change. This means, we take raw data from a source system and conform it to a processed state for the data warehouse. Then we limit that process state with limited access and restrictions on what we can do with it (i.e.: no human intervention).

    Copying the entire raw state of every source to a cheaper and more fault tolerant system before the warehouse seems to be the real source of truth. It's everything it is before you go into that processed state in the warehouse. It's the dirty and unfiltered data that can be accessed by all users to redefine the business requirements of the warehouse without actually impacting the data warehouse. It's also the one location that data can be explored and prototyped to only further enhance the data warehouse or give the data warehouse time to catch up on their own work to finally productionize what you developed.

    I think in time, this will become the source of truth for many organizations just for the sheer fact it's so easy to do when in comparison of trying to develop a schema-on-write database that if harder to maintain as the raw source of truth in whatever database system you use.