SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 

Should the Data Lake be Immutable?

By Steve Jones,

There's a concept in computer science of immutability. At a high level, this means once something is set, it isn't changed. Various computer science languages do this with variables, where values don't change, though variables can be destroyed and recreated.

In the PASS keynote, Dr. Ramakrishnan pointed out that we have silos of data, often in disparate systems where we keep our information. We want to query this together, so we transfer this to a data warehouse or data lake (the future view) and that items in the data lake are immutable. They aren't allowed to chang in the way that we update values in our relational databases. We should just read the most recent version of any data, and if there is an update, just add a new set of data.

That's an interesting concept, but not sure I agree. I think that while we might often want to use a simpler process, there are cases where we do need capabilities to edit. Imagine I had a large set of data, say GBs in a file, would I want to download this and change a few values before uploading it again? Do we want a large ETL load process to repeat? Could we repeat the process and reload a file again? I don't think so, but it's hard to decide. After all, the lake isn't the source of data; that is some other system.

Maybe that's the simplest solution, and one that reduces complexity, downtime, or anything else that might be involved with locking and changing a file. After all, we wouldn't want queries that could potentially read the data in between us deleting a value and adding back a new one.

If you're a data warehouse or analysis person, what do you think? Does it make sense to keep the data lake as immutable and reload data that might not be clean? Let us know today.

 
Total article views: 42 | Views in the last 30 days: 42
 
Related Articles
ARTICLE

Muting the Immutable

Phil Factor on dealing with "immutable" domain data during database development and deployment.

ARTICLE

CosmosDB Change Feed Processing

This article was created to help readers understand CosmosDB change feed processing.

BLOG

Comments on Proposed Changes to the NomCom Process

On Friday Thomas LaRock posted Changes to the NomCom Process on the PASS blog for comment prior to t...

FORUM

Should the Data Lake be Immutable?

Comments posted to this topic are about the item [B]Should the Data Lake be Immutable?[/B] In my wor...

FORUM

Muting the Immutable

Comments posted to this topic are about the item [B]Muting the Immutable[/B] [quote]There are plenty...

Tags
data warehouse    
editorial    
 
Contribute