Archive to the Lake

Microsoft Fabric was announced at Build in May 2023. This is the next evolution of data warehousing from Microsoft, folding in Synapse and a number of other technologies to create a simpler location for storing and analyzing data. We've published some articles on the platform and there's a great presentation from Mr. Paul Andrew on Linked In. It's worth listening to, even in the background. Paul has a nice style and a great voice.

Part of this platform is OneLake. This is a data lake for your org, just one of them, and while it's able to store data in many formats, it's mainly optimized to read tabular data in the delta parquet format. This is essentially a compressed text file that allows for some transactional changes to the original data in parquet format.

I don't do a lot of work with text files, and I've been suspicious of using lots of CSV or other text files in a warehouse environment, which is what a lot of people were advocating a few years ago. Exporting tables into lots of files split on some field, like date, while easy, didn't seem like the best way to move data for reporting.

Fabric, however, is optimized for reading delta files. A few presentations I've seen from people have advocated for exporting your data from SQL Server (or other platforms) into parquet. While I don't know there's a native way to do this (yet), I suspect one is coming. I've seen lots of articles (one, two, three, more) about how to do this now. We also have SQL Server able to read these files with external file formats already, so I'm sure we'll have an easy way to write them soon.

Many of us struggle with large systems, especially with query performance. We'd love to archive off data, though that's often impractical. However, in an amazing, wonderful world, maybe we'll get lots of people doing this, writing about it in the media, and our bosses will start to let us establish an archive in the lake. We could move some data there, especially old, unchanging data. We could delete that from source systems. We could have all our users happy.

I don't know if I see lots of data moving to the lake, but I certainly expect lots of it to be copied. If you haven't thought about archives, data lakes, and text formats, it's an area that seems to have a lot of growth. Perhaps it's of interest to you and you might find a new career.

Or maybe you just hope it gets widely adopted to relieve some pressure on your OLTP server.

Database Deployment with Terraform - Modules

by Wagner Crivelini

SQLServerCentral

Deploy resources in Azure using reusable code with Terraform modules.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2022-10-10

11,234 reads

Discuss

Analyze Azure Cosmos DB data using Synapse Link and Power BI

by Sucharita Das

SQLServerCentral

Introduction In my last article, Using Azure Synapse Link for Azure Cosmos DB , I discussed creating the Synapse Link to query data present in Azure Cosmos DB from my Synapse workspace. Here, I will create a new database in the serverless SQL pool and create views based on the Cosmos DB JSON files. Then, I will […]