June 23, 2025 at 12:00 am
Comments posted to this topic are about the item The Data Warehousing Choice
June 23, 2025 at 1:09 pm
I drink Microsoft-flavored Kool-Aid, so I support Microsoft Fabric on principle. I like idea of the Fabric SQL DB. I think a "data warehouse" is a database, and don't see why we need a lot of new shiny tools to build our "data warehouse" (maybe analytical database is a better word? Although "decision support database" may be more descriptive). Good old relational databases and SQL is good enough to build what we need. As an industry, we spend too much time on changing tools with fashion, instead of focusing on the problems of business.
June 24, 2025 at 6:27 am
Our business made the data lake choice two years ago, just before Fabric was widely released. We were expecting Azure to be the chosen one since we were heavily Microsoft and Power BI but the choice was AWS with hosting of other systems in mind and I must say that I don't believe we would have had as close and helpful a relationship with Microsoft as we've had with AWS.
It is now coming together live after a successful pilot project last year and although it's another set of skills (Python & Glue rather than SSIS etc.) it's still SQL based with Athena. The harder part has been the Power BI gateway as we didn't chose to change the best reporting tool but that's now sorted thanks to AWS staff.
I don't know whether the choice would have been Fabric had it been made now it is an established product, but I believe the financial deal and involvement of AWS had a big part to play in the choice so it isn't always just about the technology.
The one thing I miss in AWS is a "SQL Server Central" - an on-line community where I can find articles and help and forums all about AWS. My SQL Server career has been vastly helped using this resource and I haven't yet found an AWS equivalent.
June 24, 2025 at 10:05 am
I find this area somewhat confusing, especially since many of the products extend beyond traditional data warehouses. For example, Fabric offers real-time data analytics and event streaming through Event Hubs. I suppose these products are designed for data analytics, which can include data warehousing as one component.
Like Steve, I think Fabric is not quite mature yet but it seems to have the potential to hit the sweet spot in terms of functionality, ease of use and cost. A major selling point for Fabric is its tight integration with Power BI, and many of its user-friendly features resemble Power BI’s Power Query interface.
Currently, one of the main players seems to be Databricks, which is available across all major cloud platforms. As far as I know, it was created by the same team behind Apache Spark and essentially sits on top of Spark to simplify its use — 'simplify' is a relative term in this context!
I have read a couple of articles recommending Databricks for the Bronze and Silver layers of the Medallion Architecture, while using a Fabric Warehouse for the Gold layer to facilitate easy Power BI access. However, I got the impression that the integration is not straightforward and using two separate systems probably introduces additional costs.
I would appreciate any recommendations for books or papers that explore this topic in more depth.
June 24, 2025 at 5:53 pm
My current company is an early adopter of Fabric. The decisions that led to that choice are annoying to me - in-house "architects" who make decisions based on marketing promises and have a lack of experience, as well as a global company forcing decisions down on its regional companies. Frankly, for the amount of data we have, we would have been able to build a product 90% as good for 10% of the cost by staying on-prem.
Anyway, it has been a mix of cool and frustrating. There are some definite advantages of the synergies you get between SQL Server, Azure, and Fabric. There were a few complicated table builds that are done MUCH faster by doing them in Fabric/cloud. The way we can use table links to logically separate the data between business end-users has been very handy. The way PowerBI can hook to the existing Fabric data is great. The load process using articles here on SSC has been fun to set up - using metadata tables and pipeline variables to load everything on schedules. Lots of fun!
The frustrating parts are pretty annoying though. The CICD functionality is VERY clunky. We run an hourly load for frequently changing tables and receive random failures without a good explanation. We found Gen2 Dataflows to be pretty useless, unless you're doing a one-time load. Scheduling is clunky. Documentation is fuzzy at best, straight up wrong at worst, and seems to be changing often.
All that said, it's been an adventure, and I'm the type of employee who enjoys adventure more than routine.
Be still, and know that I am God - Psalm 46:10
Viewing 5 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply