This week was Microsoft Build. After four years, it was back in person in Seattle and available online. I didn't attend in person, but I did watch a number of sessions and also went through the Build 2023 Book of News. You can download the book if you want, as it provides a lengthy list of announcements with descriptions and additional links to resources. It's a neat way to summarize the content of the conference.
This article will highlight some of the things I think are useful or interesting for data professionals. I've got sections for these areas:
- Azure SQL Database
- Microsoft Fabric
- Power BI
- Copilot and Plugins
I'll go over these below. I'll summarize a few things, but there is plenty more to learn and dig into if you find these items relevant to your organization.
I have some thoughts on the main keynote in the editorial today. The second day's keynote was more AI, this time showing various demos with Microsoft 365, Teams, and the Microsoft Graph. There are some useful integrations for using the Copilot AI to help search and surface information, but again as I mentioned in the editorial, we have to trust this is actually gathering the data we need and excluding data we don't need. Where I worry about while watching the demos is that people will trust the AI has not missed or excluded information we need. We also need to believe it isn't giving us irrelevant data, which is a data problem I see with Google these days. There is so much old information and the tech world changes so quickly, that many times search results have older information included that I don't need.
That's a key issue and one that I worry about with AI. How do we continue to train the AI on which information is useful and which isn't? How do we age out and verify this is the case, that the AI isn't including outdated results, especially in a fast-paced business environment? I love the natural language interfaces, which I think have been hard to build for developers. Customers and clients love them, but just as we see people copying and pasting code from online sites that isn't quite right for the problem, I bet we will see more business users doing this with AI assistants.
The demos to help you build plugins using Copilot for your data are good. I can see us incorporating these more often to make interactions with computing systems much more smooth, but I do hope that we do so with the understanding that we're using a tool and we're still responsible. That's my hope, though I worry that too many people will use CoPilot (or similar systems) as a crutch and blame them when something doesn't work well.
The Windows keynote portion was nice, showing productivity enhancements for users. I think that adding in Copilot might be helpful, but really, for most of us data professionals, we just need Windows to not get in the way. They are adding in a things that make it easier for us as developers to customize our environments in an easier fashion and adding more to the Terminal. I use a third party tool, I've used Windows for 30 years, and I hate the snapping, so I think most of these announcements aren't that relevant.
However, I am biased. I found Windows 11 to be slower and more hesitant after an upgrade on one machine and I regret doing the upgrade. I expect quick response from the Windows key and from apps, and I don't get that at times with W11, especially on suspend/restart and when changing networks. That's something I do often with a laptop.
However, they are bring more apps to Windows, like WhatsApp, that were previously only on mobile devices. They are also improving the Windows Store experience, which helps those of us that might be interested in getting our apps on Windows desktops.
Azure SQL Database
I'm a relational person, so having changes to the relational database is of interest to me, and many of you. We had SQL Server 2022 come out late last year, and we had some interesting features like the failover to and restore from Managed Instance. This event had relatively few changes in the relational space.
The main one is that the Hyperscale edition of Azure SQL Database has an elastic pool option. Hyperscale is for large database systems, where you need 10TB+ of storage. Building a system this large with good HA and DR is hard, and Hyperscale made this easy for people. However, it was a big commitment. Now, with this announcement of the public preview, you can make a pool of up to 25 databases that can grow up to the 100TB limit. So if you have some large databases, this might be of interest to you.
Compute can be scaled up and down, and out. There are read-scale replicas if you need a higher read workload. The log has 120MB/s for the pool and up to 100MB/s for each database. I tend not to work on large systems, but if you have a few of these and are looking to get out of the hardware business, this might be of use to you.
There also are new confidential computing options in Azure, which may be of interest to those of you working with companies that are very security conscious. I think that's few of us, but these are available.
The big announcement for data professionals is Microsoft Fabric. This is the next evolution of the Synapse platform, IMHO, combining all the analytics tools. This puts Azure Synapse Analytics, Azure Data Factory, and Power BI together as one offering. It is designed to someone integrate a lot of the analytics work someone needs to do into one place. There are other parts of Microsoft Fabric, and most of my BI friends are talking about this. The platform has been in private preview for a long time, and many MVPs have been working with it and testing things out for awhile.
One of the more interesting things I've learned about this is OneLake, which is analogous to OneDrive, but for your data. In this case, it's based on using Delta and Parquet file formats to store your data. One lake for everyone in your org, with access for everyone from all their tools. In addition, they are supposed to be adding virtualized access to data from other lakes, such as those that might be in AWS or GCP. It's a neat idea, and I've seen some suggestions of archiving or exporting relational data from SQL Server and other platforms into Parquet for analysis. Rather than ETL, just export to a parquet file.
Reza Rad has a great video that summarizes a lot of things, but his main takeaway is this makes it simpler to build an analytics app or project because you aren't piecemealing things together. That alone is worth looking at if you struggle to find and learn about all these products. With the various workloads (seven of them), you can start to manage a project much easier than in the past.
A few other takes that might help you learn more:
- An overview from Ben Jarvis
- Alex Whittle's description of this as Power BI + Synapse + DW + DataLake + ML
- James Serra's take on the product
- A Power BI angle on Fabric from Marc Lelijveld
- Spreading Your SQL Server Wings with Fabric
- Does the game change? from Paul Turley
- Why Prathy is excited
- A bit behind the scenes from Matthew Roche
- First look from Damian Widera
- Using Power BI with DirectLake query from Gilbert Quevauvilliers
- Enabling Fabric from Reza Rad
- An ingestion demo from Dennes Torres
Everyone's excited. I'm meh, but I'm not that into analytics currently.
As with other things, Power BI gets Copilot. I am actually excited about this since I suck at DAX and I always need help. I don't know if this means I need to better learn to prompt instead of working on DAX skills, but I do know that this will mean I need to work with some smaller data sets so I can see if what I want to do in Power BI actually is working and producing the correct results. Then, of course, I would hope that Copilot will help me repoint all my visuals to new, larger, data sets.
There also is the new Direct Lake query mode, which is like the DirectQuery, but optimized for the Parquet/Delta files you'll store in OneLake.
The big thing for me here is that in preview there is a desktop developer mode that will integrate better with Git. We need better version control and one thing I really dislike about developers that build tool is that they aren't always thinking about how the config/options/storage/etc. will work with a VCS tool. Everything needs to be integrated with VCS and CLI these days.
Copilot appears to be the name for using ChatGPT or other types of AI models in Microsoft products. They are adding this to everything from O365 to Teams to Windows to ADS. I actually installed the ADS plugin the other day. I'm not sure how helpful it is yet, but I'll see.
I suspect that many of us will need to spend some time here, if for no other reason that our coworkers, clients, customers, and friends will be asking us about things. I'm looking to spend some of my 10% time here each week, trying to better understand prompts and how things work. I do think that this can speed up some development of features that would otherwise be hard to code because the AI can ingest lots of data and spit out some simple tasks or actions that let the user click on something to turn it on, off, or change a feature. That alone can be time consuming for work.
I don't know if this will be helpful for DBAs and query writers, but I certainly can see this being useful in Power BI and analytics. The options in those areas are so wide compared to the SQL language that I suspect this will speed up the initial structure of coding. I would hope this will also help developers always create PKs and FKs and perhaps even move from embedded code to stored procs in more places, but I doubt it.
This also gets added to the Power Platform, so if you get asked to build apps there, or are doing so, you have an assistant to help.
I don't use CosmosDB, but I do find clients that are embedded in the Microsoft stack looking at it. The key-value store and MongoDB compatible APIs are useful. This platform gets burst capacity, which is nice. This gives you a cushion if you haven't provisioned enough capacity. You will need alerts and possibly be ready to provision more, but this is welcome for those not sure how many RUs you need.
There are also hierarchical partition keys coming, which let you have up to 3 keys instead of 1. I know a few people that struggle to try and decide how to pick the partition key, so this might help.
CosmosDB also gets materialized views, which should help people not copy data between containers and store duplicates.
There is a new change feed which has all versions and deletes in it. The helps developers get a full view of data inside the backup retention period. Useful for troubleshooting what might be happening in your app.
I've been lucky to attend Build in person in the past, and even present at the event. I would have liked to be in Seattle this week and see the event, but I've had a lot of travel and since I'm gone for 17 days in June from home, I didn't need another trip. Plus I have a lot of spring grass to cut, oil to change, and fence to fix.