Day 2 was my opportunity to be focused and dive into a few technologies that were really peaking my interest, namely Azure Synapse Analytics and SQL Server 2019. Since I heard the Synapse announcement I wanted to make sure I sought out every session on this and understood everything I could possible learn and today certainly went a long way to help me understanding this new technology and also realized there is so much more to learn, but there are still 3 more days left.
With Azure Synapse everything I keep learning about this is it’s certainly impressive and I will stand by the statement that Snowflake, Google Big Query and Amazon Redshift better be paying attention because the complete integrated story, performance benchmarks and cost model have closed some big gaps that previously existed with Azure Data Lake and Azure SQL DW. One of the sessions I had a chance to attend was “Democratizing the Data Lake with On-Demand Capabilities in Azure Synapse Analytics” with Josep Aguilar Saborit and Charles Feddersen, where I came away with two very important takeaways. First, was the ease of being able to query and import data, in a nutshell get the data into Azure Data Lake Storage and it’s available. With that said it seems that Microsoft and most of the industry are standardizing on parquet as the file format of choice. Which if you are not familiar with is an extremely easy file format to get almost any file into, Azure Data Factory can really help here. Parquet format files are first class citizens within Azure Synapse and querying or importing this data is almost effortless.
Now let’s talk a little bit about the architecture behind Synapse and while yes Azure Synapse can be thought of as Azure SQL DW 2.0 you need to take everything you may already know about Azure SQL DW architecture and throw it out the window. The most important thing to remember is that the architecture behind this new service has been reinvented from the ground up and made to be a cloud first design. If you remember Azure SQL DW was not a cloud first design as it was basically the predecessor of SQL Server PDW just ported into the cloud with very little architecture change.
Now let me caveat the next few statements with I still have a lot to learn about Azure Synapse architecture but here are a few items that I have picked up on and I have been very impressed by. Both the failure detection and hotspot detection that is baked into the solution is quite impressive. First let’s understand that queries are natively treated as distributed and alight with both Hash and User partitions with the execution nodes, see picture below (Sorry these are pictures taken during presentations)
With failure detection being handled at the node level it allows for any failure at a node level to be automatically recovered and redistributed to the other active nodes. That’s right unless for some reason you lose all nodes your queries will not fail. If you are familiar with more legacy architectures, you would then know that this was not always the case.
Now with hot spot detection imagine the scenario that a query task was partitioned in a way where one node was doing a skewed amount of the work. What Synapse will do is recognize this skewed load and automatically redistribute this load to other nodes that are currently being underutilized. Oh, wait I am not done with the amazing special sauce, if multiple nodes become overloaded and distribution can’t be done then additional nodes can just be added to achieve ultimate performance.
Now let’s move on to SQL Server 2019 and new features in Azure SQL which I am just going to lump all in together. Now this is not a full comprehensive list of all new features just some things I wanted to call out. Let’s start with Azure SQL Serverless, if you are looking for a way to use Azure SQL and not incur a crazy cost then take a look at Azure SQL Serverless for scenarios such as Dev/Test environments (I’m sure there are other use cases but this is what I am focusing on for now). Typically in Dev/Test environments you don’t have the requirement to be up 100% of the time and therefore you don’t want to have the cost of running a database 24/7, and this is where serverless comes into play as you are only charged for the time that you are actually running queries. I will go out on a limb and say this could easily be a 40-50% cost savings for existing Dev/Test environments you have that are running on current Azure SQL Database and incurring 24/7 costs.
One more feature set I want to quickly hit on is the Intelligent Query Processing. With this new feature there are quite a number of areas of improvement so let’s just hit on a quick few. The first is the memory grant feedback, where if the initial execution if the memory grant was too small and caused spill to disk the subsequent executions will be adjusted to properly set memory grants and therefore reduce spill and increase query execution time. Table Variable deferred compilation now treats a table variable just as a temp table has always been done and therefore query cardinality is now no longer assumed to be a rowcount of 1. While I am not a huge table variable fan if you do have a workload that is very table variable heavy then just upgrading to SQL Server 2019 should result in performance gains just by upgrading.
One last note is on Azure Data Studio as I had a chance to see Alan Yu’s session on Azure Data Studio focused on notebooks. If you are not using Azure Data Studio and notebooks hurry up and start learning you will thank me later. Plus, Azure Data Studio now has PowerShell notebooks and is a one stop show for almost any database developer. Azure Data Studio November release