This blog covers the content and points to the code used to create the demos in my Azure SQL Database Elasticity presentations. As of today, I have presented this at the Minnesota SQL Server User Group (PASSMN) in September 2020 and as a webinar for 3Cloud in October 2020.
Elastic queries allow developers to interact with data from multiple databases supported on the Azure SQL database platform including Synapse. Elastic queries are often referred to as Polybase which is currently implemented in SQL Server 2019 and Azure Synapse. The key difference is that elastic queries only allow you to interact with other Azure SQL Databases but not Hadoop or other database implementations (e.g. Teradata or Oracle). Part of the confusion comes from the fact that the implementation looks very similar. Both toolsets use external tables in SQL Server to interact with the connected data sources. However, Polybase requires additional components to run whereas elastic queries are ready to go without additional setup.
Be aware elastic queries are still in preview. Also, elastic queries are included in the cost of Azure SQL Database in standard and premium tiers.
Elastic Query Strategies
Elastic queries support three key concepts and will influence how you implement the feature.
- Vertical partitioning. This concept uses complete tables in separate databases. It could be a shared date table or dimensions in a data warehouse solution. Vertical partitioning is a method to scale out data solutions. This is one method to use Azure SQL database for larger data solutions.
- Horizontal partitioning or sharding. Whereas vertical partitioning keeps tables together, horizontal partitioning shards or spreads the data from a single table across multiple Azure SQL Databases. This is the most complex type of partitioning as it requires a shard map. This is typically implemented with .NET or Java applications.
- Data virtualization. This concept is a mix of the partitioning solutions to achieve the goal of virtualizing the data. The idea with data virtualization is that we can use a single Azure SQL Database to interact with data from multiple databases. While this concept is limited due to the limit to use Azure SQL Databases, it is a concept to look for more improvements as the product matures even more.
Elastic Query Demo
The demo used in the presentations is configured as shown here:
Three S1 Azure SQL Databases on the same Azure SQL Server. I used ADF (Azure Data Factory) to move Fact.Purchase to WideWorldDW_2 and the three related dimensions (dimDate, dimStockItem, dimSupplier) to WideWorldDW_3. I then used WideWorldDW_3 to implement the external tables to work with the data. The WideWorldImportersDW-Standard was used as the original restore of the sample database. It is the source of the data but is not used in the demos.
One note on the demo. I did not include the ADF jobs. Use the Copy activity to move the tables to the target databases. You can find more information here.
The demo code to set up the environment can be found here.
Elastic jobs is the alternative to SQL Server Agent Jobs in Azure SQL Database. While Agent is included in Azure SQL Managed Instance, the rest of the platform needed an option to create jobs. Elastic jobs solves that issue. Currently this is also in preview and is also included with Azure SQL Database. The only additional cost is that a dedicated job database is required to support elastic jobs.
The best comparison is still with SQL Server Agent. Elastic jobs are structured with jobs which have job steps. The only limitation at the moment is that job steps must be T-SQL. Jobs can be created in the Azure portal, with PowerShell, with REST, or with T-SQL.
One of the key pieces that was originally missing from the Azure SQL Database rollout was cross database transactions that were supported in SQL Server with MSDTC. Elastic transactions add this functionality to Azure SQL Database and is built into the platform. This functionality is application driven and currently supported in the latest .NET libraries. Overall, this will allow you to support transactions across 100 databases or fewer. While there is no limit, Microsoft currently recommends only using this to support distributed transactions over 100 or less databases due to potential performance issues.
There are a few limitations to be aware of:
- Only supports Azure SQL Databases
- Only supports .NET transactions
- Does not support T-SQL Distributed transactions
- Does not support WCF transactions
Microsoft continues to improve the functionality in Azure SQL Database. These elastic features are part of that process. While I typically do not have many uses for distributed transactions, we have actively implemented elastic queries and elastic jobs for customers and look to use them more in the future.
Azure SQL Elasticity References
Hopefully you too will be able to use the elastic functionality as you continue to embrace the Azure data platform.