SQLServerCentral Article

Data Warehousing in the cloud – Azure Style

,

For a few months now, we have had a SQL Server edition in the cloud known as Azure SQL Data Warehouse (ASDW). This version enables you to provision a data warehouse instance in just 3 to 5 minutes. ASDW’s main benefit is that it allows you to scale the compute of the data warehouse to provide the resources required to keep up with your organization’s data demands. The compute resources include additional SQL instances should you scale up, or the removal of these instances as you scale down. This is an oversimplification of scaling, but you could conceptually view the scaling of a data warehouse as the addition or removal of SQL Servers as you scale up and down. And this happens under the hood, with a simple change to an option within the Azure portal.

So, if you are performing heavy data loads, you can maximise the amount of compute for the duration of a data load, only to scale the compute back down once the load is complete. Furthermore, if you don’t require access to your data, you can pause the compute so that you can keep control of your costs, while still retaining the data.

These are some of the business benefits of Azure SQL Data Warehouse, but how does it work?

SQL Data Warehouse is a massively parallel processing (MPP) distributed database system. Behind the scenes, SQL Data Warehouse spreads your data across many shared-nothing storage and processing units. These are represented as separate SQL instances described above. The data is stored in a Premium locally redundant storage layer on top of which dynamically linked Compute nodes execute distributed queries. SQL Data Warehouse takes a "divide and conquer" approach to running loads and complex queries. Requests are received by a Control node, optimized for distribution, and then passed to Compute nodes that perform their work in parallel as shown in the following graphic. 

The Azure Data Warehouse platform is architected for analytical workloads. Workloads that either have many complex queries or involve statistical processing. It is able to handle analytical workloads as it can take a single complex query, and divide it into smaller queries that are handled by individual Compute nodes. This essentially means that Azure SQL Data Warehouse can parallelize a query across the multiple nodes with smaller queries, with the Control node performing a final aggregation of all the Compute nods work before returning the query result back to the requesting client. It can also interface with unstructured data stored in an Azure Blob Store or Azure Data Lake Storage. But we will discuss this benefit more in a blog post later.

Now there are downside to this architecture. If you require a data solution that requires transactional consistency and high concurrency as your requirements, then Azure SQL Data Warehouse is not the service to use. Azure SQL Database would be a more appropriate choice. To illustrate this point. Azure SQL Data Warehouse supports the transaction isolation level of Read Uncommitted and this cannot be changed. SQL Database, supports the full range of transaction isolation levels.

It only takes minutes to get Azure SQL Data Warehouse up and running, and you can either do this within the Azure Portal, or alternatively you can use PowerShell.

At this point I will give you a choice, you can either watch the following 10 minute video that demonstrates how to set up an instance of Azure SQL Data Warehouse. Alternatively, you can continue to read about the process of setting up an instance. If you’re hard core, why not do both! But I understand your time is precious.

OK, so there is information that you need to have to hand before creating an Azure SQL Data Warehouse:

User Interface

UI Element

Description

Database Name

The name of the new database. This name must be unique on the SQL server.

Subscription

A Windows Azure subscription grants you access to Windows Azure services and to the Windows Azure Platform Management Portal. In this section you select the subscription to which the Azure SQL Data Warehouse is assigned.

Resource Group

A resource group is a container that holds related resources for an Azure solution. In this area, you can select an existing resource group, or create a new resource group to host the SQL DW Instance

Select Source

Select Blank Database to create a blank database. Backup to restore a database. Select Sample to choose the sample database AdventureworksDW

Server

In this area you can either create a new SQL Server instance to host the database, or make use of an existing server.

Collation

Defines the rules that sorts and compares data. The collation cannot be changed once the database is created

Performance

Is a slider that defines the Data Warehouse Units (DWU). The higher the value, the more compute

Armed with this information, you can then go ahead and create the SQL Data Warehouse Instance.

Alternatively, you can use the same information to create a PowerShell script to sign into an Azure Subscription, create a resource group and then create a SQL Server instance, and optionally a database. The following PowerShell code creates a resource group named cto_ads_prep_rg located in North Europe using the New-AzureRmResourceGroup cmdlet.  The script then creates a SQL Server instance named ctomsftadssqlsrv with an admin account named ctesta-oneill using the New-AzureRmSqlServer cmdlet.

######################################################################
##                PART I: Creating the Azure SQL Server             ##
######################################################################
# Sign in to Azure and set the WINDOWS AZURE subscription to work with
$SubscriptionId = "XXXXXXXX-xxXX-XXxx-XXXX-xxxxxxxxxxxx"
Add-AzureRmAccount
Set-AzureRmContext -SubscriptionId $SubscriptionId
# CREATE A RESOURCE GROUP
$resourceGroupName = "cto_ads_prep_rg"
$rglocation = "North Europe"
New-AzureRmResourceGroup -Name $resourceGroupName -Location $rglocation
# CREATE A SERVER
$serverName = "ctomsftadssqlsrv"
$serverVersion = "12.0"
$serverLocation = "North Europe"
$serverAdmin = "ctesta-oneill"
$serverPassword = "P@ssw0rd"
$securePassword = ConvertTo-SecureString –String $serverPassword –AsPlainText -Force
$serverCreds = New-Object –TypeName System.Management.Automation.PSCredential –ArgumentList $serverAdmin, $securePassword
$sqlDbServer = New-AzureRmSqlServer -ResourceGroupName $resourceGroupName -ServerName $serverName -Location $serverLocation -ServerVersion $serverVersion -SqlAdministratorCredentials $serverCreds

You can also use PowerShell to configure firewall settings on the SQL Server instance using the New-AzureRmSqlServerFirewallRule cmdlet. This can be performed in the Azure Portal as well.

# CREATE A SERVER FIREWALL RULE
$ip = (Test-Connection -ComputerName $env:COMPUTERNAME -Count 1 -Verbose).IPV4Address.IPAddressToString
$firewallRuleName = 'Client IP address'
$firewallStartIp = $ip
$firewallEndIp = $ip
$fireWallRule = New-AzureRmSqlServerFirewallRule -ResourceGroupName $resourceGroupName -ServerName $serverName -FirewallRuleName $firewallRuleName -StartIpAddress $firewallStartIp -EndIpAddress $firewallEndIp

With the firewall rules defined, you will then be able to access to Azure SQL Server using tools such as Visual Studio and SQL Server Management Studio, where you could run T-SQL scripts to create and manage your database, although this can be done in PowerShell using the New-AzureRmSqlDatabase cmdlets as well. The following code creates a data warehouse named ContosoRetailDW.

# CREATE A SQL DATABASE
$databaseName = "ContosoRetailDW"
$databaseEdition = "DataWarehouse"
$RequestedServiceObjectiveName = "DW400"
$sqlDatabase = New-AzureRmSqlDatabase -ResourceGroupName $resourceGroupName -RequestedServiceObjectiveName $RequestedServiceObjectiveName -ServerName $serverName -DatabaseName $databaseName -Edition $databaseEdition.

Once a database has been created, you can then understand how to scale and Pause a database in Azure SQL Data Warehouse, but that will be the next blog.

Rate

4.67 (6)

You rated this post out of 5. Change rating

Share

Share

Rate

4.67 (6)

You rated this post out of 5. Change rating