SQLServerCentral Article

Azure DWH part 1:Common questions about Azure SQL Data Warehouse

,

Introduction

This article is a compilation of common questions and answers about Azure SQL Data Warehouse.

What is Azure SQL Data Warehouse (ASDW)?

It is a MPP Database in Azure designed to process massive volumes of data.

What is MPP (Massive Parallel Processing)?

It is a mechanism to Process in different nodes the queries of data in parallel in order to process huge amounts of data. With Azure SQL Data Warehouse, some queries that used to take hours or days (and sometimes weeks) in traditional systems can be executed in minutes using MPP.

What are the ASDW components?

The following diagram describes the main components:

The components are described below:

  • Control node - Is in charge of the connections. It is handled by a SQL Database. It is in charge of the workload distribution between the compute nodes.

  • Compute nodes - They are SQL databases. The workload is distributed to the compute nodes and they process the workload in parallel.

  • Storage - All the data is stored in the Azure Blob Storage.

  • Data Movement Service (DMS) - The DMS is a service in charge to move the data between the compute nodes.

I heard that it is very easy to scale up or scale down with ASDW is it true?

Yes. It may take you few minutes to scale using the Azure Portal. This is one of the main advantages using ASDW over the competitors.

How can I automatically scale up or scale down?

You can use T-SQL, REST API or PowerShell.

What is REST API and PowerShell?

PowerShell is a nice Shell to administrate task in Windows OS, SQL Server, Exchange, SharePoint, Azure, etc.

We have nice articles about PowerShell in SQLServerCentral hereREST API stands for Representation State Transfer Application Program Interface. It is a program interface designed to handle HTTP requests. Ideal to work with Web Services and to work in the Cloud.

Is it expensive to have an ASDW?

In the Portal UI, the cheapest service is 1.21 USD per hour and 100 DWU and the most expensive 72.58 USD per hour and 6000 DWU. You are charged per DWU and per storage. For more information about pricing, please refer to this link: ASDW Pricing

What is a DWU ?

DWU stands for Data Warehouse Units. They are used to measure the performance. More DWUs means more performance (and higher costs)

What does pause means in terms of costs in ASDW?

You can Pause your ASDW if you are not using it. It is crucial to pause because you will only be charged per storage. For example, if you use the reports only from 8:00 to 12:00 AM and 14:00 to 18:00 PM, you will save a lot of money pausing the ASDW.

How can I start with ASDW?

Create your first database takes 5 minutes. You can start using this tutorial: Create an ASDW

If it uses T-SQL, is it identical to SQL Server T-SQL?

No, it is similar, but there are sometimes some differences in the syntax.

Do the ASDW tables require a special design?

The ASDW tables have usually a star schema design with fact and dimension tables. If you are familiar with Business Intelligence, it is the same idea. If you are not familiar with Fact tables, Dimensions, star schema and snowflakes, you can read these articles: 

How can I create a table in ASDW?

You can use the following link to create your tables: Create Table (ASDW) 

It is similar to a traditional table in SQL Server, but it usually has a COLUMNSTORE INDEX. In addition, there is a distribution option.

What is the distribution option?

When you create a table in ASDW, you can distribute the workload in different computer nodes. There are two options to distribute:

  1. Round Robin, which distributes the data evenly and randomly.
  2. Hash Distributed, which distributes data in hashing values.

I've heard that it is possible to access to No-SQL Data in ASDW. Is that true?

Yes. Using PolyBase for example, you can query csv files stored in HDInsight or data in Azure Storage Accounts without importing to ASDW tables. This is extremely useful if you have tables of TeraBytes of information. You do not need to create a ETL to import the Data, you can access to the information directly.

Is there a tutorial to learn PolyBase?

Yes. PolyBase can be used in ASDW and SQL Server 2016. Here it is a tutorial: Guide for using PolyBase in ASDWIt is also possible to query data in HDInsight using PolyBase.

Is there a Microsoft certification in ASDW?

No, however, there are some interesting certification related that can be useful to you. Here you have a list of some of the certifications related to Microsoft Azure: Microsoft Azure Certifications

Some of the certifications related are:

Do you recommend any book?

We found only one book available on the web:

This book has valuable information from chapter 1 to 5 and the rest of the chapters (from chapter 6 to 23), they have T-SQL information that can be found in other SQL Books. The final chapter (24) is about Nexus, which is a very interesting chapter, but is is out of the scope of this article.

Conclusions

ASDW is a new technology implemented to create faster reports by distributing the workload between several working nodes. It is a technology in Azure and it is very easy to implement and to scale.

References

Rate

4.75 (4)

You rated this post out of 5. Change rating

Share

Share

Rate

4.75 (4)

You rated this post out of 5. Change rating