Azure data warehouse - Index maintenance

Question

Azure data warehouse - Index maintenance

SQL!$@w$0ME

SSChampion

Points: 12349
More actions
May 22, 2017 at 12:35 pm

#321614

Hello Folks,
How do you perform/schedule index maintenance on Azure Data warehouse. Is there an option to view index fragmentation details like in SQL Server using sys.dm_db_index_physical_stats?
Many thanks!

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

SQL!$@w$0ME SSChampion Points: 12349 More actions · Answer 1

SQL!$@w$0ME - Monday, May 22, 2017 12:35 PM
Hello Folks,
How do you perform/schedule index maintenance on Azure Data warehouse. Is there an option to view index fragmentation details like in SQL Server using sys.dm_db_index_physical_stats?
Many thanks!

I see MS article stating that statistics need to be created/maintained manually in Azure sql data warehouse. But not mentioned about maintaining index fragmentation.
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-best-practices

Lowell SSC Guru Points: 323506 More actions · Answer 2

the structure of a database and dmv's are all exactly the same, whether Azure or SQL
any script that would generate scripts for indexes or users, or anything like that will work on azure with no changes.

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

Lowell - Monday, May 22, 2017 12:41 PM
the structure of a database and dmv's are all exactly the same, whether Azure or SQL.

Not Azure SQLDW. It's a different beast, DMVs differ, most T-SQL works but not all and it's managed very differently.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Lowell SSC Guru Points: 323506 More actions · Answer 4

whoops i was assuming Azure SQL, and not DW, my fault, thank you for the clarification Gail!
I read the request too quickly.

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 5

SQL!$@w$0ME - Monday, May 22, 2017 12:35 PM
Hello Folks,
How do you perform/schedule index maintenance on Azure Data warehouse. Is there an option to view index fragmentation details like in SQL Server using sys.dm_db_index_physical_stats?
Many thanks!

To be honest, I probably wouldn't bother. With the data distributed, possibly partitioned and spread out across all the nodes, fragmentation probably isn't a huge concern.
You MUST update statistics manually (and create them manually too). The control node does not create or update statistics automatically and it having correct statistics is critically important to get good distributed plans.

Are you using mostly columnstore indexes, as is the recommendation I've seen a lot, or rowstore clustered indexes?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

SQL!$@w$0ME SSChampion Points: 12349 More actions · Answer 6

GilaMonster - Monday, May 22, 2017 1:07 PM
SQL!$@w$0ME - Monday, May 22, 2017 12:35 PM
Hello Folks,
How do you perform/schedule index maintenance on Azure Data warehouse. Is there an option to view index fragmentation details like in SQL Server using sys.dm_db_index_physical_stats?
Many thanks!
To be honest, I probably wouldn't bother. With the data distributed, possibly partitioned and spread out across all the nodes, fragmentation probably isn't a huge concern.
You MUST update statistics manually (and create them manually too). The control node does not create or update statistics automatically and it having correct statistics is critically important to get good distributed plans.
Are you using mostly columnstore indexes, as is the recommendation I've seen a lot, or rowstore clustered indexes?

Thanks Gail! We are using default clustered column store index. We have statistics created for each column. No non-clustered indexes created.

SQL!$@w$0ME SSChampion Points: 12349 More actions · Answer 7

GilaMonster - Monday, May 22, 2017 1:03 PM
Lowell - Monday, May 22, 2017 12:41 PM
the structure of a database and dmv's are all exactly the same, whether Azure or SQL.
Not Azure SQLDW. It's a different beast, DMVs differ, most T-SQL works but not all and it's managed very differently.

Thanks! I see I'm not able to use dmv for index physical stats... according to ms article this dmv is not valid for azure sql dw.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 8

SQL!$@w$0ME - Monday, May 22, 2017 1:47 PM
Thanks Gail! We are using default clustered column store index. We have statistics created for each column. No non-clustered indexes created.

Fragmentation as a concept only applies to rowstore indexes, columstores don't have a logical order to them, so they're unable to get fragmented in the traditional sense.
Segment elimination may suffer over time, or may not, but that's got nothing to do with 'fragmentation'

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 9

SQL!$@w$0ME - Monday, May 22, 2017 1:49 PM
Thanks! I see I'm not able to use dmv for index physical stats... according to ms article this dmv is not valid for azure sql dw.

Correct. sys.dm_db_index_physical_stats does not exist on a SQL Datawarehouse.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

SQL!$@w$0ME SSChampion Points: 12349 More actions · Answer 10

GilaMonster - Monday, May 22, 2017 2:14 PM
SQL!$@w$0ME - Monday, May 22, 2017 1:47 PM
Thanks Gail! We are using default clustered column store index. We have statistics created for each column. No non-clustered indexes created.
Fragmentation as a concept only applies to rowstore indexes, columstores don't have a logical order to them, so they're unable to get fragmented in the traditional sense.
Segment elimination may suffer over time, or may not, but that's got nothing to do with 'fragmentation'

Thanks Gail!

Grant Fritchey SSC Guru Points: 398712 More actions · Answer 11

A whole bunch of stuff around indexes and index maintenance in Azure SQL Data Warehouse.

First off, by default, the columns are columnstore, so you're not going to have fragmentation the way we used to think about it. However, what you do have is that columnstore indexes store data two ways, as the index is loaded (either with the data load or during a rebuild) it takes 100k row chunks and puts them into compressed storage. As data changes over time, the data is stored in the uncompressed delta store. If you are doing lots of trickle loads and updates (neither of which is recommended for ADW), you may need to schedule an index rebuild in order to remove deleted data (which is only logically deleted and stored in the delta) and to clean up from updates (which are logically deleted and then inserted into the delta, you can see how messy this all gets in the delta store). The delta groups are automatically closed after one million rows and moved to compressed row groups, but remember, that's by distribution.

Next, you need to be considering the size of your data. If it's fewer than 60 million rows, you may be better off using traditional clustered indexes rather than columnstore. This is because ADW tables are not one table, but 60 tables (distributions) and a columnstore, creating those 100k chunks, needs about a million rows to create a good set of columnstore data. Multiply that by 60 and you can see where the cutoff is. Obviously experimentation and testing with your data set is key, but these are the general guidelines from Microsoft.

To see the space used, DBCC PDW_SHOWSPACEUSED. Also, maybe even better, use sys.dm_pdw_nodes_db_partition_stats.

I hope that helps.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

SQL!$@w$0ME SSChampion Points: 12349 More actions · Answer 12

Grant Fritchey - Tuesday, May 23, 2017 8:02 AM
A whole bunch of stuff around indexes and index maintenance in Azure SQL Data Warehouse.
First off, by default, the columns are columnstore, so you're not going to have fragmentation the way we used to think about it. However, what you do have is that columnstore indexes store data two ways, as the index is loaded (either with the data load or during a rebuild) it takes 100k row chunks and puts them into compressed storage. As data changes over time, the data is stored in the uncompressed delta store. If you are doing lots of trickle loads and updates (neither of which is recommended for ADW), you may need to schedule an index rebuild in order to remove deleted data (which is only logically deleted and stored in the delta) and to clean up from updates (which are logically deleted and then inserted into the delta, you can see how messy this all gets in the delta store). The delta groups are automatically closed after one million rows and moved to compressed row groups, but remember, that's by distribution.
Next, you need to be considering the size of your data. If it's fewer than 60 million rows, you may be better off using traditional clustered indexes rather than columnstore. This is because ADW tables are not one table, but 60 tables (distributions) and a columnstore, creating those 100k chunks, needs about a million rows to create a good set of columnstore data. Multiply that by 60 and you can see where the cutoff is. Obviously experimentation and testing with your data set is key, but these are the general guidelines from Microsoft.
To see the space used, DBCC PDW_SHOWSPACEUSED. Also, maybe even better, use sys.dm_pdw_nodes_db_partition_stats.
I hope that helps.

Thank you very much Grant!