Azure data warehouse - Index maintenance

  • Hello Folks,

    How do you perform/schedule index maintenance on Azure Data warehouse. Is there an option to view index fragmentation details like in SQL Server using sys.dm_db_index_physical_stats?
    Many thanks!

  • SQL!$@w$0ME - Monday, May 22, 2017 12:35 PM

    Hello Folks,

    How do you perform/schedule index maintenance on Azure Data warehouse. Is there an option to view index fragmentation details like in SQL Server using sys.dm_db_index_physical_stats?
    Many thanks!

    I see MS article stating that statistics need to be created/maintained manually in Azure sql data warehouse. But not mentioned about maintaining index fragmentation.
    https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-best-practices

  • the structure of a database and dmv's are all exactly the same, whether Azure or SQL
    any script that would generate scripts for indexes or users, or anything like that will work on azure with no changes.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell - Monday, May 22, 2017 12:41 PM

    the structure of a database and dmv's are all exactly the same, whether Azure or SQL.

    Not Azure SQLDW. It's a different beast, DMVs differ, most T-SQL works but not all and it's managed very differently.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • whoops i was assuming Azure SQL, and not DW, my fault, thank you for the clarification Gail!
    I read the request too quickly.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • SQL!$@w$0ME - Monday, May 22, 2017 12:35 PM

    Hello Folks,

    How do you perform/schedule index maintenance on Azure Data warehouse. Is there an option to view index fragmentation details like in SQL Server using sys.dm_db_index_physical_stats?
    Many thanks!

    To be honest, I probably wouldn't bother. With the data distributed, possibly partitioned and spread out across all the nodes, fragmentation probably isn't a huge concern.
    You MUST update statistics manually (and create them manually too). The control node does not create or update statistics automatically and it having correct statistics is critically important to get good distributed plans.

    Are you using mostly columnstore indexes, as is the recommendation I've seen a lot, or rowstore clustered indexes?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster - Monday, May 22, 2017 1:07 PM

    SQL!$@w$0ME - Monday, May 22, 2017 12:35 PM

    Hello Folks,

    How do you perform/schedule index maintenance on Azure Data warehouse. Is there an option to view index fragmentation details like in SQL Server using sys.dm_db_index_physical_stats?
    Many thanks!

    To be honest, I probably wouldn't bother. With the data distributed, possibly partitioned and spread out across all the nodes, fragmentation probably isn't a huge concern.
    You MUST update statistics manually (and create them manually too). The control node does not create or update statistics automatically and it having correct statistics is critically important to get good distributed plans.

    Are you using mostly columnstore indexes, as is the recommendation I've seen a lot, or rowstore clustered indexes?

    Thanks Gail! We are using default clustered column store index. We have statistics created for each column. No non-clustered indexes created.

  • GilaMonster - Monday, May 22, 2017 1:03 PM

    Lowell - Monday, May 22, 2017 12:41 PM

    the structure of a database and dmv's are all exactly the same, whether Azure or SQL.

    Not Azure SQLDW. It's a different beast, DMVs differ, most T-SQL works but not all and it's managed very differently.

    Thanks! I see I'm not able to use dmv for index physical stats... according to ms article this dmv is not valid for azure sql dw.

  • SQL!$@w$0ME - Monday, May 22, 2017 1:47 PM

    Thanks Gail! We are using default clustered column store index. We have statistics created for each column. No non-clustered indexes created.

    Fragmentation as a concept only applies to rowstore indexes, columstores don't have a logical order to them, so they're unable to get fragmented in the traditional sense.
    Segment elimination may suffer  over time, or may not, but that's got nothing to do with 'fragmentation'

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • SQL!$@w$0ME - Monday, May 22, 2017 1:49 PM

    Thanks! I see I'm not able to use dmv for index physical stats... according to ms article this dmv is not valid for azure sql dw.

    Correct. sys.dm_db_index_physical_stats does not exist on a SQL Datawarehouse.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster - Monday, May 22, 2017 2:14 PM

    SQL!$@w$0ME - Monday, May 22, 2017 1:47 PM

    Thanks Gail! We are using default clustered column store index. We have statistics created for each column. No non-clustered indexes created.

    Fragmentation as a concept only applies to rowstore indexes, columstores don't have a logical order to them, so they're unable to get fragmented in the traditional sense.
    Segment elimination may suffer  over time, or may not, but that's got nothing to do with 'fragmentation'

    Thanks Gail!

  • A whole bunch of stuff around indexes and index maintenance in Azure SQL Data Warehouse.

    First off, by default, the columns are columnstore, so you're not going to have fragmentation the way we used to think about it. However, what you do have is that columnstore indexes store data two ways, as the index is loaded (either with the data load or during a rebuild) it takes 100k row chunks and puts them into compressed storage. As data changes over time, the data is stored in the uncompressed delta store. If you are doing lots of trickle loads and updates (neither of which is recommended for ADW), you may need to schedule an index rebuild in order to remove deleted data (which is only logically deleted and stored in the delta) and to clean up from updates (which are logically deleted and then inserted into the delta, you can see how messy this all gets in the delta store). The delta groups are automatically closed after one million rows and moved to compressed row groups, but remember, that's by distribution.

    Next, you need to be considering the size of your data. If it's fewer than 60 million rows, you may be better off using traditional clustered indexes rather than columnstore. This is because ADW tables are not one table, but 60 tables (distributions) and a columnstore, creating those 100k chunks, needs about a million rows to create a good set of columnstore data. Multiply that by 60 and you can see where the cutoff is. Obviously experimentation and testing with your data set is key, but these are the general guidelines from Microsoft.

    To see the space used, DBCC PDW_SHOWSPACEUSED. Also, maybe even better, use sys.dm_pdw_nodes_db_partition_stats. 

    I hope that helps.  

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey - Tuesday, May 23, 2017 8:02 AM

    A whole bunch of stuff around indexes and index maintenance in Azure SQL Data Warehouse.

    First off, by default, the columns are columnstore, so you're not going to have fragmentation the way we used to think about it. However, what you do have is that columnstore indexes store data two ways, as the index is loaded (either with the data load or during a rebuild) it takes 100k row chunks and puts them into compressed storage. As data changes over time, the data is stored in the uncompressed delta store. If you are doing lots of trickle loads and updates (neither of which is recommended for ADW), you may need to schedule an index rebuild in order to remove deleted data (which is only logically deleted and stored in the delta) and to clean up from updates (which are logically deleted and then inserted into the delta, you can see how messy this all gets in the delta store). The delta groups are automatically closed after one million rows and moved to compressed row groups, but remember, that's by distribution.

    Next, you need to be considering the size of your data. If it's fewer than 60 million rows, you may be better off using traditional clustered indexes rather than columnstore. This is because ADW tables are not one table, but 60 tables (distributions) and a columnstore, creating those 100k chunks, needs about a million rows to create a good set of columnstore data. Multiply that by 60 and you can see where the cutoff is. Obviously experimentation and testing with your data set is key, but these are the general guidelines from Microsoft.

    To see the space used, DBCC PDW_SHOWSPACEUSED. Also, maybe even better, use sys.dm_pdw_nodes_db_partition_stats. 

    I hope that helps.  

    Thank you very much Grant!

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply