Query Performance Tuning – A Methodical Approach

  • FootyRef (6/8/2015)


    Say my source data is only looking at April, May, June, July and August. I want to use that information to swap those partitions from the fact table into a staging table, do all of the processing against that and then swap them back into the fact.

    No, you don't want to do that.

    Switching is for loading into the table or removing data, not something you do while running queries against the table.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (6/8/2015)


    FootyRef (6/8/2015)


    Say my source data is only looking at April, May, June, July and August. I want to use that information to swap those partitions from the fact table into a staging table, do all of the processing against that and then swap them back into the fact.

    No, you don't want to do that.

    Switching is for loading into the table or removing data, not something you do while running queries against the table.

    I was only going to do the switching while doing the ETL processes early in the morning.

  • FootyRef (6/8/2015)


    GilaMonster (6/8/2015)


    FootyRef (6/8/2015)


    Say my source data is only looking at April, May, June, July and August. I want to use that information to swap those partitions from the fact table into a staging table, do all of the processing against that and then swap them back into the fact.

    No, you don't want to do that.

    Switching is for loading into the table or removing data, not something you do while running queries against the table.

    I was only going to do the switching while doing the ETL processes early in the morning.

    As I said, switching is not something you do when you're querying the table. It's something you do to load the fact table or to remove old data you no longer want in the fact table.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks Gail for a nice article.

  • GilaMonster (6/8/2015)


    FootyRef (6/8/2015)


    GilaMonster (6/8/2015)


    FootyRef (6/8/2015)


    Say my source data is only looking at April, May, June, July and August. I want to use that information to swap those partitions from the fact table into a staging table, do all of the processing against that and then swap them back into the fact.

    No, you don't want to do that.

    Switching is for loading into the table or removing data, not something you do while running queries against the table.

    I was only going to do the switching while doing the ETL processes early in the morning.

    As I said, switching is not something you do when you're querying the table. It's something you do to load the fact table or to remove old data you no longer want in the fact table.

    Am I not explaining something correctly because this is the second time you have made this statement. I only want to do the switching of partitions when running the daily ETL processes. I originally had the understanding that partitioning of data in a mart would improve report performance but your article has convinced me otherwise. The partitioning would only be happening in the data warehouse when no one else is in the data.

  • FootyRef (6/9/2015)


    I only want to do the switching of partitions when running the daily ETL processes.

    You've talked about having monthly partitions and switching partitions out of the fact table, into a staging table, running some processing against the data you've switched out and then switching it back in.

    That is NOT what partition switching is for. It's not so you can switch portions of the table out to process them. It is so that you can load data into your fact tables without needing the long-duration locks that an insert would take. If you have partitioning by month, you'll switch data in and/or out at most once a month, load the new month's data in and move the earliest month out to archive or to be deleted if applicable.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (6/9/2015)


    FootyRef (6/9/2015)


    I only want to do the switching of partitions when running the daily ETL processes.

    You've talked about having monthly partitions and switching partitions out of the fact table, into a staging table, running some processing against the data you've switched out and then switching it back in.

    That is NOT what partition switching is for. It's not so you can switch portions of the table out to process them. It is so that you can load data into your fact tables without needing the long-duration locks that an insert would take. If you have partitioning by month, you'll switch data in and/or out at most once a month, load the new month's data in and move the earliest month out to archive or to be deleted if applicable.

    So, I can't switch, say, four partitions out into a staging table, do inserts and updates against data in those partitions and switch them back into the fact table? For example, we have a fact table that keeps track of benefits issued to and taken away from our participants. We get new data every day and let's say it is the second of the month and we received data for the last day of last month and yesterday, the first day of this month. I want to grab the partitions in the fact that are for last and this month, switch them into a staging table, and run the ETL processes that will either add the new benefits issued or update the benefits previously issued to participants. Once the processes have completed, the partitions will be switched back from the staging table into the fact table in the data mart.

    I have read many, many documents talking about that is how it is done but perhaps I have misunderstood them all. I am just trying to figure out a way to dynamically do it based on the months (based on Benefit Issuance Date) of data I happen to be processing that day. This way, I only need to switch out those partitions and do not need to stage the whole table and load and index it into the mart.

  • FootyRef (6/9/2015)


    So, I can't switch, say, four partitions out into a staging table, do inserts and updates against data in those partitions and switch them back into the fact table?

    You can do that. It's technically possible, though more complex than what you described.

    Whether it's a good idea is entirely another matter. Whether it'll gain you anything other than massive complexity and a chance of something going wrong is still another.

    The point of partition switching is to let you do the initial loads from a staging table into the fact table quickly, and to allow you to remove large numbers of rows, either to an archive table or to be deleted quickly, not so you can play musical chairs with portions of the table.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (6/9/2015)


    FootyRef (6/9/2015)


    So, I can't switch, say, four partitions out into a staging table, do inserts and updates against data in those partitions and switch them back into the fact table?

    You can do that. It's technically possible, though more complex than what you described.

    Whether it's a good idea is entirely another matter. Whether it'll gain you anything other than massive complexity and a chance of something going wrong is still another.

    The point of partition switching is to let you do the initial loads from a staging table into the fact table quickly, and to allow you to remove large numbers of rows, either to an archive table or to be deleted quickly, not so you can play musical chairs with portions of the table.

    OK, I see where you are coming from but, like I said, I got the idea to do this from many different articles that discussed not having time to bulk load and index the whole table from an ETL staging area into a data warehouse. They would somehow switch out the partitions they needed to process, run their ETL against only those partitions and switch them back into the warehouse. I just haven't found anything that would automatically read our source data to figure out what months we needed to process and then do the actual switching in of the partitions and then switch them back to the mart. I just need to keep looking and was hoping perhaps you might have heard of a way to do it. Maybe when I figure it out then I'll share it here if anyone has interest in doing it.

  • One thing you don't mention in the article is whether or not the Columnstore index processed in row or batch mode. Just a guess, but it probably was in row mode. If you forced the plan to go parallel (pick your poison for that) so that it would process in batch mode, I'm curious how much performance gain you might see then? Not that performance is the only consideration, as you say in the article. Just interested.

    Oh, and thanks for talking about the dangers of seeing partitioning as performance tuning. That particular meme just won't go away.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Indeed I didn't. My bad.

    The plan's cost is over 200, so it doesn't need any help going parallel.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Grant Fritchey (6/9/2015)


    One thing you don't mention in the article is whether or not the Columnstore index processed in row or batch mode. Just a guess, but it probably was in row mode. If you forced the plan to go parallel (pick your poison for that) so that it would process in batch mode, I'm curious how much performance gain you might see then? Not that performance is the only consideration, as you say in the article. Just interested.

    Oh, and thanks for talking about the dangers of seeing partitioning as performance tuning. That particular meme just won't go away.

    Wouldn't they get some improvement just from having their ETL dynamically selecting sets of records to update based on their partitioning strategy?

    You always have to account for corrections and late arriving data that do not match your assumption that all updates will be in the partition you are swapping in and out in their scenario.

    Part of the partitioning as a performance might come from SSAS.

    There I think you will see gains, plus advantages of processing the partitions one at a time to give users current data when processing is running late.

  • FootyRef (6/9/2015)


    GilaMonster (6/9/2015)


    FootyRef (6/9/2015)


    So, I can't switch, say, four partitions out into a staging table, do inserts and updates against data in those partitions and switch them back into the fact table?

    You can do that. It's technically possible, though more complex than what you described.

    Whether it's a good idea is entirely another matter. Whether it'll gain you anything other than massive complexity and a chance of something going wrong is still another.

    The point of partition switching is to let you do the initial loads from a staging table into the fact table quickly, and to allow you to remove large numbers of rows, either to an archive table or to be deleted quickly, not so you can play musical chairs with portions of the table.

    OK, I see where you are coming from but, like I said, I got the idea to do this from many different articles that discussed not having time to bulk load and index the whole table from an ETL staging area into a data warehouse. They would somehow switch out the partitions they needed to process, run their ETL against only those partitions and switch them back into the warehouse. I just haven't found anything that would automatically read our source data to figure out what months we needed to process and then do the actual switching in of the partitions and then switch them back to the mart. I just need to keep looking and was hoping perhaps you might have heard of a way to do it. Maybe when I figure it out then I'll share it here if anyone has interest in doing it.

    Sounds like a bad idea.

    Like Gail said, partitioning is not a performance feature and is aimed at easier maintenance. Sometimes, in very targeted scenarios, it can bring in some performance improvements as a side effect. Datawarehousing is definitely one of those scenarios, with faster partition loading.

    Queries will hardly see any benefit from partitioning, especially when free querying is allowed: the reports will include predicates on the partitioning key in very few cases compared to baked reports.

    Anyway, I don't think that the solution you are looking for is a smart use of partition switching. What will users see while you're updating the partition? No data for that partition?

    Moreover, if you're strictly working on one partition at a time, any DML that contain a predicate on the partitioning key will perform partition elimination anyway, so I think it's totally pointless to have a process that requires switching the partition out before working on it.

    Going down this path will also complicate your data loading process, forcing you to loop on the partitions rather than updating data in bulk when it spans over multiple partitions. In my opinion it's a totally unneeded complication.

    BTW, if you have any articles that support this idea, I would be happy to read them.

    Just because there's an article out there stating a fact, it doesn't necessarily mean that it's a good idea. You're fortunate enough to have the advice of Gail on this subject and, trust me, you couldn't be luckier.

    That said, we're not behind your monitor and we can't see the whole project you're working on: it might also be a good strategy, but based on what you posted on this discussion, I agree with Gail that it doesn't sound like a good idea.

    -- Gianluca Sartori

  • Cool. Thanks.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Greg Edwards-268690 (6/9/2015)


    Grant Fritchey (6/9/2015)


    One thing you don't mention in the article is whether or not the Columnstore index processed in row or batch mode. Just a guess, but it probably was in row mode. If you forced the plan to go parallel (pick your poison for that) so that it would process in batch mode, I'm curious how much performance gain you might see then? Not that performance is the only consideration, as you say in the article. Just interested.

    Oh, and thanks for talking about the dangers of seeing partitioning as performance tuning. That particular meme just won't go away.

    Wouldn't they get some improvement just from having their ETL dynamically selecting sets of records to update based on their partitioning strategy?

    You always have to account for corrections and late arriving data that do not match your assumption that all updates will be in the partition you are swapping in and out in their scenario.

    Part of the partitioning as a performance might come from SSAS.

    There I think you will see gains, plus advantages of processing the partitions one at a time to give users current data when processing is running late.

    There are places where partitioning can, and does, improve performance. But, that's not it's primary reason for existence. It's meant for data management. As Gail has said, it's for creating data offline which can then be added to a table or removing data quickly from a table (rolling off old data), that sort of thing.It just so happens, in some cases, if the query can focus onto a single partition, you can see serious performance enhancements. Most of the time, in my experience, though, even with very carefully partitioned data and very carefully written queries, there is some performance degradation from partitioning (not major, but measurable). So the trade-off is in the ease of data management and the seriously negative impacts that traditional INSERT/DELETE would have on performance with very large data stores.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 15 posts - 16 through 30 (of 45 total)

You must be logged in to reply to this topic. Login to reply