New Index Maintenance Methods Recommended by Microsoft

  • Sergiy wrote:

    Grant Fritchey wrote:

    Sergiy wrote:

     Do you again think I’m wrong on it?

    Yes.

    OK then.

    My recent experience with "flat-earthers" taught me that it's almost useless trying to prove anything to true believers. For them facts matter only when they fit their concept.

    Thanks for the ad hominem. That makes this so much more pleasant.

     

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Jeff Moden wrote:

    Grant Fritchey wrote:

    Sergiy wrote:

     Do you again think I’m wrong on it?

    Yes.

    Unfortunately, that actually proves that they don't know.  Bob Ward is still talking about the use of reorganize for index maintenance.  Is it really possible that even he doesn't know the train wreck known as Reorganize? 😀  And is there really a good reason why they don't support minimal logging on managed instances?  DBAs do it every day and they know the ramifications and so they work around that in special databases... you know... sandboxes that simply aren't backed up and similar "tricks"?    Yes, there's a trade off but why did they take only the easy way out on managed instances?

    Heh... and consider some of the open items on the feedback site and look at what they did for String_Split and Pivot and Format and TempDB overusage with inserting into IDENTITY columns, and making is so that you cannot temporarily disable the equivalent of TF1117 in TempDB, and making it so that LOBs default to in-row, and making it so that out of row pointers are "expansive" and, and, and...

    Yes... they do a lot of really cool stuff.  Yes, they also do a whole lot of some really shoddy and uniformed design and work and ignore other/previous problems such as how both index REBUILD and REORGANIZE work and don't work.  And then there's the "hidden price to pay" on some really useful stuff because they really didn't implement things quite right.  One of my favorite areas there is Partitioned Tables.

    The changed article about index recommendations is a good proof of it all... they left a mistake for more than 2 decades, they've done nothing to fix the root problems (they really need to fix rebuild, reorganize, and provide a better set of tools than the gross averages produced by sys.dm_db_index_physical_stats, and their new recommendation/suggestion is just as bad as the original except that it's even more complicated and the one new tool for page examination is seriously broken when it comes to previous/next page and the column for page free space doesn't actually work for what you need it to work for the most.

    I'm sure, though, that MS isn't the only software provider that has such issues.  They all have a major suck factor one way or another. 😀

    Yes. They are grossly imperfect people working on a grossly imperfect product.

    Me too.

    Pretty sure, everyone here, in an honest moment, can say the same.

    However, Jeff, in all seriousness, their mistakes & shortcomings aside, would you say that "the last people you should be listening to are Microsoft employees"? Would you really paint with that broad a brush? Sorry, but we can honestly say that they got stuff wrong and are still trustworthy people. I sure hope so, because, as imperfect as I am, and I am, I'd like to think I'm still reasonably trustworthy.

    For example, my first book on execution plans was riddled with horrific errors. So was the second, just less so. Third one, pretty good, but still has some errors in it as I keep finding. Does this mean I'm utterly untrustworthy?

    This is a ridiculous discussion.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey wrote:

    Sergiy wrote:

    Grant Fritchey wrote:

    Sergiy wrote:

     Do you again think I’m wrong on it?

    Yes.

    OK then.

    My recent experience with "flat-earthers" taught me that it's almost useless trying to prove anything to true believers. For them facts matter only when they fit their concept.

    Thanks for the ad hominem. That makes this so much more pleasant.

    Yeah, whatever label you want to put on me. If I try makes you feel warmer.

    It would not line up the facts with your believed anyway.

    _____________
    Code for TallyGenerator

  • Grant Fritchey wrote:

    Jeff Moden wrote:

    Grant Fritchey wrote:

    Sergiy wrote:

     Do you again think I’m wrong on it?

    Yes.

    Unfortunately, that actually proves that they don't know.  Bob Ward is still talking about the use of reorganize for index maintenance.  Is it really possible that even he doesn't know the train wreck known as Reorganize? 😀  And is there really a good reason why they don't support minimal logging on managed instances?  DBAs do it every day and they know the ramifications and so they work around that in special databases... you know... sandboxes that simply aren't backed up and similar "tricks"?    Yes, there's a trade off but why did they take only the easy way out on managed instances?

    Heh... and consider some of the open items on the feedback site and look at what they did for String_Split and Pivot and Format and TempDB overusage with inserting into IDENTITY columns, and making is so that you cannot temporarily disable the equivalent of TF1117 in TempDB, and making it so that LOBs default to in-row, and making it so that out of row pointers are "expansive" and, and, and...

    Yes... they do a lot of really cool stuff.  Yes, they also do a whole lot of some really shoddy and uniformed design and work and ignore other/previous problems such as how both index REBUILD and REORGANIZE work and don't work.  And then there's the "hidden price to pay" on some really useful stuff because they really didn't implement things quite right.  One of my favorite areas there is Partitioned Tables.

    The changed article about index recommendations is a good proof of it all... they left a mistake for more than 2 decades, they've done nothing to fix the root problems (they really need to fix rebuild, reorganize, and provide a better set of tools than the gross averages produced by sys.dm_db_index_physical_stats, and their new recommendation/suggestion is just as bad as the original except that it's even more complicated and the one new tool for page examination is seriously broken when it comes to previous/next page and the column for page free space doesn't actually work for what you need it to work for the most.

    I'm sure, though, that MS isn't the only software provider that has such issues.  They all have a major suck factor one way or another. 😀

    Yes. They are grossly imperfect people working on a grossly imperfect product.

    Me too.

    Pretty sure, everyone here, in an honest moment, can say the same.

    However, Jeff, in all seriousness, their mistakes & shortcomings aside, would you say that "the last people you should be listening to are Microsoft employees"? Would you really paint with that broad a brush? Sorry, but we can honestly say that they got stuff wrong and are still trustworthy people. I sure hope so, because, as imperfect as I am, and I am, I'd like to think I'm still reasonably trustworthy.

    For example, my first book on execution plans was riddled with horrific errors. So was the second, just less so. Third one, pretty good, but still has some errors in it as I keep finding. Does this mean I'm utterly untrustworthy?

    This is a ridiculous discussion.

    You see,I don't deal with you personally. I never met you and probably I never will.

    Same applies to the readers of your books.

    They read your books. And they have a choice either to trust the facts you put out in your book, or, after discovering several "horrific errors" put the book aside and look for another, more trustful, source of knowledge.

    That book had your name on it. So, the readers, when looking for a better book, will most likely avoid any ones from the author which gave them wrong information before. I'm pretty sure you'd take the same approach yourself.

    It does not matter - was it a deliberate facts fabrication or honest mistakes - consumers still cannot trust the facts published under you name. Which makes you as an author (the readers don't know you in any other way) untrustworthy.

    Same works for those MS guys. They may be nice guys with best intentions. But whatever is published by the company they work for is full of rubbish. People simply cannot put their trust into information published under that name. As far as I'm familiar with English language, it's what is called "untrustworthy".

    If you think it's harsh - try to publish unverified facts in a real scientific magazine. Once misinformation is discovered you'd be on their black list, and it would take years and a lot of effort to restore you reputation.

    _____________
    Code for TallyGenerator

  • Sergiy wrote:

    Grant Fritchey wrote:

    Sergiy wrote:

    Grant Fritchey wrote:

    Sergiy wrote:

     Do you again think I’m wrong on it?

    Yes.

    OK then.

    My recent experience with "flat-earthers" taught me that it's almost useless trying to prove anything to true believers. For them facts matter only when they fit their concept.

    Thanks for the ad hominem. That makes this so much more pleasant.

    Yeah, whatever label you want to put on me. If I try makes you feel warmer.

    It would not line up the facts with your believed anyway.

    Last word on this. I didn't put a label on you. You put one on me. That's the definition of an ad hominem attack, to the man. Meaning, attack the person. I absolutely didn't do this. You did.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • One must also consider the enormous difficulty of what the MS folks are trying to do.

    I get asked all the time about "(simple) rules for tuning indexes" since that is my specialty.  I have to disappoint people with the answer, "there aren't any simple rules that can be applied to all indexing situations to get you good results (almost) every time."  So don't fault MS too harshly for trying to do something which is basically impossible to do anyway.

    I can tell people that the clustered index is far and away the single most important factor for overall table performance, so focus on getting the best possible clustered index on every table.  Don't worry about nonclus indexes until after that.

    A big hint: that means that an identity column is most often not the best clustered index for a table.  No, really, seriously.  So get out of the brain-dead habit of automatically slapping a clus index on identity on every table!

    For example, for child tables, like say dbo.order_items, obviously a child of the dbo.orders table.  The best key for such child tables is almost always the parent clus key followed by something to make the key unique.  An identity is great for that.  And, in properly designed dbs (design is way too neglected, but that's a different discussion), there are a lot of child tables, which is the main reason why most tables should not use just identity as the clus key.

    Therefore, the clus index on the dbo.order_items should be ( order_id, $IDENTITY ) and not just ( $IDENTITY ).  Since the parent key is sequential, this key will be nearly all sequential as well.  Keep in mind, parent and child tables are joined a lot, and in the big majority of cases, this pattern of clus indexes allows MERGE joins to be used, which are the most efficient join when applicable.  And they are not derailed by the "tipping point" problem.  You could read 75% (or more) of the table by clus key value and still not "tip" to a full table scan.

    As another example, log tables should be clustered first on log datetime (which should be a datetime2 type) and only then on $IDENTITY (again, to make the key inherently unique, you don't want to force SQL to add a "uniquifier").

    Beyond that, it takes practice and expertise to properly tune indexes.  Hallelujah!!  That's MY job, and I wanna keep it! 🙂

    Oh, btw, you can literally use "$IDENTITY" as the key name (or in a SELECT).  Go ahead, try it, it works just fine.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Grant Fritchey wrote:

    Sergiy wrote:

    Grant Fritchey wrote:

    Sergiy wrote:

    Grant Fritchey wrote:

    Sergiy wrote:

     Do you again think I’m wrong on it?

    Yes.

    OK then.

    My recent experience with "flat-earthers" taught me that it's almost useless trying to prove anything to true believers. For them facts matter only when they fit their concept.

    Thanks for the ad hominem. That makes this so much more pleasant.

    Yeah, whatever label you want to put on me. If I try makes you feel warmer.

    It would not line up the facts with your believed anyway.

    Last word on this. I didn't put a label on you. You put one on me. That's the definition of an ad hominem attack, to the man. Meaning, attack the person. I absolutely didn't do this. You did.

    I cannot possibly attack you personally. Because I simply don't know you in person and I have no clue about your personality. The only personal features I can take from your photo are that you're male and wearing some facial hair. But attacking those would be quite odd for me, as I own those features myself.

    I could  (and I did) only point out that you style of argument on the subject is exactly the same as of "true believers" - repeat what you think is true and dismiss all the facts which don't fit that "truth".

    It's you who brought "ad hominem" into the discussion, not me. Probably because you've run out of professional arguments.

    _____________
    Code for TallyGenerator

  • ScottPletcher wrote:

    Therefore, the clus index on the dbo.order_items should be ( order_id, $IDENTITY ) and not just ( $IDENTITY ).  Since the parent key is sequential, this key will be nearly all sequential as well.  Keep in mind, parent and child tables are joined a lot, and in the big majority of cases, this pattern of clus indexes allows MERGE joins to be used, which are the most efficient join when applicable.  And they are not derailed by the "tipping point" problem.  You could read 75% (or more) of the table by clus key value and still not "tip" to a full table scan.

    I must say, it's also a quite questionable choice.

    It's works perfectly only when you need to display details for a single selected order.

    But if you need to find out which products been ordered last week - it's most certainly a full table scan, unless there was only a handful of orders lastweek, and bookmark lookups appear cheaper for the optimiser.

    _____________
    Code for TallyGenerator

  • Grant Fritchey wrote:

    However, Jeff, in all seriousness, their mistakes & shortcomings aside, would you say that "the last people you should be listening to are Microsoft employees"?

    See... that's a problem for me.  I can't put all their mistakes and shortcomings and "brush-offs" and frequently horrible design/feature decisions aside because those things have burned me and (frequently without their knowledge) they've burned the general SQL community way too many times.  So, with some but few exceptions to all that, I'm getting to be that way.  As I envision them saying about Erland's 11 year old "connection/feedback" item for a sequence generator, it's nothing personal... it's just business.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sergiy wrote:

    ScottPletcher wrote:

    Therefore, the clus index on the dbo.order_items should be ( order_id, $IDENTITY ) and not just ( $IDENTITY ).  Since the parent key is sequential, this key will be nearly all sequential as well.  Keep in mind, parent and child tables are joined a lot, and in the big majority of cases, this pattern of clus indexes allows MERGE joins to be used, which are the most efficient join when applicable.  And they are not derailed by the "tipping point" problem.  You could read 75% (or more) of the table by clus key value and still not "tip" to a full table scan.

    I must say, it's also a quite questionable choice.

    It's works perfectly only when you need to display details for a single selected order.

    But if you need to find out which products been ordered last week - it's most certainly a full table scan, unless there was only a handful of orders lastweek, and bookmark lookups appear cheaper for the optimiser.

    You don't have much practical experience tuning things like this, do you?

    It's easy enough to have a nonclustered index that will support getting the range of order_ids for the last week (or month or quarter or whatever) and that range can then be used to query those order_ids from the main table.  That structure does NOT just support a "single" lookup as you falsely claim, rather it very easily supports a range lookup, and, again, with a guarantee that it will NOT escalate into a full table scan, since, as I noted above, there's no tipping for these types of index patterns.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher wrote:

    Sergiy wrote:

    ScottPletcher wrote:

    Therefore, the clus index on the dbo.order_items should be ( order_id, $IDENTITY ) and not just ( $IDENTITY ).  Since the parent key is sequential, this key will be nearly all sequential as well.  Keep in mind, parent and child tables are joined a lot, and in the big majority of cases, this pattern of clus indexes allows MERGE joins to be used, which are the most efficient join when applicable.  And they are not derailed by the "tipping point" problem.  You could read 75% (or more) of the table by clus key value and still not "tip" to a full table scan.

    I must say, it's also a quite questionable choice.

    It's works perfectly only when you need to display details for a single selected order.

    But if you need to find out which products been ordered last week - it's most certainly a full table scan, unless there was only a handful of orders lastweek, and bookmark lookups appear cheaper for the optimiser.

    You don't have much practical experience tuning things like this, do you?

    It's easy enough to have a nonclustered index that will support getting the range of order_ids for the last week (or month or quarter or whatever) and that range can then be used to query those order_ids from the main table.  That structure does NOT just support a "single" lookup as you falsely claim, rather it very easily supports a range lookup, and, again, with a guarantee that it will NOT escalate into a full table scan, since, as I noted above, there's no tipping for these types of index patterns.

    This comes from VERY personal, first hands experience.

    Range of order_ids might help when you build queries yourself and are ready to jump through and extra hoop to put the ranges of those ID's in use. But when it's in hands of analysts and developers, some of which you don't even get to know - you cannot expect them to think about that workaround every time they launch another query.

    And the only reason I've got involved in that situation- continuous table scans were killing the servers.

    _____________
    Code for TallyGenerator

  • Sergiy wrote:

    ScottPletcher wrote:

    Sergiy wrote:

    ScottPletcher wrote:

    Therefore, the clus index on the dbo.order_items should be ( order_id, $IDENTITY ) and not just ( $IDENTITY ).  Since the parent key is sequential, this key will be nearly all sequential as well.  Keep in mind, parent and child tables are joined a lot, and in the big majority of cases, this pattern of clus indexes allows MERGE joins to be used, which are the most efficient join when applicable.  And they are not derailed by the "tipping point" problem.  You could read 75% (or more) of the table by clus key value and still not "tip" to a full table scan.

    I must say, it's also a quite questionable choice.

    It's works perfectly only when you need to display details for a single selected order.

    But if you need to find out which products been ordered last week - it's most certainly a full table scan, unless there was only a handful of orders lastweek, and bookmark lookups appear cheaper for the optimiser.

    You don't have much practical experience tuning things like this, do you?

    It's easy enough to have a nonclustered index that will support getting the range of order_ids for the last week (or month or quarter or whatever) and that range can then be used to query those order_ids from the main table.  That structure does NOT just support a "single" lookup as you falsely claim, rather it very easily supports a range lookup, and, again, with a guarantee that it will NOT escalate into a full table scan, since, as I noted above, there's no tipping for these types of index patterns.

    This comes from VERY personal, first hands experience.

    Range of order_ids might help when you build queries yourself and are ready to jump through and extra hoop to put the ranges of those ID's in use. But when it's in hands of analysts and developers, some of which you don't even get to know - you cannot expect them to think about that workaround every time they launch another query.

    And the only reason I've got involved in that situation- continuous table scans were killing the servers.

    If your developers aren't allowed to look at query plans, you should change that, they really need to be able to do that.

    If they can look at query plans yet they still put queries into prod that fully scan very large tables when they don't need to, you need to train them more.  If, ultimately, they are still incapable of learning how to use existing indexes properly, you should likely get rid of them.

    Most often the optimizer itself should be able to figure out how to leverage the indexes, but I'll admit, sometimes it can't, and you have to help it along by adjusting your own code.  That's true quite often when using SQL.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • One thing I am disappointed not to be hearing more about is the use of resumable index rebuilds.

    I've stopped using reorganize on huge tables where we can't rebuild because we can't afford the space / log file growth and become a fan of online resumable index rebuilds.

    Would anyone else care to comment on that?

  • Toby Harman wrote:

    One thing I am disappointed not to be hearing more about is the use of resumable index rebuilds.

    I've stopped using reorganize on huge tables where we can't rebuild because we can't afford the space / log file growth and become a fan of online resumable index rebuilds.

    Would anyone else care to comment on that?

    I cover that in my "GUIDs vs Fragmentation - They're not the problem... WE ARE!" presentation.

    I don't specifically cover resumable indexes but I couldn't agree more about not using REORGANIZE in such cases.  In fact, I pretty much destroy the idea of using REORGANIZE for anything because it's a myth that it always uses fewer resources especially where the log file is concerned and prove it all.

    The only thing I would ever use REORGANIZE for, especially since I also prove that it doesn't work the way people think it does (the documentation is correct but not well written either), is to compress LOBs and it can easily take 10 passes (runs of REORGANIZE) to get that job done and because of the mess it leaves behind, you really need to do a REBUILD after all that.  This is especially true for GUIDs but is also applicable to just about anything that fragments with only a couple of exceptions but the log file usage still doesn't justify the use of REORGANIZE even if you have only the Standard Edition.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Toby Harman wrote:

    One thing I am disappointed not to be hearing more about is the use of resumable index rebuilds.

    I've stopped using reorganize on huge tables where we can't rebuild because we can't afford the space / log file growth and become a fan of online resumable index rebuilds.

    Would anyone else care to comment on that?

    @Toby,

    I forgot to mention... have you tried the idea of moving your large indexes to a separate file group?  And, when I say that, I mean ONE index per file/filegroup pair.

    One of the things I do with my largest clustered indexes (and any non-clustered indexes that also fragment on a regular basis for the one table) is...

    1. Create a file group and file named the same as the table but with a suffix of "1".  For example, if the name of the table is "Customer", name the file group "Customer1" and the single file that it contains also as "Customer1".  If having two things name the same really bugs you, add to the suffix.  For example, the file group would be named "Customer_FG1" and the file name would be named "Customer_F1".  That will keep all the items for the "Customer" table grouped together in alphabetical order just to make things easy to read.  As an alternative, you could use a "Hungarian Notation" prefix of "FG_" and "F_" if that's your preference.  The only key that I recommend is that you should be consistent in your naming.

    2. Move the desired Clustered Index and the desired related Non_Clustered Indexes (the ones that fragment) for the "Customer" table to the new "Customer1" filegroup/file using CREATE INDEX REBUILD WITH (DROP_EXISTING = ON).  As a bit of a sidebar, I REALLY wish MS had included an opting to "KEEP_INDEX_DEFINITION = ON" so that you don't actually have to know all the columns and other options to re-create the index but that's something I'm going to have to take up with Microsoft.  I expect that'll go over as well as Erland's suggestion for an built-in, high speed, machine language "Tally" function (which has been open for 11 bloody years with no action).  Of course, that action is going to free up a whole lot of space in the PRIMARY file group, if that's where it used to live, not to mention that it's going to rebuild the indexes and get rid of the existing fragmentation (both logical and "usually" {there is an exception that takes too long to explain here} physical fragmentation {page density}).

    3.  When you need to rebuild the CI and the related NCIs, repeat steps 1 and 2 above but step 1 should be to create the "Customer2" filegroup/file pair (or whatever naming convention you've used for the names).

    4.  Once that's completed, the "Customer1" filegroup/file will be empty.  You can just drop it and you won't have all the "free space" used that you normally would.

    5.  Next time around, wash-rinse-repeat going back and forth between "Customer1" and "Customer2" filegroup/file pairs.

    As a necessary sidebar, if such moves to new file groups leave way too much free space in the Primary File Group, you may have to do a shrink file and the followup to repair any "Index Inversion" (my name for what happens to some/many of the indexes during a shrink file).  If you have the time, it's actually beneficial to do an index rebuild on all the indexes from smallest to largest to get as many of the pages as possible to move to the beginning of the file before the shrink, which will also make the shrink faster in most cases.  If not, then you can deal with the fragmentation caused by the "Index Inversion" for only those indexes affected but I'd still do them from smallest to largest so that the large stuff has the greatest change of being rebuilt into contiguous extents, which will increase your index "segment size" to help indexes used during large range scans on the larger indexes.

    And, of course, this is all only for the ROWSTORE indexes.  I've not yet had the mixed pleasure/pain of using COLUMNSTORE indexes.

    Last but not least, if you hold your mouth just right when doing this, it'll also work for PARTITIONED TABLES and PARTITIONED VIEWS.  Of course, you also probably already know that anything that's partitioned based on a temporal partitioning key will have a large amount of data that will never be updated and those partitions should live in READ_ONLY filegroups that no longer need to be backed up after the backup taken after they're set to READ_ONLY.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 45 total)

You must be logged in to reply to this topic. Login to reply