How to optimize insert , delete and update sql statement

  • Usually, we often talk about the approach of optimization select statement in SQL server, but now if there are any tips of improving the performance in a high concurrent system, if there is , how to optimze? could you please help to provide some pratical experience ?thanks!

  • I think the 3 most important things are...

    1. Correctly designed and indexed DRI.
    2. Proper design of tables.
    3. A serious understanding of page splits and all that goes with them including but not limited to Clustered Index Design, a very good understanding of in-row vs out of row Lobs and large VARCHAR()/NVARCHAR()s, identifying columns that will suffer "Expansive" updates and how to prevent them from being expansive, really good understanding of Insert/Update patterns, and the right kind of index maintenance which, in many cases, might be "none" and, in other cases, might be "every damned night".  You also need to understand how the datatypes of each column are going to affect things (Ever-increasing Clustered Indexes are NOT a panacea of performance and can make things quite bad when "ExpAnsive" updates are present while GUIDs might be the way to go for very rapid fire inserts... provided that you understand the correct type of index maintenance on both).
    4. Understanding that there are at least 6 different Insert/Update patterns and the 4 of them are seriously affected by "ExpAnsive" updates.  For example, would you think it possible to have totally massive fragmentation during inserts even though NO bad page-splits occur and the pages are as close to 100% full as possible?  Do you know how to make indexes with GUIDs for leading columns go for weeks and even months of Inserts (and some "ExpAnsive") updates with absolutely no page splits (not even good ones)?  And, no... just lowing the Fill Factor won't do it... you also have to maintain the indexes correctly and, I'm here to tell you, what people have adopted as a "Best Practice" in that area is actually a worst practice, especially (but not limitedd to) when it comes to GUIDs.

    There's a whole lot more but, because as you suggest, inserts and updates don't happen as often as SELECTs, many folks don't pay attention to Inserts, Updates, or Deletes until one of those things takes a minute or two to operate on just one row.

    Sorry... didn't mean to ramble on that but, to be honest, that's all just scratching the surface.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you Jeff Moden for you kind help!

    you said "Understanding that there are at least 6 different Insert/Update patterns and the 4 of them are seriously affected by "ExpAnsive" updates. ", could you please help me the said word ? thanks a lot!

  • Additionally understanding that the WHERE clauses and JOIN criteria (if any) for data manipulation queries are subject to all the same rules as a SELECT query. Something like a function in the WHERE clause that would cause a scan in a SELECT query will also cause a scan in an UPDATE query, hurting performance on each.

    Also, the tools used to understand the behavior of a data modification query are the same as attempting to tune SELECT queries. You'll be capturing query metrics with Extended Events or Query Store or the DMVs. You'll be interpreting behaviors using execution plans. All the same stuff.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey wrote:

    Additionally understanding that the WHERE clauses and JOIN criteria (if any) for data manipulation queries are subject to all the same rules as a SELECT query. Something like a function in the WHERE clause that would cause a scan in a SELECT query will also cause a scan in an UPDATE query, hurting performance on each.

    Also, the tools used to understand the behavior of a data modification query are the same as attempting to tune SELECT queries. You'll be capturing query metrics with Extended Events or Query Store or the DMVs. You'll be interpreting behaviors using execution plans. All the same stuff.

    Grant Fritchey, thank you  for great and enthusiastic help ! and thanks for your time !

  • 892717952 wrote:

    Thank you Jeff Moden for you kind help!

    you said "Understanding that there are at least 6 different Insert/Update patterns and the 4 of them are seriously affected by "ExpAnsive" updates. ", could you please help me the said word ? thanks a lot!

    First, Grant is spot on in what he says.  For what he said, he's completely accurate.  But, there's a lot he didn't say (not his fault) about Inserts, Updates, and Deletes.  So here's the proverbial "rest of the story"... and this is just the "introduction" to the problems that a lot of people (includes at least all those using current supposed "Best Practice" index maintenance) are having but don't know about.

    "Expansive" Updates

    Starting with your question above (and many people ARE actually aware of this problem)...

    Take, for instance, a table containing only integer and date columns plus two variable width columns called "Created_By" and "Modified_By.  I'm citing these two very specific columns because the frequently occur as a bit of "poor-man's" auditing in many people's table designs.  Both of these columns are typically VARCHAR(n) with "n" typically being a length of 50 or 100 (or some other value... the value doesn't matter here... it's still a variable width column.

    During an INSERT into such a table, all columns of the table are affected.  Typically, there's a default on the "Created_By" column of ORIGINAL_LOGIN() or some other function to automatically populate the row(s) being inserted with who or whatever inserted the rows.

    Not so with the Modified_By column.  Typically, it has no default.

    Even more typical is the fact the people use and "ever increasing" valued key column such as an IDENTITY column or a date column for their Clustered Index (CI from here on) key.

    So, here's what happens with all of that...

    1.  An INSERT is done (one or many rows... it doesn't matter).  On such "ever increasing" CI, the newly inserted rows will ALWAYS be inserted into the last logical page according to the sort order of the CI.  It doesn't matter what the Fill Factor of the CI is... these new rows will always go to the last page and fill it as close to 100% full as the row width will allow.  Once that given page becomes full, a supposed "good" page split will occur (no rows are moved but a new page is created that is logically numerically superior according to the CI sort order) and more rows will be added to that new "last page" in the CI.  The process will repeat itself until the all the rows of the INSERT have been inserted.The end result is that all the pages caused by that and other INSERTs, all of which are in a string of pages near the logical end of the CI are completely filled to the extent (no pun intended) that the row size will allow.
    2. What typically happens then is that some process is executed on the newly inserted rows.  For example, an invoice detail table will have a whole bunch of new rows for line items on an invoice that need to be filled and shipped.  The rows related to that process are typically updated before any kind of index maintenance can occur and so all those pages are sitting there and it wouldn't take much extra data to cause them to split (which would be the "bad" kind of page splits that moves rows and makes a log file entry or two or three for every row that moves to the new split page).  So the rows are updated and a part of the process is to update the "Modified_By" with the name of whomever or whatever did the update along with updates to the fixed width columns.If only the fixed width columns were updated, they wouldn't change the size of the row even if they started out as NULL because they're all fixed width columns.  There would be no increase in the size of the row and it would all still fit on the same page, which means the UPDATE would NOT cause one or more page splits.Unfortunately, that bloody "Modified_By" column gets updated.  It starts at NULL and gets updated to something which is extra bytes that need to be stored in the row, which causes the row size to "expand" (hence the name "ExpAnsive UPDATEs") and even just one such "expanded" row can cause the page to split.

      The Created_By column will never cause that problem because its content goes in during the original insert.

    To summarize, those inserts on the "ever increasing" CI ignore the Fill Factor and pack the pages full.  Those are the most recent pages and are usually updated soon after they are inserted and because of the "ExpAnsive" update of the "Modified_By" (or other variable width columns that have had extra data added to them), you end up with relatively massive page splits which can cause a whole lot of blocking and a huge amount of totally unnecessary log file activity the seriously (by a factor of 40 or more) slow down the updates.

    And now... "The Rest of the Story" 😀

    For the record, I currently have identified the following types of Insert/Update patterns (which also have some sub-patterns), which is also what I set the Fill Factors to (and I have code that assigns the Fill Factors automatically and a few new tools I've built to help automatically figure out what to do to fix some of the indexes but I'm not yet ready to release them to the public yet because of the 90/10 rule of developing something and I want to knock that 10% down to 1 or 2%)...

    Type 100 - Totally or near totally static.  Almost never changes and almost never suffers an "ExpAnsive" update.

    Type 99 - "Ever increasing" index that suffers no "ExpAnsive" updates.  Marked differently that "100" just to identify the type of index using only the Fill Factor.

    Type 98 - "Sequential Silos".  These are a strange pattern where the leading column of the CI might be something like "Server Name" and the second column is an ever increasing column like a date or User ID followed by a last date logged in or an invoice number followed by a status date or whatever.  These indexes very quickly suffer massive logical fragmentation but virtually no physical fragmentation (page density) because they suffer virtually no page splits.  It's the only "Insert/Update" pattern that I know of that has this trait.  They are considered to be "permanently fragmented" even though they typically never suffer "ExpAnsive" updates due to multiple sequential insert points.

    Note that these "Type 98" indexes may have to be changed to "Type 2" (described further below) indexes if they also suffer from "ExpAnsive" updates.  If they suffer a lot from "Expansive" updates, they may actually fit the category of being "Random Silos" and may actually need to be treated as "Type 1" indexes (also described further below).

    Type 97 - The "7" here has the same rough shape as a "2", which stands for "to do".  These indexes have an ever increasing key that would other-wise require virtually no index maintenance but suffer from "Expansive Updates" in the process I first described at the beginning of this post.  A 97 Fill Factor is used on these because lowering the Fill Factor does nothing to fix these and would be a complete waste of data buffer space (memory) and disk space and backup space, etc, etc and we need "to do" something to fix that problem to get rid of the "ExpAnsive" updates that cause the fragmentation.

    Type "2" - These indexes are assigned a Fill Factor of 72, 82, or 92, all ending with a "2" as in "to do".  The cause of their fragmentation is either unknown or is known but can't be fixed or there isn't time to fix them (yet).  The reason why they are rebuilt at a lower Fill Factor is because we don't know or don't have time to fix them but we want to do one or both of two things.  The first might be because they've done enough page splits to cause a "very low" page density and we simply want to recover some disk space.  The problem is that we don't want to rebuild these at the default Fill Factor of "0" because that's actually the same as "100" and we don't actually know if the fragmentation was due to out of order inserts or "Expansive Updates" or what but we do know they did page splits (or maybe just DELETEs, which also causes logical fragmentation) and so we also want to leave a little headroom for those things.

    This helps prevent the "morning after index maintenance" problem that a lot of people are aware of (index maintenance caused performance to go down) but haven't figured out the cause.  The cause is that they Rebuild or (hopefully not) Reorganize indexes that have a "0" Fill Factor and so all "free space" is removed from the index when it needs it the most and EVERY out of order insert or "ExpAnsive" update causes massive page splitting, which seriously slows down ALL activity having to do with the index.

    So, the bottom line for Type "2" indexes is you wanted to defragment the index or simply recover disk space but you want to allow some room for expansion and you still have something "to do" to fix those problems (including "Random Silos" where just one part or several small parts of an index are suffering from page splits).  These might also end up being rather permanently fragmented.

    Type 1 - Random GUIDs and other evenly distributed Inserts (in particular) or Random Updates (inherently covered).  I assign a Fill Factor ending in "1" for these, which indicates the "threshold" of fragmentation where they need to be rebuilt (these types must NOT ever be Reorganized because Reorganize simply removes most free space and does nothing for pages between the Fill Factor and the 100% line-off death page density).  Once those bad boys go over 1% fragmentation, they need to be Rebuilt because it's an early indication that ALL of the pages in the index are getting ready to split (they go from 1% to 30% virtually over night if active).

    While all that sounds real bad and sounds like a really good reason why people say you should never use Random GUIDs for the CI, that's completely false (there are a lot of reasons to not use them but fragmentation shouldn't be one of them).  If you establish the correct Fill Factor AND you REBUILD at 1% fragmentation, the CI can go for weeks and even months (yep... I have set of lengthy tests that prove it) with absolutely no page splits (no even supposed "good" ones) and so absolutely no fragmentation (logical or page density wise) occurs.  Of course, no fragmentation also means no need for index maintenance during those weeks or months)   And, the memory above the Fill Factor isn't wasted... it slowly fills up and so, just like pre-allocating disk size (especially for log files), it prevents a huge number of "growth" problems and will get used.

    Type "0/5" - These are indexes that have either not been assigned a Fill Factor (indicated by a Fill Factor of "0") or have a non-zero 2 digit Fill Factor assigned by someone else or, perhaps, a 3rd party chunk of software that someone used.

    To summarize, NEVER rebuild an index with a "0" Fill Factor without changing the Fill Factor.  This is the leading cause of the "morning after index maintenance" slowdowns caused by the underlying massive page splits that occur.  Either assign them one of the Fill Factors above or leave them alone, PERIOD.

    If you have indexes with 2 digit Fill Factors that end in 0 or 5 that are becoming fragmented, they're other people's ideas of what the Fill Factor is and I would change them all to Type "2" Fill Factors just to mark them as "to do".  I would be seriously leery of rebuilding any of those that have a Fill Factor of 85 or above (remember... only 2 digit fill factors ending in 0 or 5... don't include "100" in this).

    REORGANIZE

    I'll say it again as I have on quite a few posts now.  Although Books Online (BOL/MS Documentation) correctly states exactly what REORGANIZE does and doesn't do, most people I know have misinterpreted that meaning and, so I'll say, that most people are using it the wrong way.  The supposed "Best Practice" (and BOL doesn't call it that... read footnote #1 in the documentation where it says that you need to experiment and judge every index) of reorganizing between 5 and 30% fragmentation and rebuilding above 30% is actually a WORST Practice in nearly every case.

    And, no... there is no insult intended to Paul Randal, the person that wrote that "recommendation".  He'll tell you right up front that MS pressured him into stating some sort of General Guidline and so he chose the one that was best at the time.  He also wrote footnote #1 to that that states you really need to examine every index.

    It also turns out that REORGANIZE isn't the tame little kitty that everyone makes it out to be and is the frequent cause of the log file being larger (sometimes almost twice as large... yeah... I've got proof of that, as well) than your largest CI (which can be really tough if you have large CIs) even when logical fragmentation is quite low.  Even REBUILDs in the FULL Recovery Model on such indexes have much less of an impact on the log file (there's also some cool tricks to keep them from blowing out the data (MDF/NDF) files, as well).

    With that, I'll say stop using REORGANIZE for index maintenance even if you're stuck with the Standard Edition because it's actually setting up almost all of your indexes for perpetual fragmentation.  IMHO, the only time you should use REORGANIZE is to compress LOBs that you've deleted from (and there's even a "fix" for that).

    REBUILDs

    Waiting for 30% or more fragmentation might be (I'm really mean "usually is") a silly thing to do depending on the Insert/Update pattern.  Most of the "Wait" damage due to rampant page splits occurs between 1 and 10% logical fragmentation (again, I have proof in the form of tests).  If you're going to wait until an index is 30% fragmented, it might actually be better to not waste resources by doing any index maintenance especially if the index usually only suffers "single row lookups".  If that's the case, then you're actually measuring fragmentation the wrong way... you should be more concerned with wasted space and be looking at page density instead of logical fragmentation in such cases.

    The Bottom Line

    You can do a shedload of damage with index maintenance insofar as the "morning after", especially if you use what people have mistakenly adopted as a "Best Practice".  In many cases, their index maintenance is actually setting up the pages for immediate and long term fragmentation and, like a bad drug, the more you do it, the more you need to do it.  In other words, it can and frequently does perpetuate a ton of page splits, especially on the "morning after".

    With that, I'll say that it's actually better to do no index maintenance than to do it incorrectly and the current supposed "Best Practices" for index maintenance are doing it incorrectly.  There is NO PANACEA when it comes to index maintenance even if the indexes themselves have been perfectly designed.

    And, yeah... although my proof there is a bit anecdotal, I went from January 2016 'til about June of this year with virtually no index maintenance and CPU dropped (all by itself) from an average "normal usage" of 22% to 8% (with similar reductions in reads and writes).  In other words, performance didn't suffer... it actually got better.

    Do I recommend NOT doing index maintenance?  No.  You need to at least recover disk and memory (buffer) space but you need to do it without causing the "morning after" effect.

    Remember that fixing logical fragmentation really only fixes "read aheads", which don't actually matter once the data resides in memory (with a hopefully high PLE).  There's a whole lot more to be concerned about like page splits and the resulting blocking and wasted space in memory (buffers) because of the other type of fragmentation, low page density.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff  Moden, thank you for your help and guidance !

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply