Rebuilding/Reorganizing in a very transactional table

  • Hello masters,

    It's been quite some time since i last posted here but i encountered a really hard opponent.

    We've got this table, this table has around 50-60m rows of transactions and this is only in one month, we've got this daily process that moves the transactions from the main to the historic one (basically it takes all transactions that are 1 month older, since we run it everyday it moves the last day), and way too many processes by the application and jobs that take a few hours every night that use this table. It is basically used by the whole application, sql jobs and SSIS packages at all times, 24/7, 365 days non stop, so we don't have a window to do maintenance.

    This table has 55 columns and 4 indexes, 1 clustered and 3 non clustered (its only 31GB in size, 15GB data and 16GB indexes), the thing is that the software provider provided us with "their scripts" which are a few procedures that  they use to reorganize or rebuild this table indexes and a few others tables but today we are focusing on the transactions one, this table takes the cake.

    The server has 150GB of RAM And 48 processors, its 2019 Enterprise.

    Their procedure basically queries the internal index tables to get this table fragmentation (this damn query takes like 30 minutes, i have been thinking on just removing it and just hard coding the rebuilds without querying the fragmentation.), inserts into a temporary table and create a cursor to iterate through every index and based off of the frag % they have it either reorganizes (5-30%) and if >30% rebuilds using the scripts below.

    Because of all the processes running early in the morning with the index maintenance it spends like a lot of the time being blocked by other processes. and its been taking 9hours+ and by the time everybody gets to the office it is still running and hasn't even finished, everything is slow.

    This is the script they use for reorganize and rebuilds

    ALTER INDEX ' + @index+ ' ON ' + @schema+ '.' + @table+ ' REORGANIZE

    And Rebuild

    ALTER INDEX ' + @index+ ' ON ' + @schema + '.' + @table+ ' REBUILD' + ' WITH (DATA_COMPRESSION = PAGE, ONLINE=ON)

    I've stopped doing the reorganize and always rebuilding, this is how i modified it

    ALTER INDEX ' + @index+ ' ON ' + @schema + '.' + @table+ ' REBUILD' + ' WITH (DATA_COMPRESSION = PAGE, ONLINE=ON, SORT_IN_TEMPDB=ON, MAXDOP=8)

    Has it improved after making the changes? well it finished where the other didn't but it still took quite many hours.

    The other thing with this job is that it actually does the index maintenance before moving the historic data from the transactions table to the historic table (which i believe would cause defragmentation since it will delete data from the transactions table but only the last day).

    I've been thinking of trying to figure out when this table is used the less throughout the day and try to index maintenance in that window despite being while the users use the application.

    I even did a Rebuild between 10:00AM and 12:00M while everyone was using it and the rebuild only took 1 hour and 14 minutes (only one index, didn't rebuild the others was just trying to test). (i still got blocked by many user/app processes but since everything was slow for the whole week i told the server guys to restart the services to remove the blocks, since it looks like it was at the final phase of rebuilding online with sort_in_tempdb and thats where most blocks where happening)

    In my development SQL these rebuilds took less than 10 minutes (the largest index) and less than 5 minutes all the others, obviously without any transactions happening.

    This server has every version of SQL instance (2014,2016,2017,2019 and 2022 ) all of them were running, we use it for internal development and restoring production databases for testing scripts by the dba team before going to production.

    -- INDEX_1 WITH (DATA_COMPRESSION = PAGE,ONLINE=ON, SORT_IN_TEMPDB=ON,MAXDOP=1) > it died just testing , +26 minutes, never finished.
    -- INDEX_1 WITH (DATA_COMPRESSION = PAGE,ONLINE=ON, SORT_IN_TEMPDB=ON) > 5:15
    -- INDEX_1 WITH (DATA_COMPRESSION = PAGE,ONLINE=ON, SORT_IN_TEMPDB=OFF, MAXDOP = 8) -> 6 minutes
    -- INDEX_2 WITH (DATA_COMPRESSION = PAGE,ONLINE=ON, SORT_IN_TEMPDB=OFF, MAXDOP = 8) -> 1:49 minutes
    -- INDEX_2 WITH (DATA_COMPRESSION = PAGE,ONLINE=ON, SORT_IN_TEMPDB=ON, MAXDOP = 8) ->1:37 minutes
    -- INDEX_2 WITH (DATA_COMPRESSION = PAGE,ONLINE=ON, SORT_IN_TEMPDB=ON) ->1:36 minutes
    -- INDEX_3 WITH (DATA_COMPRESSION = PAGE,ONLINE=ON, SORT_IN_TEMPDB=ON, MAXDOP = 8) -> 1:35 minutes
    -- INDEX_3 WITH (DATA_COMPRESSION = PAGE,ONLINE=ON, SORT_IN_TEMPDB=OFF) -> 1:34 minutes
    -- INDEX_3 WITH (DATA_COMPRESSION = PAGE,ONLINE=ON, SORT_IN_TEMPDB=ON) -> 1:31 minutes
    -- INDEX_4 WITH (DATA_COMPRESSION = PAGE,ONLINE=ON, SORT_IN_TEMPDB=ON, MAXDOP = 8) -> 43 seconds
    -- INDEX_4 WITH (DATA_COMPRESSION = PAGE,ONLINE=ON, SORT_IN_TEMPDB=OFF, MAXDOP = 8) -> 43 seconds
    -- INDEX_4 WITH (DATA_COMPRESSION = PAGE,ONLINE=ON, SORT_IN_TEMPDB=OFF) -> 39 seconds
    -- INDEX_4 WITH (DATA_COMPRESSION = PAGE,ONLINE=ON, SORT_IN_TEMPDB=ON) -> 43 seconds

    is there a way to find when a table is used the less to see if we can run the maintenance in that small window instead of running it when a lot of internal processes are running?

    Do you guys think ola's index maintenance would be a better choice? whilst using sort_in_tempdb, online and (maxdop is also configured at the server settings  at 8 but i still just put it manually in the index rebuild)

    We have been thinking on partitioning this table and doing the rebuilds on a partition basis, if you have any advice or a different approach its always welcomed.

    Best regards,

  • The first question I have is - what issue are you trying to resolve by rebuilding the indexes on this table?

    If the answer has anything to do with performance - then rebuilding the index almost certainly won't improve the performance.  Instead, you should focus on making sure statistics are updated frequently - and with a large enough sampling rate (ideally a full scan - but definitely not the default sampling rate).

    There are many more items to consider here - do you have the right clustered index for this table?  Is the clustered index an identity column with frequent updates to varchar columns where the data in the columns is expanded?  Can't list all possible issues/concerns without actually seeing the table and indexes.

    As for partitioning - modifying the table to use partition would allow for easier archiving if you can use partition switching, and would definitely help with deleting the archived data since you can truncate the oldest partition and merge.  However, to do so will require rebuilding the indexes - or the creation of a new table - and most likely changing the clustered index.

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Apart from saving space it is unlikely that rebuilding or reorganising the index will improve the performance of anything, in fact it might make it worse.

    It sounds like the entire job is more trouble than it's worth. You could just stop running the job.

    I'm sure Jeff will come along and add his views.

  • Alejandro Santana wrote:

    The server has 150GB of RAM And 48 processors, its 2019 Enterprise.

    First, that's an insanely expensive server and you good folks have crippled it with 150GB of RAM.  You need to bump that up to at least 512GB and then make sure that your MAXDOP for the server is set to 4.  You can try 8 but I've found that's overkill and 4 give us the biggest bang for the buck in a mixed environment of OLTP and massive imports/bulk processing.

    Second, have you ever tried to prove whether or not all the index maintenance you're doing that based on Logical Fragmentation is actually helping?  Most people have not.  Most people blindly adopt the old 5/30 supposed "Best Practices" and then wonder why their index maintenance periods are so long and while their log files are so big.

    Here's you chance... do you index maintenance one more time and then measure the performance of your workloads.  Then, stop doing your index maintenance for a month and measure the performance of you workloads again.  You might be in for a shock because most workloads won't see much of a change in either direction and a bunch of the workloads will be working a whole lot faster and more efficiently.

    The reason why is because the supposed "Best Practices" are actually a worst practice that seriously increases page splits on the proverbial morning after and REORGANIZE perpetuates the mistake.

    I went 4 years without doing any index maintenance on my production box with no ill effects other than some of the indexes using a bit too much space.  That's the only reason why I rebuilt those after 4 years and I was very careful about how I did it for each index.

    Seriously... go for a month without doing any index maintenance and measure the before and after performance of your workloads. DO REBUILD STATISTICS ON A REGULAR BASIS, THOUGH!

    And feed that awesome machine some memory!!!

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I don't see why you'd need more then 150GB of RAM for a 31GB table.  If that's "the big table" in the db, presumably the entire is only 50 or 60GB ("only" when comparing it to the 150GB available, not because 60 gb is a "small" db, per se).

    Cluster the table first on datetime/datetime2 -- the same column you base the history on -- and make the key unique by including a $IDENTITY (or other unique column if applicable).

    If the current data is very active and older data is not, then you should analyze partitioning the data, so that the current portion can be its own partition and could be rebuilt if needed.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Thanks guys for taking your time and reading the post, I'm gonna look up all the options and make some tests, also i'm going to pay more attention to the statistics like some of you said.

    Two questions about stats

    How often do you guys update the stats?

    Do you guys go for full sampling or a specific %?

    Scott thanks for the reply, but that's one of the smallest tables, its just one of the most used, db is TBs in size.

  • I rebuild stats once per week.  Because most of the clustered indexes are "ever increasing" in nature, I use the "if it moved, shoot it" approach.  I also use full scan.  It was a long time ago but I repeated some experiments that I saw in an old article that demonstrated that even a sample of just 21% took about the same time as a Full Scan... so why bother with the reduction?  Use the Full Scan.

    That was a while back, though.  I've not retested since then.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for the quick reply Jeff!

    Do you use Auto update stats? And for critical tables full scan with norecompute? And update these weekly? Or just full scan all stats with norecompute for all tables?

    I guess it depends.

  • Alejandro Santana wrote:

    Thanks for the quick reply Jeff!

    Do you use Auto update stats? And for critical tables full scan with norecompute? And update these weekly? Or just full scan all stats with norecompute for all tables?

    I guess it depends.

    Funny you should bring those up because there's a possible change in the wind for me coming up.  I'm researching for the "first time in a long time".  We'll see.

    In the meantime and up through now, I left Auto Update and Auto Create of stats on for everything.  That may change in the near future because the tables in two of the main databases have gotten large enough where I need to pay attention to such a thing to make sure they don't kick in during the busiest times of the day or at least do them asynchronously.  It will likely become a "do it by size/criticality/index type" thing in the next couple of months.

    In the meantime and like I said, because of the nature of the tables having mostly ever-increasing clustered indexes, we've done the "if it moved, shoot it" type of thing ( based on the modification counter) for full scan rebuilds every week.  There is a restriction on that... if the statistic has been rebuilt in the last 3 days for any reason, don't do it this time.  That auto-magically accounts for any index rebuilds for space recovery that we may have done.

    As a bit of a sidebar, we don't do any index maintenance based on Logical Fragmentation and haven't since the 18th of January, 2016.

    I do not use the last time a stat was rebuilt for anything else because if nothing changed in the table (for reference and other static tables), there's simply no need to rebuild any of the statistics on it.

    That may change and "when to rebuild a stat" may be calculated differently for non-ever-increasing indexes.  It's just "easier" to do the not-so-thoughtful thing of "if it moved, shoot it" because it's been effective up 'til now and still seems to be.  You know the story there for such claims... YMMV.   That also goes for whether or not to continue to do full scans.  The consensus is that you should for ever-increasing columns, especially on stats for indexes that have such a leading column.  The jury is still out on how long lesser sample rates take.  Previous testing said that sample rates even as low as 21% take the same time.  It may be that lower "default" sample rates will do but we'll see.

    I also have MAXDOP set to 4 at the server level.  I've not tested to see if a statistics rebuild using a temporary MAXDOP of 8 would be any better but that's in the plan for the near future.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply