Dealing with huge heap tables

  • Recently, our dev teams approach me for advice on improving their huge heap table which are causing issues on their DBs IO usage. This particular DB is hosting an old in-house application and the dev teams has been doing a lot of work modernizing & improving it's table structure. They hit the wall when trying to create a clustered index on this large heap table (approx 400GB), running index creation with ONLINE parameter will cause some locking, same goes to table rebuild.

    So now, with almost no way out, they're asking DBA for help as that particular DB aren't monitored or managed by DBAs. I provided some solution such as doing a side by side table migration, where a new table with clustered index be created then move the data in batches into the new table before setting a short downtime to cutover. This could work normally but various rows in the table will be updated from time to time, so there might be changes even during the copying period. Due to it's size it's not possible for production to actually freeze for days just to allow the copy to complete. Any ideas how to work around on this?

    Another issue is the clustered index key, that table doesn't have any PK or unique keys. Furthermore most columns contains VARCHAR, datetime data type which will not be a suitable candidate to be a clustered index key.

  • Hi Jason1,

    before you start creating a clustered index on this table ask yourself (or the devs), WHY they want it.

    • Why is a C. I. better than a heap?
    • What is their expectation?

    Note: If you only want to create a C. I. because it is recommended, you deal with a penalty of 4 Bytes for each row if your C. I. is not unique! When a C.I. is not unique, Microsoft SQL Server will add a UNIQUIFIER (4 Bytes) to each row.

    My honest recommendation: If there is no real reason for a C. I. than leave it as a heap.

     

    Microsoft Certified Master: SQL Server 2008
    MVP - Data Platform (2013 - ...)
    my blog: http://www.sqlmaster.de (german only!)

  • According to them, the server's IO is getting worst and they trace it to that particular heap table.

    Also i ran checks for the forwarded records and they're massive! So I assume their findings on this particular heap table is sucking IO is true.

    forwarded record count 14,580,400

    page count 36,563,700

  • Hi Jason,

    if you are using a C. I. you don't have any more Forwarded Records but Page Splits.

    Does your table have additional nonclustered indexes?

    Yes: A clustered index will blow your nonclustered indexes because every nonclustered index will hold the Clustered Key + UNIQUIFIER

    No: Just rebuild the table to get rid of forwarded records.

    To prevent Forwarded Records your Team should analyse the most common attributes in the relation which gets updated.

    Make these columns fixed length columns (if feasable) 🙂

    But yeah - it's complex without knowing more about the table, workloads, etc...

    The high number of Forwarded Records are an indice for heavy UPDATE activity on the rows.

    If you have covering nonclustered indexes and every SELECT return only a few rows than Forwarded Records should not be the problem.

    Best from Germany, Uwe

    Microsoft Certified Master: SQL Server 2008
    MVP - Data Platform (2013 - ...)
    my blog: http://www.sqlmaster.de (german only!)

  • For a 400GB heap table, the safest approach is usually a side by side migration rather than creating a clustered index directly on the existing table.

    Create a new table with a clustered index (ideally on a new BIGINT IDENTITY surrogate key), then copy data in batches while capturing ongoing changes using Change Tracking, CDC, or triggers. Once the tables are nearly synchronized, you may schedule a short downtime for final sync and cutover.

    CREATE TABLE dbo.BigTable
    (
    BigTableID BIGINT IDENTITY(1,1) NOT NULL,
    ...
    );

    Direct CREATE CLUSTERED INDEX ... ONLINE = ON on the heap can still cause blocking, huge log growth, and long rollback risk.

    Deepesh Dhake
    Database Administrator

  • A possible solution for you, although I'm pretty sure there's going to be issues with it, there's going to be issues with any solution to this.

    Start with your idea of creating a new table and batch copying the existing heap into it.  As it sounds like there's no good candidate for a true key column for the clustered index, you're probably going to have to do something like create an ID column for your clustered index.  Before you start the copy process, but after creating the new table, create a trigger (I know, I know) on the heap that, after an insert / update / delete that copies the row in question (or updates it or deletes it) in the new table.

    Yes, this will almost certainly have an impact on the performance of the heap, but it likely would also help keep the new table up-to-date until you can schedule a downtime window to do a final cutover.

  • A leading datetime clustering key would not be a problem at all, i.e., there's no reason to exclude a datetime from a clustering key.  As you, and others, have noted, if you have don't have a unique set of column values, SQL will have to add a "uniquifier" that will significantly increase overhead for the table.

    But before picking the clustering key, you should review the "index missing" stats on the table to see what, if any, queries are being issued against the table.  Naturally if you have requests searching for a particular column(s), you'd want it(them) as the leading key(s) in any clustering index.  I can provide a query to summarize that info if you want it.

    > most columns contains VARCHAR <<

    Then you should strongly consider compressing the table.  That will increase the time to rebuild the table, but it will also likely dramatically decrease the size (and the reduced disk writes of the size reduction gains back some of the time required to compress the data).   MS provides a proc to estimate compression savings, but its results may not be too accurate with a lot of forwarded rows:

    EXEC sys.sp_estimate_data_compression_savings '<schema>', '<table>', 0 /*index#*/, NULL, 'PAGE'

     

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply