Small index blows huge logfile during online index rebuild

  • Hello,

    Today I encountered a very special case and I was very surprised to see this happen.

    The case:

    3 key compound clustered index on nvarchar fields.

    Recordcount = 1.2 billion records.

    Index size : 24Gb

    Logfile = 240Gb

    tablestructure:

    CREATE TABLE [DemoTable]

    ([MANDT] [nvarchar](3) NOT NULL DEFAULT (N'000'),

    [IN_RECNO] [nvarchar](22) NOT NULL DEFAULT (N' '),

    [IN_SEGMCNT] [nvarchar](4) NOT NULL DEFAULT (N'0000'),

    [SYMBOL_ID] [nvarchar](12) NOT NULL DEFAULT (N'000000000000'),

    [ASSTYP] [nvarchar](1) NOT NULL DEFAULT (N' '),

    [ATAUT] [nvarchar](1) NOT NULL DEFAULT (N' '),

    [MSEHI_LOW] [nvarchar](3) NOT NULL DEFAULT (N' '),

    [MSEHI_HIGH] [nvarchar](3) NOT NULL DEFAULT (N' '),

    [ATSRT] [nvarchar](4) NOT NULL DEFAULT (N'0000'),

    [ATZIS] [nvarchar](3) NOT NULL DEFAULT (N'000'),

    [ATIMB] [nvarchar](10) NOT NULL DEFAULT (N'0000000000'),

    CONSTRAINT [DemoTable-0] PRIMARY KEY CLUSTERED

    (

    [MANDT] ASC,

    [IN_RECNO] ASC,

    [IN_SEGMCNT] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    Never mind the nvarchar(1) datatypes, I already explained the responsible people that this is not a very good practise (2 byte overhead) and they already always use the 2 byte storage (Unicode).

    The problem:

    For an online index rebuild I check database free internal spaces and logfile free spaces based on a margin.

    I keep a rather big marging as I calculate the index size * 2.8 for an online index rebuild.

    However, on this particular table the rebuild takes about 4h and consume the whole 240Gb.

    I can reproduce this any time even on a standby system without other processing happening.

    data from the DMV:

    Rowcount : 1175602219

    in_row_data_page_count : 2922244

    reserved_page_count : 3119211

    in_row_used_page_count : 2987246

    As a work around I have issued an index reorganize to get rid of the fragmentation levels,

    very slow but controllable.

    Anybody a nice idea about why this table of 24Gb would consume 240Gb (and still hungry) of logspace.

    I suspect the compound key in correlation with the huge recordcounts and multiple sorts happening as the problem,

    but need to be sure so I can think on a calculated way to avoid this situation.

    Kind regards,

    Eddy

  • "clustered index" means the base table, so you are essentially rebuilding or reorging the base table.

  • Hi Patrick,

    Think you might misinterpret the question.

    Clustered index is the "table" data sorted following the keys.

    No problem here.

    Question i have is:

    Why does a table needs more then 240GB (at least 10 fault) of logspace for an online index (table) rebuild when the clustered index size is only 24Gb.

    This seems so far away from the figures i was expecting , therefore wondering whether the compound (3 keys) could have anything to do with this.

    Just for completeness: there are none additional non clustered indexes on this table

    wkr,

    Eddy

  • I just am having trouble how to reorganize a table that size and get away with just 24 gigs of tlog activity.

    When I look at the table and count the column sizes, I get 132 bytes. If there are a billion of those, then its going to be 132 gigs right? And that's not including the overhead SQL would use in maintaining this table.

    To expect a rebuild or reorg to finish with only 24 gigs, that means your process can only write less than 25 percent of those pages, is that realistic? Maybe in the best case if the table is not that fragmented, but for me by nature, I'd probably expect the worse case. If I saw a table that size and needed to rebuild the clustered index and only had a budget of 24 gigs of log space I'd probably be a bit worried there.

    Of course, I'm a bit of a pessimist. Also I'm probably not the expert on this either, so theres that to consider also! It would be cool to learn how to reorg or rebuild a table that size and only spend 24 gigs of disk updates so I would be interested in hearing any theories here, maybe somebody knows about the internals and can shed some light on the issue!

  • Hi Patrick,

    No, you still misunderstand.

    the logspace free for use = 240 GB.

    The index size however according the used pages is very small = 24GB

    so it should perfectly fit a rebuild into this logspace.

    in execution it shows even the 240Gb free log space is not enough.

    However, meantime i found out whats the reason for this.

    You were right : 1Billion rows could never fit in 24Gb as each record size should be at least 114bytes.

    The reason why my 240Gb logspace is not enough is due to PAGE_COMPRESSION set on this index.

    And as they ave a lot of blank spaces like " " and recurring values the factor of compression is rather above average.

    The data should otherwise have been:

    1.2Billion * 114bytes = at least 127Gb

    now it makes sense it blows my 240Gb of logspace.

    makes my also poundering what extra checks might be needed for automatic maintenance.

    Have some thinking to do this weekend 😉

    Thanks for your input, 🙂

    Wkr,

    Eddy

  • Van Heghe Eddy:

    Keep in mind that when SQL creates an index on a table, there are basically two types of pages in that index: upper-level (non-leaf) index pages and lowest-level (leaf) index pages. The non-leaf level(s) store only key values, not raw data from the table. The leaf pages store all data from the index, including any included columns in a nonclustered index.

    When SQL shows you "index" size for a clustered index, it's the size of the non-leaf pages only. But when SQL rebuilds a clustered index, it has to use the entire row, including leaf pages.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • An index rebuild is a fully logged operation.

    Your clustering key is 62 bytes wide and you have 1.2B rows. That's just the data that gets distributed across the non-leaf levels.

    The leaf level of the CI contains 286 bytes per row, including 2x the max length because of the Unicode and the 2 extra bytes per column to store the length of the variable-length string. Multiple that by 1.2B rows. It's not a trivial table.

    To see how much space your table is consuming, you can use sp_spaceused:

    execute sp_spaceused 'dbo.DemoTable';

    Is it really any wonder why rebuilding the CI consumes log space?

  • You might want to take a look at the design of that table. Things like NVARCHAR(1) or using any type of character based datatype to hold only digits is a pretty bad waste of space, can become a performance problem, and certainly lends itself to difficulties like you're having when rebuilding the CI.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • How many rows do you add in a day? Is the fragmentation really significant enough that you need to rebuild a table with 1.2B rows?

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply