Small index blows huge logfile during online index rebuild

Question

Post reply

Small index blows huge logfile during online index rebuild

Van Heghe Eddy

SSCarpal Tunnel

Points: 4449
More actions
October 20, 2016 at 6:21 am

#400898

Hello,
Today I encountered a very special case and I was very surprised to see this happen.
The case:
3 key compound clustered index on nvarchar fields.
Recordcount = 1.2 billion records.
Index size : 24Gb
Logfile = 240Gb
tablestructure:
CREATE TABLE [DemoTable]
([MANDT] [nvarchar](3) NOT NULL DEFAULT (N'000'),
[IN_RECNO] [nvarchar](22) NOT NULL DEFAULT (N' '),
[IN_SEGMCNT] [nvarchar](4) NOT NULL DEFAULT (N'0000'),
[SYMBOL_ID] [nvarchar](12) NOT NULL DEFAULT (N'000000000000'),
[ASSTYP] [nvarchar](1) NOT NULL DEFAULT (N' '),
[ATAUT] [nvarchar](1) NOT NULL DEFAULT (N' '),
[MSEHI_LOW] [nvarchar](3) NOT NULL DEFAULT (N' '),
[MSEHI_HIGH] [nvarchar](3) NOT NULL DEFAULT (N' '),
[ATSRT] [nvarchar](4) NOT NULL DEFAULT (N'0000'),
[ATZIS] [nvarchar](3) NOT NULL DEFAULT (N'000'),
[ATIMB] [nvarchar](10) NOT NULL DEFAULT (N'0000000000'),
CONSTRAINT [DemoTable-0] PRIMARY KEY CLUSTERED
(
[MANDT] ASC,
[IN_RECNO] ASC,
[IN_SEGMCNT] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Never mind the nvarchar(1) datatypes, I already explained the responsible people that this is not a very good practise (2 byte overhead) and they already always use the 2 byte storage (Unicode).
The problem:
For an online index rebuild I check database free internal spaces and logfile free spaces based on a margin.
I keep a rather big marging as I calculate the index size * 2.8 for an online index rebuild.
However, on this particular table the rebuild takes about 4h and consume the whole 240Gb.
I can reproduce this any time even on a standby system without other processing happening.
data from the DMV:
Rowcount : 1175602219
in_row_data_page_count : 2922244
reserved_page_count : 3119211
in_row_used_page_count : 2987246
As a work around I have issued an index reorganize to get rid of the fragmentation levels,
very slow but controllable.
Anybody a nice idea about why this table of 24Gb would consume 240Gb (and still hungry) of logspace.
I suspect the compound key in correlation with the huge recordcounts and multiple sorts happening as the problem,
but need to be sure so I can think on a calculated way to avoid this situation.
Kind regards,
Eddy

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

x SSC-Insane Points: 23660 More actions · Answer 1

"clustered index" means the base table, so you are essentially rebuilding or reorging the base table.

Van Heghe Eddy SSCarpal Tunnel Points: 4449 More actions · Answer 2

Hi Patrick,

Think you might misinterpret the question.

Clustered index is the "table" data sorted following the keys.

No problem here.

Question i have is:

Why does a table needs more then 240GB (at least 10 fault) of logspace for an online index (table) rebuild when the clustered index size is only 24Gb.

This seems so far away from the figures i was expecting , therefore wondering whether the compound (3 keys) could have anything to do with this.

Just for completeness: there are none additional non clustered indexes on this table

wkr,

Eddy

x SSC-Insane Points: 23660 More actions · Answer 3

I just am having trouble how to reorganize a table that size and get away with just 24 gigs of tlog activity.

When I look at the table and count the column sizes, I get 132 bytes. If there are a billion of those, then its going to be 132 gigs right? And that's not including the overhead SQL would use in maintaining this table.

To expect a rebuild or reorg to finish with only 24 gigs, that means your process can only write less than 25 percent of those pages, is that realistic? Maybe in the best case if the table is not that fragmented, but for me by nature, I'd probably expect the worse case. If I saw a table that size and needed to rebuild the clustered index and only had a budget of 24 gigs of log space I'd probably be a bit worried there.

Of course, I'm a bit of a pessimist. Also I'm probably not the expert on this either, so theres that to consider also! It would be cool to learn how to reorg or rebuild a table that size and only spend 24 gigs of disk updates so I would be interested in hearing any theories here, maybe somebody knows about the internals and can shed some light on the issue!

Van Heghe Eddy SSCarpal Tunnel Points: 4449 More actions · Answer 4

Hi Patrick,

No, you still misunderstand.

the logspace free for use = 240 GB.

The index size however according the used pages is very small = 24GB

so it should perfectly fit a rebuild into this logspace.

in execution it shows even the 240Gb free log space is not enough.

However, meantime i found out whats the reason for this.

You were right : 1Billion rows could never fit in 24Gb as each record size should be at least 114bytes.

The reason why my 240Gb logspace is not enough is due to PAGE_COMPRESSION set on this index.

And as they ave a lot of blank spaces like " " and recurring values the factor of compression is rather above average.

The data should otherwise have been:

1.2Billion * 114bytes = at least 127Gb

now it makes sense it blows my 240Gb of logspace.

makes my also poundering what extra checks might be needed for automatic maintenance.

Have some thinking to do this weekend 😉

Thanks for your input, 🙂

Wkr,

Eddy

ScottPletcher SSC Guru Points: 100948 More actions · Answer 5

Van Heghe Eddy:

Keep in mind that when SQL creates an index on a table, there are basically two types of pages in that index: upper-level (non-leaf) index pages and lowest-level (leaf) index pages. The non-leaf level(s) store only key values, not raw data from the table. The leaf pages store all data from the index, including any included columns in a nonclustered index.

When SQL shows you "index" size for a clustered index, it's the size of the non-leaf pages only. But when SQL rebuilds a clustered index, it has to use the entire row, including leaf pages.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Ed Wagner SSC Guru Points: 287024 More actions · Answer 6

An index rebuild is a fully logged operation.

Your clustering key is 62 bytes wide and you have 1.2B rows. That's just the data that gets distributed across the non-leaf levels.

The leaf level of the CI contains 286 bytes per row, including 2x the max length because of the Unicode and the 2 extra bytes per column to store the length of the variable-length string. Multiple that by 1.2B rows. It's not a trivial table.

To see how much space your table is consuming, you can use sp_spaceused:

execute sp_spaceused 'dbo.DemoTable';

Is it really any wonder why rebuilding the CI consumes log space?

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 7

You might want to take a look at the design of that table. Things like NVARCHAR(1) or using any type of character based datatype to hold only digits is a pretty bad waste of space, can become a performance problem, and certainly lends itself to difficulties like you're having when rebuilding the CI.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 100948 More actions · Answer 8

How many rows do you add in a day? Is the fragmentation really significant enough that you need to rebuild a table with 1.2B rows?

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.