Alter table performance

Question

Alter table performance

Mauricio_

SSCrazy

Points: 2935
More actions
January 19, 2015 at 3:54 am

#294389

Hi all,
I have a question regarding performance. I need to alter several tables, changing columns from decimal(18,4) to decimal(19,8). Some of the tables have millions of rows. So, in your opinion/experience, what's the best way to do this?
1) Using ALTER TABLE
2) Creating a new one, updating with old value, delete old column and rename the new one?
Keep in mind things like speed, log file size, etc.
Thanks for any advice.
Mauricio

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 1

Use ALTER TABLE

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

BrainDonor SSCoach Points: 19273 More actions · Answer 2

Mauricio_ (1/19/2015)
Hi all,
I have a question regarding performance. I need to alter several tables, changing columns from decimal(18,4) to decimal(19,8). Some of the tables have millions of rows. So, in your opinion/experience, what's the best way to do this?
1) Using ALTER TABLE
2) Creating a new one, updating with old value, delete old column and rename the new one?
Keep in mind things like speed, log file size, etc.
Thanks for any advice.
Mauricio

If you have several million rows then ALTER TABLE could cause your log file to grow beyond what you would want, because it will update all rows inside one transaction. There's also locking to consider while it is doing this, if the table is also being updated during this process.

You could add a new column of the type you require and then batch the updates, to keep the transactions smaller. Swap the column names using sp_rename once completed.

If the possibility of fragmentation is something that might concern then you could create the table again with the column as its new definition and then copy of the data across, renaming the tables afterwards.

Steve Hall
Linkedin
Blog Site

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 3

BrainDonor (1/19/2015)
Mauricio_ (1/19/2015)
Hi all,
I have a question regarding performance. I need to alter several tables, changing columns from decimal(18,4) to decimal(19,8). Some of the tables have millions of rows. So, in your opinion/experience, what's the best way to do this?
1) Using ALTER TABLE
2) Creating a new one, updating with old value, delete old column and rename the new one?
Keep in mind things like speed, log file size, etc.
Thanks for any advice.
Mauricio
If you have several million rows then ALTER TABLE could cause your log file to grow beyond what you would want, because it will update all rows inside one transaction. There's also locking to consider while it is doing this, if the table is also being updated during this process.
You could add a new column of the type you require and then batch the updates, to keep the transactions smaller. Swap the column names using sp_rename once completed.
If the possibility of fragmentation is something that might concern then you could create the table again with the column as its new definition and then copy of the data across, renaming the tables afterwards.

I would go into this, if there is a real issue with log size and locking. Otherwise, keep it simple. SQL Server should handle most of "millions of rows" just fine.

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

Grant Fritchey SSC Guru Points: 399340 More actions · Answer 4

Test, test, test, test, test.

I agree that keeping it simple and using ALTER TABLE is the right answer most of the time. I also agree that you may see situations where you need to break this down, especially the data movement, in order to avoid a single massive transaction. But, you're not really going to know the difference based on a number of rows. You'll need to test for your situation and then use the appropriate solution.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 5

Grant Fritchey (1/19/2015)
Test, test, test, test, test.
I agree that keeping it simple and using ALTER TABLE is the right answer most of the time. I also agree that you may see situations where you need to break this down, especially the data movement, in order to avoid a single massive transaction. But, you're not really going to know the difference based on a number of rows. You'll need to test for your situation and then use the appropriate solution.

The answer that Grant provided ABOVE is the one that I'd do especially since each system has a "tipping point" in what it can do without going nuts.

Create a test database... copy the rows just for the table in question into a test table and give it the same indexes... shrink the log file to 0 (for testing only)... backup the database... run a test... record the log file size and the backup size... do a physical stats check on any indexes to see what their fragmentation is (probably won't be affected for this change because 18 and 19 both require the same number of bytes underneath) and record them... do a restore on the test database for the next test... later, rinse, repeat until you've tested each method.

Yeah... it's going to take a little time up front. Far better there than a massive blackout on the table in production that you make the mistake of trying to kill and rollback, etc, etc.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Mauricio_ SSCrazy Points: 2935 More actions · Answer 6

Yes, test is the best, I agree. The only problem is that I underestimated the size of the table, it has 3 billion rows.

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 7

Mauricio_ (1/19/2015)
Yes, test is the best, I agree. The only problem is that I underestimated the size of the table, it has 3 billion rows.

That's gonna leave a mark. 😉 Are the powers that be absolutely positive that this change needs to be done?

If so, schedule some downtime, make sure that you take a full backup and that you've tested a bit for how long it might take, and use ALTER TABLE.

Estimates aren't going to work here. You have to know.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Mauricio_ SSCrazy Points: 2935 More actions · Answer 8

Jeff Moden (1/19/2015)
Mauricio_ (1/19/2015)
Yes, test is the best, I agree. The only problem is that I underestimated the size of the table, it has 3 billion rows.
That's gonna leave a mark. 😉 Are the powers that be absolutely positive that this change needs to be done?

Yes, after hours of meetings. 🙂

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 9

Estimates aren't going to work here. You have to know how long this is going to take and you'll likely need to schedule some downtime.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

BrainDonor SSCoach Points: 19273 More actions · Answer 10

Mauricio_ (1/19/2015)
Yes, test is the best, I agree. The only problem is that I underestimated the size of the table, it has 3 billion rows.

Well that changes the picture somewhat.

I have to agree with Jeff - is it essential? If it is then are you sure the new column size is enough? You really don't ever want to repeat this.

Is this a Data Warehouse, and does this table get updated on a fixed schedule, or is a customer-facing DB that is constantly updated?

If it is a Data Warehouse, is it Fast Track, where fragmentation is something that really, really isn't wanted?

Is there enough storage to duplicate this data onto another table, or are you going to have to use the least amount of disc storage possible?

What timescales have you been given?

Do you have somewhere large enough to make a sensible test of this?

We have tables of this size here and these are some of the things to be considered for a change such as this.

Steve Hall
Linkedin
Blog Site

Henrik Staun Poulsen SSCertifiable Points: 6598 More actions · Answer 11

Hi Mauricio,

I've just issued such a command;

ALTER TABLE dbo.VMPGlobalLogArgument ADD VMPGlobalLogArgumentID BIGint NOT NULL IDENTITY (1, 1)

on a table with 1.4 billion rows.

This time I can afford the down time, and I know the log file is big enough to cope with such a change.

Some of the other tables we have, we cannot afford several hours/days of down time, so we developed a set of stored procedures to copy all rows into a table with the new format.

When the initial copy has finished, then we take the downtime to a) we copy the (new) remaining rows, and b) rename the tables such that the new table replace the old table.

This requires a lot of disk space in the database instead of the log file.

So try it out on your test server, estimate the time needed for both solutions (both run time and programming time), and make your choice.

Best regards,

Henrik Staun Poulsen

Charles Kincaid SSChampion Points: 13593 More actions · Answer 12

Understanding the way that SQL Server stores things is very helpful in understanding why some table alterations seem to take no time at all and other will bring the server to its knees.

I read a wonderful book on SQL Server 2008 internals. The key to this is understanding how things get stored in a row (any row) and what changes mean. As you will remember when you add a column it is added to the end of the list of columns and you have no choice about it. There is actually a good reason for that as this causes the least amount of rewrites than other constructions. It does no avoid them all though.

If you are increasing the length of a NVARCHAR column then all that needs to be changed is the metadata and there are no row rewrites. Very fast. Since a variable width column that is NULL does no actually exist in the row then adding a variable width column that is nullable also does not cause rewrites. Things change when we get to fixed width things.

In a row all of your fixed width elements are packed one after the other in the order that they are defined in the column list. No delimiters. Columns that are NULL still take up the space. Yes this means that DATETIME always takes up 8 bytes in the row regardless of contents, even NULL. NULL is handled by the NULL map but fixed width data can always be found as X number of bytes from the beginning of the row.

Variable width columns are found after all of the fixed width columns. This means that adding a fixed width row will cause every row in the table to be rewritten. Likewise dropping a fixed width column will cause all rows to be rewritten.

Changing a column type on fixed width will likely cause rewrites. Going from BIGINT to INT. Almost all of these will cause rewrites.

So starting out by adding a fixed width column is not likely to make things better. That is almost guaranteed to rewrite every row and since the row will now take up even more space there will be page splits. Since the non-clustered indexes use the clustered key rather than the RID they should no be much affected but the cluster index will get a thrashing.

I think that the best way to go about this is the ALTER TABLE ALTER COLUMN thing. Yes this will rewrite all of your 4 million rows but it will do so only once. The other will rewrite because of (1) adding a column, (2) doing the UPDATE, and (3) dropping the column. Renaming a column is metadata only. (it may have impacts elsewhere.)

Restore a backup onto your oldest, slowest, dog breath machine in the place and do timing tests on ALTER TABLE ALTER COLUMN.

ATBCharles Kincaid

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 13

Charles Kincaid (1/21/2015)
Understanding the way that SQL Server stores things is very helpful in understanding why some table alterations seem to take no time at all and other will bring the server to its knees.
I read a wonderful book on SQL Server 2008 internals. The key to this is understanding how things get stored in a row (any row) and what changes mean. As you will remember when you add a column it is added to the end of the list of columns and you have no choice about it. There is actually a good reason for that as this causes the least amount of rewrites than other constructions. It does no avoid them all though.
If you are increasing the length of a NVARCHAR column then all that needs to be changed is the metadata and there are no row rewrites. Very fast. Since a variable width column that is NULL does no actually exist in the row then adding a variable width column that is nullable also does not cause rewrites. Things change when we get to fixed width things.
In a row all of your fixed width elements are packed one after the other in the order that they are defined in the column list. No delimiters. Columns that are NULL still take up the space. Yes this means that DATETIME always takes up 8 bytes in the row regardless of contents, even NULL. NULL is handled by the NULL map but fixed width data can always be found as X number of bytes from the beginning of the row.
Variable width columns are found after all of the fixed width columns. This means that adding a fixed width row will cause every row in the table to be rewritten. Likewise dropping a fixed width column will cause all rows to be rewritten.
Changing a column type on fixed width will likely cause rewrites. Going from BIGINT to INT. Almost all of these will cause rewrites.
So starting out by adding a fixed width column is not likely to make things better. That is almost guaranteed to rewrite every row and since the row will now take up even more space there will be page splits. Since the non-clustered indexes use the clustered key rather than the RID they should no be much affected but the cluster index will get a thrashing.
I think that the best way to go about this is the ALTER TABLE ALTER COLUMN thing. Yes this will rewrite all of your 4 million rows but it will do so only once. The other will rewrite because of (1) adding a column, (2) doing the UPDATE, and (3) dropping the column. Renaming a column is metadata only. (it may have impacts elsewhere.)
Restore a backup onto your oldest, slowest, dog breath machine in the place and do timing tests on ALTER TABLE ALTER COLUMN.

Awesome post, Charles. Just to add to that, if you add a fixed width column (for example) or a NOT NULL variable width column (with a default, of course) to a table with a clustered index on it, you'll end up with rewriting all the rows once for that and again when you re-index to get rid of all the massive page splits that it'll cause.

Because it doesn't happen often, I've not thought about what happens if you change the column width of a fixed with column that appears in an NCI. My assumption is that it's going to do splits at the page and extent levels but I don't know if that's true or not. Got any info on that?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Charles Kincaid SSChampion Points: 13593 More actions · Answer 14

Thanks Jeff. No actual test data but you would be right in presuming that if you change the type of a column that is indexed, either as the key or part of the key, you will cause the index to be affected. Although this was not the point of the original question it is a noteworthy consideration.

To avoid the kind of log pressure that would result might be the reason that some folks have taken to dropping affected indexes, doing the change, and then recreating the index. Also this might affect indexes that most folks don't think of as indexes: Foreign Keys.

ATBCharles Kincaid