ALTER TABLE / ALTER COLUMN datatype I/O considerations / measurement

  • All,

    I was wondering what considerations (if any) were there (along with mitigating operational procedure) with altering the datatype of a column from INT to BIGINT on a relatively large partitioned and indexed table in SQL 2008? If there is an I/O consideration, what's a good benchmark for predicting the I/O and time length for such an operation?

  • I'm not sure there's an easy way to accurately assess it other than try it in a test environment. It's going to be an intensive operation as not only will the data size for the column double, you'll need to drop all indexes/constraints referencing it first, alter the column and then recreate the indexes

    If it's part of the partition key, as far as I'm aware, you cannot alter it's data type, you'd have to create a new table and migrate data.

  • Fortunately in this case, its not part of the partition key.

    Unfortunately, because of the size of the tables involved, getting a representative "test" environment set up for a dry run is extremely difficult. However I think you have already answered my first question around "if there is an I/O consideration".

    In the case of an int to bigint conversion, it sounds like there will at least be an I/O consideration. I do have I/O statistics and average / max throughput of IOPS on the disks in the environment. If there was a way to caculate how many IOPS were required for each row involved in this change I could calculate the amount of time it would take based on the rowcount of each table. Perhaps this is the change I can do in my lower lanes while measuring.

    Does this approach make sense for predicting / calculating the I/O time consideration?

  • I'm not sure you'd be able to do that accurately, it depends on a lot of factors. It's may not be IOPS dependent as much as overall throughput as it should be mostly sequential reads and writes.

    You'd also need to factor in the transactional overhead of all the changes.

    One way or another, you'd be crazy not to test this (at least in smaller scale) in a test environment, not least to make sure you've accounted for all constraints/indexes that need to be dropped and recreated.

    I would take a portion of the data, run a test with that, then scale up (maybe applying a factor to take into account faster IO in production)

  • Totally agree. Perhaps I'll take 1/4 of the data, run the change, watching I/O and total time, as well as t-log growth. Multiply that by 4 and I should have a fairly safe guess save factoring in faster disk in the upper lane.

    These things always make me nervous....

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply