Expanding column length on very large table

  • Bob Hovious (8/5/2009)


    But changing from a char(1) to a varchar, i'm sure pointers would have to be established on the page to support the varchar. It wouldn't necessarily all have to be done at once. Maybe it's smart enough to do that on a row by row basis (or page by page) as data is updated.

    Altering tables is a pretty dumb operation in SQL Server 2005/2008. At least compared to what you might think.

    For example, dropped columns are just marked as dropped - the data remains in place in the row, and continues to count toward the 8060 byte row size limit. The only way to reclaim space from dropped fixed-length columns is to rebuild the clustered index (or build one and drop it if the table is a heap). You can run DBCC CLEANTABLE to reclaim space used by variable-length columns, but it is fully logged, resource-intensive, and makes no attempt to 'tidy up' the on-page structures.

    Paul

  • Wouldnt't a varchar(2) actually use more space than a char(2). I thought varchar is the number of characters + 2 bytes. So for instance '' would use 2 bytes, 'N' would use 3 bytes and 'No' would use 4 bytes. Whereas if it was a char datatype then all 3 examples would use 2 bytes. So in terms of performance/table width I think it would make sense to stick with Char for this. If the length of a field can be 3 or more and is variable then go with a varchar. Obviously the performance impact of this might be neglible depending on your table width.

  • Thanks for your input Paul. I'll take your suggestion and do some testing in our DEV environement.

  • Josh (8/7/2009)


    Wouldnt't a varchar(2) actually use more space than a char(2). I thought varchar is the number of characters + 2 bytes. So for instance '' would use 2 bytes, 'N' would use 3 bytes and 'No' would use 4 bytes. Whereas if it was a char datatype then all 3 examples would use 2 bytes. So in terms of performance/table width I think it would make sense to stick with Char for this. If the length of a field can be 3 or more and is variable then go with a varchar. Obviously the performance impact of this might be neglible depending on your table width.

    Hey Josh,

    So yeah - that's true - though not the whole story by any means. I have found that people tend to resist the CHAR data type where the column is not truly fixed length - it's probably the padding behaviour that annoys people.

    For most columns you won't lose much by using VARCHAR instead of CHAR. In fact you may not lose anything at all - it all comes down to how many rows (including all the overheads) will fit on an 8KB page. Even if the VARCHAR uses 2 extra bytes per row, that makes no odds if the same number of rows fit on a page. In any case, the difference is likely to be small in most cases.

    For larger columns, it depends on the data. Remember that VARCHAR only uses the number of bytes required whereas CHAR always uses the fixed amount. For a (VAR)CHAR(255) column that has an average data length of 16 bytes, VARCHAR will pack far more rows on a page. Hence the general advice is to use VARCHAR if the data length varies by much (for some value of much!)

    Paul

  • Hey Paul, I was aware of how dropped columns are handled. I was just theorizing about how Michael got his one-second update. His later statement that his original column was varchar(1) explained that.

    I generally use char(1) columns instead of BIT because Y/N statuses often evolve into more subtle distinctions. Any length of ten or over I go with varchar() every time. The tricky calls to me are where I'm being given codes with a length around three or 4. Although char is the more efficient use of storage, I prefer varchar behavior and have a sneaking suspicion that the requirements will change one day to larger strings. As has already been discussed, increasing the length of a varchar column is a trivial effort.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Hi,

    I have 140,453,701 number of rows in a SQL server database table (2008). I need to change one column of varchar(max) to nvarchar(max). how can i estimate the size of transaction log and time required to execute the query. I am using following query

    alter table table_name

    alter column_name nvarchar(max) null.

    How can I optimize it to run it faster than normal ?

    Thanks

    Vaibhav

  • How can I check the integrity of the data change ??

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply