What is happening under the hood when I change a columns data type?

  • After going through the Microsoft® SQL Server® 2008 Internals book I am now looking for any reason to pull out the DBCC IND and DBCC PAGE commands.

    So on my test database I have a table with 1517436 rows.

    There are 27 columns with the one I am referring to a NVARCHAR(max) .

    I thought I would see what happened if I changed the NVARCHAR to VARCHAR(max).

    I view the data using DBCC Page [as well as other DMV's] before the change and see the following as my baseline:

    Record Count: 1517436

    Page Count: 635870

    Defrag %: 0

    Avg Row Size: 2745 bytes

    Avg Page Space Used: 80%

    Total DB data size: 11099MB

    The NVARCHAR(max) field has "[BLOB Inline Data] Slot 0 Column 12 Offset 0x9d Length 2670 Length (physical) 2670" in DBCC page.

    So the actual data is 1335 bytes x 2 for unicode.

    I am unsure what [BLOB Inline Data] means (I am sure I have read that if a VARCHAR(max) field is less than the page size it is stored the same as a VARCHAR rather than moved elsewhere)???

    I use T-SQL to alter the field and I was expecting a smaller record size, therefore more records per page therefore less pages and data size -- I appear to be very wrong!!!

    ALTER TABLE XYZ ALTER column html_format varchar(MAX)

    I knew it would use some server resources and possibly lock the table for a few minutes (so have done this on prod in the past not understanding what I think I know now).

    The column change took a couple of minutes with the following results:

    Record Count: 1517436

    Page Count: 1135273

    Defrag %: 83

    Avg Row Size: 2745 bytes

    Avg Page Space Used: 60%

    Total DB data size: 14729 MB

    So major defrag, 3GB of extra data in an extra 499403 pages.

    So doing this on production seems a bit more severe that just a few minutes of overhead.

    I assume some queries will be slower due to increase frag and pages and / or the database may autogrow with this large size increase?

    I noticed via DBCC Page that the NVARCHAR(max) column still exists as well as the new VARCHAR(max) and because of this sometimes two rows were on a page and sometime only one (depending the total size of the VARCHAR(max) data which is on average between 1000 and 2000 characters).

    For a random row I see a total size of 3398 bytes which now consists of 2160 bytes [the original NVARCHAR(max) data] AND 1080 [the new VARCHAR(max) data].

    The NVARCHAR(max) slot now has the text: DROPPED = [BLOB Inline Data] Slot 0 Column 67108865 Offset 0x9f Length 2160 Length (physical) 2160

    The VARCHAR(max) slot now has the text: HTML_FORMAT = [BLOB Inline Data] Slot 0 Column 67108865 Offset 0x9f Length 1080 Length (physical) 1080 (HTML_FORMAT is the field name)

    If I then do a rebuild on this table everything goes back to normal:

    Page Count: 309249

    Total DB data size: 6476MB

    Avg Row Size: 1475 bytes

    This reduced page count and data size is what I was expecting given that the space used to store the data was now going to half.

    Maybe somebody who knows more in this field than me (which is not that hard!) can confirm or deny that this is indeed what happens and the increased page count, fragmentation, db size and performance decrease would / could occur in this scenario.

    thanks

Viewing 0 posts

You must be logged in to reply to this topic. Login to reply