XML is 17GB on disk... 4MB when cast as varchar(max)

  • Hi folks,

    I have a table that always contains only 1 row which has a column defined as XML data type. The size on disk for this table with only 1 row is 17GB. When I cast it to a varchar(max) the size is only 4MB.

    It is my understanding that the size of an XML column is limited to 2GB. Is SQL Server doing something behind the scenes (like keeping its own internal index for traversing the nodes in the XML document)?

    I am running the following query to check space used:

    http://www.databasejournal.com/features/mssql/article.php/3733996/Find-space-Usage-by-Table--Schema-in-SQL-Server-2005-and-2008.htm

    Here are the results:

    schemanametablenamerow_countreserved_KBdata_KBindex_size_KBunused_KB

    BadSchemaBadTable1170156881701529632360

    I ran the following command:

    SELECT CAST(BadColumn AS VARCHAR(MAX))

    INTO BadSchema.BadTable_VarcharMax

    FROM BadSchema.BadTable;

    And then I checked the space again. Here are the results:

    schemanametablenamerow_countreserved_KBdata_KBindex_size_KBunused_KB

    BadSchemaBadTable1170156881701529632360

    BadSchemaBadTable_VarcharMax147204688824

    Can anyone explain this?

    Thanks,

    SQL Dude

  • my first guess: the table is a heap, and does not have a clustered index.

    in the case of a heap, if you update data, or delete and reinsert individual rows (without getting a table lock), a heap never frees up the space previously occupied by the pages of data, unless you got a table lock on the table for the delete/update operation.

    my suggestion:

    create a clone of the table featuring a clustered index (ie BadTable_tmp)

    insert the data into that table,

    rename the old table to something else

    and rename the clone table to the original name

    i'll bet you consistently see the data is much smaller.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • To further on Lowell's point, what is the output of "select datalength(xml_column) from bad_table"? Should be about 70% of the nvarchar conversion normally.

    😎

  • Lowell (4/30/2014)


    my first guess: the table is a heap, and does not have a clustered index.

    in the case of a heap, if you update data, or delete and reinsert individual rows (without getting a table lock), a heap never frees up the space previously occupied by the pages of data, unless you got a table lock on the table for the delete/update operation.

    my suggestion:

    create a clone of the table featuring a clustered index (ie BadTable_tmp)

    insert the data into that table,

    rename the old table to something else

    and rename the clone table to the original name

    i'll bet you consistently see the data is much smaller.

    Thank you for the reply. The table is not a heap, there is an identity column with a clustered index primary key.

  • Eirikur Eiriksson (4/30/2014)


    To further on Lowell's point, what is the output of "select datalength(xml_column) from bad_table"? Should be about 70% of the nvarchar conversion normally.

    😎

    Thanks for your reply.

    The value returned for select datalength(BadColumn) from BadSchema.BadTable is 3345897 (so like... 3.19 MB?) Now I'm really confused.

  • SQLdude412 (4/30/2014)


    Eirikur Eiriksson (4/30/2014)


    To further on Lowell's point, what is the output of "select datalength(xml_column) from bad_table"? Should be about 70% of the nvarchar conversion normally.

    😎

    Thanks for your reply.

    The value returned for select datalength(BadColumn) from BadSchema.BadTable is 3345897 (so like... 3.19 MB?) Now I'm really confused.

    Just as I thought and what Lowell said earlier, bloated heap! Follow his instructions.

    😎

  • Eirikur Eiriksson (4/30/2014)


    SQLdude412 (4/30/2014)


    Eirikur Eiriksson (4/30/2014)


    To further on Lowell's point, what is the output of "select datalength(xml_column) from bad_table"? Should be about 70% of the nvarchar conversion normally.

    😎

    Thanks for your reply.

    The value returned for select datalength(BadColumn) from BadSchema.BadTable is 3345897 (so like... 3.19 MB?) Now I'm really confused.

    Just as I thought and what Lowell said earlier, bloated heap! Follow his instructions.

    😎

    Thanks again for your reply. As I said before, the table is most definitely not a heap. It has the following structure:

    CREATE TABLE BadSchema.BadTable (

    primaryKeyID uniqueidentifier not null default newsequentialid(), -- I was wrong before I thought this was an identity column

    alternateKeyID int not null,

    BadColumn XML null,

    datevalue1 datetimeoffset not null,

    datevalue2 datetimeoffset not null,

    CONSTRAINT PK_BadTable PRIMARY KEY CLUSTERED (primaryKeyID),

    CONSTRAINT AK1_BadTable UNIQUE NONCLUSTERED (alternateKeyID)

    );

    Is there anything causing SQL Server to treat the table like a heap even though it has a clustered index? Is is possible this is a bug?

    I appreciate your help so far and hope you can help me further.

  • More evidence that it is not a heap

    SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'BadSchema.BadTable');

    object_idnameindex_idtypetype_descis_uniquedata_space_idignore_dup_keyis_primary_keyis_unique_constraintfill_factoris_paddedis_disabledis_hypotheticalallow_row_locksallow_page_lockshas_filterfilter_definition

    981578535PK_BadTable11CLUSTERED110100000110NULL

    981578535AK1_BadTable22NONCLUSTERED110100000110NULL

  • He he, somehow my phone didn't show the earlier reply, kind of awkward :hehe:

    My first approach would be creating a table with a different name but same structure, copy the data and then rename (sp_rename) to switch the two.

    The cause could be an update related problem but must say, I haven't seen this kind of bloating on an xml column before.

    😎

  • Just a heads up, I followed your advice despite my assertion that the table is not a heap, and the size is now 3 MB.

    I am continuing to investigate this since I am 100% certain this table is not a heap.

    Thanks again for all your help. I'll continue to update this thread with any findings.

  • What happens if you rebuild the clustered index?

  • arnipetursson (4/30/2014)


    What happens if you rebuild the clustered index?

    Good question! I already switched the table out in production but I should be able to restore a backup and try it out. I will let you know!

  • Quick update.

    After switching out the table for a newly created one (double-checked the clustered index), the initial size was 3MB but it has been growing since then and is now around 50MB.

    Here are some facts that may help narrow this down:

    * We are running SQL Server 2008 SP3 on Windows Server 2008 R2

    * We are running with READ_COMMITTED_SNAPSHOT turned ON

    * The data in this table is updated by the following statement:

    BEGIN TRANSACTION

    DELETE FROM BadSchema.BadTable WHERE alternateKeyID = @Foo;

    INSERT INTO BadSchema.BadTable VALUES (DEFAULT, @Foo, @foo2, @foo3, @foo4);

    COMMIT TRANSACTION

  • arnipetursson (4/30/2014)


    What happens if you rebuild the clustered index?

    ALTER INDEX PK_BadTable ON BadSchema.BadTable REBUILD;

    The command completed successfully.

    The size of the data is unchanged. (Currently 66MB)

  • Update: I can reproduce this with READ_COMMITTED_SNAPSHOT turned ON, but not with it OFF.

    The results are inconsistent; sometimes it goes up and never goes down... other times it seems to clean up OK. I will keep this thread updated as I gather more information.

    EDIT: Nevermind, I believe what I was seeing on my test system was a "lazy" ghost cleanup. It eventually did shrink back down. See follow up post for more info.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply