XML is 17GB on disk... 4MB when cast as varchar(max)

Question

XML is 17GB on disk... 4MB when cast as varchar(max)

SQLdude412

SSC Enthusiast

Points: 179
More actions
April 30, 2014 at 10:53 am

#292850

Hi folks,
I have a table that always contains only 1 row which has a column defined as XML data type. The size on disk for this table with only 1 row is 17GB. When I cast it to a varchar(max) the size is only 4MB.
It is my understanding that the size of an XML column is limited to 2GB. Is SQL Server doing something behind the scenes (like keeping its own internal index for traversing the nodes in the XML document)?
I am running the following query to check space used:
http://www.databasejournal.com/features/mssql/article.php/3733996/Find-space-Usage-by-Table--Schema-in-SQL-Server-2005-and-2008.htm
Here are the results:
schemanametablenamerow_countreserved_KBdata_KBindex_size_KBunused_KB
BadSchemaBadTable1170156881701529632360
I ran the following command:
SELECT CAST(BadColumn AS VARCHAR(MAX))
INTO BadSchema.BadTable_VarcharMax
FROM BadSchema.BadTable;
And then I checked the space again. Here are the results:
schemanametablenamerow_countreserved_KBdata_KBindex_size_KBunused_KB
BadSchemaBadTable1170156881701529632360
BadSchemaBadTable_VarcharMax147204688824
Can anyone explain this?
Thanks,
SQL Dude

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

Lowell SSC Guru Points: 323504 More actions · Answer 1

my first guess: the table is a heap, and does not have a clustered index.

in the case of a heap, if you update data, or delete and reinsert individual rows (without getting a table lock), a heap never frees up the space previously occupied by the pages of data, unless you got a table lock on the table for the delete/update operation.

my suggestion:

create a clone of the table featuring a clustered index (ie BadTable_tmp)

insert the data into that table,

rename the old table to something else

and rename the clone table to the original name

i'll bet you consistently see the data is much smaller.

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Eirikur Eiriksson SSC Guru Points: 182881 More actions · Answer 2

To further on Lowell's point, what is the output of "select datalength(xml_column) from bad_table"? Should be about 70% of the nvarchar conversion normally.

😎

SQLdude412 SSC Enthusiast Points: 179 More actions · Answer 3

Lowell (4/30/2014)
my first guess: the table is a heap, and does not have a clustered index.
in the case of a heap, if you update data, or delete and reinsert individual rows (without getting a table lock), a heap never frees up the space previously occupied by the pages of data, unless you got a table lock on the table for the delete/update operation.
my suggestion:
create a clone of the table featuring a clustered index (ie BadTable_tmp)
insert the data into that table,
rename the old table to something else
and rename the clone table to the original name
i'll bet you consistently see the data is much smaller.

Thank you for the reply. The table is not a heap, there is an identity column with a clustered index primary key.

SQLdude412 SSC Enthusiast Points: 179 More actions · Answer 4

Eirikur Eiriksson (4/30/2014)
To further on Lowell's point, what is the output of "select datalength(xml_column) from bad_table"? Should be about 70% of the nvarchar conversion normally.
😎

Thanks for your reply.

The value returned for select datalength(BadColumn) from BadSchema.BadTable is 3345897 (so like... 3.19 MB?) Now I'm really confused.

Eirikur Eiriksson SSC Guru Points: 182881 More actions · Answer 5

SQLdude412 (4/30/2014)
Eirikur Eiriksson (4/30/2014)
To further on Lowell's point, what is the output of "select datalength(xml_column) from bad_table"? Should be about 70% of the nvarchar conversion normally.
😎
Thanks for your reply.
The value returned for select datalength(BadColumn) from BadSchema.BadTable is 3345897 (so like... 3.19 MB?) Now I'm really confused.

Just as I thought and what Lowell said earlier, bloated heap! Follow his instructions.

😎

SQLdude412 SSC Enthusiast Points: 179 More actions · Answer 6

Eirikur Eiriksson (4/30/2014)
SQLdude412 (4/30/2014)
Eirikur Eiriksson (4/30/2014)
To further on Lowell's point, what is the output of "select datalength(xml_column) from bad_table"? Should be about 70% of the nvarchar conversion normally.
😎
Thanks for your reply.
The value returned for select datalength(BadColumn) from BadSchema.BadTable is 3345897 (so like... 3.19 MB?) Now I'm really confused.
Just as I thought and what Lowell said earlier, bloated heap! Follow his instructions.
😎

Thanks again for your reply. As I said before, the table is most definitely not a heap. It has the following structure:

CREATE TABLE BadSchema.BadTable (

primaryKeyID uniqueidentifier not null default newsequentialid(), -- I was wrong before I thought this was an identity column

alternateKeyID int not null,

BadColumn XML null,

datevalue1 datetimeoffset not null,

datevalue2 datetimeoffset not null,

CONSTRAINT PK_BadTable PRIMARY KEY CLUSTERED (primaryKeyID),

CONSTRAINT AK1_BadTable UNIQUE NONCLUSTERED (alternateKeyID)

);

Is there anything causing SQL Server to treat the table like a heap even though it has a clustered index? Is is possible this is a bug?

I appreciate your help so far and hope you can help me further.

SQLdude412 SSC Enthusiast Points: 179 More actions · Answer 7

More evidence that it is not a heap

SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'BadSchema.BadTable');

object_idnameindex_idtypetype_descis_uniquedata_space_idignore_dup_keyis_primary_keyis_unique_constraintfill_factoris_paddedis_disabledis_hypotheticalallow_row_locksallow_page_lockshas_filterfilter_definition

981578535PK_BadTable11CLUSTERED110100000110NULL

981578535AK1_BadTable22NONCLUSTERED110100000110NULL

Eirikur Eiriksson SSC Guru Points: 182881 More actions · Answer 8

He he, somehow my phone didn't show the earlier reply, kind of awkward :hehe:

My first approach would be creating a table with a different name but same structure, copy the data and then rename (sp_rename) to switch the two.

The cause could be an update related problem but must say, I haven't seen this kind of bloating on an xml column before.

😎

SQLdude412 SSC Enthusiast Points: 179 More actions · Answer 9

Just a heads up, I followed your advice despite my assertion that the table is not a heap, and the size is now 3 MB.

I am continuing to investigate this since I am 100% certain this table is not a heap.

Thanks again for all your help. I'll continue to update this thread with any findings.

arnipetursson SSCertifiable Points: 6557 More actions · Answer 10

arnipetursson

SSCertifiable

Points: 6557

April 30, 2014 at 12:23 pm

#1709965

What happens if you rebuild the clustered index?

SQLdude412 SSC Enthusiast Points: 179 More actions · Answer 11

arnipetursson (4/30/2014)
What happens if you rebuild the clustered index?

Good question! I already switched the table out in production but I should be able to restore a backup and try it out. I will let you know!

SQLdude412 SSC Enthusiast Points: 179 More actions · Answer 12

Quick update.

After switching out the table for a newly created one (double-checked the clustered index), the initial size was 3MB but it has been growing since then and is now around 50MB.

Here are some facts that may help narrow this down:

* We are running SQL Server 2008 SP3 on Windows Server 2008 R2

* We are running with READ_COMMITTED_SNAPSHOT turned ON

* The data in this table is updated by the following statement:

BEGIN TRANSACTION

DELETE FROM BadSchema.BadTable WHERE alternateKeyID = @Foo;

INSERT INTO BadSchema.BadTable VALUES (DEFAULT, @Foo, @foo2, @foo3, @foo4);

COMMIT TRANSACTION

SQLdude412 SSC Enthusiast Points: 179 More actions · Answer 13

arnipetursson (4/30/2014)
What happens if you rebuild the clustered index?

ALTER INDEX PK_BadTable ON BadSchema.BadTable REBUILD;

The command completed successfully.

The size of the data is unchanged. (Currently 66MB)

SQLdude412 SSC Enthusiast Points: 179 More actions · Answer 14

Update: I can reproduce this with READ_COMMITTED_SNAPSHOT turned ON, but not with it OFF.

The results are inconsistent; sometimes it goes up and never goes down... other times it seems to clean up OK. I will keep this thread updated as I gather more information.

EDIT: Nevermind, I believe what I was seeing on my test system was a "lazy" ghost cleanup. It eventually did shrink back down. See follow up post for more info.