November 24, 2005 at 4:34 pm
Hi,
I have a table with the following structure (in SQL server 2000)
TestTable (col1 numeric, col2 tinyint, col3 text, col4 datetime, col5
datetime, col6 varchar(32))
The primary key is col1 and col2 (clustered, default fill factor).
After i insert 1000 records with 2K data each for the text column (col3),
the sp_spaceused command shows the data+index usage of around 4M. Similarly,
for 1000 inserts of 3K data each, the storage size goes to 4M, and for 1000
inserts of 4K, 5K, 6K, 7k and 8K, the size is shown around 8M.
1. Can you tell me why this extra space is used?
2. Is there some parameter/ setting to be done not to use so much of free space? If this is the way data is stored, then with our actual
data of 3 million rows, the storage space requirements are going to be huge!
Please help
Thanks in advance
November 24, 2005 at 11:32 pm
Hi!!!
That extra space reqiured may be data header...
u can get more information on
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/createdb/cm_8_des_02_92k3.asp
Regards,
Papillon
November 25, 2005 at 9:27 am
The storage jumps when you hit 4k per text value because only a single text record of that size can be stored on a page (remember a page is 8192 bytes - 96 bytes for the header - 2 bytes for at least one record in the offset table) with only 8094 available for actually storing data - so 2 of these 4k values won't fit on a page. Each page is 8k, so 1000 * 8k is 8M. The text records will not be split between pages during the initial insert - subsequent updates may cause them to be split and use the space more efficiently.
Checkout the varchar(max) datatype in SQL Server 2005 - that has much better behavior.
Regards
Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005
November 25, 2005 at 12:56 pm
hmmm .. my recollection is that a text column will always allocate 1 page per populated row for the text data regardless of whether you put any data into it. That's one of the downsides of text columns. However you can configure your table to store some of the text data, subject to space, within the actual row rather as a pointer to the page. All depends how large your data is, if it's less than 8000 bytes ( approx - depends upon row space ) then allocate a varchar, overall table space will be less.
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
November 25, 2005 at 5:36 pm
Hi Paul,
Thanks a lot for the info.
I am still confused as the BOL says - "Although the data for text, ntext, and image columns is arranged logically in a B-tree, both the root node and the individual blocks of data are spread throughout the chain of text, ntext, and image pages for the table. They are placed wherever there is space available. The size of each block of data is determined by the size written by an application. Small blocks of data will be combined to fill a page. If there is less than 64 bytes of data, it is all stored in the root structure.
For example, if an application first writes 1 KB of image data, this is stored as the first 1-KB block of image data for the row. If the application then writes 12 KB of image data, then 7 KB is combined with the first 1-KB block so the first block becomes 8 KB. The remaining 5 KB forms the second block of image data. (The actual capacity of each ntext, text, or image page is 8080 bytes of data.)"
1. So why is this packing of data within a page not happening during the initial data load?
2. If its going to allocate 8K pages, how do i estimate the storage space required? (we have around 3 million rows with average text size of 6K).I could find only articles related to non-text datatypes. Is there any article on how to estimate for the size of a table with text datatype?
3. As of now, have to manage with SQL server 2000. Is there any other alternative (designwise)?
Thanks a lot
warm regards
Anitha
November 27, 2005 at 11:14 pm
Not always. There's a complicated algorithm which determines whether to use a new page or not. For instance, inserting two 2k values into an empty table will use a single page, but inserting a 2k and a 5k value will take two pages. Just tried it on SS2005 too, and examining the on-disk layout confirms.
I don't remember the exact calculations involved unfortunately and I don't have the sourcecode on my laptop. If you're interested I'll find out from the dev.
Thanks
Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005
November 27, 2005 at 11:14 pm
1. Just the way the design works. Any value that doesn't fit on the rest of a page will have a new page allocated for it.
2. Its really not possible to say deterministically - so much depends on the order in which they're inserted. A rough guess would be 8k * 3 million (around 23Gb).
3. Do you know the maximum text value size? If its always possible to fit the entire record in 8063 bytes, you could use varchar instead of text, or use the text-in-row-option - both as Colin also suggest above.
Thanks
Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005
November 29, 2005 at 5:59 pm
Hi,
Thanks for the valuable information.
warm regards
Anitha
Viewing 8 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply