NULL Storage Space

Question

NULL Storage Space

Undebtedly

SSCrazy

Points: 2938
More actions
January 17, 2003 at 1:15 pm

#159116

In the nvarchar and nvarbinary data types, storage is 2 * n characters. Does NULL take any storage space? Does it increase the size of an index?
Might seem like a silly question, but the amount of bytes we backup is a constant concern with this project. Adding hardware is not an option.
Steve Miller

Steve Miller

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Antares686 SSC Guru Points: 125444 More actions · Answer 1

Ok when the data is stored and looked at in hex at the page level there are a few things that occurr.

First the row contains a header that tells if any variable length columns exist in the row with data. NULLable variable length columns if left out will produce header mark for no variable length columns in data.

Next all the fixed length data is stored in the fron of the row under the hood and the variable at the back. Even though you define a table as

col1 int

col2 nvarchar(5)

col3 varchar(5)

col4 char(5)

col5 nvarchar(8)

it will be stored as

col1 int

col4 char(5)

col2 nvarchar(5)

col3 varchar(5)

col5 nvarchar(8)

In order in the table, bet some didn't realize that but that makes storage easier.

Now if there is data in the variable length columns of course the header has the marker set to indentify that.

Then this is what transpiers.

Say I had two nvarchar columns in my table.

And in one insert I insert into the value for only the first.

Reading the data you see a column count marker at the point of the variable length columns being added on.

This column counter is 2 extra bytes long not accounted for in the setup of the table.

Next you see the offset in the row of data for the end of the column.

Also this is 2 bytes long and not counted for in your table def.

SO going further, if you populate both variable length columns you get the column counter showing 2 and then the next 2 bytes represent col1s offset and the next 2 bytes after that represent col2s offset.

In other words you have 6bytes extra plus the data.

Finally the really neat part. If say varible length col1 is NULL and col2 is filled in then you get the following situation.

You have column count 2, col1 offset and col2 offset. But col1 offset points to the position right after col2s offset value. In other words the column count lets the DB engine know to expect 4 bytes for 2 columns then the col1 offset right after that returns a zero characters or length or in this case NULL. And the offset for col2 of course combined with col1s offset tells column 2s data.

So if I created a table like so

CREATE TABLE [Table1] (

[col1] [int],

[col2] [nvarchar] (5),

[col3] [int],

[col4] [char] (10),

[col5] [nvarchar] (3)

)

and fill only the fixed lenght items in I use a length of the byte lengths of those combined plus the header bytes which are 4 and an additional 3 bytes I haven't figured out completely yet.

But if you fill out col2 only in the described table you get

total length of fixed items

header

piece i am not sure about

column count bytes

col2 offset bytes

and actual data length of col2 entered

if you fill out both col2 and col5 then you get

total length of fixed items

header

piece i am not sure about

column count bytes

col2 offset bytes

col5 offset bytes

actual data length of col2 entered

and actual data length of col5 entered

then if you fill out col5 but not col2 you get

total length of fixed items

header

piece i am not sure about

column count bytes

col2 offset bytes

col5 offset bytes

col2 length = 0 bytes (no data no bytes)

and actual data length of col5 entered

So keeping that in mind if you really want to pinch space out make sure your column order is most used variable length column first to least used. And as a gotcha say you add a NON-NULL variable length column to the end. Your column count will include any NULLable that have not been field. So NON-NULL should be before nullable if you are byte conscience.

Hopefully that makes sense and sheds some light on under the hood.

kknudson SSCarpal Tunnel Points: 4986 More actions · Answer 2

OK Antares

"Ok when the data is stored and looked at in hex at the page level there are a few things that occurr."

Do you have any special tools used to view the MDF at the physical level ?

Is there any documentation on the physical structure ?

I'm just curious, I used to know IMS structures (yea I'm an old guy).

KlK, MCSE

KlK

Antares686 SSC Guru Points: 125444 More actions · Answer 3

DBCC PAGE is what I use. Then just a lot of time figuring the data out.

Andy Warren SSC Guru Points: 119902 More actions · Answer 4

If its just backup space that concerns you, you should take a look at both SQLZip and SQL Litespeed, both advertise here on the site. You can really cut the amount of space needed for backup. Steve Jones has a review of Litespeed, I reviewed SQLZip a while back.

Andy

http://www.sqlservercentral.com/columnists/awarren/

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter

Undebtedly SSCrazy Points: 2938 More actions · Answer 5

Antares, thanks for sharing your research.

>> In other words you have 6 bytes extra plus the data.

If I'm reading you right, 2 of the 6 bytes is in the column header, 2 is is for one column, and 2 is for the other column.

So if I have one nvarchar in the table, I would add 4 bytes to the table, plus whatever is stored in the columns. Right?

>> If its just backup space that concerns you

Andy, backup space does concern us, but it is broader than that. We're using MSDE as a object repository for a C++/C# front end. The software will be distributed primarily on stand-alone machines all over the world. We don't have the ability to control the hardware it runs on, and much of the hardware will be old. We try to get every ounce of performance we can.

Steve

Steve Miller

Antares686 SSC Guru Points: 125444 More actions · Answer 6

Actually it is an extra 4 bytes for the row header. Plus 2 bytes for the variable length column count in the row. Plus with a single nvarchar 2 bytes for the offset for the end of the data for the column. Plus 1 byte for an end of record or something that seems to always be 00 that I have not figured out.

So if you have 1 nvarchar column and an int identity field (int is 4 bytes) in the table it would be

4+4+2+2+1+(lengthofdatainnvarcharfield*2)

OR

RecordHeader+INTLen+ColCount+Col1Offset+(lengthofdatainnvarcharfield*2)

If NVARCHAR field is null you would have

4+4+1

Or

RecordHeader+INTLen+EndRec

due to no non-null variable length column and the header would note it as such.

Hope that makes sense.

Edited by - antares686 on 01/21/2003 4:04:51 PM

Undebtedly SSCrazy Points: 2938 More actions · Answer 7

I think it makes sense.

Bottom line for me is this: those records that have a null value in the nvarchar column will not eat any storage. That's what I wanted, and I'm happy.

Thanks,

Steve

Steve Miller