NULL Storage Space

  • In the nvarchar and nvarbinary data types, storage is 2 * n characters. Does NULL take any storage space? Does it increase the size of an index?

    Might seem like a silly question, but the amount of bytes we backup is a constant concern with this project. Adding hardware is not an option.

    Steve Miller



    Steve Miller

  • Ok when the data is stored and looked at in hex at the page level there are a few things that occurr.

    First the row contains a header that tells if any variable length columns exist in the row with data. NULLable variable length columns if left out will produce header mark for no variable length columns in data.

    Next all the fixed length data is stored in the fron of the row under the hood and the variable at the back. Even though you define a table as

    col1 int

    col2 nvarchar(5)

    col3 varchar(5)

    col4 char(5)

    col5 nvarchar(8)

    it will be stored as

    col1 int

    col4 char(5)

    col2 nvarchar(5)

    col3 varchar(5)

    col5 nvarchar(8)

    In order in the table, bet some didn't realize that but that makes storage easier.

    Now if there is data in the variable length columns of course the header has the marker set to indentify that.

    Then this is what transpiers.

    Say I had two nvarchar columns in my table.

    And in one insert I insert into the value for only the first.

    Reading the data you see a column count marker at the point of the variable length columns being added on.

    This column counter is 2 extra bytes long not accounted for in the setup of the table.

    Next you see the offset in the row of data for the end of the column.

    Also this is 2 bytes long and not counted for in your table def.

    SO going further, if you populate both variable length columns you get the column counter showing 2 and then the next 2 bytes represent col1s offset and the next 2 bytes after that represent col2s offset.

    In other words you have 6bytes extra plus the data.

    Finally the really neat part. If say varible length col1 is NULL and col2 is filled in then you get the following situation.

    You have column count 2, col1 offset and col2 offset. But col1 offset points to the position right after col2s offset value. In other words the column count lets the DB engine know to expect 4 bytes for 2 columns then the col1 offset right after that returns a zero characters or length or in this case NULL. And the offset for col2 of course combined with col1s offset tells column 2s data.

    So if I created a table like so

    CREATE TABLE [Table1] (

    [col1] [int],

    [col2] [nvarchar] (5),

    [col3] [int],

    [col4] [char] (10),

    [col5] [nvarchar] (3)

    )

    and fill only the fixed lenght items in I use a length of the byte lengths of those combined plus the header bytes which are 4 and an additional 3 bytes I haven't figured out completely yet.

    But if you fill out col2 only in the described table you get

    total length of fixed items

    header

    piece i am not sure about

    column count bytes

    col2 offset bytes

    and actual data length of col2 entered

    if you fill out both col2 and col5 then you get

    total length of fixed items

    header

    piece i am not sure about

    column count bytes

    col2 offset bytes

    col5 offset bytes

    actual data length of col2 entered

    and actual data length of col5 entered

    then if you fill out col5 but not col2 you get

    total length of fixed items

    header

    piece i am not sure about

    column count bytes

    col2 offset bytes

    col5 offset bytes

    col2 length = 0 bytes (no data no bytes)

    and actual data length of col5 entered

    So keeping that in mind if you really want to pinch space out make sure your column order is most used variable length column first to least used. And as a gotcha say you add a NON-NULL variable length column to the end. Your column count will include any NULLable that have not been field. So NON-NULL should be before nullable if you are byte conscience.

    Hopefully that makes sense and sheds some light on under the hood.

  • OK Antares

    "Ok when the data is stored and looked at in hex at the page level there are a few things that occurr."

    Do you have any special tools used to view the MDF at the physical level ?

    Is there any documentation on the physical structure ?

    I'm just curious, I used to know IMS structures (yea I'm an old guy).

    KlK, MCSE


    KlK

  • DBCC PAGE is what I use. Then just a lot of time figuring the data out.

  • If its just backup space that concerns you, you should take a look at both SQLZip and SQL Litespeed, both advertise here on the site. You can really cut the amount of space needed for backup. Steve Jones has a review of Litespeed, I reviewed SQLZip a while back.

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • Antares, thanks for sharing your research.

    >> In other words you have 6 bytes extra plus the data.

    If I'm reading you right, 2 of the 6 bytes is in the column header, 2 is is for one column, and 2 is for the other column.

    So if I have one nvarchar in the table, I would add 4 bytes to the table, plus whatever is stored in the columns. Right?

    >> If its just backup space that concerns you

    Andy, backup space does concern us, but it is broader than that. We're using MSDE as a object repository for a C++/C# front end. The software will be distributed primarily on stand-alone machines all over the world. We don't have the ability to control the hardware it runs on, and much of the hardware will be old. We try to get every ounce of performance we can.

    Steve



    Steve Miller

  • Actually it is an extra 4 bytes for the row header. Plus 2 bytes for the variable length column count in the row. Plus with a single nvarchar 2 bytes for the offset for the end of the data for the column. Plus 1 byte for an end of record or something that seems to always be 00 that I have not figured out.

    So if you have 1 nvarchar column and an int identity field (int is 4 bytes) in the table it would be

    4+4+2+2+1+(lengthofdatainnvarcharfield*2)

    OR

    RecordHeader+INTLen+ColCount+Col1Offset+(lengthofdatainnvarcharfield*2)

    If NVARCHAR field is null you would have

    4+4+1

    Or

    RecordHeader+INTLen+EndRec

    due to no non-null variable length column and the header would note it as such.

    Hope that makes sense.

    Edited by - antares686 on 01/21/2003 4:04:51 PM

  • I think it makes sense.

    Bottom line for me is this: those records that have a null value in the nvarchar column will not eat any storage. That's what I wanted, and I'm happy.

    Thanks,

    Steve



    Steve Miller

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply