Updating null value increases size

  • I have a table with 1 lac rows with id and bit columns. The size of the table is 13.523MB.

    If I add the 10 int column to this table without any data it will not increase size of the table.

    But when i update the int col with "NULL" value it increases the size to 189.227 MB

    UPDATE test SET int2 = NULL

    Why update query increasing the size ?

    Some of the test cases listed below with size:

    table - id col (auto increment) - 10 lac rows , bit col - add data 13.523MB
    add 5 varchar(8000) cols - no data 13.523MB
    add 5 varchar(max) cols - no data 13.523MB
    add 2 date cols - no data 13.523MB
    add 2 xml cols - no data 13.523MB
    add 10 int cols - no data 13.523MB
    add some data - 10 rows - all cols 13.531 MB
    add 2 decimal(15,2) cols - no data 13.531 MB
    add 1 cahr(10) col - no data 13.531 MB
    UPDATE test SET int2 = NULL 189.227 MBdelete from [test] 0.016 MB
    INSERT INTO [test11].[dbo].[sfitest]([bitcol]) VALUES (1) 9.766 MB
    delete from [test] 0.008 MB
    INSERT INTO [test11].[dbo].[test] ([bitcol],[varcol1],[varcol2],[varcol3],[varcol4],[varcol5],[varcol6],[varcol7],[varcol8],[varcol9],[varcol10],[datecol],[datecol2],[xmcol1],[xmlcol2],[int1],[int2],[int3],[int4],[int5],[int6],[int7],[int8],[int9],[int10],[d1],[d2],[int11],[char1],[varcol11]) VALUES (1 ,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL ) 9.766 MB
  • I created a script that replicates this behavior but I do not have a good "reason" why this behavior exists, just a theory.  Sample script:

    DROP TABLE IF EXISTS test1
    GO
    CREATE TABLE test1 (id INT IDENTITY(0,1), UserValue BIT)
    GO
    EXEC sp_spaceused @objname = N'test1'
    GO
    INSERT INTO test1 (uservalue) VALUES (1)
    GO 1000
    EXEC sp_spaceused @objname = N'test1'
    GO
    ALTER TABLE test1 ADD newcolumn CHAR(10)
    GO
    EXEC sp_spaceused @objname = N'test1'
    GO
    UPDATE test1
    SET UserValue = NULL
    WHERE 1=1
    GO
    EXEC sp_spaceused @objname = N'test1'
    GO

    Script was tested on SQL Server 2016.

    I tried changing the UPDATE from NULL to DEFAULT and got the same result which leads me to believe it is unrelated to what value is going into the field.

    Doing the test above without adding the column results in no change in the data size.  I repeated the test above with a variable length column size (VARCHAR) and data size didn't change.  This leads me to believe it is unrelated to changing a value, that is more of a side effect.

    Based on the above tests what I believe is happening is NOT that updating int2 (in your case) to NULL causing problems BUT the CHAR(10) field existing but not using any space on the table prior to updating the rows.

    Something that is interesting to note - if you have a HEAP (which I do), using the above example, I have 48 KB of data used at the end of the query and 16 KB unused with 72 KB reserved.  If I rebuild the heap, the data drops down to 24 KB.  If I skip the UPDATE step, but rebuild my heap after adding the columns, I get the same result as doing the UPDATE and REBUILD.  This leads me to believe that the behavior is more of a metadata behavior issue.

    What I expect is happening under the hood (I am not a SQL Server internals expert, so the following is all speculation) is SQL sees the columns being added with NULL being allowed to be in the column so there is no data changes.  Since no data is changed (only metadata), it sees no need to update things on disk at this time.  Likely it is doing this for performance reasons.  Once the data is modified, it needs to persist everything in there to disk as the change on the row MAY result in any of the columns having their data changed.  All of the variable length columns would use 1 additional byte of space (for the NULL marker) and all fixed length columns would use up their allocated disk space requirements (such as nullable CHAR(10) using 11 (ish) bytes per row).

    Basically, I expect that adding a nullable column doesn't impact a row until the row is changed even if the change in the row results in no data changes (such as rebuilding an index/heap).

  • I am not an internals expert, but I believe that if a table has any nullable columns it always stores a 'null bitmap' for each row to indicate the null values. So if you have a table with no nullable columns, and then you add nullable columns, the table will grow due to the creation of the null bitmap.

    Also what is "1 lac"?

  • William, that was my thought as well.  But as you can see with my test scenario, adding nullable columns doesn't seem to affect table size until a fixed-width column is added (such as CHAR(10)).  And even then, the size isn't impacted until data is added or changed in the table (heap in my test case) OR the heap is rebuilt.

    Also, 1 lac is 100,000.  I had to google that to figure it out.  I was not familiar with the term.

  • You can read about this behavior in the book

    "Pro SQL Server Internals" by Dmitri Korotkevitch, Chapter 1

    Furthermore if the majority of the rows will contain NULL values, the sparse column feature could be advantageous (https://www.sqlshack.com/optimize-null-values-storage-consumption-using-sql-server-sparse-columns/)

  • I don't think that the server is creating new column to an existing table as sparse columns.     Creating column as sparse columns has a big effect on the storage and there are specific scenarios that it will be beneficial.  In most scenarios it will not be beneficial, so I don't think that Microsoft will create sparse columns as default behavior when we are adding new columns to an existing tables.

    From what I know when you add a new column that is nullable or has a default, the column will not be added to each row.  Instead it will be added to the table's metadata as  a runtime constant.  The data will be moved to the row when the row is updated (even if it is updated with the same data that exists in the runtime constant or the column is not referenced by the update statement).   It will also be moved to the table if you rebuild the table.   You can read more about it in this link - https://docs.microsoft.com/en-us/sql/t-sql/statements/alter-table-transact-sql?view=sql-server-ver15 (under the section "Adding not null columns as an online operation").

    Adi

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply