Table growing excessively when updating column

  • I have got an issue that is driving me nuts.

    There is table in an Azure database containing some 8.800.000 rows . This table is created and loaded in a batch process loading data from an exernal source file. Once the loading is done, the table has a size of roughly 3 Gb. 
    Now, the next step in the batch process is dropping a column from the table, next adding a column to the table, and finally doing an update of that column. Sometimes, but not all the time, after this update the table has grown in size to 300 Gb. 
    Now, the column added and updates is not a varchar column: it is a decimal (13,0) column, so I cannot comprehend why the table should grow to 100 times it original size. Has anyone had the same issue, or does anyone have a theory why this might happen?

  • That's.... a lot.

    From an update I can imagine a scenario where you get quite a few page splits and end up with the size growing 2-4 times. A 100 times growth suggests something else is up. I only have questions. How are you measuring the data size of the table?  Are there indexes involved? Have you looked at the fragmentation of the indexes and/or the table (if it's a heap)?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Regardless of what is causing the issue have you considered not updating the table, but rather do the insert into a new table where you add the column on the fly, with the correct value and also dropping the column that is not needed?

    Single transaction or even in batches if required and it probably performs better and won't increase as much.

  • Is there maybe a persisted calculated field based that column?

  • Grant Fritchey - Monday, October 15, 2018 12:27 PM

    That's.... a lot.

    From an update I can imagine a scenario where you get quite a few page splits and end up with the size growing 2-4 times. A 100 times growth suggests something else is up. I only have questions. How are you measuring the data size of the table?  Are there indexes involved? Have you looked at the fragmentation of the indexes and/or the table (if it's a heap)?

    The updates are being done in a batch.  It's VERY possible that you can get "skewed" page splits where many parts of the table drop to nearly trivial page densities (like 10% or less).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • And stop adding and deleting columns.  Populate the table with the "new" column in place and, if it's variable width, make sure that you default it with SPACE(X) where "X" is a number that represents what the max width of 80% of the rows will be once populated.  If it's a MAX LOB, get that the hell out of the table using sp_TableOption to force all LOBs out of row.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • gerben.scherpenzeel - Monday, October 15, 2018 8:25 AM

    I have got an issue that is driving me nuts.

    There is table in an Azure database containing some 8.800.000 rows . This table is created and loaded in a batch process loading data from an exernal source file. Once the loading is done, the table has a size of roughly 3 Gb. 
    Now, the next step in the batch process is dropping a column from the table, next adding a column to the table, and finally doing an update of that column. Sometimes, but not all the time, after this update the table has grown in size to 300 Gb. 
    Now, the column added and updates is not a varchar column: it is a decimal (13,0) column, so I cannot comprehend why the table should grow to 100 times it original size. Has anyone had the same issue, or does anyone have a theory why this might happen?

    Adding the column causes the rows to grow when they're populated because the "new" column is just a place holder until populated when you add a column like this.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Grant Fritchey - Monday, October 15, 2018 12:27 PM

    That's.... a lot.

    From an update I can imagine a scenario where you get quite a few page splits and end up with the size growing 2-4 times. A 100 times growth suggests something else is up. I only have questions. How are you measuring the data size of the table?  Are there indexes involved? Have you looked at the fragmentation of the indexes and/or the table (if it's a heap)?

    Hai Grant,

    yes, that is a lot 🙂 . There are indexes involved, but I separate the index and the table when measuring size. So the size is really only the table. 
    To measure the table size, I use:

    select s.name as schema_name, o.name as object_name, sum(reserved_page_count) * 8.0 / 1024 / 1024 as SizeInGB
    from sys.dm_db_partition_stats ps
    join sys.objects o on ps.object_id = o.object_id
    join sys.schemas s on o.schema_id = s.schema_id
    group by s.name, o.name
    order by SizeInGB desc

    Yes, the table is a heap, but I did not yet look at index fragmentation; that might be a good one the next time the problem surfaces (did I mention that it does not happen all the time? )

  • frederico_fonseca - Monday, October 15, 2018 12:35 PM

    Regardless of what is causing the issue have you considered not updating the table, but rather do the insert into a new table where you add the column on the fly, with the correct value and also dropping the column that is not needed?

    Single transaction or even in batches if required and it probably performs better and won't increase as much.

    Hai Frederico ,

    yes, I have already developed another way of handling the data and suggested this solution to the customer. However, he wants the know the root cause, so he can be certain that the solution I suggested will indeed prevent the sudden growth from happening again....

    and to be honest: I am curious as well what causes this behaviour...

  • ZZartin - Monday, October 15, 2018 12:42 PM

    Is there maybe a persisted calculated field based that column?

    Interesting suggestion, but no....

  • Jeff Moden - Monday, October 15, 2018 2:48 PM

    And stop adding and deleting columns.  Populate the table with the "new" column in place and, if it's variable width, make sure that you default it with SPACE(X) where "X" is a number that represents what the max width of 80% of the rows will be once populated.  If it's a MAX LOB, get that the hell out of the table using sp_TableOption to force all LOBs out of row.

    Hai Jeff,

    I agree, deleting and adding the column is nuts, and I suggested as much to the customer. However, it is not a variable width column: it is a decimal (13,0) column. 

    And what is puzzling me is that is does not happen all the time....

  • gerben.scherpenzeel - Thursday, October 18, 2018 2:34 AM

    Jeff Moden - Monday, October 15, 2018 2:48 PM

    And stop adding and deleting columns.  Populate the table with the "new" column in place and, if it's variable width, make sure that you default it with SPACE(X) where "X" is a number that represents what the max width of 80% of the rows will be once populated.  If it's a MAX LOB, get that the hell out of the table using sp_TableOption to force all LOBs out of row.

    Hai Jeff,

    I agree, deleting and adding the column is nuts, and I suggested as much to the customer. However, it is not a variable width column: it is a decimal (13,0) column. 

    And what is puzzling me is that is does not happen all the time....

    Still, if it's a newly allocated column to the table, the space for the column isn't actually allocated at the page level until you start to populate it with data, causing massive page splits.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply