Is there such a thing as database file bloating?

  • Hello,

    I'm still fairly new to SQL Server.  We have a production database at our company that has grown from about 40gb to 100gb over the course of about five years.  What I want to better understand is if the data file can become "bloated" with nonexistent data, or if the file size is an accurate reflection of how large the database needs to be.  From what I understand, a DBA would almost never shrink a DB file, but are there other bits of maintenance that one might do to decrease the size of this file or remove unnecessary bloat?

    Thank you for any assistance!

  • That is a bit of a loaded question, as it can depend on what types of tables and indexes you have in this database.  At a very high level you can see total vs "used" space in the files with a query like this run within the db in question:

    SELECT DB_NAME() AS database_name, f.file_id, f.name AS file_name, f.type_desc,
    f.physical_name, f.size/128 AS size_MB,
    FILEPROPERTY(f.name, 'SpaceUsed')/128 AS used_MB,
    CASE WHEN f.is_percent_growth = 1 THEN f.growth ELSE f.growth / 128 END AS growth,
    f.is_percent_growth
    FROM sys.database_files f;

    The used value here is what is allocated to objects within the database.  To see more details you can use this:

    SELECT t.object_id, s.name AS schema_name, t.name AS table_name, i.name AS index_name, i.type_desc, ps.size_MB, ps.used_page_count
    FROM sys.indexes i
    INNER JOIN sys.tables t ON i.object_id = t.object_id
    INNER JOIN sys.schemas s on t.schema_id = s.schema_id
    LEFT OUTER JOIN (SELECT object_id, index_id, SUM(used_page_count) / 128 AS size_MB, SUM(used_page_count) AS used_page_count FROM sys.dm_db_partition_stats GROUP BY object_id, index_id) ps ON i.object_id = ps.object_id AND i.index_id = ps.index_id
    WHERE i.object_id > 100
    ORDER BY ps.size_mb DESC;

    From there, you can look at the function sys.dm_db_index_physical_stats to see how much of those used pages are actually used.

    https://www.red-gate.com/simple-talk/blogs/quick-look-at-dm_db_index_physical_stats/

     

  • This was removed by the editor as SPAM

  • chris is correct

    this is complex (mostly on the fact that we don't know your environment)

    normally we see bloat in the log file (which is easy to fix)

    but bloat in data files can be a lack of index optimisation (empty pages etc) - or also deletes from Heaps where the pages are not released.

    MVDBA

  • If you do a large scale data purge, you may also see a very empty database file.

    Shrinking a file is not the end of the world. If you have, let's say given your pages the wrong free space, so that you have a lot of emptiness, or you've done a purge, a one-time shrink is fine. Even three or four shrinks over a two or three year period is not a big deal. What happens is, people start shrinking weekly or daily. That is where huge problems come up. Don't be scared of shrinks, just use them in a highly judicious manner.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • tarr94 wrote:

    Hello,

    I'm still fairly new to SQL Server.  We have a production database at our company that has grown from about 40gb to 100gb over the course of about five years.  What I want to better understand is if the data file can become "bloated" with nonexistent data, or if the file size is an accurate reflection of how large the database needs to be.  From what I understand, a DBA would almost never shrink a DB file, but are there other bits of maintenance that one might do to decrease the size of this file or remove unnecessary bloat?

    Thank you for any assistance!

    There are a whole bunch of things that can consume file space in SQL Server.  Some of it will be filled.  Some of it will be empty.  A lot of it is totally unnecessary.

    Lets first start with indexes.  If your not doing index maintenance, you can quite easily be wasting 50% of both your disk space and memory just due to the bad form of page splits.  If you're doing index maintenance incorrectly (which, oddly enough, is advertised as "Best Practices"), the same or worse can be true.  And, no matter what happens, if you do a REBUILD on larger indexes (anything over 128 extents, which is only 8MB), you could have quite a bit of unwanted/unnecessary free space in your data files because an index rebuild makes a full copy of the index before dropping the original.  If you're using REORGANIZE on larger indexes, your log files can explode to more than 155% of the size of the largest index (it was a helluva surprise to me when a 146GB clustered index with only 12% fragmentation grew the log file to 227GB from only 20GB).

    Another huge waste of space are work tables that were supposed to be temporary but that no one has deleted.  If you check sys.dm_db_index_usage_stats for tables that haven't been written to or read from in a month, you can develop a good list of suspects.  Never just drop these tables.  Just rename them by adding a suffix of "_ToBeDeleted" to them and wait a couple of months before you delete them.  Renaming tables will change the "modified" date for the table and you can key off of that..  We use a "scratch" schema in our databases where we work and have an automatic "sweeper" that renames any such table after a week of existence and automatically drops them a week after that.

    As if that's not bad enough, Microsoft decided that they'd help us with performance in version 2016.  They made it so that "insert bulk" (not to be confused with "BULK INSERT"), which a lot of front ends use to add data to tables, would automatically reserve (IIRC) 4 extents up use and without checking to see if there were any extents that were already allocated and had free space.  It also doesn't matter how many rows are being inserted.  If it's just one row at a time, the 4 extents (32KB) are being consumed and most of the space will be "unused".   The WhatsUp database in our system went kinda of nuts... it contained only 8GB of data but had 47GB of "unused" space.  Don't confuse that with the much more benign "Unallocated Space", which can be used by any table or index.  "Unused space" is assigned to indexes and cannot be used by anything else except that one index.

    If you see databases with a large amount of "Unused" space, that's probably the issue and you're going to first need to enable Trace Flag 692 and then rebuild your indexes to recover the space as "Unallocated".

    You'll also find various audit and history tables in any given database.  As single such table can grow to be as larger than the rest of the entire database.  You need to ferret those out and have people identify when you can archive the data out of those tables.

    tarr94 wrote:

    I'm still fairly new to SQL Server.

    Welcome to the data equivalent of "Ripley's Believe It or Not". 😀

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • tarr94 wrote:

    Welcome to the data equivalent of "Ripley's Believe It or Not". 😀

    I think of it more like "stranger things" , but I still get  visions of aliens bursting out of my chest every time I see a query plan that sql management studio can't handle and it locks up management studio. for an hour.

    tarr94 - you have jeff, grant and chris on board,  three of the best in our trade...(i'm just a minion) .. give us a bit more to work with

    MVDBA

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply