huge free spaces in data file can make query slower?

  • Dear All,

    I need some info for the following 2 questions. Any idea please?
    1. Keeping huge free spaces in the data file can make my query performance slower?
    2. If I stop DB shrink function, how to de-allocate the allocated spaces which is occurred after delete/truncate functions? SQL server DB can remove itself?

    Currently we are planning to add more spaces to our DB data files. According to all the suggestions from SQL server DB experts, I decided to increase my DB file size manually instead of auto growth every time. Now I become confusing with the above 2 questions. Any idea or suggestions please? I couldn't find those 2 answer in any where. My DB is used for OLAP. So query performance is crucial for us.

    Thanks in advance,
    Cool

  • 1) No. It doesn't work like that. The only issue possible is if, let's say, you created a giant table, then added data to other tables, and then deleted the giant table. This would result in some of your data being scattered a bit across the disk. Assuming this is a one-off situation, this is where a shrink of the database helps things. However, if it's a standard operating procedure, creating a giant table, delete it, repeat over & over again, shrinking is a very bad issue. Assuming this case (and only this one or very similar), I'd suggest adding a file (or filegroup) just for the giant table. That way you don't have to worry about the scatter effects (which really will only hurt scans anyway).

    2) I don't generally recommend stopping a shrink. Let it finish. If there is cleanup necessary when you do stop it, the OS & SQL Server will handle it.

    Otherwise, as you add data to a file, it gets added at the beginning of the file. It doesn't really matter that there is empty space. It's empty. It doesn't hurt you.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey - Thursday, November 1, 2018 5:15 AM

    1) No. It doesn't work like that. The only issue possible is if, let's say, you created a giant table, then added data to other tables, and then deleted the giant table. This would result in some of your data being scattered a bit across the disk. Assuming this is a one-off situation, this is where a shrink of the database helps things. However, if it's a standard operating procedure, creating a giant table, delete it, repeat over & over again, shrinking is a very bad issue. Assuming this case (and only this one or very similar), I'd suggest adding a file (or filegroup) just for the giant table. That way you don't have to worry about the scatter effects (which really will only hurt scans anyway).

    2) I don't generally recommend stopping a shrink. Let it finish. If there is cleanup necessary when you do stop it, the OS & SQL Server will handle it.

    Otherwise, as you add data to a file, it gets added at the beginning of the file. It doesn't really matter that there is empty space. It's empty. It doesn't hurt you.

    Dear Grant Fritchey,
    Thank you so much for your reply and advices.
    I would like to give brief description about my DB environment.
    Our DB is a Data warehouse. We have monthly and adhoc loading only. There is no daily loading. Sometime we need to reload the data. That time we deleted the whole month data and re-insert for that client. There are 50 over clients. Monthly record count for each client is around 10k max. Total monthly increment around 600 MB. We have weekly database maintenance as follows. 
    - ShrinkDB with truncate only
    - index rebuilding, Index reorg, update statistic.
    - Our DB growthsize is still default value (1MB).

    According to google, I am thinking to remove ShrinkDB from weekly schedule since it can lead a lot of disk fragmentation due to DB autogrowth and ShrinkDB together.
    Instead I am thinking to add 10GB to my DB data files manually every year.
    My question here is:
    1. Since the ShrinkDB remove the free spaces from Data files, my added free spaces will be removed also. Am I correct?
    2) If we do not schedule for ShrinkDB, is there any way to de-allocate those free spaces after delete the rows?
    3) Currently we are using Shrink DB with Truncate only. So if the spaces are not at the end of the file, but in the middle of the data rows, will it be remove by Shrink DB with Trucate Only function? If not, SQL server can remove it automatically after some time? If I am wrong, there is. Right?
    4. Is there any function which can replace ShrikDB function?

    Thank you & Best Regards,
    Cool

  • 1) Yes.
    2) Not really. Rebuilding indexes, probably something you'd want to do after data loads, will defragment the data storage internally, moving everything to one area of the file. That is the process you'd want.
    3) Nothing removes or changes the file size except shrink. As I said, rebuild will move where the data is stored (depending on what gets rebuilt, when, etc.).
    4) Nope.

    Shrink is fine for one time events. It's really bad for what you're trying to do, shrink and shrink and shrink. It absolutely will lead to all sorts of performance problems, storage and management issues, all the stuff I hope you've read about. There is no need for most systems to shrink the database over and over. If there's not enough disk space, get more disks. Shrinking repeatedly is just an indication of mismanagement, not good resource use. Set the database to an appropriate size and leave it there. Also, the 1mb growth is going to lead to all sorts of issues as well. There's nothing wrong with automated growth, but set it to a reasonable size for your actual database. Otherwise, you have lots and lots of file extents which are also going to cause similar problems to those from repeatedly shrinking your database. In short, your current system is the worst possible set of choices.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • It's worth bearing in mind that the shrinking currently is probably only harming your database. You delete a month of data and then reload it - that reloading will be perfectly happy to use the free space within the file, unless there is enormous variation each time then any space that is "free" is unlikely to stay that way for very long anyway and you're just forcing SQL Server to spend more time re-allocating the space you just reclaimed (even before all the other problems caused by shrinkdb)

  • Cool2018 - Thursday, November 1, 2018 9:18 AM

    - ShrinkDB with truncate only
    - index rebuilding, Index reorg, update statistic.
    - Our DB growthsize is still default value (1MB).

    Since this is OLAP/Data Warehouse - remove the step for index rebuild/reorg/update statistics.  As part of your load processes you should rebuild those indexes anyways...one way of performing this is to:

    1) Disable non-clustered indexes on the destination table
    2) Load the data
    3) Rebuild all indexes (which updates statistics with full scan)

    There is no reason to schedule index maintenance on tables loaded once a month - they should be rebuilt after loading the data and left alone until the next load since no changes will be made to the table until the next load.  Rebuilding/Reorganizing indexes is just wasting processing time and IO time for no benefit...

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • There IS a form of freespace that WILL cause your code to run slower and also consume more memory.  If you do a bunch of deletes but it doesn't actually delete all the data from pages, then you can end up with a very low page density which wastes sometimes tons of memory.  And, it's possible that you can have this waste with virtually zero logical fragmentation.

    If your index maintenance routines aren't looking at avg_page_space_used_in_percent, you may be missing out.  Heh... even if you do, you may still be missing out because it's an average.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Grant Fritchey - Thursday, November 1, 2018 10:15 AM

    1) Yes.
    2) Not really. Rebuilding indexes, probably something you'd want to do after data loads, will defragment the data storage internally, moving everything to one area of the file. That is the process you'd want.
    3) Nothing removes or changes the file size except shrink. As I said, rebuild will move where the data is stored (depending on what gets rebuilt, when, etc.).
    4) Nope.

    Shrink is fine for one time events. It's really bad for what you're trying to do, shrink and shrink and shrink. It absolutely will lead to all sorts of performance problems, storage and management issues, all the stuff I hope you've read about. There is no need for most systems to shrink the database over and over. If there's not enough disk space, get more disks. Shrinking repeatedly is just an indication of mismanagement, not good resource use. Set the database to an appropriate size and leave it there. Also, the 1mb growth is going to lead to all sorts of issues as well. There's nothing wrong with automated growth, but set it to a reasonable size for your actual database. Otherwise, you have lots and lots of file extents which are also going to cause similar problems to those from repeatedly shrinking your database. In short, your current system is the worst possible set of choices.

    Dear Grant,

    Highly appreciated and thank you so much for your answer and advices.
    Actually we have a lot of Hard Disk spaces and server is dedicated server for SQL server DB. We use shrink because we thought it will make smaller the DB and improve the performance. Also we would like to remove free spaces for those deleted rows so to defrag.
    Now I decided to remove the weekly shrinkDB task. I will set the autogrowth size to 1GB and add 10 GB free spaces to data file every 1 year.
    I have few more questions here.
    1. if we add 10GB to our data file, file extent will be 10GB page or still 8kb pages but spaces until 10 GB? 
    Because you said currently we keep 1mb autogrowth, so a lot of file extent. Is it ok to explain me a bit more?  

    2. Currently our DB is partitioned by year. Partition can make our query performance slower since it need to go another places to take the data? Our query is retrieving the whole table based on roles. Role is in another table. So currently our table has 9mil + rows with 300+ col. When we have 9 roles, there are 9*9=81mil + rows in the result set. By using partition, there are 20 partitions since we have 20 years of data in the table. When we run query, it need to extract from 20 partition places. So we are thinking our query slow performance is due to DB partition.

    3. After I remove weekly shrink function, any idea when and how many times should we use DBShrink function?  If we do shrink function once a year, do you think it is good idar? Can there be a lot of garbage space inside the file if we do once a year? But I think it can resolve by rebuild index. Am I right?

    Thank you so much & Best Regards,
    Cool

  • andycadley - Thursday, November 1, 2018 1:33 PM

    It's worth bearing in mind that the shrinking currently is probably only harming your database. You delete a month of data and then reload it - that reloading will be perfectly happy to use the free space within the file, unless there is enormous variation each time then any space that is "free" is unlikely to stay that way for very long anyway and you're just forcing SQL Server to spend more time re-allocating the space you just reclaimed (even before all the other problems caused by shrinkdb)

    Dear Andycadley,

    Thank you so much for your advice.
    1. According to your explanation, I do understand as when we do reloading, insert record will reuse the spaces which made by delete function. Am I right?
    2. May I know what do you mean this "unless there is enormous variation each time then any space that is "free" is unlikely to stay that way for very long anyway"?

    Thank you & Best Regards,
    Cool

  • Jeffrey Williams 3188 - Thursday, November 1, 2018 4:08 PM

    Cool2018 - Thursday, November 1, 2018 9:18 AM

    - ShrinkDB with truncate only
    - index rebuilding, Index reorg, update statistic.
    - Our DB growthsize is still default value (1MB).

    Since this is OLAP/Data Warehouse - remove the step for index rebuild/reorg/update statistics.  As part of your load processes you should rebuild those indexes anyways...one way of performing this is to:

    1) Disable non-clustered indexes on the destination table
    2) Load the data
    3) Rebuild all indexes (which updates statistics with full scan)

    There is no reason to schedule index maintenance on tables loaded once a month - they should be rebuilt after loading the data and left alone until the next load since no changes will be made to the table until the next load.  Rebuilding/Reorganizing indexes is just wasting processing time and IO time for no benefit...

    Dear Jeffrey,

    Noted with thanks. Really a good idea.
    Highly appreciated and thank you for your advice.
    Currently our fact tables are using column stored index.
    We do disable and enable in our update job only. We haven't do it for insert jobs.
    I will take your advice.

    Thank you so much & Best Regards,
    Cool

  • Jeff Moden - Thursday, November 1, 2018 5:12 PM

    There IS a form of freespace that WILL cause your code to run slower and also consume more memory.  If you do a bunch of deletes but it doesn't actually delete all the data from pages, then you can end up with a very low page density which wastes sometimes tons of memory.  And, it's possible that you can have this waste with virtually zero logical fragmentation.

    If your index maintenance routines aren't looking at avg_page_space_used_in_percent, you may be missing out.  Heh... even if you do, you may still be missing out because it's an average.

    Dear Jeff,

    Thank you so much for your advice.
    I am sorry I have below 3 questions. Would you please share your answer?
    1. To make actual delete is to use commit function. Am I right? But currently our sp did not have commit statement. We delete and insert. So it is not actual delete? If we don't commit, when the SQL server will really delete? Should we add commit in our script but our DB is currently auto commit.
    2. How to looking at and resolve avg_page_space_used_in_percent during our index maintenance routine job?
    3. You are right. Currently our SQL server DB memory usage very high. It reached almost 100%. Does it really due to the spaces? But our table has 9mil + rows with 300+ columns and we run by role based. So when we have 9 roles, our data extraction has 9*9= 81 mil +. Currently our DB has partition by year. Partition can make our query slower?

    Thank you & Best Regards,
    Cool

  • Cool2018 - Thursday, November 1, 2018 8:17 PM

    andycadley - Thursday, November 1, 2018 1:33 PM

    It's worth bearing in mind that the shrinking currently is probably only harming your database. You delete a month of data and then reload it - that reloading will be perfectly happy to use the free space within the file, unless there is enormous variation each time then any space that is "free" is unlikely to stay that way for very long anyway and you're just forcing SQL Server to spend more time re-allocating the space you just reclaimed (even before all the other problems caused by shrinkdb)

    Dear Andycadley,

    Thank you so much for your advice.
    1. According to your explanation, I do understand as when we do reloading, insert record will reuse the spaces which made by delete function. Am I right?
    2. May I know what do you mean this "unless there is enormous variation each time then any space that is "free" is unlikely to stay that way for very long anyway"?

    Thank you & Best Regards,
    Cool

    1) Yes, if there is empty space in a file because you deleted something, SQL Server will re-use that space before it starts growing the file further.
    2) What I mean is, if each customer has roughly the same amount of data on a monthly basis, then the space released by deleting the current months data is going to be approximately the same as the space required to store the next months. So, for example, if a customer has 10MB one month, then over subsequent months 9MB, 11MB and back to 10MB - you are going to have an 11MB data storage requirement, it might be a little empty some months but it will fill back up again the next. Shrinking the database can only really make a noticeable difference if the amount of data you have to store in total is considerably less over time (which is almost always not true).

    Ordinary usage patterns tend towards either a reasonably consistent overall data size or one that grows steadily over time and therefore the only time you can reclaim a notable amount of storage is if you have deleted a significant proportion of the data you hold and aren't going to be replacing it - which is why shrinking, if done at all, should be a one-off process in response to a very specific need.

  • Cool2018 - Thursday, November 1, 2018 8:39 PM

    I am sorry I have below 3 questions. Would you please share your answer?

    1. To make actual delete is to use commit function. Am I right? But currently our sp did not have commit statement. We delete and insert. So it is not actual delete? If we don't commit, when the SQL server will really delete? Should we add commit in our script but our DB is currently auto commit. 

    SQL Server only requires a COMMIT statement if a transaction is in effect, either as the result of a BEGIN TRANSACTION or because IMPLICIT_TRANSACTIONS has been set to ON (don't do this, it's a terrible idea) directly or via setting ANSI_DEFAULTS to ON.

    Usually a DELETE (or any other SQL statement) takes immediate effect and the results are committed when the statement completes, assuming it does not error.

  • andycadley - Thursday, November 1, 2018 10:03 PM

    Cool2018 - Thursday, November 1, 2018 8:17 PM

    andycadley - Thursday, November 1, 2018 1:33 PM

    It's worth bearing in mind that the shrinking currently is probably only harming your database. You delete a month of data and then reload it - that reloading will be perfectly happy to use the free space within the file, unless there is enormous variation each time then any space that is "free" is unlikely to stay that way for very long anyway and you're just forcing SQL Server to spend more time re-allocating the space you just reclaimed (even before all the other problems caused by shrinkdb)

    Dear Andycadley,

    Thank you so much for your advice.
    1. According to your explanation, I do understand as when we do reloading, insert record will reuse the spaces which made by delete function. Am I right?
    2. May I know what do you mean this "unless there is enormous variation each time then any space that is "free" is unlikely to stay that way for very long anyway"?

    Thank you & Best Regards,
    Cool

    1) Yes, if there is empty space in a file because you deleted something, SQL Server will re-use that space before it starts growing the file further.
    2) What I mean is, if each customer has roughly the same amount of data on a monthly basis, then the space released by deleting the current months data is going to be approximately the same as the space required to store the next months. So, for example, if a customer has 10MB one month, then over subsequent months 9MB, 11MB and back to 10MB - you are going to have an 11MB data storage requirement, it might be a little empty some months but it will fill back up again the next. Shrinking the database can only really make a noticeable difference if the amount of data you have to store in total is considerably less over time (which is almost always not true).

    Ordinary usage patterns tend towards either a reasonably consistent overall data size or one that grows steadily over time and therefore the only time you can reclaim a notable amount of storage is if you have deleted a significant proportion of the data you hold and aren't going to be replacing it - which is why shrinking, if done at all, should be a one-off process in response to a very specific need.

    Dear Andycadley,

    Highly appreciated and thank you so much for your explanation and advices.
    1) I just got some idea. If the space was refilled by SQL server, the page order will not be in order. For example: Jan, Apr, Dec, Feb, etc. Instead of Jan, Feb, Mar, Apr, etc. Can it also be lead it performance issue? But we use index and it is not an issue. Am I right to say that?  
    If we use column stored index, can it be handle the same way as other index? Any idea?
    2) After your explanation, I recalled that the data page sizes are all equal. So if we add less data in one and more in another that will lead to spaces in the page in less data page and affect the performance. Am I right? If yes, any idea how can we handle it?
    3) Currently we will have one new requirement which need to truncate the whole table and then reinsert the rows to get latest update. For this case, inserted rows will be filled to the space which produced by the truncate. Right? So I do not need DB Shrink although the space size is big.
    I am sorry if I ask you stupid questions. I have very limited knowledge on this.

    Thank you so much & Best Regards,

  • andycadley - Thursday, November 1, 2018 10:12 PM

    Cool2018 - Thursday, November 1, 2018 8:39 PM

    I am sorry I have below 3 questions. Would you please share your answer?

    1. To make actual delete is to use commit function. Am I right? But currently our sp did not have commit statement. We delete and insert. So it is not actual delete? If we don't commit, when the SQL server will really delete? Should we add commit in our script but our DB is currently auto commit. 

    SQL Server only requires a COMMIT statement if a transaction is in effect, either as the result of a BEGIN TRANSACTION or because IMPLICIT_TRANSACTIONS has been set to ON (don't do this, it's a terrible idea) directly or via setting ANSI_DEFAULTS to ON.

    Usually a DELETE (or any other SQL statement) takes immediate effect and the results are committed when the statement completes, assuming it does not error.

    Dear Andycadley,

    Thank you so much for your explanation.
    I got one question here. When we do the commit, committed rows will be removed from transaction log. Am I correct?

    Thank you so much & Best Regards,
    Cool

Viewing 15 posts - 1 through 15 (of 30 total)

You must be logged in to reply to this topic. Login to reply