Growing tables : same number of records

  • I have a SQL Server 2000, running on win2000 server. (fully patched).  I have a process that adds 120K to 500K records to a table through out the day.  I have a job that deletes records once a day that are older than 45 days.  My records seem to have leveled off (daily reports), but by data files keep growing.  I have auto shrink on.

    In Oracle, I would export he data, drop the table (only 1 table in database) colesece the tablespace, import the data and it would reclaim all  unused space in the table and indexes.  Is there an equivalant process?

     

    Thanks

    Joseph

  • Do you hav a clustered index on the table ?

    If yes, then you can rebuild that index and should solve the problem.

    If not create a clustered index.

     

  • Does SQL Server reuse the space in delete rows in the table? in indexes?

    I will check free space before and after the rebuild.

     

     

  • Rebuild still running after 40 minutes.

    Is there a %fill theory out there?  High percent fill with low % changes in a table and lower % fill with greater amount of changes?

  • SQL Server reuses space if data is removed.

    A couple things to check. First are you sure the data file is growing and not the log file? Some people use the total space as a check and not the data file. Adding or deleting data causes the log file to grow. You should be backing up the log in addition to the data files.

    The rebuild can take time depending on data size, disk type/#, etc.

    The idea behind fill is that you want the most data per page (high fill), but balanced with the amount of updates you'll make per page to prevent page splits (low fill). If you are fairly static within a time period, then a high fill works well. If you have lots of changes occuring, you might want a lower fill.

    Note, changes can be new data, which will get inserted in clustered index order, so this can cause page splits.

  • Thanks.

    The logs are backed up 4 times a day.  They stay relatively small.  On a 16gig db, they are only a couple hundred meg to 500meg total (not part of the 16 gig).

    I am running the dbreindex and it is till running, the table in question has 2 indexes and contains approx 14,000,000 rows.  The data changes are inserts through out the day, upto 500K / day and 1 daily delete.

     

    I will post the results when it finishes.  Thanks for the information.

     

    Joseph

  • DBREINDEX is useful for regular maintenance, but as you're finding out it can be a very long process for a large table that is heavily fragmented.

    The equivalent to the Oracle process you mentioned is to use the BCP utility to export the table contents to a flat file, truncate the table, and then use BCP to reload it.  It will run fastest if executed directly on the server, and if the BCP file is on a separate physical drive from the SQL data and log files.

    Generate CREATE scripts for nonclustered indexes and foreign keys

    Run a full backup

    BCP "SELECT * FROM <table> ORDER BY <clustered index fields>" QUERYOUT <file> -n -T

    ALTER DATABASE <db> SET RECOVERY SIMPLE

    Drop foreign keys and nonclustered indexes

    Truncate the table

    BCP <table> IN <file> -n -T -h"TABLOCK, ORDER(<clustered index fields> ), ROWS_PER_BATCH=5000"

    Recreate foreign keys and nonclustered indexes

    ALTER DATABASE <db> SET RECOVERY FULL

    Run another full backup

    There are some other BCP switches to consider.  If there is an identity field in the table add "-E" to the import command to use the input value, otherwise new identity values are generated.  If any field with a default constraint has NULL values, and you want them to stay that way, add "-k" to the import command.  BCP option letters are case-sensitive.

    This process defragments a table, but it doesn't do anything about file fragmentation caused by file growth and autoshrink.  If you have the time, you could stop SQL Server and defragment the drives the data files are on.

  • Thanks, I will try that.  My dbreindex cause problems with the sql server instance and I had to cancel it.  I tried running it over night (no end users), and it failed.  I will try this method after hours.

     

    Thanks,

    Joseph

  • Not sure if this may apply to you, but there is a bug that prevents space from being released after a delete if the table doesn't have a clustered index.  This bug exists in SQL Server 7.0, 2000, and 2005 (I'm not sure why this never gets fixed):

    http://support.microsoft.com/kb/913399/en-us

     

     

  • It doesn't get fixed because it isn't a bug, it's the way a heap table must be managed.  The page in Books Online for DELETE spells it out quite clearly, if you want the pages of a heap table to be released by a DELETE you must use the TABLOCK hint.  Empty space in heap tables is reused by INSERTS, so if you don't lock the table then another thread may have uncommitted inserted rows in the free pages.

    It is also worth mentioning that DBCC DBREINDEX is a waste of time for a table with no clustered index because the rows have no defined order.  These are two of the reasons why every non-trivial table should have a clustered index.

  • It's not always possible to use TRUNCATE since you may not want to remove all rows, and it's not always feasible to use a TABLOCK hint due to blocking concerns.  If not a bug, it's definitely a product design flaw.

    In fact SQL 2000 Books Online states this:

    "Both DELETE and TRUNCATE TABLE make the space occupied by the deleted rows available for the storage of new data."  Oops...I guess not...at least this is updated in SQL 2K5 BOL.

    I do agree that every table should have a clustered index for these and other reasons.

     

  • The space occupied by deleted rows is available for storage of new data in the same table when new rows are inserted, so technically the documentation is correct, although misleading.  This "bug" has been corrected in the SQL 2005 documentation which has a "Deleting Rows from a Heap" section in the Remarks for the main DELETE (T-SQL) topic that spells out the issue.  There were some ancient database systems that only flagged rows as deleted without freeing them, no space was reclaimed or reused until a maintenance table rebuild was performed.  So saying the space from deleted rows is available for reuse is saying something even if the action doesn't live up to our expectations.

    "It's not always feasible to use a TABLOCK hint due to blocking concerns" - exactly my point.  If the appllication developer can't use TABLOCK because they know there may be concurrent activity, how is the system supposed to know when it is safe to release the pages?  If the system can't lock the table, it can't release any pages because another thread may have already scanned the IAM and decided to insert rows into those pages.  Any multithreaded system has to either be able to lock shared resources during a change, or avoid making changes that will corrupt other threads.  The only way to make it safe to release the pages of a heap would be to introduce another level of locking, possibly in the PFS scan, that would cause an unacceptable performance hit in all other operations.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply