Deleting Large Volumes of records

  • I have a logging table that has 130+million rows in it and it is taking up over 11gb in disk space between the data and indexes.

    Ideally I want to reduce the database size and improve my index performance by deleting obsolete data over a year old, which would leave me with about 20m records.

    If I delete the redundant data, my understanding is that my overall database size won't reduce unless I shrink the database, which I want to avoid. Is that correct?

    What happens to the re-allocated space then? Is it made available across the database for re-use or is it only used for growth on my original table?

  • Today's ssc headlines include an article by Hugo Cornelis on partitioning logging tables to enable fast deletes. It's a step by step how-to guide and covers partitioning an existing table. Right up your street.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • If this is something like an audit log that's accumulates GB of data and is only occasionally queried, then you want to consider inserting these records into an external file, rather than containing them in a database table. Database storage is a lot more expensive and problematic to manage than the file system, and this would keep your backups a lot smaller.

    Below is a T-SQL solution that uses openrowset and the OLEDB provider for text files. In this example, the file sysobjects.csv should pre-exist in a folder on the database server or a network folder that is accessible by the SQL service account. The first row should contain the header, in this case "A,B". Also, to use openrowset, you need to have the server option 'Ad Hoc Distributed Queries' enabled.

    insert into openrowset

    (

    'Microsoft.Jet.OLEDB.4.0',

    'Text;Database=C:\TEMP\;HDR=Yes;',

    'select A, B from sysobjects.csv'

    )

    select object_id, schema_id

    from sys.objects

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply