Indexing a large table

  • Hello, on my development server I have a table that has over 200 million records.

    CREATE TABLE [dbo].[P2_SYS_HistoryNotes](

    [IDX] [int] NOT NULL PRIMARY KEY CLUSTERED,

    [MatterNum] [varchar](10) NULL,

    [NoteType] [varchar](10) NULL,

    [CreateDate] [datetime] NULL DEFAULT (GETDATE()),

    ) ON [PRIMARY]

    I tried to add an index

    CREATE NONCLUSTERED INDEX IX_P2_SYS_HistoryNotes_NoteType

    ON [dbo].[P2_SYS_HistoryNotes] ([NoteType])

    INCLUDE ([MatterNum],[CreateDate])

    Which ran for over 15 hours before I stopped it. We generally do not have large tables but this one is for auditing. (Client auditors requested). Anyway I am not use to working with this large of table.

    Is it normal to take that many hours to index a large table?

    Thank you,

    djj

  • 200 million rows isn't really that big and that's a long time for an index to run. I suspect you may have been experiencing blocking. The thing to do would be to understand why it's either not running at all or running so long. You could use sys.dm_exec_requests to monitor the creation as it runs to see if it's being blocked or if it's waiting on resources.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thank you for the reply.

    I did not think it was too big, but as I say I do not have experience with large amount of data.

    As to blocking, I doubt there was a lot as the table was one I just created and am the only one using it. However, I will check during the next try.

  • Then you might be running into resource issues, not enough memory, possibly contention in tempdb, the log file was growing over and over? Something along those lines.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Do you really search this table by NoteType without specifying a CreateDate range? Could you specify CreateDate range?

    This table should be clustered on CreateDate, not on identity; this is a typical dopey "default" clustering.

    Unfortunately, it could take quite a while to make that change. If you have the disk space to duplicate this table temporarily, you can reduce the downtime dramatically:

    1) Create a new table with identical columns:

    CREATE TABLE [dbo].[P2_SYS_HistoryNotes__New](

    [IDX] [int] NOT NULL,

    [MatterNum] [varchar](10) NULL,

    [NoteType] [varchar](10) NULL,

    [CreateDate] [datetime] NULL DEFAULT (GETDATE())

    ) ON [PRIMARY]

    CREATE CLUSTERED INDEX P2_SYS_HistoryNotes__CL ON dbo.P2_SYS_HistoryNotes ( CreateDate ) WITH ( FILLFACTOR = 100 ) ON [PRIMARY]

    /* if you need the PK for lookup */

    ALTER TABLE P2_SYS_HistoryNotes

    ADD CONSTRAINT P2_SYS_HistoryNotes__PK

    PRIMARY KEY NONCLUSTERED ( IDX ) WITH ( FILLFACTOR = 100 ) ON [PRIMARY]

    2) Capture the max IDX value for the current table.

    DECLARE @IDX int

    SELECT @IDX = MAX(IDX) FROM P2_SYS_HistoryNotes

    3) Copy rows from the existing table into the new table, but, for now, ending at the high IDX, to (help) not interfere with current activity:

    SET IDENTITY_INSERT 'dbo.P2_SYS_HistoryNotes' ON

    INSERT INTO P2_SYS_HistoryNotes__CL ( [IDX], [MatterNum], [NoteType], [CreateDate] )

    SELECT [IDX], [MatterNum], [NoteType], [CreateDate]

    FROM dbo.P2_SYS_HistoryNotes

    WHERE IDX <= @IDX

    4) When that's done, take exclusive control of the original table within a transaction that:

    A) copies rows > @IDX into the new table

    B) renames the current table to __OLD (for example)

    C) renames the __NEW table to the current table name

    D) releases any exclusive lock(s) on table(s)

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Unfortunately the date is along for the ride in this case. Normally the MatterNum and Date are the important information to the auditors, however in this case we want all data for a set of MatterNum. Of course this is driven off another table which has the client.

    Scott, I will look into doing as you suggest. How long should the copy of data take? Notice despite my requests we have limited resources on the machines. It is the old "we can save money", which is followed by "why is the system so slow?".

    Thanks for the information.

    P.S. a rebuild of the index I originally tried to build finished in under 9 minutes.

  • That is not a big table, regardless of the 200 million rows. It's size isn't bigger than 5GB. You'd better do the copy in batches of let's say 1 million rows per batch, looping it of course.

    Maintenance of it should not be more than 1 hour on presumably not a powerful server with about 4-8 cores and 16-32GB RAM.

    Igor Micev,My blog: www.igormicev.com

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply