RE: Very large table - performance issues

SSC Guru

Points: 1003863

May 23, 2013 at 7:34 pm

Abu Dina (5/21/2013)
We have a tall table that contains 2.6 billion rows
Table structure:

Gosh. A couple of people have mentioned a couple of things seriously wrong with this already but let me summarize some of the things I see wrong even if some of them are repeats.

1. 2.6 Billion rows using an INT for a PK. Most people don't have the forsight to start such a column at [font="Arial Black"]-[/font]2,147,483,648 and that's about the only way this table could have 2.6 Billion rows in it. If you're going to keep adding 400,000 rows on a regular basis, now would be a good time to convert the PK column (DID) to a BIGINT (heh... while the table is still "small" 😛 )

2. The Clustered Index is on the DTYPE column. I suspect there are relatively few unique values that go into that column. If you added 400,000 rows, you will likely have caused some huge page splits throughout the table which means that you have to read a lot more pages than you probably need to because many of the pages might only be half full due to the page splits. To wit, I strongly agree that the clustered index is on the wrong column. I don't know if your DID column is an IDENTITY column (or similar in operation) or not or whether the Created column appears in many queries but, if they are, I'd be tempted to make a UNIQUE Clustered Index using the Created and DID columns and in that order. Every index will benefit from the uniqueness of the clustered index.

The rule of thumb for a clustered index is "Narrow, ever increasing, and unique". If you build it correctly, you should almost never have to defrag the clustered index.

It would also be highly beneficial if the Created column was based on GETDATE() just to keep the "ever increasing" thing going.

3. I don't know if the table is partitioned or not. For the sake of easier index maintenance, you should strongly consider it if it's not. It'll take some planning and maybe some code changes if the DID column is an IDENTITY column, but it will absolutely be worth it insofar as index maintenance goes. With a bit more planning, you can make also make it so you only need to backup a very small portion of the table and you'll be able to suddenly do that on a nightly basis with Point-in-Time recovery. It'll also allow you (if you use a partitioned view across 2 databases) to restore the more recent parts of the table separately incase of a DR event to get the system back online much more quickly than trying to restore a 2.6 Billion row table in one fell swoop.

Also, keep in mind that partitioning isn't done for performance. The only thing that partitioning has to do with performance is 1) it'll make index management a whole lot faster so it could be done every night if necessary and 2) properly managed indexes will help performance. Depending on which method of partioning you chose, it may also reduce backup times, DR restore times, and tape space.

4. I can't tell from your graphic, but it would also be a good thing to find out if there are any triggers on the table. Those won't hurt SELECTs but they could really put the whammy on INSERTs, UPDATEs, and DELETES.

5. Consider archiving some of that data in the table. Do you REALLY need all 2.6 Billion rows to be accessable on an instantaneous basis? Even if you do, using partioned views will allow you to move a great deal of the data out of the main database and into an archive database. Again, you have to plan for this carefully especially if you want to do INSERTs via the partitioned view.

Of course, this is all just an opinion on my part and isn't complete (they're suggestions to look into, not complete solutions). I just wanted to give you some things to look at that I've had pretty good success with in the past.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)