Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««123

Very large table - performance issues Expand / Collapse
Author
Message
Posted Thursday, May 23, 2013 10:22 AM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Friday, October 31, 2014 10:27 AM
Points: 708, Visits: 3,293
Eric M Russell (5/22/2013)
Abu Dina (5/21/2013)
We have a tall table that contains 2.6 billion rows

Table structure:



The application which uses this table has been running slow for the last couple of days and it seems to have happened following the addition of about 400 million rows last weekend.

I think it's because of the index fragmentation although I'm not sure how to check if this the case without affecting the application?

So my first question is, how do I check to see if the indexes are fragmented and whether the stats need updating on such a large table?


I notice that your clustered index is on IX_dType. Why was that column chosen to cluster the table. Especially for tables with a large number of rows, you typically want to cluster on a column with unique sequential values. There could very well be fragmentation.


I don't know, I didn't create the table but now I'm looking after it!


---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Post #1456083
Posted Thursday, May 23, 2013 7:34 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 2:14 PM
Points: 35,770, Visits: 32,435
Abu Dina (5/21/2013)
We have a tall table that contains 2.6 billion rows

Table structure:



Gosh. A couple of people have mentioned a couple of things seriously wrong with this already but let me summarize some of the things I see wrong even if some of them are repeats.

1. 2.6 Billion rows using an INT for a PK. Most people don't have the forsight to start such a column at -2,147,483,648 and that's about the only way this table could have 2.6 Billion rows in it. If you're going to keep adding 400,000 rows on a regular basis, now would be a good time to convert the PK column (DID) to a BIGINT (heh... while the table is still "small" )

2. The Clustered Index is on the DTYPE column. I suspect there are relatively few unique values that go into that column. If you added 400,000 rows, you will likely have caused some huge page splits throughout the table which means that you have to read a lot more pages than you probably need to because many of the pages might only be half full due to the page splits. To wit, I strongly agree that the clustered index is on the wrong column. I don't know if your DID column is an IDENTITY column (or similar in operation) or not or whether the Created column appears in many queries but, if they are, I'd be tempted to make a UNIQUE Clustered Index using the Created and DID columns and in that order. Every index will benefit from the uniqueness of the clustered index.

The rule of thumb for a clustered index is "Narrow, ever increasing, and unique". If you build it correctly, you should almost never have to defrag the clustered index.

It would also be highly beneficial if the Created column was based on GETDATE() just to keep the "ever increasing" thing going.

3. I don't know if the table is partitioned or not. For the sake of easier index maintenance, you should strongly consider it if it's not. It'll take some planning and maybe some code changes if the DID column is an IDENTITY column, but it will absolutely be worth it insofar as index maintenance goes. With a bit more planning, you can make also make it so you only need to backup a very small portion of the table and you'll be able to suddenly do that on a nightly basis with Point-in-Time recovery. It'll also allow you (if you use a partitioned view across 2 databases) to restore the more recent parts of the table separately incase of a DR event to get the system back online much more quickly than trying to restore a 2.6 Billion row table in one fell swoop.

Also, keep in mind that partitioning isn't done for performance. The only thing that partitioning has to do with performance is 1) it'll make index management a whole lot faster so it could be done every night if necessary and 2) properly managed indexes will help performance. Depending on which method of partioning you chose, it may also reduce backup times, DR restore times, and tape space.

4. I can't tell from your graphic, but it would also be a good thing to find out if there are any triggers on the table. Those won't hurt SELECTs but they could really put the whammy on INSERTs, UPDATEs, and DELETES.

5. Consider archiving some of that data in the table. Do you REALLY need all 2.6 Billion rows to be accessable on an instantaneous basis? Even if you do, using partioned views will allow you to move a great deal of the data out of the main database and into an archive database. Again, you have to plan for this carefully especially if you want to do INSERTs via the partitioned view.

Of course, this is all just an opinion on my part and isn't complete (they're suggestions to look into, not complete solutions). I just wanted to give you some things to look at that I've had pretty good success with in the past.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1456275
Posted Tuesday, May 28, 2013 10:20 AM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Friday, October 31, 2014 10:27 AM
Points: 708, Visits: 3,293
Jeff, thanks so much for your ideas and suggestions. I've spent a few days on this problem and the table has been restructured and a new indexing strategy has been implemented.

I'm glad to report that performance has significantly improved.

Unfortunately I'm a little busy at the moment so I can't go into too much detail about what I've done but I will keep this thread alive somehow and will post back with more detail about this. Watch the space!

Once again thank you and thanks to all who have contributed to this thread. I couldn't have done this without you guys!


---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Post #1457433
« Prev Topic | Next Topic »

Add to briefcase «««123

Permissions Expand / Collapse