RE: Tables, Primary Keys and Clustered Indexes

SSC Guru

Points: 1003863

June 25, 2016 at 9:31 am

henrik staun poulsen (6/24/2016)
We have several tables above the 10 billion rows mark. Luckily we have the Enterprise Edition, so we have been able to partition the tables.
That helps maintainability, it does not automatically improve the speed of your code.

First of all... THANK YOU for pointing out that partitioning isn't a performance panacea. Way too many people have been mislead to thinking that it is and don't understand that it can actually slow things down a bit because each partition essentially has it's own B-Tree to traverse. Hat's off to you, good Sir!

On the subject of Partitioned Tables, I've learned to loath them (seriously... what the hell was Microsoft thinking when they wrote those?). As you've correctly pointed out, they require the "Expensive Edition" 😛 and that can get quite expensive, indeed. The reason that I prefer the Enterprise Edition is only because of the "online" capabilities for certain index maintenance such a rebuilds (although I've intentionally NOT done any index maintenance on my big databases since 18 Jan 2016 and have suffered no performance problems, but that's a whole 'nuther discussion).

The reason why I loath Partitioned Tables is because of some major usage faults that a lot of people haven't had to consider or just put up with. For example, if you have some very large temporally-based (by month) audit tables (they necessarily take up 75% of a 600GB database we have), it would be nice to restore, say, only the last 3 months of those tables when doing a "refresh" of Development databases from Prod. That can't actually be done in a straight forward manner using restores even if, like our tables, each month is in a separate file in a separate file group. Oh sure, it'll let you restore the required file groups but you can't actually "RECOVER" the database until ALL of the file groups have been restored. That also means you can't do backups, etc, etc, and, for me, backups in Dev are just as important and backups in Prod. And, no, you can't change the Partition Schema or Partition Function to "ignore" the non-loaded filegroups until you've loaded the very file groups you want to ignore and not load.

Then there's also the subject of data use. The newer partitions would perform better (especially for inserts) if they were indexed one way and while the older partitions will never be inserted into ever again and would perform better if they were indexed a different way. The only way to do that with Partitioned Tables is to have non-aligned indexes and that defeats nifty functionality such as SWITCH and other things.

Then there's the idea of doing a real restore for DR. Yeah... it's handy that Partitioned Tables will allow the database to be used while you're loading older, non-critical filegroups to quickly "get back in business" BUT... there's that nasty little problem of not being able to do backups until ALL the filegroups have been restored. For our larger systems, that could actually take an entire day, which is way too long for us to go without Point-in-Time backups.

It would take too long to explain here but I'm prepping our systems to convert our Partitioned Tables to Partitioned Views. There are methods unique to Partitioned Views that will suffer none of the previously mentioned disadvantages of Partitioned Tables while bringing advantages not possible in Partitioned Tables to bear.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)