July 26, 2011 at 7:39 am
We have a database with 3 main tables representing 98% of the database. Those 3 tables have 1,311,289,428 rows and will continue to grow each month. Right now the tables are partitioned on company number into 6 partitions. We have hundreds of customers that we run the reports on. The primary key is maccid(identity), Company#, Call year, call month, call day. And that is clustered.
We are in the process of moving to emc virtual machine that will allow for the growth we need. We have to keep 5 years worth of data for reporting purposes. We are looking at changing the partitions but are having issues with what the key should be.
I would like to see Call Year, Call Month, Company number and create a partition for each month. It would help with backups, restores, index maintenance. Some months could be marked read only since they would not be getting new data in. However another DBA wants it to be Call Year, Call Month, Company number and create a partition but really divide partitions based on the company number. He feels query performance would be affected unless we keep a years worth of data all together in one partition. All reports run based on company number and some sort of date range. If we spread the partition by month and a report needs to cover 6 months of partitions will performance take a huge hit versus hitting just the one partition that holds 12 month of data for that company?
Any help setting this up or reference to setting up the correct partitioning key would be appreciated.
Thanks Grace
July 26, 2011 at 9:50 am
You should get better performance if you partition by month rather than by year, because the query will only cover the partitions (and thus data) it needs to, so 6 months rather than 12.
If you use different file groups, and better still disks, you'll have some advantage of improved i/o as well, though I'm not sure how well SQL Server handles the parallel access to all the disks.
July 26, 2011 at 9:58 am
If it was by company though then 1 whole company would be in 1 partition and contain a whole calendar year of data in that partition. But if by month then you would have to hit more partitions depending on range.
Do you know what the impact is the more partitions you have to get data from?
July 26, 2011 at 10:09 am
Partitioning is presented more as an admin tool than a performance enhancer, but it definitely improves range queries, because you scan only the range you need. Covering one partition as opposed to many in a query ought not to make any difference, because it would be the same data, except for the year, it's twice as much.
It also depends on your indexes, and how they are aligned with the data, because that will also determine how the optimiser goes about it in the first place. I have found this to be one of those, "it depends - try it and see" issues!
I've found this white paper very helpful, even tho' I'm on 2008, as it gives a good explanation of how the whole thing works: http://msdn.microsoft.com/en-us/library/ms345146%28v=sql.90%29.aspx#sql2k5parti_topic11
Viewing 4 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply