|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Saturday, May 10, 2008 9:35 AM
Points: 21,
Visits: 99
|
|
I'm wondering if there is really a benefit to splitting a db into multiple files on smaller db's? If I'm running 16-way 64 bit Win2003/Sql2005 Enterprise on a serious workhorse, do I really need to split a 200Meg db into 16 files?
At what size db do I split the db into a one-db-file-per-processor-core setup?
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Wednesday, July 01, 2009 9:20 AM
Points: 1,323,
Visits: 940
|
|
Doesn't make sense to split any database into multiple files for per-cpu performance, except tempdb which can suffer from multiple-CPUs all trying to modify the same allocation bitmaps under high load with small temp tables being created/deleted (see http://www.microsoft.com/technet/prodtechnol/sql/2005/workingwithtempdb.mspx for details).
Now, saying that, there's an exception - and that's when a non-tempdb database has the same allocation bitmap contention issues, but that only happens on the very high-end with thousands of single-row inserts per second happening on each CPU. It's pretty rare. I've never seen it but Kimberly has.
What you should be considering is splitting a database into filegroups to be able to isolate critical tables within separate filegroups, or splitting a partitioned table across multiple filegroups. For SS2005, these aren't for performance but instead for manageability (e.g. being able to restore a single filegroup offline while the rest of the database remains online, or setting several partitions of a table to be read-only). For SS2008, there are improvements to query plan generation over multiple partitions that could allow for better parallelism, but its still not optimal compared to non-partitioned data.
So - complex topic and these are simple guidelines. Hope they help.
Paul Randal Managing Director, www.SQLskills.com Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine Author of DBCC CHECKDB/repair for SQL Server 2005
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: 2 days ago @ 6:49 PM
Points: 1,747,
Visits: 1,273
|
|
It's not about the size of the database. It's usually about the number of physical disks you have (or SAN LUNs) available.
Splitting a DB into multiple files doesn't make sense if they all get stuck on the same LUN or physical disk. The idea behind multiple files is to separate out tables, etc. to get better I/O performance. You can't get better I/O if all your files are using the same disk controllers.
The big caveat to the above statement is when you have a bunch of read only tables that never get written to. You can use a separate Read Only filegroup and do Read/Write file backups more often than Read Only filegroup backups and save backup space & time. But I haven't used this feature because we don't have a database that has Read Only tables.
Regardless, it isn't about size. It's about performance, partitioning, and backups. If you don't have a good reason to set up multiple files for your DB, don't do it. You'll just create maintenance headaches for yourself if there's no tradeoff.
Brandie Tarvin, MCITP Database Administrator, MCDBA, MCSA
Webpage: http://www.BrandieTarvin.net LiveJournal Blog: http://brandietarvin.livejournal.com/ Now a member of LinkedIn!
Contributing Author: Transformers: Legends, Pirates of the Blue Kingdoms, Blue Kingdoms: Shades & Specters
|
|
|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Saturday, May 10, 2008 9:35 AM
Points: 21,
Visits: 99
|
|
Ok, now I'm really stumped. Microsoft has setting up to one data file per cpu in their top 10 best practices. (# 8) http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/storage-top-10.mspx
I have also read in one of Dell's papers that it is recommended to set one virtual disk per processor core as well, and put each data file in its own separate drive letter. I'm assuming that the OS will equally and automatically span each virtual disk and it's data file across the physical drives?
I certainly do not have a db taking in thousands of inserts per second, but at peak times it is definately doing a hundred+. I'm running a 4x quad core with the data & indexes housed on a Raid 1+0 having 6 spindels (12 total). I appreciate the info and will be reading up on putting heavy tables into separate filegroups, but is there a reason not to follow MS best practices on this?
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: 2 days ago @ 6:49 PM
Points: 1,747,
Visits: 1,273
|
|
Keep in mind that one data file per CPU does you little good unless you've got separate disk controllers accessing the data. You might notice a difference due to processor threads being able to access the files at different times, but at a certain point, you have to combine good I/O practices with your CPU set up before you see any solid returns on performance.
If you read carefully the first several bullet points on the link you suppied, you'll see that Microsoft is also talking I/O (that is multiple disks) as the first consideration, then the separate files for TEMPDB (specifically) per CPU as a secondary consideration. You have to combine the two methods for it to work efficiently and to be worth the extra maintenance headaches.
And a virtual disk is not the same as a physical disk or LUN. You can have 1 hard drive divided into multiple virtual disks and still only have one disk controller handling the I/O on all files, which dings your performance (My opinion only!) more than it helps it.
Does this clarify matters? Or did I just confuse you more?
Brandie Tarvin, MCITP Database Administrator, MCDBA, MCSA
Webpage: http://www.BrandieTarvin.net LiveJournal Blog: http://brandietarvin.livejournal.com/ Now a member of LinkedIn!
Contributing Author: Transformers: Legends, Pirates of the Blue Kingdoms, Blue Kingdoms: Shades & Specters
|
|
|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Saturday, May 10, 2008 9:35 AM
Points: 21,
Visits: 99
|
|
I think what confuses me the most is why it is recommended to put one file per core on TEMPDB knowing that 9 times out of 10 (I'm guessing) it will be housed on a Raid 1, but saying it makes little sense to do it on data files that have a greater chance of being Raid 5 or 1+0. If the TEMPDB generally has one spindle and the Data more than three, why would multiple files benefit the one but not the other? I have my TEMPDB on it's own controller (away from data files) with a 4-disk Raid 1+0.
That being said, my server does have dual external controllers going into the drive shelf. This being the case, should I follow both Microsoft & Dell's recomendation and configure it as one virtual disk and one data file per core? Here is the Dell paper on the virtual disk config: http://www.dell.com/downloads/global/power/ps4q07-20070555-Chen.pdf
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: 2 days ago @ 6:49 PM
Points: 1,747,
Visits: 1,273
|
|
Tom,
It's not the number of disks involved in the RAID per say. It's the way the files are divided.
Think of it this way.
In RAID 1, you have two drives. Each drive has its own full file. Each disk is separate and self-contained. You lose one drive, no big deal, you still have the whole file on the second drive.
In Raid 0, you have two or more drives, but a single file is split (or striped) into the multiple drives (we'll say 2). So half the file is on one drive and the other half is on the other drive. Which means you still only have 1 file with a pointer between the two halves telling SQL Server where the rest of the data is. When SQL Server searches the striped data file, it starts at one point and moves through the file (regardless of what disk the current section is) in a logical and orderly manner. It does not search all sections of the striped file simultaneously because it's following the pointers. (Someone correct me if I'm wrong about this).
Raid 1+0 is a combination of the above two RAID methods.
Raid 5 is three or more disks with the file split (or striped) on all the drives. You still have only one file total and each drive contains a little bit of parity/checksum stuff for redundancy.
The key here is, even if you have multiple files in a RAID 5 or RAID 1+0, they will all be on the same sets of drives. Since each disk only has (usually) one controller, you can't search multiple files at the same time. The controller will only search for your data one file at a time then go onto the next file. It's terribly inefficient.
Now if you have multiple RAID 5 sets and stick a file on each different set, that's a different story. But this particular hardware config is not something I've seen IRL and you'd have to be pretty rich to afford a different RAID 5 / RAID 1+0 setup for each data file.
Does that help?
Brandie Tarvin, MCITP Database Administrator, MCDBA, MCSA
Webpage: http://www.BrandieTarvin.net LiveJournal Blog: http://brandietarvin.livejournal.com/ Now a member of LinkedIn!
Contributing Author: Transformers: Legends, Pirates of the Blue Kingdoms, Blue Kingdoms: Shades & Specters
|
|
|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Saturday, May 10, 2008 9:35 AM
Points: 21,
Visits: 99
|
|
I understand how the various Raid's work, but what I still do not understand is why would I not want to follow MS best practice (as well as what Dell has recommended)? I have to assume that if the number of disk controllers is paramount to wether or not having a multi-file config makes sense or not, Microsoft would have stated this. They did not state this however. So, knowing that MS has said to create up to 1 data file per prosessor core, and Dell say's to do the very same thing (as well as one virtual disk per core), why wouldn't I want to do this? I'm not saying anyone is wrong, I'm just wondering why anyone would say not to follow MS best practices?
I have a 4x quad core (16 cores) server with dual controllers connected to a 12-drive (300G drives = 1.8TB) Raid 1+0 shelf. Why wouldn't I want to create 16 virtual disks and 16 data files for every filegroup, as both Microsoft & Dell recommend I do? That's really the question I'm trying to get answered. Both companies are telling me to do it, so why wouldn't I follow their instructions? Or am I totally misreading what they tell me?
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: 2 days ago @ 6:49 PM
Points: 1,747,
Visits: 1,273
|
|
You're misreading what they are saying. They are not saying "create a file for each CPU" only. They are saying to do it based on your disk configuration and how many disks you have.
When you read these white papers, you can't just concentrate on one line. You have to see the whole picture. Which, if you go back to that Microsoft link you provided, you'll see the picture MS is painting for you. As I said, they talk about I/O first and then the CPU issue for TempDB. These are not separate issues. They are related issues and both must be taken into consideration for the best practice to work effectively.
Brandie Tarvin, MCITP Database Administrator, MCDBA, MCSA
Webpage: http://www.BrandieTarvin.net LiveJournal Blog: http://brandietarvin.livejournal.com/ Now a member of LinkedIn!
Contributing Author: Transformers: Legends, Pirates of the Blue Kingdoms, Blue Kingdoms: Shades & Specters
|
|
|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Saturday, May 10, 2008 9:35 AM
Points: 21,
Visits: 99
|
|
I respectfully disagree that they are saying "do it based on your disk configuration and how many disks you have". I'm not saying you are wrong, but I have to believe that if Microsoft was recommending we create one file per CPU core "based on the number of disk's", they would have specifically said that. Unless I'm missing something, I don't see anything that says it's conditional.
Here is another page from Microsoft that says the very same thing:
http://www.microsoft.com/technet/prodtechnol/sql/2005/physdbstor.mspx
Quote: "The number of data files within a single filegroup should equal to the number of CPU cores."
That bullet is not followed up with a asterisk saying it is conditional to number of disks.
Again, I'm not saying you are wrong, but everything I read from Microsoft indicates the opposite. Either that or their team forgot to point this out. Thoughts?
|
|
|
|