Is it considered good practice to have one database file per cpu?

  • CPU (I didn't know there was a length limit for subjects!)

    I came across this blog entry from a (former?) Microsoftie who suggests that it is a good practice to have one database file per CPU, so on my quad boxes, I should have four files per database? And would that apply to transaction logs also?

    I don't think I've seen this discussed previously, but I could have easily overlooked such a discussion.

    http://blogs.technet.com/mat_stephen/archive/2005/02/02/365325.aspx

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • If taken literally that is some really aweful adivce at best it's some extremely ambiguous advice.  Either way the advice is bad for its intended audience.

    Creating multiple files/file groups is a way of boosting SQL Server performance if you have hardware that is configured to allow SQL Server to take advantage of the additional files/file groups.  The simple act of creating multiple files/file groups does virtually nothing for performance and it certainly won't make any difference in performance in a multi-processor system UNLESS you somehow have a disk subsystem which is faster than your processer.  I've personally never seen a disk subsystem which could outperform an 8088 microprocessor much less a multiprocessor server filled with Pentiums.  If you are like most mortals your server is connected to a single disk subsystem which is far slower than a single processor in your PC.  If you are lucky your server is connected to multiple disk subsystems which are independent of one another and are connected to either independent controllers in your server or independent channels on the same controller in your server.  If you happen to have the later configuration then creating mutliple files/file groups across those independent disk subsystems can result in a significant performance gain if you place the right data on the right disks.

    SQL Server like any database spends much of it's time waiting for the disk subsystem to respond.  By creating mutliple files/file groups and spreading them across multiple independent disks you can eliminate some of that waiting.  For example, if SQL Server is asked to execute a complex query with mutiple table joins and mutliple criteria, it might be able to read both tables/indexes at the same time using different processors in a multiprocessor box IF the data/indexes are on independent disks.  If the data/indexes are on the same disk, SQL Server might attempt to use multiple processors but each processor will have to wait for the other processor while it is reading data because again, there's no such thing as a disk subsystem which is faster than a processor/RAM combination.

    I did notice the article was written in 2005 but that's no excuse for the poor advice.

  • Unfortunately the blog is inactive and frozen, so there's no easy way to ask him his reasoning directly. I agree, unless your disk system is optimized to take advantage of the distributed design, you might be taking a hit on the storage side. I think that my mind, for whatever reason it construed, was thinking that I could get more work out of my cpus with it; but you're right: you're going to be more disk-bound than cpu-bound. And I'm going to be seriously disk-bound in the near future: whoever configured my second biggest SQL box just gave it two drive partitions, so I have little choice except to have my data and logs on the same drive area.

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • I think you've missed the latter half of the equation, which is that SQL Server must also have a corresponding I/O path (e.g. raid array or HBA) to really take advantage of multiple CPU's and I/O paths.  Ideally you will have an equal number of CPU's/IO Paths/Files.  For example if you have 4 CPU's, 4 I/O paths (e.g. 4 RAID channels + associated disks) it would then make sense to break your database up into 4 files, one on each RAID channel.  In this circumstance SQL Server would spin up 4 threads for read/write, one per CPU/IO path and your throughput would go up dramatically.

    On the other hand a 4 CPU machine with only one I/O path can spend a great deal of time and disk I/O (aka "thrashing") "fighting" for resources resulting in lower performance than a single or dual CPU machine or SQL Server configured to use only 1 CPU would.  Filegroups or files without a direct I/O path and multiple CPU's can actually cause "worse" performance.

    Joe

     

  • Unfortunately my box has four drives, all in RAID-5 config, so I'm fighting from the get-go (it was that way when I got here). The case has two more slots, I'm hoping that I can get two more drives and mirror OS and transaction logs on them so that at least the RAID won't be churning itself to bits. I've got three production databases to move to this box, plus I'm probably going to start replication with this box as the distributor and receiver of a transaction-based system, so i/o load will definitely be going up.

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • There's a recomendation from MS to have 1 file per CPU for TempDB.

    I've never heard that suggestion applied to user databases and I don't think it's a good idea.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • The 1 file per CPU does apply for user database. But normally i dont follow this recommendation for maintenance consideration. Bear in mind you'll have to create all filegroups to have the same disk space allocation in your user DB.

    Having said so, I do have 1 file per CPU on few of my DB. I found the way SQL using up the filegroups space were pretty consistent so far.

    But I would make sure my server has enough disks array to support this filegroup structure. Not blindly creating filegroups/CPU.

    Simon

    Simon Liew
    Microsoft Certified Master: SQL Server 2008

  • The 1 file per CPU does apply for user database.

    Why? What's the logical reason behind it?

    For TempDB, at least on SQL 2000 it was due to the round-robin algorithm for temp table creation. Even then it was only really necessary if there was significant blocking on the SGAM page (2:1:3)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • mate,

    how sql fills-up user database is also round-robin. The reason why 1 filegroup per CPU is so that each CPU can have a thread to do a job at each filegroup at the same time. Its the same logic as why you would create filegroup/CPU is so that the CPU can work on a particular filegroup. But obviously, you'll need the hardware to improve the IO and not get stuck when the CPU are flogging the disks.

    FROM BOL

    Placing Tables on Filegroups

    A table can be created on a specific filegroup rather than the default filegroup. If the filegroup comprises multiple files spread across various physical disks, each with its own disk controller, then queries for data from the table will be spread across the disks, thereby improving performance. The same effect can be accomplished by creating a single file on a RAID (redundant array of independent disks) level 0, 1, or 5 device.

    If the computer has multiple processors, Microsoft® SQL Server™ 2000 can perform parallel scans of the data. Multiple parallel scans can be executed for a single table if the filegroup of the table contains multiple files. Whenever a table is accessed sequentially, a separate thread is created to read each file in parallel. For example, a full scan of a table created on a filegroup comprising of four files will use four separate threads to read the data in parallel. Therefore, creating more files per filegroup can help increase performance because a separate thread is used to scan each file in parallel. Similarly, when a query joins tables on different filegroups, each table can be read in parallel, thereby improving query performance.

    Additionally, any text, ntext, or image columns within a table can be created on a filegroup other than the one that contains the base table.

    Eventually, there is a saturation point when there are too many files and therefore too many parallel threads causing bottlenecks in the disk I/O subsystem. These bottlenecks can be identified by using Windows NT® Performance Monitor to monitor the PhysicalDisk object and Disk Queue Length counter. If the Disk Queue Length counter is greater than three, consider reducing the number of files. For more information, see Monitoring Disk Activity.

    It is advantageous to get as much data spread across as many physical drives as possible in order to improve throughput through parallel data access using multiple files. To spread data evenly across all disks, first set up hardware-based disk striping, and then use filegroups to spread data across multiple hardware stripe sets if needed.

    Simon Liew
    Microsoft Certified Master: SQL Server 2008

  • Sure on that?

    If you have multiple files in the primary filegroup, then the writes are proportional to the free space in the file, nothing to do with CPU affinity to data files. See excerpt from BoL

    Filegroups use a proportional fill strategy across all the files within each filegroup. As data is written to the filegroup, the database engine writes an amount proportional to the free space in the file to each file within the filegroup, instead of writing all the data to the first file until full. It then writes to the next file. For example, if file f1 has 100 MB free and file f2 has 200 MB free, one extent is allocated from file f1, two extents from file f2, and so on. In this way, both files become full at about the same time, and simple striping is achieved.

    As soon as all the files in a filegroup are full, the database engine automatically expands one file at a time in a round-robin manner to allow for more data, provided that the database is set to grow automatically. For example, a filegroup is made up of three files, all set to automatically grow. When space in all the files in the filegroup is exhausted, only the first file is expanded. When the first file is full and no more data can be written to the filegroup, the second file is expanded. When the second file is full and no more data can be written to the filegroup, the third file is expanded. If the third file becomes full and no more data can be written to the filegroup, the first file is expanded again, and so on.

    As for the CPU threads, read the following from the PSS engineers:

    http://blogs.msdn.com/psssql/archive/2007/02/21/sql-server-urban-legends-discussed.aspx

    SQL Server Urban Legends Discussed

    SQL Server Uses One Thread Per Data File...

    The Legend grew from the following SQL Server behavior -   “When SQL Server is creating a database file (.mdf, .ndf, .ldf) it writes zero’s to all bytes in the file and stamps the appropriate allocation structures.  SQL Server can use a thread per file when creating a database.”   This is a true statement but leaves room for interpretation so allow me to clarify with the following examples.

    <snipped for brevity>

    The Truth

    SQL Server uses asynchronous I/O allowing any worker to issue an I/O requests regardless of the number and size of the database files or what scheduler is involved. 

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail,

    First of you, i think you were right to say filegroups per CPU does not have performance gain. I think i'll have to agree with that now. Well, at least my user DBs are aligned properly in an IO context and the filegroups sizes are spread out.

    at anytime, i can clearly see my DBs are spans across disks in an uniform manner, therefore maximising the parellel work stream. I initially thought user DBs might get the same performance gain as tempdb, but apparently its not.

    Still, no of tempdb filegroups should be created as no of CPU, although the explaination of CPU per filegroup is not valid. The performance gain is due to different reason!

    thanks for the info mate.

    Cheers,

    Simon

    Tempdb

    The immediate question I get is – "What about all the KB articles on tempdb, a file for scheduler and such?"     This is again a misinterpretation of the legend as tempdb tuning is all about the allocation activity, not a number of threads per file (no such thing).   Workers use tempdb like any other database, any worker can issue I/O to and from tempdb as needed. 

    Tempdb is the database with the highest level of create and drop actions and under high stress the allocation pages, syscolumns and sysobjects can become bottlenecks.    SQL Server 2005 reduces contention with the 'cached temp table' feature and allocation contention skip ahead actions.

    When multiple workers are attempting to create or drop objects in tempdb you can decrease the bottleneck by having

    •         As many files in tempdb as schedulers => means no of filegroup = no of CPU

    •         All files sizes are equal

    •         Uniform allocations enabled (-T1118)

    By configuring in this way each new allocation occurs in round-robin fashion across the files and only on uniform extents.    SQL Server only allows a single worker to be active on each scheduler at any point in time so if every 'running' worker is attempting tempdb allocations they will logically partition to a separate file in tempdb, reducing the allocation contentions.

    Microsoft has seen that tempdb is an exception to the common rules here.   Tempdb can be a high contention point for internal tracking structures.   It is often better to create multiple files for tempdb so the internal latching and other activities achieve better separation (per file) so all the workers don't cause a resource contention on a single tempdb file.    This type of contention would be extremely rare to see on a user database and I have not seen that issue in a user database.

    Simon Liew
    Microsoft Certified Master: SQL Server 2008

  • I'm sorry but I just don't buy the whole idea of creating multiple files groups on a single disk.  Sustained transfer rates on the fastest DASD's like the latest/greatest SCSI drives have trouble approaching 250MB/second while doing sequential reads.  More importantly if you have a high volume environment there's not going to be that much sequential activity so you have to factor in disk head movement which is going to cause sustained transfer rates to drop to 10MB/second or 20MB/second.  There's no disk subsystem which can keep up with a multiple processor server so the bottleneck will always be the disk subsystem unless we are talking about a system which does incredibly complex calculations with the data being processed or a very poorly designed system.

    I can explain how having multiple file groups spread across independent disk subsystems buys you a performance advantage.  A terrific example is the case of a table which has a non-clustered index which the optimizer has chosen to use to access the data.  One thread could, theoretically, read from the index (located on the first disk subsystem) using the first processor in the server, while a second thread could read the data pages pointed to by the nodes in the non-clustered index (located on the second disk subsystem) using the second processor in the server.  If the disk subsystems are truly independent then SQL Server would gain a significant performance boost in that scenario and you would see true parallel processing.

    How exactly does having multiple file groups improve performance when all the threads are waiting on the same the disk subsystem to become available?

    I don't care how many drives you add to a raid array, there's a limitation to the amount of data you can transfer over the bus which connects the array to the server.  A 500 drive raid array is no faster than a 7 drive raid array (assuming all drives used can achieve a sustained transfer rate of 60 MB/second).  Attaching 500 drives to a single RAID array is a waste of spindles.  You would get far superior performance out of 70, 7 drive RAID arrays.

    Or am I missing something here?

  • Ed,

    Assuming you've got 2 physical CPU, have you tried to load test and put tempdb on RAID 1 + 0 on a SAN with its own spindle? The transfer rate is 2GB per way via fibre channel and the controller cache is normally 4GB. It can all go to cache before hits the disks, unless you've got really large transaction which is larger than 4GB/transaction, then this will cause a major issue. If a highly used system is badly designed, it would still cause an issue to the disks, CPU, memory, etc.

    If this doesnt convince you, you should try to run many concurrent query (aggregate, grouping, etc) on SQL with tempdb filegroup = no physical CPU on a RAID 1 + 0 on its own. There are "catch" and not simply creating more filegroups (E.g. making sure filegroups are of same size, adding flags to startup parameters, etc)

    As summary, the filegroup per CPU is for tempdb. User DB would benefit more from the disk layout than more filegroups.

    Simon

    Simon Liew
    Microsoft Certified Master: SQL Server 2008

  • I don't know if you are agreeing with me or disagreeing with me.  But it appears you are confusing giga-bits with giga-bytes or someone has developed a fiber channel card that is currently 800% faster than anything on the market today.  So I'll take this opportunity to discuss a couple of things. 

    Assuming the fastest fiber channel card can actually achieve a sustained throughput of 4Gbps (bps = bits per second not bytes per second) then at most you would want to have 9 disks in a single array in the SAN assuming each drive could achieve a sustained throughput of 60 MBps (Bps = bytes per second).  The theoretical maximum throughput for the fastest fiber channel today is 512 MBps so in reality you can count on getting a maximum of about 75% of that performance, 384 MBps, in sustained read/write scenarios.  I'm assuming about 25% of the bandwidth will be consumed by protocol overhead.

    Something else to remember in designing hardware is that the PCI and PCI-X buses have limitations as well.  A 64 bit 133MHZ PCI bus can achieve burst transfer rates of about 1GBps.  At some point your server's bus will become saturated so no matter how many arrays you attach to it you won't be able to achieve better performance.

    In my opinion, the ideal server has at least 4 independent disk subsystems attached to it all operating at burst transfer rates of about 240 MBps.  Better yet, the server has 6 disk subsystems attached to it operating at burst rates of about 240 MBps.  I'm of that opinion because the reality is that the enemy is more often disk head movement rather than sustained/sequential transfer rates so the better you can group like activities on a disk subsystem the better chance you have of avoiding having to move the disk head half way across the disc platter.

  • i go agree with your disk theory, and i'm not confused about Gbps and GB bytes. it sounded to me your point of view are from hardware/disks perspective and mine is from tuning sql. you must be wondering how on earth sql can give any improvement with limited disk transfer rate.

    The 4Gbps transfer rate is talking about to memory. yes, the transfer rate is still limited to memory <-> disk, but that's what memory are there for. it takes the heat from disks when things are getting done in memory. SAN has cache battery backup capability to flush uncommited work in a power outage. unless some genius design a disk that can go faster than memory, then maybe the dependancy on memory would be less. the theory that you're talking about are transaction <-> disks and assume all things are committed to disk. fyi, sql doesnt flush all cache to disks at all time. try issue DBCC MEMORYSTATUS and check-it out yourself. all your reads/writes are from/to MEMORY page and not always against disk whenever you need them. you read 100MB to memory once and sql doesnt fetch them again unless pages are dirty or had a memory pressure. if you had such tremendous load = put in more memory, h/w, etc and tune the DB accordingly. that's why memory is such an important thing for sql. disk plays a part in sql but it doesnt mean if you've got a limitation in disk head mean you cant make sql run faster.

    i wouldnt say the disk head is not the enemy, but what i've seen in sql world is that, more h/w power doesnt mean you'll get the most out of your system if you dont tune it properly. by tuning it means how to best match h/w with sql. i.e oltp that's encountering cpu pressure doesnt mean adding more cpu would solve the problem. its a question how do you make sql make better use of all the cpu rather than just putting in more h/w or say cant do more because industry cpu only goes up to such and such speed.

    the way to create few more filegroups is not to get "faster" IO out from disk heads, but its for better/optimised use of tempdb against CPU resource. if you have

    1 disk with multiple tempdb filegroups => maybe not much performance gain

    multiple disks and multiple tempdb filegroup (per cpu) => M$ is saying this would bring benefit to reduce allocation contention.

    multiple disks and one tempdb filegroup => not optimised

    On the disks setup, i agree that at least 4 independant disks would be preferred as this would allow me to config a RAID 1 + 0. And as always, the more disks the better for IO as more disk spindle are available. But from experience, I noticed its the SAN controller cache size that's matter compared to number of disk spindles to get a better ROI, which is not the case in a medium sized server such as HP ProLiant DL380 where the more disk the better. Well, as what i've explained, in SAN, things get done in memory first.

    by the way, do you understand the statement below? if you dont agree, your reason? if you can prove statement below has no value, i'd be glad to submit your findings to M$ to clear things out. i too, dont want to lose hundred of thousands dollars to buy and architect something that has no value.

    Tempdb

    The immediate question I get is – "What about all the KB articles on tempdb, a file for scheduler and such?"     This is again a misinterpretation of the legend as tempdb tuning is all about the allocation activity, not a number of threads per file (no such thing).   Workers use tempdb like any other database, any worker can issue I/O to and from tempdb as needed. 

    Tempdb is the database with the highest level of create and drop actions and under high stress the allocation pages, syscolumns and sysobjects can become bottlenecks. SQL Server 2005 reduces contention with the 'cached temp table' feature and allocation contention skip ahead actions.

    When multiple workers are attempting to create or drop objects in tempdb you can decrease the bottleneck by having

    • As many files in tempdb as schedulers => means no of filegroup = no of CPU

    • All files sizes are equal

    • Uniform allocations enabled (-T1118)

    By configuring in this way each new allocation occurs in round-robin fashion across the files and only on uniform extents. SQL Server only allows a single worker to be active on each scheduler at any point in time so if every 'running' worker is attempting tempdb allocations they will logically partition to a separate file in tempdb, reducing the allocation contentions.

    Microsoft has seen that tempdb is an exception to the common rules here. Tempdb can be a high contention point for internal tracking structures. It is often better to create multiple files for tempdb so the internal latching and other activities achieve better separation (per file) so all the workers don't cause a resource contention on a single tempdb file. This type of contention would be extremely rare to see on a user database and I have not seen that issue in a user database.

    Simon Liew
    Microsoft Certified Master: SQL Server 2008

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply