64KB disk formatted vs 4KB disk formatted

  • Hi everyone !

    Here is an interesting test I would like to share with you.

    I was given the SQL Servers few weeks ago. First thing I realize: the customer has not formatted the disks with an allocation unit size set to 64KB, the disks are formatted with the default 4KB unit size. I explain to the customer that it is best practice to format the disk in 64KB because SQL performs disk IO in extents and an extent is 8 pages, each of which are 8KB, for a total of 64KB, hence formatting the SQL disks in 64KB will significantly improve performances.

    I wanted to demonstrate the performance improvement by running a storage testing tool against the disks and compare metrics like IOPs, MB/s, I/O per second and latency.

    TESTS

    TEST#1 – Disk formatted with 4KB

    TEST#2 – Disk formatted with 64KB

    Parameters (same for both test)

    •Block Size of the I/O: 64KB

    •Duration : 30 seconds

    •Queue Depth: 32

    •Thread: 4

    •75% read, 25% write

    •Random

    •Disabled both hardware and software buffering

    Result

    My expectation is that I should have more IOPS and less latency when I read/write a 64KB block on a disk formatted with 64KB block size. My assumption is based on the idea that one operation is faster than 16 operations (16X4KB) for the same amount of data (64KB) with SCSI disks. However, my tests shows the same results, meaning this is no performance improvement from a benchmark standpoint. What am I missing ? Is my test wrong? Misunderstanding of SQL ? Misunderstanding of storage ? How can I prove that disks formatted in 64KB improves SQL performance ?

    I was hoping someone could give me a clue.

    Thank you

  • because SQL performs disk IO in extents and an extent is 8 pages, each of which are 8KB, for a total of 64KB

    Unfortunately that's not true.

    http://blog.purestorage.com/what-is-sql-servers-io-block-size/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • The allocation unit size has much less effect on size of SQL Server IO than is commonly believed.

    Consider writes to the transaction log. Each write is a minimum of 512bytes to a maximum of 60kb. That maximum and minimum are the same whether the filesystem au is 4kb or 64kb.

    Similarly, a 64kb au by itself won't prevent SQL Server from performing a single 8kb page read from a data file if required. A 4kb au won't necessarily prevent a 64kb, 256kb or 512kb physical read from a data file.

    The au is a provisioning/tracking unit for the filesystem. It is the smallest intitial allocation that can be made to a file. Going to have thousands of XML files smaller than 8kb in a filesystem? Using a 4kb au in that filesystem will allow file sizes of 4kb to 60kb(as well as any other multiple of 4kb as a size). While the 64kb au wouldn't allow any file sizes below 64kb or incremental file growth smaller than 64kb.

    In high performance database layout recommendations, I typically recommend one busy file (transaction log or data file) per NTFS filesystem/Windows volume. This prevents files from interleaving with each other in the filesystem as they grow. Interleaved files results in filesystem fragmentation (mainly in the sense of loss of contiguity but also sometimes in the sense of mixed-up-ness or wasted space). It also requires more filesystem metadata overall to track.

    Interleaved files can result in performance degradation if the breaks in file contiguity prevent large reads which would otherwise be possible. More metadata in a 4k vs 64k au filesystem can have a performance impact - especially as individual file size or occupied filesystem space grows. More metadata for the same contents means more disk reads/writes to retrieve or maintain the metadata. It also means more server RAM to hold the metadata, and more memory accesses to traverse the maps from file offset to location within filesystem.

    All that said, I suspect that the greatest performance impact of the 64k vs 4k au was likely seen on 32bit systems, given the RAM constraints and the larger share of RAM consumed by more metadata.

    I think a very well-designed test could show the difference between 64k and 4k au even on a 64bit OS. But it would have to include file interleaving and large enough file sizes for the differences in metadata size and file contiguity to present themselves in the results.

    I still recommend 64k au on new systems. But on existing systems with significant data already present I often advise to wait until a storage tech refresh or other system migration to *change* from 4K to 64k.

  • Thank you very much, it certainly helps to understand SQL. I understand now that SQL does not only perform 64K operations as pointed on by GilaMonster.

    Interleaved files is also very interesting !

    However, if we put SQL server aside for one second, would you agree or not the reading/writing 64KB block size from a disk formated with 64 au should provide better performance than reading/writing 64KB block size from a disk formated with 4KB au ?

    I thought yes, but the numbers says no.

  • Let's step back a bit. By 'disk', do you mean old-style spinning disks directly attached to the machine, or do you mean a SAN LUN presented as a drive?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • SAN LUN presented as a drive to VMware. I am doing my test on a virtual disk.

  • Unless a test stresses the metadata differences or potential file interleave/contiguity differences between 4k and 64k au, I'd expect no measurable difference in a disk IO test.

    For example, with a single 64kb file (filesystem cache disabled) all alone on a filesystem I expect no difference in the maximum read rate against that file for reads or writes of 8k or 64k regardless of 4k/64k au.

    I expect that to remain true until some combination of file size and file fragment placement stress metadata size and/or split IO, resulting in a performance difference.

    It would take careful design of a test & monitoring to ensure a repeatable, measurable difference (although that doesn't preclude the possibility of measuring the difference with a simpler test).

  • lonnyniederstadt (7/27/2016)


    The allocation unit size has much less effect on size of SQL Server IO than is commonly believed.

    Consider writes to the transaction log. Each write is a minimum of 512bytes to a maximum of 60kb. That maximum and minimum are the same whether the filesystem au is 4kb or 64kb.

    Similarly, a 64kb au by itself won't prevent SQL Server from performing a single 8kb page read from a data file if required. A 4kb au won't necessarily prevent a 64kb, 256kb or 512kb physical read from a data file.

    The au is a provisioning/tracking unit for the filesystem. It is the smallest intitial allocation that can be made to a file. Going to have thousands of XML files smaller than 8kb in a filesystem? Using a 4kb au in that filesystem will allow file sizes of 4kb to 60kb(as well as any other multiple of 4kb as a size). While the 64kb au wouldn't allow any file sizes below 64kb or incremental file growth smaller than 64kb.

    In high performance database layout recommendations, I typically recommend one busy file (transaction log or data file) per NTFS filesystem/Windows volume. This prevents files from interleaving with each other in the filesystem as they grow. Interleaved files results in filesystem fragmentation (mainly in the sense of loss of contiguity but also sometimes in the sense of mixed-up-ness or wasted space). It also requires more filesystem metadata overall to track.

    Interleaved files can result in performance degradation if the breaks in file contiguity prevent large reads which would otherwise be possible. More metadata in a 4k vs 64k au filesystem can have a performance impact - especially as individual file size or occupied filesystem space grows. More metadata for the same contents means more disk reads/writes to retrieve or maintain the metadata. It also means more server RAM to hold the metadata, and more memory accesses to traverse the maps from file offset to location within filesystem.

    All that said, I suspect that the greatest performance impact of the 64k vs 4k au was likely seen on 32bit systems, given the RAM constraints and the larger share of RAM consumed by more metadata.

    I think a very well-designed test could show the difference between 64k and 4k au even on a 64bit OS. But it would have to include file interleaving and large enough file sizes for the differences in metadata size and file contiguity to present themselves in the results.

    I still recommend 64k au on new systems. But on existing systems with significant data already present I often advise to wait until a storage tech refresh or other system migration to *change* from 4K to 64k.

    You will notice a significant drop in through put with 4K block size on random reads. Attached is the graph from a diskspd test that i did , this is how i found that the drives were formatted with 4K allocation unit. The one in purple is on a server with drives formatted in 64K and the other one in 4k. These tests were executed against two different LUNS on different physical servers at the same time.

  • Gamleur84,

    So if I understand you correctly, you have a SAN LUN presented to VMware that you have formatted with VMFS. Is that VMFS5 with the default 1MB block size?

    You then carve out two VMDK files and present them to your VM and format one of them 4K and the other 64k?

    If all of this is true, that's why you aren't seeing a difference because the datastore block size is 1MB.

    Can you raw map a SAN LUN to the virtual machine and format that with 64k and compare it to the 4K VMDK drive?

    What SAN are you using and what is its block size?

    Hope this helps!!

    Jon

  • @SeniorITGuy

    VMFS version is 5.61 and the block size is 1 MB

    SAN sector size = 512 bytes

    There is conversation about block size across different layer (OS, VMFS, SAN) here: https://communities.vmware.com/thread/461129?start=0&tstart=0

    I also thought it could be the reason (VMFS 1MB), but there is no indication on internet that says 64KB au is useless on virtual disks. Microsoft also recommends to format SQL disks with 64KB on VMs on Azure... (maybe the storage is optimized for 64KB...)

    I can not test raw disk for now, but I also believe that I would see a difference in this case.

    @SSCommitted

    Did you mean: "You will notice a significant drop in through put with 4KB block size on random reads" ?

    @lonnyniederstadt

    Can you explain why you "expect no difference in the maximum read rate against that file for reads or writes of 8k or 64k regardless of 4k/64k au" ?

    Thank you

  • @SSCommitted

    Did you mean: "You will notice a significant drop in through put with 4KB block size on random reads" ?

    Good catch, i fixed my original comment. Yes i mean you will notice drop in throughput if the volume au is 4K.

  • I like this image of the Windows storage driver stack.

    https://i-technet.sec.s-msft.com/dynimg/IC347746.gif

    The allocation unit is an attribute of the filesystem, governing initial the incremental filegrowth possible.

    Although its old, I like this explanation of how the NTFS filesystem works as a whole.

    https://technet.microsoft.com/en-us/library/cc781134(v=ws.10).aspx#w2k3tr_ntfs_how_dhao

    How an individual file is accommodated in the NTFS filesystem is explained very thoroughly in these two posts.

    https://blogs.technet.microsoft.com/askcore/2009/10/16/the-four-stages-of-ntfs-file-growth/

    https://blogs.technet.microsoft.com/askcore/2015/03/12/the-four-stages-of-ntfs-file-growth-part-2/

    Once clusters are allocated to a file for growth, the work of the allocation unit is done. When a physical read is issued against a file, at the filesystem layer the file-based address is resolved for lower levels of the Windows stack based on the cluster allocations.

    Once the file has grown, the remaining questions are how compact its metadata is and how contiguous its data is.

    A single file on a new volume should have completely contiguous data. Unless the NTFS metadata was fragmented and bloated by a lot of prior activity like massive amounts of small file creation and deletion, the metadata should also be very compact.

    With contiguous clusters and compact metadata, whether allocations were made to the file in 4k clusters or 64k clusters can end up falling out of the equation.

    If that file is a transaction log, sequential writes of 512bytes to 60k will occur regardless of 4k/64k cluster size. If its a database file, reads of 8k to 512k (the default Windows maximum physical IO size and maximum physical IO size for VMware pvscsi vhba) will also occur regardless of 4k/64k cluster size. On a VMware vm, vmfs filesystem block size will have minimal effect outside the first write to thin volumes. The 1 mb vmfs block size doesn't force small physical reads to become larger.

    https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003565

    On real systems there are often departures from the ideal. Multiple files within a filesystem increase the odds of fragmentation of both data and metadata. Sometimes that's not even intentional. If a SQL Server data file grows while DBCC checkdb is running, the sparse file for the database snapshot can result in fragmentation of the base database file. Sparse files themselves are very susceptible to fragmentation, as is discussed in the "four stages part 2" post above.

    In the past, NetApp indicated observed performance improvements on systems with 4k allocation units vs 64 k allocation units. I don't doubt their results, but I am skeptical of their conclusions. I bet closer examination of the test systems would reveal something other than cluster size was responsible for observed performance difference. Maybe file fragmentation within the filesystem. Maybe filesystem metadata fragmentation or bloat. Maybe 4k vs 512byte disk sector size...

    4k allocation units allow large numbers of small files to be stored in an NTFS filesystem without a lot of empty space in the allocated clusters. Although there have been claims of observed performance gains for SQL Server with 4k au, I've always suspected there's another cause in those cases.

    64k allocation units allow much larger maximum Windows volume sizes than 4k allocation units. For systems like SQL Server where small numbers of large files are expected within an NTFS filesystem, 64k au seems a natural choice. Its much less likely to experience very badly fragmented metadata and hopefully (if more than 1 file in the filesystem) will experience greater contiguity. But predicting a performance gain on any given system by converting from 4k to 64k au AND pinpointing the reason for improved performance is a very tricky business. I'm not overly skeptical when I see evidence of gain on some systems. But I'm also not surprised to see legitimate tests on other systems that show no measurable difference.

  • Gamleur84,

    You are correct, MS still recommends 64k block size for SQL drives no matter if they are VMDK or not. The link you pasted states that the author doesn't think you will see much, if any difference in performance between 4k and 64k block sized VMDKs due to the fact that the VMFS volume is 1 MB block size.

    One thing not mentioned is the design of having multiple VMDKs attached to a single VM:

    When you attach each new drive to a virtual machine, you should add it to a new SCSI controller, 0:0 for the OS disk, 1:0 for the second disk, 2:0 for the third disk, 3:0 for the fourth disk and then I like to go back to 1:1 for the fifth disk or pick the SCSI controller with the smallest drive on it and so on. Then for each new SCSI controller added, make it VMware Paravirtual instead of the default LSI Logic SAS. This will get you better throughput and performance in the virtual machines. We stick to no more than 8 drives total for a virtual machine. I have no idea if this will give you different results for your IO tests between the VMDKs but something to consider. Also this method for adding new drives holds true for Hyper-V VMs as well. I can speak from experience that 5 VHDX drives on a single virtual SCSI controller will bring a virtual machine to its knees when it starts accessing files on those drives.

    I also read an article for building new VMs to make sure the BIOS is set to UEFI and to change the first SCSI controller to VMware Paravirtual when deploying so your OS drive is formatted GPT instead of MBR. I have no experience with doing this yet but something I will be doing and seeing if it makes any difference in the VM performance in the future.

    One thing of note from the book "Virtualizing SQL Server with VMware: Doing IT Right" by Michael Corey, Jeff Szastak and Michael Webster:

    "Having a small (default) Allocation Unit Size means there are many more times the number of blocks at the file system level that need to be managed by the operating system. For file systems that hold thousands or millions of small files, this is fine because there is a lot of space savings by having a smaller Allocation Unit in this scenario. But for a SQL Server database that consists of very few, very large files, having a much larger Allocation Unit is much more efficient from a file system, operating system management, and performance perspective."

    Hope this helps!!

    Jon

  • Thank you very much guys !

  • Gamleur84 (7/27/2016)


    SAN LUN presented as a drive to VMware. I am doing my test on a virtual disk.

    Then it's meaningless, a virtual disk does not mimic an underlying RAID volume that would be attached to the server. Also vhds don't do random I\O too well. Take a look at my article on this site at the link below

    http://www.sqlservercentral.com/articles/iscsi/75856/[/url]

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply