Block Size on MSSQL on Linux

Question

Post reply

Block Size on MSSQL on Linux

gwellbrock

Ten Centuries

Points: 1130
More actions
March 17, 2021 at 2:52 pm

#3859006

Do I need to set my block size of drives on linux at 64k allocation size like I do on windowsOS?

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Mr. Brian Gale One Orange Chip Points: 25026 More actions · Answer 1

I am not an expert on this, but my understanding is that it is recommended.

Same thing in windows, it isn't that you HAVE to set it to 64k, it just gets a performance benefit.

SQL requests data in Extents. Each extent is 8 pages long. Each page is 8KB in size. So an extent is 64kb. So when SQL asks for an extent from disk, if it needs to do a random read to get those 8 pages, you may be jumping from start to end of the disk if you 8 times if you use 8K block size and more if you use smaller.

Using 64k block size means that when you ask for an extent, you are doing 1 continuous read from disk.

As for as I know, this recommendation applies to Linux as well as Windows.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

gwellbrock Ten Centuries Points: 1130 More actions · Answer 2

Thanks I guess this might be more of a Ubuntu forum question but ty for any response -

Based on MS documentation they are allocating the 64k chunk by implementing software raid

# To locate the devices (for example /dev/sdc) for RAID creation, use the lsblk command

# For Data volume, using 4 devices, in RAID 5 configuration with 8KB stripes mdadm --create --verbose /dev/md0 --level=raid5 --chunk=8K --raid-devices=4 /dev/sdc /dev/sdd /dev/sde /dev/sdf

# For Log volume, using 2 devices in RAID 10 configuration with 64KB stripes mdadm --create --verbose /dev/md1 --level=raid10 --chunk=64K --raid-devices=2 /dev/sdg /dev/sdh # For tempdb volume, using 2 devices in RAID 0 configuration with 64KB stripes mdadm --create --verbose /dev/md2 --level=raid0 --chunk=64K --raid-devices=2 /dev/sdi /dev/sdj

On the actual drive creation you can't do chunk size or won't let you mount. Doesn't make sense to do software Raid as we would be duplicating our storage when we have hardware raid.

mkfs.xfs -L mssql.data -b size=65536 /dev/nvme3n1p1

root@ip-??????:/var/opt/mssql# mount -a

mount: /var/opt/mssql/data: mount(2) system call failed: Function not implemented.

https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-performance-best-practices?view=sql-server-ver15

gwellbrock Ten Centuries Points: 1130 More actions · Answer 3

added note when creating the xfs drive it won't allow any size above 4k per ubuntu documentation

Mr. Brian Gale One Orange Chip Points: 25026 More actions · Answer 4

Quick read of the documentation, for an XFS filesystem, it seems you can SET the block size larger, but the kernel will only let you mount it if the partition block size is at or smaller than the memory block size. Ubuntu has a memory block size of 4K.

link - https://serverfault.com/questions/246640/mount-xfs-partition-with-4k-block-size

If you need it higher, you'd need to compile your own kernel or find a kernel with larger memory block size.

If you have spinning disks for your SQL instance, you will want the block size as large as possible to reduce your random read time. If you have SSD's or flash storage, I THINK the block size is less relevant as the disk controller has "magic" baked into it so even if your OS requests sectors 1 through 10, on a spinning disk these are going to be sequential on the physical disk. On SSD's they may not be (or so is my understanding). On SSD's they MIGHT be on the same physical chip or it may be spread out across all of the memory chips.

And if you have hardware RAID set up, the request to read or write from blocks 1-10 (again, as an example) may exist on one single disk OR may span multiple disks. The RAID controller and your RAID setup may do some magic on the back end to get you better performance for sequential reads.

NOTE - I know it isn't "Magic" that happens, but it is something that is transparent to the application and (generally) not conrollable by the DBA.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.