Blog Post

Understanding File System IO, Lessons Learned From SQL Server

,

I do more than just SQL Server. I enjoy programming. In my former life I have worked with C/C++ and Assembler. I also spent quite a bit of time in Visual Basic. I loved it for prototyping and what we now call RAD development efforts.

As I spent more and more time with SQL Server my programming really took a back seat career wise. Having that background though really helps me day in and day out understanding why SQL Server does some of the things it does at the system level.

When .Net launched, I found it lacking for a lot of the stuff I like to do. It didn’t replace C/C++ for me on the speed side and didn’t replace VB 6 on the ease of development side. During the time after VB6 and before C# 2.0 I was using non Microsoft languages to do my systems work that mirrored VB6 like REALBasic.

But, like all things Microsoft .Net improved and 2.0 fit very nicely in my world view. I didn’t fully understand that at the time though and cussed up a storm when they announced CLR for what would become SQL Server 2005.

Fast forward several years and I’ve moved away from VB/C++ and spent the last few years learning C#.

Now that I work mostly in C# I do look up solutions for my C# dilemmas on sites like http://www.codeplex.com and http://www.codeproject.com. I love the internet for this very reason, hit a road block do a search and let the collective knowledge of others speed you on your way. But, it can be a trap if you don’t do your own homework.

I write mostly command line or service based tools these days not having any real talent for GUI’s to speak of. Being a person obsessed with performance I build these things to be multi-threaded, especially with today’s computers having multiple cores and hyper threading it just makes since to take advantage of the processing power.

This is all fine and dandy until you want to have multiple threads access a single file and all your threads hang out waiting for access.

So, I do what I always do, ask by best friend Google what the heck is going on. As usual, he gave me several quality links and everything pointed to the underlying file not being set in asynchronous mode.

Now having done a lot of C++ I knew about async IO buffered and un-buffered and could have made unmanaged code calls to open or create the file and pass the handle back, but just like it sounds it is kind of a pain to setup and if you are going down that path you might as well code it all up in C++ anyway.

Doing a little reading on the mighty MSDN I found all the little bits I needed to set everything to rights. All the async flags were set and I started my test run again. It ran just like it had before slow and painful.

Again, I had Mr. Google go out and look for a solution for me, sometimes being lazy is a bad thing, and he came back with several hits where people had also had similar issues. I knew I wasn’t the only one!

The general solution? Something I consider very, very .Net, use a background thread and a delegate to keep the file access from halting your main thread, so your app “feels” responsive but really hasn’t gained anything.

I spit out my carrot stick and shot diet coke from my nose when I read that.

Sure, it solved the issue of program “freezing” up on file access but doesn’t really solve the problem of overlapped asynchronous IO that I am really hoping to use to speed up my application.

Drawing back on my past again, I remembered that SQL Server uses async IO to get some of it’s performance boost. I did some refresh reading on the MSDN site again and struck gold.

Writes to the file system may OR may not be async depending on several factors, one of which is if the file must be extended everything goes back to sync IO for that.

Well, since I was working with a filestream and a newly created file every time I was pretty much guaranteeing that I would be sync no matter what.

At this point I almost just gave up and dropped back to C++. Actually I did start to code it up when I realized I was doing things differently in my C++ version.

I was manually creating the file and doing an initial allocation growing it out to the size of one full buffer call.

I then switched over to one of my favorite tools processmon, from the Sysinternals guys now at Microsoft, and used it to see what was going on at the file level.

I watched my C++ version and lo it was doing async in the very beginning then switching as the file started growing naturally.

I fired up my instance of SQL Server and watched as the async trucked right along…. until a file growth happened and everything went sync for the duration of the growth.

AH HA!

So, taking that little extra knowledge I manually created my file in C# set an initial default size and wouldn’t you know the async IO kicked right in until it had to grow the file.

I had to do a little extra coding watching for how much free space was in the file when I get close I now block all threads extend the file by some amount and then release the lock letting things async away.

So, there you go my little adventure and how, yet again, SQL Server helped me solve something even though it didn’t have a direct tie to it.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating