We often take the advice given to us on forums or in articles at face value. Even though the authors almost always say things like “your mileage may vary” or “may not apply to your situation” people still assume it is the gospel. Sometimes it is lack of experience. Other times it is just lack of knowledge on how to verify these things on your own. In this article I’m going to give you a tool to look at what SQL Server is doing at the disk level and allow you to make better decisions on how to configure your underlying disks.
There are several things you need to know about how SQL Server accesses the database files and the implications of that before you can construct a proper testing methodology.
http://technet.microsoft.com/en-us/library/cc966500.aspx covers the basics. There are a few things I will highlight here.
ACID (Atomicity, Consistency, Isolation, and Durability) is what makes our database reliable. The ability to recover from a catastrophic failure is key to protecting your data.
WAL (Write-Ahead Logging) is how ACID is achieved. Basically, the log record must be flushed to disk before the data file is modified.
Stable media isn’t just the disk drive. A controller with a battery backed cache is also considered stable. Since SQL Server can request blocks as big as 64KB make sure your controller can handle that block size request in cache. Some older controllers only do a 16KB block or smaller.
With the requirement of stable media SQL Server creates and opens all files with a specific set of flags. FILE_FLAG_WRITETHROUGH tells the underlying OS not to use write caching that isn’t considered stable media. So, the local disk cache is normally bypassed. Not all hard drives honor the flag though, Some SATA/IDE drives ignore it. Usually, the drive manufacturer provides a tool to turn off write caching. If you are using desktop drives in a mission critical situation be aware of the potential for data loss. FILE_FLAG_NO_BUFFERING tells the OS not to buffer the file ether. At this point the only cache available will be the battery backed or other durable cached on the controller.
SQL Server uses asynchronous access for data and log files. This allows IO request to queue up and use the IO system as efficiently as possible. The main difference between the two are SQL Server will try and gather writes to the data file into bigger blocks but the log is always written to sequentially.
All of these rules apply to everything but tempdb. Since tempdb is recreated at restart every time recoverability isn’t an issue.
Searching around you will find these generalities about SQL Server’s IO patterns
Log Writes
Sequential 512 bytes to 64KB
Data File Read/Writes
8KB
Read ahead – more important to Enterprise Edition
8KB to 125KB
Bulk Insert
8KB to 128KB
Create Database
512 byte – full initialize on log file only.
Backup Sequential Read/Write
1 MB
Restore Sequential Read/Write
64K
DBCC – CHECKDB
Sequential Read 8K – 64K
DBCC – DBREINDEX
(Read Phase) Sequential Read (see Read Ahead)
(Write Phase) Sequential Write
Any multiple of 8K up to 128K
DBCC – SHOWCONTIG
Now that we have an idea of what SQL Server is suppose to be doing its time to verify our assumptions.
There are a few tools that will allow you to capture the file activity at the system level. Process Monitor is a free tool from Microsoft that I will use to collect some base line information. In it’s standard configuration Process Monitor captures a ton of stuff and uses the page file to spool the info to. So, before we begin we need to change the default configuration.
Capturing IO data using process monitor.
Filter to apply
process is sqlservr.exe Operation is Read Operation is Write
Columns to choose.
Process Name PID PATH Detail Date & Time Time of Day Relative Time Duration TID Category
Change Backing File.
The maximum number of events it will capture is 199 million. This is enough on my system to capture 12 hours of activity easily. Once we have a good sample you can save it off as an XML file or CSV. Choosing CSV it is pretty easy to import the data into SQL Server using SSIS or your tool of choice.
I import the CSV into a raw table first.
Raw table to import into.
CREATE TABLE [SQLIO].[dbo].[pm_imp] ( [Process Name] VARCHAR(12), [PID] SMALLINT, [Path] VARCHAR(255), [Detail] VARCHAR(255), [Date & Time] DATETIME, [Time of Day] VARCHAR(20), [Relative Time] VARCHAR(50), [Duration] REAL, [TID] SMALLINT, [Category] VARCHAR(6))
Next I create a cleaner structure with some additional information separated from the detail provided.
SELECT [Process Name] AS ProcessName, PID AS ProcessID, PATH AS DatabaseFilePath, Detail, [Date & Time] AS EventTimeStamp, [Time of Day] AS TimeOfDay, [Relative Time] AS RelativeTime, [Duration], TID AS ThreadID, Category AS IOType, substring(detail,charindex('Length: ',detail,0) + 8, (charindex(', I/O',detail,0) - charindex('Length: ',detail,0) - 8)) AS IOLength, CASE reverse(left(reverse(PATH),3)) WHEN 'mdf' THEN 'Data' WHEN 'ndf' THEN 'Data' WHEN 'ldf' THEN 'Log' END AS FileTypeINTO SQLIODataFROM dbo.pm_impWHERE reverse(left(reverse(PATH),3)) IN ('mdf','ndf','ldf')
Once we have the data cleaned up a bit we can now start doing some analysis on it.
Queries for interesting patterns.
This query gives us our read and write counts.
SELECT count(* ) IOCount, IOTypeFROM SQLIODataGROUP BY IOTypeORDER BY count(* ) DESC
This one shows us the size of the IO and what type of operation it is.
SELECT count(* ) IOCount, IOLength, IOTypeFROM SQLIODataGROUP BY IOLength,IOTypeORDER BY count(* ) DESC
This is a look at activity by file type data or log.
SELECT count(* ) IOCount, FileTypeFROM SQLIODataGROUP BY FileTypeORDER BY count(* ) DESC
Since we are capturing the thread id we can see how many IO’s by thread.
SELECT count(* ) IOCount, ThreadID FROM SQLIOData GROUP BY ThreadID ORDER BY count(* ) DESC
We can also look at IO types, sizes and count by file helping you see which ones are hot.
SELECT count(* ) IOCount, databasefilepath, iotype, iolengthFROM SQLIODataWHERE databasefilepath LIKE '%filename%'GROUP BY databasefilepath, iotype, iolengthHAVING count(* ) > 10000ORDER BY databasefilepath, count(* ) DESC
Now that we see exactly what our IO patterns are we can make adjustments to the disk subsystem to help scale it up or tune it for a particular pattern.
This is just another tool in your tool belt. This is a supplement to using fn_virtualfilestats to track file usage. I use it to get a better idea of the size of the IO’s being issued.Using these two tools I can see the size of the IO’s in a window of time that is reported by my fn_virtualfilestats capture routine.
Always verify your assumptions, or advice from others.
I’ve been dealing with change control and source code repositories for most of my professional career. While I’ve seen change control and integration advance steadily for writing programs it feels like the database part of things is just stuck in the stone age. For months now I’ve been researching solutions for source control, change management, and deployment of database objects. The conclusion I’ve come to is there is no solution. Well, no easy solution. I was very happy in the early days of SQL Server 2005 when they announced source control integration into management studio. It was a great pain for me personally to have Visual Studio, and the solution architecture it offered and not have that on the database side of things. Alas, it wasn’t meant to be. What they meant buy source control was using the previous generation of integration and then crippling it.
Really?
This doesn’t look like much of a solution to me.
I know what most of you are thinking. If you have Visual Studio use it. That works for me but not the people on my team that only have access to SSMS. It also means I have to jump between two tools to do one thing, work with SQL Server. I have been told that Microsoft is basically pushing you to Visual Studio for all of your development needs. Leaving SSMS as a management tool only. If Visual Studio did everything SSMS did it wouldn’t be that big a deal for me personally.
I tried several things to work around the limitations SSMS has. I found you could manually edit the solution file to get extra folders. The only problem with that is they all show up as ether Queries or Miscellaneous. Other than that one and the old fix for sorting files by name there aren’t any other hacks I can find.
Generally has a nice look and feel.It has all the development and management features to be a true replacement for management studio. I tried all the normal things that I do in SSMS in Toad and several things were better. The debugger was nice and the statement optimizer is also a nice addition. It does fall down flat in some basic key areas. I never could get it to display an execution plan. As a T-SQL guy the plan is a must. I know it is a bug somewhere. Having something this fundamental during and evaluation is a big red light though.
The only down side is it doesn’t support Sourcegear Vault/Fortress which is a real shame. Lots of SMB’s use Vault for source control since it is miles better than visual source safe and much cheaper than team system.
That left only one other contender in this fight. ApexSQL Edit has been around quite a while as well. Initially, it has a similar look and feel to Toad. I know there isn’t a lot that you can do to since both look like Office. I is also missing the management features but I can live with that. The goal is to get the developers a tool they can develop in and use our code repository easily. ApexSQL Edit did include support for Vault and it worked as expected. Again, I started using it daily like I would SSMS. Everything I tried worked, for the most part. 95% of the time it would generate an execution plan. Not as clean as SSMS but it had more options on how to display the plan, which I liked. I did have a few crashes, but this was a beta build and I will let that go until I test the full release. Since this was a beta I did provide feedback and initially the folks at ApexSQL were very responsive. Eventually though everything just went quiet accept for the sales guys asking me how things were going. Right now they are a no go until the stability issues are addressed and the RTM is out so I can do a full evaluation again.
What I hoped would be a pretty easy exercise turned out to be a real work out. For all of SSMS’s problems it is stable and familiar. I was really hoping that ether Toad or ApexSQL Edit would solve my problems. I haven’t given up on ApexSQL Edit yet, we will just have to play the waiting game and keep using an inadequate solution until someone comes up with something better.
As promised and update on what has happened so far. A correction needs to be made. the P800 is a PCIe 1.0 card so the bandwidth is cut in half from 4GB/sec to 2GB/sec.
My CDW rep did get me in contact with an HP technical rep who actually knew something about the hardware in question and its capabilities. It was one of those good news, bad news situations. We will start with the bad news. The performance isn’t off. My worst fears were confirmed.
The HP Guy (changing the names to protect the innocent) told me their rule of thumb for the performance of the 2.5” 73GB 15K drives is 10MB/Sec. I know what you are thinking, NO WAY! But, I’m not surprised at all. What I was told is the drives ship with the on board write cache disabled. They do this for data integrity reasons. Since the cache on the drive isn’t battery backed if there was any kind of failure the potential for data loss is there. There are three measurements of hard disk throughput, disk to cache, cache to system and disk to system. Disk to cache is how fast data can be transferred from the internal data cache to the disk usually sequentially. On our 15k drive this should be on average 80MB/sec. Disk to system, also referred to burst speed, is almost always as fast as our connection type. Since we are using SAS that will be close to 250MB/sec. Disk to system is no caching at all. Without the cache several IO reordering schemes aren’t used, there is no buffer between you and the system, so you are effectively limited by the Areal Density and the rotational speed of the disk. This gets us down to 10 to 15 megabytes a second. Write caching has a huge impact on performance. I hear you saying the controller has a battery backed cache on it, and you would be right.
The P800 controller was the top of the line that HP had for quite a while. It is showing its age now though. The most cache you can get at the moment is 512MB. It is battery backed so if there is a sudden loss of power the data in cache will stay there for as long as the battery holds out. When the system comes back on the controller will attempt a flush to disk. The problem with this scheme is two fold. The cache is effectively shared across all your drives since I have 50 drives total attached to the system that is around 10.5 megabytes per drive. Comparable drives ship with 16 to 32 megabytes of cache on them normally. The second problem is the controller can’t offload the IO sorting algorithms to the disk drive effectively limiting it’s throughput. It does support native command queuing and elevator sorting but applied at the controller level just isn’t as fast as at the disk level.If I had configured this array as a RAID 6 stripe the loss of performance from that would have masked the other bottlenecks in the controller. Since I’ve got this in a RAID 10 the bottleneck is hit much sooner with fewer drives. On the P800 this limit appears to be between 16 and 32 disks. I won’t know until I do some additional testing.
If you have been following my blog or coming to the CACTUSS meetings you know I tell you to test before you go into production. With the lack of documentation I went with a set of assumptions that weren’t valid in this situation. At that point I should have stopped and done the testing my self. In a perfect world I would have setup the system in a test lab run a series of controlled IO workloads and come up with the optimal configuration. I didn’t do as much testing as normal and now I’m paying the price for that. I will have to bring a system out of production as I run benchmarks to find the performance bottlenecks.
I have two P800’s in the system and will try moving one of the MSA70’s to the other controller. This will also allow me to test overall system performance across multiple PCIe busses. I have another system that is an exact duplicate of this one and originally had the storage configured in this way but ran into some odd issues with performance as well.
HP has a faster external only controller out right now the P411. This controller supports the new SASII 6G protocols, has faster cache memory and is PCIe 2.0 complainant. I am told it also has a faster IO processor as well. We will be testing these newer controllers out soon. Also, there is a replacement for the P800 coming out next year as well. Since we are only using external chassis with this card the P411 may be a better fit.
We are also exploring a Fusion-io option for our tempdb space. We have an odd workload and tempdb accounts for half of our write operations on disk. Speeding up this aspect of the system and moving tempdb completely away from the data we should see a marked improvement over all.
Faced with the lack of documentation, don’t make assumptions based on past experiences. Test your setup thoroughly. If you aren’t getting the information you need, try different avenues early. Don’t assume your hardware vendor has all the information. In my case, HP doesn’t tell you that the disks come with the write cache disabled. They also don’t give you the full performance specifications for their disk controllers. Not even my HP Guy had that information. We talked about how there was much more detailed information on the EVA SAN than there was on the P800.
Again, I can’t tell you how awesome CDW was in this case. My rep, Dustin Wood, went above and beyond to get me as much help as he could, and in the end was a great help. It saddens me I couldn’t get this level of support directly from HP technical support. You can rest assured I will be giving HP feedback to that effect. By not giving the customer and even their own people all the information sets everyone up for failure.
I’m not done yet. There is a lot of work ahead of me, but at least I have some answers.You can bet I’ll be over at booth #414 next week at PASS asking HP some hard questions!
I love visiting Seattle, even in November. Getting to catch up with old friends and meet new people is always the highlight of my trip. Don’t get me wrong, the training is awesome and I always learn so much, but I am a social creature. I can always go to specific training events to hear Kalen or Itzik. Getting to meet them in a social setting is something else all together. I learn as much talking to as many of the 2000 people there as I do attending sessions. Never underestimate the wisdom of the crowd!
I’m also a bingo square, if you aren’t playing SQLBingo shame on you! It is another great way to meet some smart people and expand your circle of friends.Even if you aren’t playing still come and find me, I’m always willing to meet new people.
See you next week!
I have had the pleasure of being a vendor, and technical support for both hardware and software products. I know it isn’t easy. I know it isn’t always possible to fix everything. The level of support I’ve received from HP on my current issue is just unacceptable. This is made more frustrating by the lack of documentation. The technical documents show capacity. How many drives in an array, Maximum volume size but nothing on throughput.Every benchmark they have seems to be relative to another product with no hard numbers. For example, the P800 is 30% faster than the previous generation.
I’m not working with a complicated system. It’s a DL380 G5 with a P800 and two MSA70’s fully populated with 15k 73GB hard drives. 46 of them are in a RAID 10 array with 128k stripe. Formatted it NTFS with a 64k block size and sector aligned the partition. Read/Write cache is set at 25%/75%. This server originally just had one MSA70. We added the second for capacity expansion and expected to see a boost in performance as well. As you can probably guess, there wasn’t any increase in performance at all.
Here is what I have as far as numbers. Some of these are guesses based on similar products.
P800 using two external miniSAS 4x connectors maximum throughput of 2400 MB/sec (2400Mbit per link x 4 per connector x 2 connectors). The P800 uses a PCIe x8 connection to the system at 4,000 MB/Sec (PCIe 2.0 2.5GHz 4GB/sec each direction). Attached to the controller are 15k 73GB 2.5” hard drives 46 of them for a raw speed 3680 MB/Sec of sequential read or write speed (23x80MB/sec write sequential 2 MSA70's RAID 10 46 Drives total based on Seagate 2.5 73GB SAS 15.1k)
Expected write speed should be around 1200 megabytes a second.
We get around 320 MB/Sec sequential write speed and 750MB/sec in reads.
Ouch.
Did I mention I also have a MSA60 with 8 7.2k 500GB SATA drives that burst to 600MB/sec and sustain 160MB/Sec writes in a RAID 10 array? Yeah, something is rotten in the state of Denmark.
With no other options before me I picked up the phone and called.
I go through HP’s automated phone system, which isn’t that painful at all, to get to storage support. Hold times in queue were very acceptable. A level one technician picked up the call and started the normal run of questions. It only took about 2 minutes to realize the L1 didn’t understand my issue and quickly told me that they don’t fix performance issues period. He told me to update the driver, firmware, and reboot. Of course none of that had worked the first time but what the heck, I’ll give it the old college try. Since this is a production system I am limited on when I can just do these kinds of things. This imposed lag makes it very difficult to keep an L1 just sitting on the phone for five or so hours on hold while they wait for me to complete the assigned tasks. I let him go with the initial action plan in place with an agreement that he would follow up.Twice I got automated emails that the L1 had tried to call and left voicemails for me. Twice, there were no voicemails. I sent him my numbers again just to be on the safe side. Next, I was told to run the standard Array Diagnostic Utility and a separate utility that they send you to gather all the system information and logs, think a PSSDiag or SQLDiag. After reviewing the logs he didn’t se anything wrong and had me update the array configuration utility. I was then told they would do a deeper examination of the logs I had sent and get back to me. Three days later I got another email saying the L1 had tried to call and left me a message. Again there was no voicemail on my cell or my desk phone. I sent a note back to the automated system only to find the case had been closed!
I called back in to the queue and gave the L1 who answered my case number, he of course told me it was closed. He read the case notes to me, the previous L1 had logged it as a network issue and closed the case. If I had been copying files over the network and not to another local array I can see why it had been logged that way. I asked to open a new case and to speak to a manager. I was then told the manager was in a meeting. No problem, I’ll stay on the line. After 45 minutes I was disconnected. Not one to be deterred, I called back again. The L1 that answered was professional and understanding. Again, I was put on hold while I waited for the manager to come out of his meeting. About 10 minutes later I was talking to him. He apologized and told me my issues would be addressed.
I now had a new case number and a new L1. Again, we dumped the diagnostic logs and started from the beginning. This time he saw things that weren’t right. There was a new firmware for the hard drives, a new driver for the P800, and a drive that was showing some errors. Finally, I felt like I was getting somewhere! At this point it has been ten days since I opened the previous case. We did another round of updates. A new drive was dispatched and installed. The L1 did call back and actually managed to ether talk to me or leave a message. When nothing had made any improvement he went silent. I added another note to the case requesting escalation.
That was eight days ago. At this point I have sent seven sets of diagnostic logs. Spent several hours on the phone. And worked after hours for several days. The last time I talked to my L1, the L2’s were refusing to accept the escalation. It was clearly a performance problem and they don’t cover that. The problem is, I agree. Through this whole process I have begged for additional documentation on configuration and setup options, something that would help me configure the array for maximum performance.
They do offer a higher level of support that covers performance issues, for a fee of course. This isn’t a cluster or a SAN. It is a basic setup in every way. The GUI walks you through the setup, click, click, click, monster RAID 10 array done. What would this next level of paid support tell me?
My last hope is CDW will be able to come through with documentation or someone I can talk to. They have been very understanding and responsive through this whole ordeal.
Thirty one days later, I’ve still got the same issue. I now have ordered enough drives to fill up the MSA60. The plan is to transfer enough data to free up one of the MSA70’s. Through trial and error, I will figure out what the optimum configuration is. Once I do I’ll post up my findings here.
If any of you out there in internet-land have any suggestions I’m all ears.
I’ll be on the Twitter Bingo card along with a lot of other great SQL Server folks at Pass! Quest Software/SQLServerPedia are sponsoring this event. This is an opportunity for you to meet lots of folks and expand your circle of SQL friends. It’s also a great opportunity for me to meet you! I spend quit a bit of time at the PASS conference meeting new people and making friends. The rules are here. They will have some cards at the Quest booth, if you need one just let me know I’ll make sure you get in the game. I’ll be updating my location via twitter through the day. To be honest though you can usually hear me from several blocks away, so you shouldn’t have any problems finding me!
This isn’t exactly related to SQL Server. I did effect my productivity. I am a monitor junkie. I love multi-monitor setups and have had them for a very long time. Today, having two monitors is trivial. Almost every video card on the market supports two displays. If you want more than that you have to ether step up to a professional graphics card, which can be very expensive, or add another video card to your system. On my rig at home two video cards isn’t a problem I’ve got multiple PCIe slots and run two cards for a four monitor setup. My workstation at work only has one PCIe slot and a PCI slot. Getting a modern card using PCI is harder and harder.
I did some digging and found a card that wouldn’t break the bank and should work with my primary card without having two sets of drivers so everything looked good. I slotted the card in and powered up the computer. Not only did I not have anything on my third display, my network card dropped out too! So, I tried several other older PCI cards to no avail. I knew there were other adaptors out that connected a display via USB. Last time I had checked they were very expensive and didn’t work all that well. I made a trip down to my local Fry’s store. There I saw something truly amazing. Not only did they have USB to VGA they had USB to DVI, and they were under 100 bucks! I settled on the UV Plus+ 16 since it would do 1650x1080 with 32 bit color. The price was right at 60 bucks.
The UV Plus+ 16 is based on the Display Link DL-160 chip. Several other manufacturers also have adaptors based on ether the DL-160 or DL-120 and should be similar in performance and installation. The DL-120 doesn’t support the higher resolutions that I needed to run my 22” monitors.
Out of the box you get everything you need.
This thing is small! I was also expecting something that wasn’t in the package, a power supply. You know, one of those lovely wall warts that take up way to much space. To my surprise, the UV Plus gets its power directly from the USB port. They include a carry case for the adaptor but not the USB cable or the DVI to VGA adaptor. To me this makes the case useless. The unit is sturdy and shouldn’t have any issues floating around in my laptop bag when I travel. The name, why Plus+? One plus not good enough? It is going to be hard selling these on eBay. “I give the Plus+ an A+++++.”
Installing the adaptor is simple.
That’s it.
Once installed changing the options is just like working with any other monitor attached to your system.
If you don’t want to go that route you can always use the try icon control application that comes with it.
I actually moved the UV Plus to power a monitor I have that rotates from portrait to landscape since the tray utility handles it so well.
There are other benefits to running these adaptors. You can have six, yes six attached to a single system. Unless you are running an Nvidia card then you are limited to four adaptors. This is going to allow me to move up to six displays and remove the second video card in my primary workstation at home. With the free PCIe slot I’m putting in a hardware RAID controller, you can never have enough IO!
Has been pretty uneventful, generally it just works. I did have an issue with opening a PDF from inside adobe reader in full screen mode. The display became corrupted with blocks and color bands. I just used the tray tool to turn off the display then back on and it was fine again.
The UV Plus+ 16 is EXACTLY what I needed. It is cheap, easy to use, and small. All of these things add up to a solid little device with lots of possibilities. If you need a multi-monitor setup for work this is a great solution for you. If you need a multi-monitor setup for gaming then you need to go with the right PCIe video card setup.
In my last post The Basics of Spinning Disks we covered hard drives, the back bone of our IO system.
This installment we will cover what connects the controller to the computer.
Disk controllers use a system bus to talk to your CPU and memory. It also determines the maximum speed your disk can talk to the computer. There may be as many as six different system busses in your computer. We are only interested in the ones that directly connect your disk controllers.
The oldest bus still in general use is PCI. You can still find them in your desktop and in servers though it is really on the way out. We are only covering PCI 2.0 32 bits wide running at 33 MHz. This allows for a theoretical top speed of 133.33 MB/Sec. In reality after overhead and other limitations you end up around 86 MB/Sec throughput. A single modern disk can achieve this speed. You generally don’t see PCI disk controllers with more than 4 ports. Adding more disk controllers to a system may not yield a direct increase in performance. Even if you have multiple PCI slots, they may only actually run through a single PCI bus. Limiting your bandwidth to the system to 133.33 MB/Sec.
Credit:Jonathan Zander
IBM, HP, and Compaq came together to standardize a faster bus for servers, specifically for disk controllers and network interface cards. PCI-X build on the PCI standard and was backwards compatible. It extended the PCI bus to 64 bits wide and a speed of 66 MHz in its initial launch. We go from 133.33 MB/Sec to 533.3 MB/Sec, a 4x improvement. The next generation brought us two more implementations, PCI-X 64 bit/100 MHz and PCI-X 64 bit/133 MHz at 800 MB/Sec and 1067 MHz respectively. This was a major step up but had several flaws. The physical size of the connector was huge. It also carried over the shortcomings of PCI. Signal noise across slots, errors could be caused by having several cards next to each other. Communication was half-duplex bidirectional, It couldn’t send and receive data at the same time. You are only as fast as the slowest card on the bus, If you had a 66 MHz card your 133 MHz card was reduced to match.
Yikes!
Credit: Snickerdo
We have moved on to a completely new standard, PCI Express (PCIe). Some people confuse PCI-X with PCIe, but they are completely different. The new PCIe standard was introduced in 2004 and was quickly adopted in main stream computers for video cards, but it is a general system bus. There are several key differences between PCIe and the buses that came before it. It is a fully serial and bidirectional bus, you can have multiple cards at multiple speeds reading and writing data at the same time. It also introduced the concept of lanes. PCIe card will use between 1 to 16 lanes. Each lane in the 1.0 specification was rated at 250 MB/sec. The 2.0 specification introduced in 2007 doubled that to 500 MB/Sec. In 2011 the 3.0 Specification will double that again to 1 GB/Sec. PCIe is also rated by how many transfers a second it can handle. Measuring transfers in Gigatransfers or Megatransfers has been around for a while, though not commonly used. See Gigatransfers at Wikipedia for a better explanation. One thing to be aware of the 1.0 and 2.0 standard loose speed due to the way data is encoded on the bus. Like the PCI bus, the 250 MB/Sec is a maximum you won’t see in the real world. You will loose about 20%. The 3.0 specification reduces that to around 1.5%. The most common sizing of PCIe slots is 1x, 4x, 8x, and 16x. It is also downwards compatible so a 1x, 4x, and 8x card will all work in a 16x slot. Just because a card is physically a 16x it may be a 8x or slower internally. That applies for the slot as well, it may be a 16x slot but only operate at 8x speeds.
PCI Express slots (from top to bottom: x4, x16, x1 and x16), compared to a traditional 32-bit PCI slot (bottom).
So, what does all this mean? If you have an older server, don’t use the PCI slots. Be careful with the PCI-X cards and placement. If you have PCIe you need to know if it is a 1.0 or 2.0 capable and what speed the physical connectors actually operate at.
Now that we have a basic understanding of how things connect to the computer we will cover disk controllers in the next installment!
I’ve been blogging for a few months. During that time I’ve received comments about my grammar, sentence construction and paragraph construction. I have taken this criticism to heart. I have only had two specific comments about my writing style so I have a point to start from. Like most of you, I do not write for a living. I do write every day of my life. I have had basic writing instruction during my high school an college days, but nothing during my time as a professional. The most training I’ve ever had communicating to others has been in the form of speech and theater. I have had some success as a speaker, and I feel very good about that end of my skill set.
So, I am attempting to correct this lack of education the only way I know how. I did some research and came up with three books to start my literary education.
The great thing about a blog is it isn’t as static as a printed book. As I learn I can make revisions to the articles. I can make revisions to correct the broken bits and make them easier to read.
If you have any recommendations leave them in the comments section and I’ll give it a read. I’ll also try to learn something in the process!
UPDATE:
A couple of friends that write also suggested
All three are on the way!
I’ve toyed with the CLR in SQL Sever 2005 off and on since the first Yukon beta had it enabled. And I’ll be honest with you, I was not a fan.It wasn’t like “YOU got chocolate in my peanut butter!” kind of moment for me. I really thought it was going to be a disaster of biblical proportions. As SQL Server DBA’s we caught a break, adoption wasn’t exactly stellar. The problem was there are enough restrictions and little gotchas to keep developers from whole sale abandoning Transact SQL for something more familiar. Fast forward a few years and now I’m not so scared.My biggest worry back then was memory usage. I’m still not very comfortable with it, but on a 64-bit platform you can mitigate those issues by adding more memory. On a 32-bit platform you could cause all kinds of damage by squeezing the lower 4GB memory space to the point you could have connection and backup failures due to lack of memory. Oh and the fix is usually restarting SQL Server. An example of this comes directly from http://msdn.microsoft.com/en-us/library/ms131075.aspx
Scalable Memory Usage
In order for managed garbage collection to perform and scale well in SQL Server, avoid large, single allocation. Allocations greater than 88 kilobytes (KB) in size will be placed on the Large Object Heap, which will cause garbage collection to perform and scale much worse than many smaller allocations. For example, if you need to allocate a large multi-dimensional array, it is better to allocate a jagged (scattered) array.
88KB!
This memory thing is serious.
The other biggie is what you can, or cannot do using the CLR.
Again from MSDN http://msdn.microsoft.com/en-us/library/ms131047.aspx
SAFE is the most reliable and secure mode with associated restrictions in terms of the allowed programming model. SAFE assemblies are given enough permission to run, perform computations, and have access to the local database. SAFE assemblies need to be verifiably type safe and are not allowed to call unmanaged code.
UNSAFE is for highly trusted code that can only be created by database administrators. This trusted code has no code access security restrictions, and it can call unmanaged (native) code.
EXTERNAL_ACCESS provides an intermediate security option, allowing code to access resources external to the database but still having the reliability guarantees of SAFE.
Most restrictive to least restrictive permissions. Something you don’t worry about in general as a C# programmer but in the database its always an issue in some way.
What it boils down to:
If you are just talking to SQL Server using basic C# stuff leave it in SAFE which is the default.
If you need access to the file system or the registry and some other limited stuff EXTERNAL_ACCESS is the way to go.
IF you want to have the ability to completely tank a production SQL Server UNSAFE puts it all into your hands. You can call unmanaged code via P/Invoke, all bets are off.
Some additional light reading on what libraries can and can’t be called in the CLR.
http://msdn.microsoft.com/en-us/library/ms403273.aspx
Fun stuff, no Finalizers or static fields, read-only static fields are ok though. You will see why this is important to me a little later on.
http://msdn.microsoft.com/en-us/library/ms403279.aspx
T-SQL vs. CLR
The other thing I had been promoting, and not always correctly, is putting complicated math functions in CLR. Generally, I’ve found that most math problems run faster in the CLR over native T-SQL. And I’ve found for the most part that holds true for the core algorithm. Once you add data retrieval into the mix things shift back in T-SQL’s favor for a lot of operations. Like everything else, test your ideas using real world scenarios or as close as you can before deciding on one technology over another. I prime example for me was coding up Pythagorean and Haversine equations for the classic distance between two zip codes in T-SQL and C# via CLR. Running test data through an array in the C# solution it ran rings around the T-SQL function I had coded up but once it had to start pulling and pushing data back to the database the T-SQL solution was the clear winner.
Another aspect where the CLR can be much better is string manipulation. I’ve written a couple of small UDF’s to handle some of this since using the LIKE ‘%’ would cause a table scan anyway the CLR UDF was faster internally when dealing with the string than T-SQL was using all the string handling functions.
I’m also seeing quite a bit on using the CLR for rolling aggregates and other kinds of aggregation problems. I don’t have any personal experience in that yet with the CLR.
There are also some things that aren’t practical at all using T-SQL, some would say you shouldn’t be using the database for some of this stuff in the first place but that is an argument for a different post.
I’ve recently started working on my most complex project using the CLR, some aspects have been covered by other folks like Adam Machanic, Robin Dewson and Jonathan Kehayias but there was some specific requirements that I needed.
Thus was born….
SQL Server File System Tools
This is a codeplex hosted project and all the source code is available there for your viewing pleasure.
I’ve done a lot of C# stuff but this was my first hard core CLR app for SQL Server.
What the assembly does is pretty simple, store files in the database ether native, encrypted or compressed.Yoel Martinez wrote up a nice UDF that does blob compression using the CLR. Between this and examples in Pro SQL Server 2005 on storing files in the database I knew I could do what I needed to do.
The wrinkle in my project was not just reading the file and storing it compressed it was putting it back on disk compressed as well. Enter #ziplib (SharpZipLib). This library allows you to pretty easily create standard zip files that even Windows Explorer can open and extract from. So with all the bits in place I set out to build my little tool.
The first thing I did was put together all the samples I’d found build them up as a set of stored procedures instead of UDF’s and just got the file in and out working. Next I added compression via C#’s DeflateStream to see what it would take to get the data flowing in and out and what the performance hit in memory and time would start looking like. At this point I was pretty optimistic I could knock this thing out in a day or two tops. That was all fine and dandy until I started integrating the #ziplib library. My initial goal was to have the assembly set to EXTERNAL_ACCESS since that was the most restrictive security model.
Since the guys that wrote #ziplib didn’t have the CLR in mind there are several things that break without UNSAFE set. As I mentioned earlier the use of finalizers and static fields were the two big ones. I will at some point recode those parts but for now they are still in place. The second thing is the library covers a lot more functionality that I actually need, So I’ve removed the bits I can without refactoring the library. The resulting DLL isn’t horribly big at this point but I figure when I get around to coding up the finalizers I’ll refactor down to what I need then. One big plus for me though is #ziplib is all managed code written in C# so it is pretty easily added directly into my DLL so I don’t have to register two assemblies or call down to the file system to a unmanaged DLL. Compression is handled by RijndaelManaged which is a built in .net 2.0 libraries.
The big downer for me was trying to debug the the code in Visual Studio 2008, when it did work it was ok but It would fail to connect or fail to register the assemblies so I just fell back to injecting debug messages and running tests manually in SSMS.
One thing I really like about programming languages like C# is method overloading, I really wished you could do that with stored procedures! Since I can’t there were only two options, a stored proc that had lots of flags and variables that may or may not be used and handle it all under the covers or just build each option into a proc with simple callers and a descriptive name. I voted for option two. Some of the T-SQL procedures are used internally by the CLR procedures while all the CLR procedures are called by the user.
Here is the list procedures and what they do.
Called by CLR procedures as helpers InsertFile Is called by every proc that inserts a file into the database. SavePassphrase Called by StorePassPhrase to handle insert into database. RetrievePassPhrase Called by any proc that has to decrypt a file stream RetrieveFile Called by any proc that retrieves a file from the database
Called by Users
T-SQL Procedures
RetrieveFileDetails Get details on a single file or every file stored in the database.
CLR Procedures
StorePassPhrase Give it a password and it generates a secure hash stored into the database for symmetric encryption
Below all store a file from the file system into the database. StoreFile StoreFileEncrypted StoreFileCompressed StoreFileEncryptedCompressed
Below all retrieve a file from the database back to the file system. ExtractFileToDisk ExtractFileToDiskEncrypted ExtractFileToDiskCompressed ExtractFileToDiskEncryptedCompressed
Below all retrieve a file from the database but returns a record set with the file name and the BLOB data. ExtractFileToRecord ExtractFileToRecordEncrypted ExtractFileToRecordCompressed ExtractFileToRecordEncryptedCompressed
And lastly, I put in an xp_getfiledetails clone since I wanted a way to verify the file is on disk and get attributes it seemed pretty straight forward since I’m getting the same details when i write the file to the database anyway.
This project isn’t done yet. there are a few more things to be added other than the code cleanup I mentioned already.
Off line decryption tool so the files dumped to disk still encrypted can be worked with.
Additional stored procedures for searching for files by tag or by attributes like name, size, etc.
A real installer and not a zip file with T-SQL scripts.
After that it goes into maintained mode with no new features but work on speeding it up, reducing the memory impact and fixing any bugs that are found. I really want to avoid this growing into a huge library, Keep it simple, do one thing and do it well.
Here are some things that helped me along the way.
Visual Studio 2008 could have used notepad but hey I’m getting lazy in my old age.
JetBrains ReSharper 4.5 If you are using Visual Studio ReSharper is a must. I feel like I’m programming the the stone age without it.
GhostDoc Free tool to help you document your C# code using XMLDoc. Yet something else I wished I could do with stored procedures
Doxygen If you are building documentation and have XMLDoc in your code this can make it easier to gather it all together. It isn’t perfect but it is free.
Both solid text Pro SQL Server 2005 has a chapter on CLR
This one is dedicated to just CLR and was also invaluable to me.
http://www.sqlclr.net Devoted to the CLR with some nice resources.
My Twitter buddies are always there to answer a question or two!
Until next time!
-Wes
I’ve been reading through this book and it really does have something for everyone at any level.
I think it is truly awesome that this many people have stepped up and donated not only their time but their knowledge to help http://www.WarChild.org!
I’ll have a more in-depth review later this week when I’ve read it cover to cover.
From Paul Nielson’s blog:
http://sqlblog.com/blogs/paul_nielsen/archive/2009/09/29/53-mvps-warchild-org-and-sqlservermvpdeepdives-com.aspx
GO GET IT! :)
-wes
Your servers are only as fast as the slowest part, hard drives.To feed other parts of the system we have to add lots of drives to get the desired IO single server can consume.
The basics of how hard drives work has been fundamentally static since the 70’s only refinements in technique and the core technologies have improved. You have a shaft or “spindle” attached to a motor. Disks or "platters” are attached to the spindle. The motor spins the spindle and the platters. Read/write heads controlled by actuator motors move across the surface with very precise motion and access the information stored on the platters. Generally, there is one read/write head per platter surface that is useable.
Simple.
This configuration has worked so well for the last 45 years that every claim to date that X new technology would unseat it just hasn’t happened. That’s not to say it won’t happen, just that hard drives have been “good enough” for the bulk of our storage needs for a very, very long time.
Since this is the core of our permanent storage in our database world it is important to have a basic understanding of them.
Description Six hard disk drives with cases opened showing platters and heads; 8, 5.25, 3.5, 2.5, 1.8, and 1 inch disk diameters are represented. Date 1 March 2008(2008-03-01) Author Paul R. Potts
http://commons.wikimedia.org/wiki/File:SixHardDriveFormFactors.jpg
I love this picture. Smaller and faster yet still the same.
To give you an idea of what you are up against lets compare the growth rate of your hard drive VS. your CPU.
Our 1981 machine has a the veteran Seagate ST-412 and a Intel 8088. Our new computer has a Seagate Cheetah 15k.6 ST3146356SS and a Core i7 965 from Intel of course.
At first glance we can say WOW what an improvement! Right up until you see how far the processors have come.Everyone is familiar with Moore’s law (Often quoted, rarely understood) loosely applied says CPU transistor counts double roughly every 18 to 24 months.Up until recently, hard drive capacity has been growing almost at the same rate doubling in size around every 18 months (Kryder’s Law).
Hard disks haven’t come close to keeping up with that pace, performance wise. Again, the problem isn’t size is speed.
“You cannot change the laws of physics” – Scotty
As I stated in the previous section hard drives have remained relatively unchanged since the IBM Winchester drive. Lets take a closer look at the physical structure.
So, we have a spindle one or more platters and one or more read/write head, all of that spinning and jittering about at a pretty good clip. So, just how does the computer know where your data is? The platter is broken up into a map of sorts.
Simplistic view:
The platter is broken up into concentric rings and pie slices that allow the drive controller to find the region where the data is.
The heads all move in unison and present a view through the platters that make up a cylinder. I won’t go into great detail on how we have advanced sector and track layouts and the advent of Logical Block Addressing there are plenty of articles on the web that get into those nuts and bolts. What I’m after is to show you physically what has to happen to read the data from the disk and why that is the limiting factor.
With the disk spinning at 15,000 RPM the sectors are flying by pretty quickly so the head has to be positioned above the sector and then read or write to it as the platter moves underneath it. The spinning disk, moving the heads and waiting for the data to be read all add up to latency.
Rotational latency is how long it takes the sector we are after to move under the head to be read or written to. Average rotational latency is expressed as half the time it takes for the platter to make one revolution.
For our 15k hard drive that number is 2 milliseconds, 60 seconds divided by 15000 RPM divided by 2.
Seek Time is how quickly the disk head can be positioned over a sector to start reading data.
There are to kinds of seek we are interested in, average random seek time and sequential or track-to-track seek times.
In our top of the line Seagate Cheetah our random read seek time is 3.4ms that is the time it takes to get from any one sector to any other sector, usually half the distance from the inner track to the outer most track. Random write seek time is 3.9ms. It is longer due to the process of actually effecting the sector its at before moving on to the next random sector.
sequential is much much faster. If the head only has to move to the next track it can usually do so in under a millisecond.
All this adds up to an average access time. basically, you take the rotational latency plus the average random seek time and any command processing time overhead I usually throw in an additional millisecond. Our Cheetah has a random access time of 6.4ms. Sometimes it may be much faster sometimes it may be much slower but this is a good number to work with as far as planning our storage needs.
The flip side of operations per second is throughput usually expressed in megabytes a second.
This is a direct correlation to the amount of data that can be squeezed into a sector. As drive densities go up so does the average megabytes per second. There is something you should know, the inner tracks are slower on throughput but higher on IO’s and the outer tracks are higher on through put and lower on IO’s. This is just a function of the diameter of the platter getting larger the farther out you go.
It isn’t unusual to see sequential throughput average of around 110 MB/sec and that is only getting better.
Random throughput is not so rosy a picture. I haven’t seen any drive manufactures advertise these numbers from my own testing it can be as little as 15MB/sec up to 40MB/sec. You should test your system to get more accurate numbers.
This boils down to how many I/O operations a single disk can give us. In SQL Server land random IO is king and generally one of the biggest bottlenecks on our data files.For log files, things are a little better. Since logs are written to sequentially you can effectively double the available I/O’s a drive can provide since you have cut our the random access and are much closer to the sequential or track-to-track access.
To calculate the maximum number of random operations we use 1000ms / (seek time[ms] + latency[ms]+overhead[ms])= input/output operations per second.
or
1000/(3.4+2+1) = 155 IOps
Sequential reads get much better since seek times go down from 3.4 to around 0.2.
1000(.7+2+1) = 270 IOps
Almost twice as much! Now you know why we keep our database log files separate from each other and from the data. The amount of disks needed to get the performance is about half. We do the same thing for writes and they will be a little less.
Hard drives suffer from what is known as the “hockey stick” effect the closer they get to 100% utilization the performance falls off dramatically.
Since running a disk at 100% capacity for IO’s introduces the maximum possible latency. The knee of the curve is around 80% we back that off a little more to 75% and that gives us the number of IO’s we have available per hard drive in the storage system in general. This reduces Queuing and keeps latency low, at the cost of maximum number of IO’s.
Now our available read IO’s is down to about 117 IOps for random access and 216 IOps for sequential.
This number will get better as seek times get better and the command overhead gets better. But remember it will never ever be better than the 2.0ms for the rotational latency.
Physics can be a real bummer sometimes.
Along with physical spindle speed there have been large improvements with how the drive handles incoming and outgoing request. Through IO Prioritization and advanced command queuing algorithms (Native Command Queuing on SAS/SATA) access times and latencies are kept predictable and as fast as possible.
Now that you are armed with the fundamentals of how a hard drive works we can move on to disk controllers and RAID.
Stay tuned!
At least once a year I give a large talk on disk subsystems, IO and SQL Server. It’s a ground up from the nuts and bolts of how a hard drive works through SAN’s and Solid State Disks.
The reasons I give this presentation so often is it is one of the most requested topics and one of the most misunderstood.
The problem often lies in the fact the DBA may not know that much about different storage systems but they do know that it is very important do their jobs.
With the rise of SAN, iSCSI and other storage solutions DBA’s have less and less control over the disk system that their SQL Server relies on. It’s my goal to give them, or you, the tools they need to effectively present their needs to the storage teams hopefully without a major amount of fuss and arguments. If you know how and why it works they way it works you can make logical requests in the language that your storage folks understand.
To that end, the presentation is meant to lay the foundation that can then be built upon and expand your knowledge off all things I/O.
This article series will be slightly expanded over what my presentation normally covers, since I’m only restricted by your willingness to read what I write. It will still be a condensed version of storage systems but I’ll put up as many reference links as I can.
This series will cover:
The basics of spinning disks. Disk Controllers. RAID. Storage Area Networks Network Attached Storage/iSCSI Solid State Disks. SQL Server and The File System. Understanding Mean Time to Failure and Other Failure Metrics. Tools and Techniques To Monitor SQL Server and I/O.
Some topics may be a single post some may span several I won’t know for sure until I get done writing them. As request come in I may try to post on specific questions, or at a minimum point you in the right direction.
Stay Tuned….
Hello all!
We will be having our normal meeting at the Microsoft technology center sponsored by:
http://www.cactuss.org/Supporters/tabid/59/Default.aspx
Microsoft http://www.cactuss.org/MeetingInformation/tabid/63/Default.aspx Stonebridge Plaza, Building One 9606 N. Mopac Expressway, Suite 200 Austin, TX 78759
5:30PM to 6:00PM – Q&A and DBA 101 presentation 6:10PM to 7:00PM – Feature presentation 7:00PM 7:30PM – Q&A follow up and discuss next meeting.
The feature presentation will be given by Bjorn Hovd, Systems Engineer, Idera over Auditing Considerations: Native SQL 2008 and Third Party Auditing Technologies. As usual we will have some goodies to give away and food for everyone!
See you there!
I’ve known Joe for a number of years and have a lot of respect for his experience and knowledge around relational database design and the SQL language.
Joe is a prolific writer and has been writing about technology since I was in grade school with articles going back into the early 80’s. Not to mention his ten years serving on the ANSI board for SQL standards.
I would put Joe’s books between the purely academic text on relational and set based theory and the more popular books out today that don’t always cover some of the dryer materials that newcomers to SQL may find to hard to digest at the start of their career.
If you have been working with SQL for a while and want to take the next step Joe’s books are generally the way to go.
Joe’s style is humorous at times and completely unflinching at others. When it comes to things that he thinks is the right way to model and develop using the SQL language, and that is against what popular or easy methods call for, he makes his feelings known on the subject.
I hope you enjoy these books as much as I have, over the years I have relied on Joe and others like him to build my own knowledge base and skills. I also have tried to share that knowledge like Joe has for so many years to others who want it.