SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

SQL Man of Mystery

Add to Technorati Favorites Add to Google
More Posts Next page »
All Posts

Out Of My Comfort Zone: Building a Web App

By Wesley Brown in SQL Man of Mystery 01-19-2010 4:12 PM | Categories: Filed under: , , , , , ,
Rating: (not yet rated) Rate this |  Discuss | 916 Reads | 916 Reads in Last 30 Days |2 comment(s)

If you read Fundamentals of Storage Systems – Stripe Size, Block Size, and IO Patterns you know I built a little web tool to help you with sizing and estimating your RAID array’s performance. This is way out of my area of expertise. Luckily for me I like a challenge and had a guiding hand from some friends. I haven’t done any web programming since I wrote a photo album mod for Snitz! forum package in 2002, using classic ASP. I still get thank you emails from folks that have been running it for years. Needless to say my skills are a little rusty. My first instinct was to fire up Visual Studio 2008 and build an ASP.net page tied to a back end SQL Server 2008 database. Well, ASP.net is just different enough that I was struggling to do the most basic things. So, I thought it would be good to “get back to basics”. A very good friend of mine has been a professional web developer for the same company since about the time my photo album came out. He lives and breathes web technology. The problem was he doesn’t do ASP.net at all. Everything he does is standards compliant HTML and JavaScript. I told him about my spread sheet of calculations and my desire to turn it into a web page. Joe quickly begged me to leave him alone. Once I calmed him down, he did a little sample page to get me on the right track. I took a look at it and thought it would be easy to goof around in this JavaScript stuff. I was wrong. What a convoluted world web developers live in.

I hacked on it the weekend before Christmas and was pretty happy with my handy work, right up until I showed it to some people. Comments like “Kill it with fire!” were common. I explained what it was and that took some of the hostility out of the feedback. The next Monday I asked the two lead web heads at work to look at it. Once Ben had washed his eyes out with bleach he told me under no certain terms could he fix it. My powers of persuasion, and a threat to never approve another schema change, helped bring him around. He explained several new technologies to me I wasn’t aware off, like CSS. Ben showed me quickly how to format the page so peoples heads wouldn’t burst into flames when they saw it. He also spoke of things like Ajax and JSON. Being the clueless data guy that I am I turned to my trusty friend yet again, Google.

Quick Definition List

JavaScript, the underpinning of the modern dynamic web 2.0 world.

Ajax (asynchronous JavaScript and XML, a group of interrelated web technologies that allow for dynamic web design.

JSON (JavaScript Notation), a lightweight data-interchange format, a way to structure data like XML implementing name/value pairs and ordered lists.

JQuery, a JavaScript library used to traverse HTML elements and build dynamic content easily.

JavaScript Implementation

What struck me was just how much had changed, yet stayed the same. One of the things I’ve always hated is no two browsers render the page the same way. If it worked in Firefox it didn’t work in IE. The other thing I didn’t realize for quite a while was any of the browsers would happily run the worlds worst code. I’m not talking poorly written code, I mean flat wrong. They will let any old JavaScript run. JavaScript is defined as a loosely typed language. The way it is implemented it goes from loosely to sloppy very quickly. Don’t get me wrong here. The language specification EMCAScript, is solid. The implementations are very poor in most browsers and that is where the problems developing for JavaScript really come in. I’m not saying JavaScript is slow, quite the opposite in fact. It is just difficult to make sure you are actually writing the correct code. Having worked with Visual Studio for quite some time, I’ve become spoiled with having my IDE tell me when the syntax is wrong before I hit the compile button. So, I was back to Google again looking for tools to help me out.

The Tools

I quickly settled on Aptana Studio 2.0 to do the actual code work. I also used Notepad ++, which is by far the best text editor in the world. with Aptana Studio allowing me to see errors in real time and use the preview function to look at page renders without having to have browser windows open. I did look at Microsoft’s new Expression Suite 3.0. It is nice. The designer is slick, but in typical Microsoft fashion it mangles some parts of the code that I then have to fix by hand. Aptana Studio also has a built in JavaScript minimizer that shrinks the size of your JavaScript files for quicker load times. Aptana Studio doesn’t enforce the use of line terminators and when I compacted my JavaScript it wouldn’t load. I did some additional digging and found an awesome Lint tool for validating JavaScript JSLint. My favorite thing about it is the quote and link on the front page “Warning: JSLint will hurt your feelings.” I knew I had found the right tool. After several passes I got my code to validate and minimize without any errors. I quickly looked for other Lint based tools and found one for JSON as well JSONLint.

The Work

Over the course of a few weeks I tinkered with the page, adding features, correcting mistakes and cleaning up the layout. The biggest thing that I learned is just how fast JavaScript can be. Even after I did the initial page I had no doubt a database would be involved at some point. I had compiled a small database of hard drives and their characteristics and wanted to add that to the page. The problem was I would basically have to redo the page again in asp.net or some other dynamic language to pull the data from the database. So, I cheated. As a proof of concept I built a JSON structure with a sample of the data to load the drop down list box and all the variables dynamically. I was stunned at just how fast the page was to respond. I hacked together some T-SQL that would generate the JSON object from my database, all 1200 entries. I cleaned it up a bit made sure that JSONLint validated it ran the minimizer on it and plugged it in. Holy cow, it worked! It was also still very vast. I just couldn’t believe it. The only drawback is having to update the JSON if I add new drives to the database.

 

What I’ve Learned

The main takeaways for me on this project are simple, web development stinks. Now I know why web folks have a hard time with T-SQL, they already have to be experts in five or six different technologies across two or more platforms. I use to introduce Joe as “My friend who does JavaScript”, now I know is is a hard core developer! What I build is simple and works. I think the layout is tolerable. If you have any suggestions or feed back just drop me a note.


Fundamentals of Storage Systems – Stripe Size, Block Size, and IO Patterns

By Wesley Brown in SQL Man of Mystery 01-04-2010 9:00 AM | Categories: Filed under: , , , , , , ,
Rating: |  Discuss | 1,523 Reads | 1091 Reads in Last 30 Days |1 comment(s)

If you have been following this series we have covered system buses, hard disks, host bus adapters and RAID. Along the way we also covered how to capture your IO patterns and the SQLIO tool. Now we will pull it all together.We move up the stack even further to the actual layout of the RAID stripe and the file system. How the stripe and file system are laid out on your disks has a huge impact on performance. One of the things that has really gotten some traction over the last few years is sector alignment. This one thing, if not done, could cost you 30% to 40% of your IO potential. Jimmy May has covered sector alignment in depth So I won’t hash it here again. Kendal Van Dyke also has a good series that covers offset, stripe size, and allocation units with different raid levels.

It Don’t Add Up…

Something I’ve seen, and been guilty of, is taking a drives base specifications and just multiplying out. Say the manufacturer says the drive will to 79MB/Sec minimum throughput, we have 10 drives so that is 790MB/Sec of throughput! We all know from experience that this isn’t so. What eats us up is how much slower it really can be. As we have seen throughout this series there is overhead associated to everything. Before we just throw a bunch of disks in an enclosure and press it into service it would be nice to have an idea of what the performance should be. It’s also recommended to do some of this work before you actually buy anything so you don’t have to go back to your boss and beg for more money and explain to him that your wild guess was wrong.

Always add a pinch of salt to whatever the disk manufacturer puts in the specifications. Most of the time they will be close enough. The problem lies in the fact they don’t always disclose the methods for archiving those numbers. For instance, when they report minimum and maximum throughput they are usually talking about a scan of the entire disk including all meta data stored between tracks, the best possible throughput possible. You won’t see those results in every day life. They also give you numbers that can be completely irrelevant like single sector read rates. very rarely do you read a single sector at a time. Personally, I would love if the drive makers gave the engineering specifications. I know that won’t happen, it would make my life easier though. The disk characteristics that are important are, sector size,spindle speed, seek times read and write, sequential times read and write. To a lesser extent sequential throughput in megabytes per second. With the single disk numbers we can move on to the RAID configuration.

Configuring your RAID Array

There are several factors that impact the RAID arrays ability to perform. The RAID level, size of the IO request, and stripe size. RAID level is the easy one, what kind of hits do you take on writes vs. capacity of the array. On the stripe size there is a direct corollary with the size of the IO request. If the IO request is bigger than the stripe size it will have to seek across another disk to satisfy the data request. If the IO request size is very small and random you may loose some IO performance if the requests pile up on one disk causing a hot spot. There are established calculations that you can perform to get an idea of how to configure you array. I’ve built a web page that you can use to do all the basic calculations, Disk Drive RAID Configuration Tool. These equations are base line estimates so you aren’t working completely in the dark. You can enter your own drive statistics or pick from one of 1100 hard drives in the database. This web calculator is based off of Peter Chen’s equations for estimating RAID performance and best stripe size. I’ll add more to it as I get time.

SQL Server IO Patterns and Array Performance

SQL Server works with two specific IO request size 8K and 64K in general. If you did your due diligence earlier you could also add any other request size that you saw come through. Focusing on the page size and extent size is a good place to start. Using the raid calculator tool I chose a Seagate Savvio 15K.2 drive as my base. One of the things my calculator can’t take into consideration is your system and RAID HBA. This is where testing is essential. You will find there are anomalies in every card, physical limits on throughput and IO’s. Since my RAID card won’t do a stripe bigger than 256k that is my cap for size. Reading through several IO white papers on SQL Server the general recommendation is for 2000/2005 a 64k or 128k stripe size and for SQL Server 2008 a 256k stripe size. I’ve found as general guidance, this is a good place to start as well. The calculator tells me for a RAID 10 array with 24 drives at a 256k stripe size and 8k IO request I should get 9825 IOs/Sec and 76.75 MB/Sec on average, across reads, writes, sequential and random IO requests. That’s right, 76 MB/Sec throughput for 24 drives rated at 122 MB/sec minimum. That is 2.5 MB/Sec per drive. The same array at a 64k IO request size yields 8102 IOs/Sec and 506 MB/Sec. A huge difference in throughput just based on the IO request size. Still, not anywhere near 122 MB/Sec. As an estimate, I find that these numbers are “good enough” to start sizing my arrays. If I needed to figure out how big the array needs to be to support say 150 MB/sec throughput or 10000 IOs/Sec you can do that with the calculator as well. Armed with our estimates it’s time to actually test our new RAID arrays. I use SQLIO to do synthetic benchmarking before running any actual data loads.

After doing a round of testing I found that in some cases the numbers were a little high or a little low. Other factors that are hard to calculate are cache hit ratios. Enterprise RAID HBA’s usually disable the write cache on the local disk controller and just use their own batter backed cache for all write operations. This is safer but with more and more disks on a single controller the amount of cache per disk can get pretty low. The HBA will also want you to split that between read and write operations. On my HP RAID HBA’s the default is 25% read and 75% write. In an older study I found on disk caches and cache size saw diminishing returns above 2 MB gaining between 1 and 2 percent additional cache hits per megabyte of cache. I expect that to flatten out even more as the caches get larger, you simply can’t get 100% cache ratios that would mean the whole drive fit in the ram cache or your IO request are the same over and over. Generally if that is the case you will find SQL Server won’t have to go to disk it will have what it needs in the buffer pool for reads. I find that if you have less than 20 percent write activity leaving the defaults is fine. If I do have a write heavy load I will set the cache to 100% writes.

The Results

Having completed my benchmarking I found that 128k or 256k stripe size was fine on average. Just realize that if you optimize for one IO pattern the others will suffer. Latency is also important and I have included it here as well. You find that the larger the IO request and the smaller the stripe size latency gets worse. Here are the results from my tests on a DL380 G5 with a P411 and 24 drives in a MSA 70 enclosure. I’ve included tests for an 8k to 256k stripe sizes.

As a footnote I’d like to thank Joe Handley, Ben Poliakoff, David Gosslin and Dale Davis for helping me get the Disk Drive RAID Configuration Tool together. I’m not a web guy!

WARNING! Lots of charts below!

 

Read 8K IO Request 24 73GB 15K Drives RAID 10 64K File System Cluster Size 1 Outstanding IO's 8 Threads
Random Sequential
image image
image image
image image

 

Write 8K IO Request 24 73GB 15K Drives RAID 10 64K File System Cluster Size 1 Outstanding IO's 8 Threads
Random Sequential
image image
image image
image image

 

Read 64K IO Request 24 73GB 15K Drives RAID 10 64K File System Cluster Size 1 Outstanding IO's 8 Threads
Random Sequential
image image
image image
image image

 

Write 64K IO Request 24 73GB 15K Drives RAID 10 64K File System Cluster Size 1 Outstanding IO's 8 Threads
Random Sequential
image image
image image
image image

A Year In Review

By Wesley Brown in SQL Man of Mystery 01-02-2010 11:43 PM | Categories: Filed under: ,
Rating: (not yet rated) Rate this |  Discuss | 1,052 Reads | 659 Reads in Last 30 Days |no comments

It’s been a busy year for me. Last year I quit traveling and took a 8 to 5 job so I could stay at home with my new baby boy Theo and my wife Elizabeth, who is now a stay at home mom. This was a pretty big shift for me, I’d been on the road most of the last five years. After I settled into my new life, I started taking stock of what I had given up to be on the road so much. What I found was my role in the SQL Server community had suffered quite a bit. My personal company was also a casualty. I hadn’t done any writing at all. I decided that 2009 would be a rebuilding year. I set several goals, some met with varying degrees of success.

  1. Spend quality time with my family.
  2. Write more.
  3. Get healthier.
  4. Reengage the user community
  5. Build up a new company with some friends.

Most of the goals required a skill I am weak in, time management. I have tried to follow the GTD methodology. My big issue is not having the right tool to keep me in sync easily with my available time. Because of that I get on and off that band wagon every few months. I have structured my day so that I get time with the family and still have some time to do the other things on my list. On writing more, that meant starting a blog and sticking to it. I think I have done OK on that front averaging 4 posts a month. I also wanted to write at least one published article but fell short of that goal. Focusing on quality blog posts has allowed me to hone my writing skills, so when I do write a published piece it is of the highest quality I can muster. My health has been a life long struggle. I got my blood pressure under control, a CPAP machine to help me sleep, and lost some weight. All of these things have made me feel years younger. I still need to add regular workout besides walking and keep on my new eating plan. I had let the Austin user group languish in 2008. I put up a new web site. Made sure we had a quality meeting every month. Got some sponsorships so I could boost the give-a-ways. I also hang out on twitter quite a bit and really try to help folks out. I have given up on posting up on news groups or forums, there are a lot of good folks able to answer all of the questions I could and in a timely manor. Also, the mob rules mentality of large forums is really soured me in some ways.  Most of you know I am a part of Nitrosphere and that is gaining some traction. It is easier to do a start up when you aren’t the only guy shouldering the load.

All in all a good year.

2010 Goals

For 2010 it is pretty much more of the same. Build on 2009 and push it forward.

  1. Family time.
  2. Write a article or two for SSC or other web outlet.
  3. Submit to speak at SQL Saturday events.
  4. Organize a SQL Saturday event in Austin.
  5. Speak at non SQL Server user groups.
  6. Release second product with Nitrosphere, NitroFS.
  7. Be more active in “Stackoverflow” style sites.

Longer list and more specific. I’m sure it will change. I measure my success by two rules. Did I bring value to the world? Did it make me happy to do so?

Try not to become a man of success, but rather try to become a man of value.  Albert Einstein


Fundamentals of Storage Systems – RAID and Hard Disk Reliability, Under the Covers

By Wesley Brown in SQL Man of Mystery 12-08-2009 9:00 AM | Categories: Filed under: , , , ,
Rating: (not yet rated) Rate this |  Discuss | 2,309 Reads | 783 Reads in Last 30 Days |4 comment(s)

In the last RAID article we covered the basics. This is a little deeper dive into the underlying mechanics of RAID. Exactly what it does, how it does it and what it doesn’t do that people assume it does. I sited David Patterson, Garth Gibson, and Randy Kats and their work at UC Berkley on RAID. They show something I’ve talked about before the “Pending I/O Crises”. Of course it isn’t pending anymore, its here. One of the concerns has to do with Amdah’s Law and speeding up execution with parallel operations. As processors and memory speed up hard disks are still an order of magnitude slower. Another aspect is Kryder’s Law, which like Moore’s Law, is a estimation of capacity growth of hard disks over time. Kryder’s Law is starting to slow down just as Moore’s law is. The problem with hard drives has never really been capacity, its speed. As areal density increases you do get an increase in data throughput, there is simply more data per square inch on the disk. You also get an improvement in I/O’s, tracks are closer together.  We haven’t broken past the 15k barrier yet. I’ve still got Seagate Cheetah 15k.3 drive from 2002. It has a max sequential throughput around 80 MB/sec. I doubt we will see spinning disks faster than 15k. This is a real problem for scaling I/O up. Enter RAID. It’s simple get a bunch of disks and then stripe data across them. One little problem creeps up.  Reliability goes down for each drive you add to the array. Using RAID 0 pretty much guarantees you will have an array failure. To overcome this We start adding some way to make the data more redundant.

Hard Disk Reliability

People make a lot of assumptions about hard drives and their reliability. Hard disks break down into two classes consumer grade, the drive you have in your desktop and enterprise, the kind usually in your servers. There are misconceptions around both. Recently, Google and others have written papers based on long term large batch sample failure rates and found the enterprise class drives don’t last any longer than consumer class. This study is perfectly valid from a physical reliability point of view. Most drives are manufactured the same way in the same plants. Not like the poor misunderstood lemming, hard disks do all jump off a cliff together. Studies have shown that there is a strong corollary to disk failure and a shared manufacturing batch. Simply put, if they are made around the same time if one has a failure there is a likelihood, around 30%, other drives in that batch will also suffer failures. So, what are we paying for with an enterprise drive besides speed? Data reliability. Enterprise level drives have more robust error correction than their consumer counterparts. On a normal hard drive the smallest piece of data that can be written is 512 bytes. This is the size of a sector. Enterprise drives usually have 520 byte sector 8 bytes are used to verify the data in that sector, this is the Data Integrity Field. DIF isn’t 100% ether. It is more reliable than a consumer drive without it. You can still have write corruption for several reasons. Misdirected writes occur when data is written to the wrong location on disk and reported as a successful write. When the system goes to access again you get a read fault. Torn pages, which we are familiar with, is when an 8k page write is requested but only part of the 8k is actually reported. Corruption outside the drive where the controller makes a bad request to write but it is a perfectly legitimate I/O request at the hard drive level. With larger drives the odds of hitting one of these errors becomes a real possibility. Enterprise drives add this extra layer of protection. Your RAID HBA may also have additional error correction. The last thing I would like to touch on is write catching. Without a battery backup, or if the cache non-volatile in nature, you will loose data on a power failure if a write is in progress.

RAID Host Bus Adapter Reliability

The adapter is as reliable as any other component in your system. Normally, the cache on the controller is ECC based. Also, you usually have the option of a battery module to supply the cache with power incase of an outage so the data in cache can be written to the array when everything comes back up. Most of the issues I have seen with RAID HBAs is almost always driver or firmware related. You may also see inconsistent performance due to write catching and the battery backup unit. The unit has to be taken off line and conditioned to keep it in top condition. The side effect is a temporary disabling of the write cache on the controller. You can override this setting on some controllers but it is dangerous proposition. I personal anecdote from my days at a large computer manufacturer, we started getting a larger volume of failed drive calls into support. We started doing failure analysis. It all pointed back to a particular batch of hard drives. That was when the drive manufacturer made a change in its drives removing very small component. It shaved a few cents off the cost but had a dramatic effect. All the drives were technically good and would pass validation. Under a enough load and attached to a particular RAID HBA they would randomly fall off line. It came down to the little component. It provided a little bit of electrical noise suppression on the SCSI bus. Some cards were effected and others chugged along just fine. This is also confirmed by the Google paper, they observed the same behavior. They also point out that 20% to 30% of all returned drives have no detectible problems. The point is validate your entire I/O stack. Any single component may be within specification but may not play well with others.

RAID Parity, Mirroring, and Recoverability

Not to belabor the point, RAID isn’t bullet proof. People rap RAID round themselves like Superman’s cape. There are several issues that all the RAID schemes in the world don’t protect against. With current hard disks in the two terabyte range it is possible to build even a small RAID 5 array and have potential for complete failure. The problem is the amount of data that has to be read for the rebuild process. Having a hot spare available reduces the time to replace a failed drive to zero but that is only part of the equation. The much larger part is rebuild time. Lets say you have a 14 drive RAID 5 array with the new two terabyte drives installed and suffer a failure. If you have no activity on the array and all the IO is detected to the rebuilt it could still take two or three days to rebuild the array. During that time you are effectively running on a RAID 0 array that is now under load. Your chance of total array failure is near 100%. RAID by its very nature assumes a failure is a hard failure. A drive goes off line and the redundant part of the system takes over. It also makes the assumption that if a write succeeds then, barring a hardware failure, the read will also be valid. Data is only validated on writes not on reads. If it was RAID 5 would be twice as slow on reads and four times as slow on writes as a single drive or RAID 0. With all the potential hidden write failures it is completely possible to have hidden corruption and not know it until it is way to late. RAID levels with striped parity are most susceptible to this kind of silent creeping corruption. It is possible that the corrupted data is in the parity stripe making it completely unusable for data reconstruction. If that particular piece of data doesn’t change you can go a very long time with a RAID 5 array with polluted parity. You know how to recover from a polluted parity stripe? Simple, copy all the data off the array, figure out which files are now corrupt and restore them. RAID 6 with its dual stripes makes it more likely to recover your data from a single parity stripe becoming corrupt. You do pay a price in write speed for that extra level of protection. RAID 1 and RAID 10 aren’t perfect ether. On a mirrored pair if the write is assumed good there is no way to validate that on read. Without a third piece of information, like a checksum, it would be a coin toss. If the read is successful there is no way to tell which drive has the bad data. It is possible to have a mirrored pair run just fine with one giving you corrupted data on reads all day long. It would manifest itself as file corruption or some other anomaly that could be difficult to track down. We are back to relying on the disk to tell us all is well. We often recommend RAID 10 over everything else for speed and reliability, and I still hold to that. RAID 10 can still suffer from a catastrophic failure due to a single mirrored pair failing at the same time. With the probability of correlated disk failures it can’t be ignored.

What Can We Do?

There are a few tools available to us that can help predict the failure of a drive or that something is wrong with the array. All modern drives support the SMART protocol. Even though Google found it wasn’t as useful and wasn’t 100% reliable, closer to 30%, some warning is better than none in my opinion. All modern RAID HBA’s also come with tools to detect parity errors. You do take a hit when you run these internal consistency checks. Just like you run maintenance on your databases via DBCC your RAID arrays need checkups too. They are a necessary evil if you don’t want any surprises one day when you have a failed drive in your RAID 5 array and can’t rebuild it. If you have intermittent problems with a drive, don’t mess around, replace it. The HBA almost always has the ability to send SNMP messages to something like nagios or HP Openview, Use it. If you aren’t running something like that usually you can configure email alerts on error to go out. Proactive is the name of the game.

Don’t take my word for it….

Short list of papers to get you started on your path to paranoia.

Google Disk Failure analysis

Original RAID Paper

NetApp disk failure analysis

CERN data corruption tests

Silent Data Corruption in SATA arrays


Fundamentals of Storage Systems – RAID, An Introduction

By Wesley Brown in SQL Man of Mystery 12-07-2009 8:30 AM | Categories: Filed under: , , , , , , , , ,
Rating: |  Discuss | 6,214 Reads | 1377 Reads in Last 30 Days |17 comment(s)

In previous articles, we have covered the system bus, host bus adapters, and disk drives. Now we will move up the food chain at take a look at getting several disks to operate as one.

In 1988 David A. Patterson, Garth Gibson, and Randy H. Katz authored a seminal paper, A Case for Redundant Arrays of Inexpensive Disks (RAID). The main concept was to use off the shelf commodity hardware to provide better performance and reliability and a much lower price point than the current generation of storage. Even in 1988, we already knew that CPUs and memory were outpacing disk drives. To try to solve these issues Dr. Patterson and his team laid out the fundamentals of our modern RAID structures almost completely RAID levels 1 through 5 all directly come from this paper. There have been improvements in the error checking but the principals are the same. In 1993, Dr. Patterson along with his team released a paper covering RAID 6.

 

RAID Level

Description

Disk Requirements

Usable Disks

Diagram

RAID 0

RAID 0 is striping without parity. Technically, not a Redundant array of disks just an array of disks but lumped in since it uses some of the same technical aspects. Other hybrid raid solutions utilize RAID 0 to join other RAID arrays together. Each disk in the array holds data and no parity information. Without having to calculate parity, there are no penalties on reads or writes. This is the fastest of all the RAID configurations. It is also the most dangerous. One drive failure means you lose all your data. I don’t recommend using RAID 0 unless you are 100% sure losing all your data is completely OK.

2

N

 325px-RAID_0.svg

RAID 1

RAID 1 is mirroring two disks. RAID 1 writes and reads to both disks simultaneously. You can lose one disk and still operate. Some controllers allow you to read data from both disks; others return only data from the disk that delivers it first. Since there are no parity calculations, it is generally the easiest RAID level to implement. Duplexing is another form of RAID 1 where each disk has its own controller.

2

N/2

 325px-RAID_1.svg

RAID 5

RAID 5 is a striped array with distributed parity. This is similar to RAID 0 in that all data is striped across all available disks. Where it differs is one stripe holds parity information. If a drive fails, the data contained on that drive is recreated on the fly using the parity data from the other drives. More than one disk failure equals total data loss. The more drives you have in a RAID 5 array the greater the risk of having a second disk failure during the rebuild process from the first disk failure. The general recommendation at this time is 8 drives or less. In general, the larger the drive the fewer of them you should have in a RAID 5 configuration due to the rebuild time and the likely hood of a second drive failure.

3

N-1

 675px-RAID_5.svg

RAID 6

RAID 6 is a striped array with dual distributed parity. Like RAID 5 it is a distributed block system with two parity stripes instead of one. This allows you to sustain a loss of two drives dramatically reducing the risk of a total stripe failure during a rebuild operation. Also known as, P+Q redundancy using Reed-Solomon isn’t practical to implement in software due to the math intensive calculations that have to take place to write parity data to two different stripes. The current recommendation is to use 8 drives or more.

4

N-2

 800px-RAID_6.svg

RAID 10

RAID 10 is a hybrid or nested striping scheme combining RAID 1 mirrors with a RAID 0 stripe. This is for high performing and fault tolerant systems. Like RAID 1, you lose half your available space. You could lose N/2 drives and still have a functioning array. Duplexing each mirror between two drive chassis is common. You could lose a drive chassis and still function. The absence of parity means write speeds are high. Along with excellent redundancy, this is probably the best option for speed and redundancy.

4

N/2

 180px-RAID_10

RAID 0+1

RAID 0 + 1 is not interchangeable with RAID 10. There is one huge difference and that is reliability. You can lose only one drive and have a functioning array. With the more drives in a single RAID 0 stripe the greater the chance you take. Speed characteristics are identical to RAID 10. I have never implemented RAID 0 + 1 when RAID 10 was available.

4

N/2

 180px-RAID_0 1

RAID 50

Since RAID 5 becomes more susceptible to failure with more drives in the array keeping the RAID 5 stripe small, usually under 8 drives and then striping them with RAID 0 increases the reliability while allowing you to expand capacity. You will lose a drive per RAID 5 stripe but that is a lot less than loosing half of them in a RAID 10. Before RAID 6, this was used to get higher reliability in very large arrays of disks.

6

(N-1)*R

 320px-RAID_50

RAID 60

RAID 60 is the exact same concept as RAID 50. Generally, a RAID 6 array is much less susceptible to an array failure during a rebuild of a failed drive due to the nature of the dual striping that it uses. It still is not bullet proof though the RAID 6 array sizes can be much larger before hitting the probability of a dual drive failure and then a failure during rebuild than RAID 5. I do not see many RAID 60 configurations outside of SAN internal striping schemes. You do lose twice as many drives worth of capacity as you do in a RAID 50 array.

8

(N-2)*R

 400px-RAID_60

RAID 100

RAID 100 is RAID 10 with and additional RAID 0 stripe. Bridging multiple drive enclosures is the most common use of RAID 10. It also reduces the number of logical drives you have to maintain at the OS level.

8

N/2

 320px-RAID_100

Speed, Fault Tolerance, or Capacity?

You can’t have your cake and eat it too. In the past, it was hard to justify the cost of RAID 10 unless you really needed speed and fault tolerance. RAID 5 was the default because in most situations it was good enough. Offering near raid 0 read speeds. If you had a heavy write workload, you took a penalty due to the parity stripe. RAID 6 suffers from this even more so with two parity stripes to deal with. Today, with the cost of drives coming down and the capacity going up RAID 10 should be the default configuration for everything.

Here is a breakdown of how each RAID level handles reads and writes in order of performance.

 

RAID Level

Write Operations

Notes

Read Operations

Notes

RAID 0

1 operation

High throughput, low CPU utilization.
No data protection

1 operation

High throughput, low CPU utilization.

RAID 1

2 IOP’s

Only as fast as a single drive.

1 IOP

Two read schemes available. Read data from both drives, or data from the drive that returns it first. One is higher throughput the other is faster seek times.

RAID 5

4 IOP’s

Read-Modify-Write requires two reads and two writes per write request. Lower throughput higher CPU if the HBA doesn’t have a dedicated IO processor.

1 IOP

High throughput low CPU utilization normally, in a failed state performance falls dramatically due to parity calculation and any rebuild operations that are going on.

RAID 6

6 IOP’s

Read-Modify-Write requires three reads and three writes per write request. Do not use a software implementation if it is available.

1 IOP

High throughput low CPU utilization normally, in a failed state performance falls dramatically due to parity calculation and any rebuild operations that are going on.

 

Choosing your RAID level

This is not as easy as it should be. Between budgets, different storage types, and your requirements, any of the RAID levels could meet your needs. Let us work of off some base assumptions. Reliability is necessary, that rules out RAID 0 and probably RAID 0+1. Is the workload read or write intensive? A good rule of thumb is more than 10% reads go RAID 10. In addition, if write latency is a factor RAID 10 is the best choice. For read workloads, RAID 5 or RAID 6 will probably meet your needs just fine. One of the other things to take into consideration if you need lots of space RAID 5 or RAID 6 may meet your IO needs just through sheer number of disks. Take the number of disks divide by 4 for RAID 5 or 6 for RAID 6 then do your per disk IO calculations you may find that they do meet your IO requirements.

Separate IO types!

The type of IO, random or sequential, greatly affects your throughput. SQL Server has some fairly well documented IO information. One of the big ones folks overlook is keeping their log separate from their data files. I am not talking about all logs on one drive and all data on another, which buys you nothing. If you are going to do that you might as well put them all on one large volume and use every disk available. You are guaranteeing that all IO’s will be random. If you want to avoid this, you must separate your log files from data files AND each other! If the log file of a busy database is sharing with other log files, you reduce its IO throughput 3 fold and its data through put 10 to 20 fold.

 

RAID Reliability and Failures


Correlated Disk Failures


Disks from the same batch can suffer similar fate. Correlated disk failures can be due to a manufacturing defect that can affect a large number of drives. It can be very difficult to get a vendor to give you disks from different batches. Your best bet is to hedge against that and plan to structure your RAID arrays accordingly.

Error rates and Mean Time Between Failures

As hard disks get larger the chance for an uncorrectable and undetected read or write failure. On a desktop drive, that rate is 10^14 bits read there will be an unrecoverable error. A good example is an array with the latest two-terabyte SATA drives would hit this error on just one full pass of a 6 drive RAID 5 array. When this happens, it will trigger a rebuild event. The probability of hitting another failure during the rebuild is extremely high. Bianca Schroeder and Garth A. Gibson of Carnegie Mellon University have written an excellent paper on the subject. Read it, it will keep you up at night worrying about your current arrays. Enterprise class drives are supposed to protect against this. No study so far proves that out. That does not mean I am swapping out my SAS for SATA. Performance is still king. They do boast a much better error rate 10^16 or 100 times better. Is this number accurate or not is another question all together. Google also did a study on disk failure rates, Failure Trends in a Large Disk Drive Population. Google also found correlated disk failures among other things. This is necessary read as well. Eventually, RAID 5 just will not be an option, and RAID 6 will be where RAID 5 is today.

What RAID Does Not Do

RAID Doesn’t back your data up. You heard me. It is not a replacement for a real backup system. Write errors do occur.As database people we are aware of atomic operations, the concept of an all or nothing operation, and recovering from a failed transaction. People assume the file system and disk is also atomic, it isn’t. NTFS does have a transaction system now TxF I doubt SQL Server is using it. Disk drives limit data transfer guarantees to the sector size of the disk, 512 bytes. If you have the write cache enabled and suffer a power failure, it is possible to write part of the 8k block. If this happens, SQL Server will read new and old data from that page, which is now in an inconsistent state. This is not a disk failure. It wrote every 512-byte block it could successfully. When the disk drive comes back on line, the data on the disk is not corrupted at the sector level at all. If you have turned off torn page detection or page checksum because you believe it is a huge performance hit, turn it back on. Add more disks if you need the extra performance don’t put your data at risk.

Final Thoughts

  1. Data files tend to be random reads and writes.
  2. Log files have zero random reads and writes normally.
  3. More than one active log on a drive equals random reads and writes.
  4. Use Raid 1 for logs or RAID 10 if you need the space.
  5. Use RAID 5 or RAID 6 for data files if capacity and read performance are more important than write speed.
  6. The more disks you add to an array the greater chance you have for data loss.
  7. Raid 5 offers very good reliability at small scale. Rule of thumb, more than 8 drives in a RAID 5 could be disastrous.
  8. Raid 6 offers very good reliability at large scales. Rule of thumb, less than 9 drives you should consider RAID 5 instead.
  9. Raid 10 offers excellent reliability at any scale but is susceptible to correlated disk failures.
  10. The larger the disk drive capacity should adjust your number of disks down per array.
  11. Turn on torn page for 2000 and checksum for 2005/08.
  12. Restore Backups regularly,
  13. RAID isn’t a backup solution.

Up next, A deeper look into IO types, stripe size, disk alignment, and file system cluster sizes.


Fundamentals of Storage Systems – Disk Controllers, Host Bus Adapters, and Interfaces

By Wesley Brown in SQL Man of Mystery 12-03-2009 9:00 AM | Categories: Filed under: , , , , , , , , ,
Rating: |  Discuss | 3,786 Reads | 1048 Reads in Last 30 Days |2 comment(s)

We have covered the Hard Disk and the System Bus. This time around we will cover disk controllers and host bus adapters.

In The Beginning…

There were three distinct components to your IO subsystem, the disk, controller, and the host bus adapter. Today there are still three distinct components but the arrangement has changed. The physical disk we have covered and you know about. What you may not realize is the disk controller is actually the circuit board on the back of the hard drive. In the past this board may have been an add-in card, a back plane that the drives plugged into or even an add-in card with the hard disk mounted on it! It took a little time for the configuration we take for granted today to settle out. Once the form factor for a hard drive and the controller was done there was still the issue of what a host bus adaptor was suppose to do. Some of you may remember the old days of MFM, RLL, and proprietary disk layouts. Having to do a low level format, setting the interleave, even having to park the drive when you were done with the computer. Those days are long gone. Now, low level formatting is done at the factory, there is no need for interleaving, and all drives auto-park. Whoa, what a time warp. All of these things were eliminated mostly due to the advancement in disk controllers and host bus adapters.

The Disk Controller

The card that slots into your system and is connected via cable to your hard drive isn’t the disk controller. The disk controller resides on the hard drive and handles all the low level operations. From spinning the disk, moving the heads and transferring the data the disk controller does most of the heaving lifting. Once the data has been read it finally makes its way down the wire to the host bus adapter.

DiskController

Interface Protocols

There have been several data encoding and signaling schemes over the years. We have touched on MFM and RLL as the first wide spread standards used early on. The two standards that have stood the test of time are IDE/ATA and SCSI.  These standards can be implemented on top of other protocols like IP and Fibre Channel both network protocols.There are ATA implementations on FC and IP but nether are as popular as SCSI. Fibre Channel is pretty much the domain of Storage Area Networks(SAN) which we will cover in a future article.

A breakdown of speeds.

Bus Type

Speed MB/Sec

ATA/133

133

SATA 150

150

SATA/SAS 300

300

SATA/SAS 600

600

SCSI U160

160

SCSI U320

320

 

Alternate SCSI/ATA implementations

Fibre Channel 1GFC

106

Fibre Channel 2GFC

212

Fibre Channel 4GFC

425

Fibre Channel 8GFC

850

iSCSI Gigabit Ethernet

125

iSCSI 10 Gigabit Ethernet

1250

 

A modern spinning disks would have a hard time using even ATA/133’s available bandwidth all by itself. The older parallel ATA (PATA) and SCSI standards are giving way to their newer serial counterparts SATA and SAS. The previous generation had several marked differences between them. ATA could only have two drives per channel while SCSI could have up to 15. ATA was unidirectional, only able to read or write, to the drive while SCSI was bidirectional. This has carried over to the new standards as well.

If you have a SAS HBA it will accept both SAS and SATA drives. Another great feature is the reliability of the connectors. Both ATA and SCSI relied on large ribbon cables and in the case of SCSI termination to the cable chain. I have been kept up at night troubleshooting faulty SCSI cabling running down the chain to try and figure out which drive was causing the problem or if it was a termination issue. The new cables are much smaller and are all point to point, no daisy chaining or termination issues to worry about. The last boon added was the idea of using expanders in the case of SAS or port multipliers for SATA only arrays. The old SCSI standard with 15 drives in a single chain was limiting. You also had the issue that 15 drives could easily saturate a single U320 channel. The biggest SCSI RAID HBA’s usually shipped with 4 channels. In contrast, the new SAS HBA’s may have 4 times that amount. With the SAS expanders you can aggregate SAS channels and have more drives in a single chain. With the SAS 300 standard you could have 4 drives saturate a single channel. With a single 4 drive expander you could have 4 drives on that single channel making the most use of the available bandwidth. You can also have up to 128 drives on a edge expander and up to an astounding 16,384 SAS devices in a single SAS domain. This gives you a lot of flexibility when it comes to configuring your storage and utilizing the bandwidth available.

As you plan your configuration you must be mindful of how many channels you have, what kind of bus the HBA uses and how much bandwidth is available through the entire stack. For example, If you have a PCIe RAID controller with 28 ports that is a theoretical throughput of 8.4 gigabytes a second of available bandwidth via the SAS 300 protocol. The drive may be able to deliver 80 megabytes a second if you only use one drive per port and no expanders that is 2.2 gigabytes a second. If the HBA isn’t plugged into a PCIe 2.0 x8 slot or PCIe 1.0 x16 slot you aren’t going to get that 2.2 GB/Sec of throughput. You should still get the IO’s available but sustained throughput will be limited. Just because an HBA says it can support 108 drives doesn’t mean you will get all the throughput of those drives. You may have an HBA that only supports PCIe 1.0 and only has 4 lanes for a total of 1GB/sec of throughput to the system. Again, you get the IO increase and for SQL Server sometimes that is exactly what you are after.

Host Bus Adapter

This is what most people think of as the disk controller or controller card. In its simplest form it transfers data to and from the system board to the hard disk controller. Of course there are other things that can happen on the HBA. It can have intergraded RAID functions, additional caching, or other things that are not appropriate to do at the disk controller level. There are several types of HBA’s from the ones built into your computers motherboard to high end SAS RAID controllers.

Cache, Disk Controllers, and HBA’s

Almost all enterprise class HBA’s usually have caching as an option or built into the card. This is a particular interest to us and SQL Server. Your data will be safe, SQL Server guarantees this over all else. In my post on capturing IO patterns I discuss why and how SQL Server does this and the concept of stable media. SQL Server assumes that it is talking to a single physical disk and opens the data files in such a way that write caching isn’t used even if it is available. SAS and SCSI drives honor this request normally. But, one of the options more advanced HBA’s offer you are the ability to use the cache and still have stable media. This is usually accomplished through a battery backup unit mounted on the card that keeps the cache memory active during a system failure. Some controllers will gladly let you shoot yourself in the foot by letting you turn the write cache on without a battery and also enable the local write cache on the disk drive as well. In this situation if you have a sudden power failure, data loss is going to happen if there are any writes at that time. Currently, there isn’t a disk drive on the market with a battery backed cache that I know of. There is a new possibility of using fast NAND flash instead of DRAM to act as the cache on drives and HBA’s. Since NAND is non-volatile it doesn’t need to have constant power. To make up for the slower speed of NAND over DRAM caches, they are making them two or more times the size.

 

Just in case you haven’t had a chance to peek into your servers here is an assortment of HBA’s from yesterday and today.

 

 800px-KL_Tandon_HDD_MFM 761px-ATA_Controller_Board 800px-Adaptec
An MFM add-on card with disk drive, disk controller and host bus adapter. A PCI ATA/133 IDE host bus adapter A PCI SCSI host bus adapter
800px-QLA_2200F 800px-Adaptec_RAID_52445_0 lsi_sas_8480_e
A PCI-X Fibre Channel host bus adapter PCIe SAS host bus adapter with 28 ports available. A PCIe SAS host bus adapter with 2 4x external ports. Notice the memory module?

Until Next Time

I hope you know a little more about HBA’s now and have a better understanding what they are and what they do. Up next, RAID!


Fundamentals of Storage Systems – Testing IO Systems

By Wesley Brown in SQL Man of Mystery 12-02-2009 8:30 AM | Categories: Filed under: , , , , , , ,
Rating: (not yet rated) Rate this |  Discuss | 2,405 Reads | 845 Reads in Last 30 Days |2 comment(s)

12/03/2009 - UPDATE! There were a couple of bugs in the SQLIOCommandGenerator new SQLIOTools.zip has been updated.

------------------------------------

I often tell people one of the greatest things about SQL Server is that anyone can install it. I also tell people what the worst things about SQL Server is that anyone can install it. Microsoft fostered a "black-box" approach to SQL Server in 7.0 and 2000. Thankfully, they are reversing this course. As a follow-on to my last article, capturing I/O patterns, we will take a quick look at building some synthetic tests based on those results. There are several tools on the market test I/O systems, some of them free some of the not. SQLIO has been around for several years. There are lots of good articles already on the web describing various uses for this tool.SQLIO was specifically designed to test the limits of your I/O system at different workloads. The problem is people tend to run this tool, will look at the best results, and assume that they will see the same results when the server goes live. But, without understanding your current workloads that is an unreasonable expectation at best. What ends up happening, is a misconfigured I/O system, lots of headaches, with no idea why the system performs so poorly.

I always advocate testing new systems before they go into production. I also understand that it always isn't an option. Having found myself in that exact situation recently, I've decided to take my own advice and pull the new storage off-line to do the proper testing. I'm also taking this opportunity to refine my testing methodology and gather as many data points before the system goes live.

The Test Scripts

With my IO patterns in hand I set out to build a couple of little tools to help me generate all the test scripts and manage the data. As usual, I built these as command line tools since I have no skill at all with GUI’s. It is all in C# and I will be posting them up to Codeplex. You can download the tools here SQLIOTools.zip, this zip has the two tools, they are beta and don’t have a ton of error checking built into them yet. The first tool, SQLIOCommandGenerator does just that, generates the batch file that has all the commands. I does depend on the SQLIO.exe being in the same directory as well as having already defined a parameter file for it to use.

params.txt

X:\SQLIO_testfile0.dat 8 0x0 150240

The first parameter is the test file name that SQLIO will create on start up or use if it already exists. Second is the number of threads that will access that file. Third is the affinity mask. Fourth is the file size in megabytes. Make sure and size the file large enough to be representative of a real database you would be housing on the system. If it is too small it will simply fit in the RAID controllers cache and give you inflated results. I also tend to use one thread per physical CPU core. Be careful though, if you are using a lot of files, having too many threads can cause SQLIO to run out of memory.

Calling SQLIOCommandGenerator:

SQLIOCommandGenerator 0.10 

We assume -F<paramfile> -LS
-d,-R,-f,-p,-a,-i,-m,-u,-S,-v, -t not implemented

Usage: SQLIOCommandGenerator [OPTIONS]
Generates the command line syntax for the SQLIO.exe program output into a batch file.

Options:
-f, --iopattern[=VALUE] Random, Sequential or Both
-k, --iotype[=VALUE] Read,Write or Both
-s, --seconds[=VALUE] Number of seconds to run each test 1(60) to
10(600) minutes is normal
-c, --cooldown[=VALUE] Number of seconds pause between tests suggested
minimum is 5 seconds.
--os, --outstandingiostart[=VALUE]
Starting number of outstanding IOs 1
--oi, --outstandingioincrament[=VALUE]
Multiply Outstanding IO start by X i.e 2
--oe, --outstandingioend[=VALUE]
Ending Number of outstanding IOs i.e. 64
--ol, --outstandingiolist[=VALUE]
Specific Outstanding IO List i.e. 1,2,4,8,16,32,64,128,256,512,1024
--oss, --iosizestart[=VALUE]
Starting Size of the IO request in kilobytes i.-
e. 1
--osi, --iosizeincrament[=VALUE]
Multiply IO size by X in kilobytes i.e. 2
--ose, --iosizeend[=VALUE]
Ending number of outstanding IOs in kilobytes -
i.e. 1024
--osl, --iosizeList[=VALUE]
Specific IO Sizes in kilobytes i.e. 1,2,4,8,16,32,64,128,256,512,1024
-b, --buffering[=VALUE] Set the type of buffering None, All, Hardware,
Software. None is the default for SQL Server
--bat, --sqliobatchfilename[=VALUE]
The name of the output batch file that will be
created
-?, -h, --help show this message and exit

So I passed it this command:

SQLIOCommandGenerator.exe -k=Both -s=600 -c=5 --os=1 --oi=2 --oe=256 --oss=1 --osi=2 --ose=1024 -b=all --bat=c:\wes_sqlio_bat.txt -f=both

That generates this sample:

:: Generated by SQLIOCommandGenerator 
:: This relies on SQLIO.exe being in the same directory.

:: c:\wes_sqlio_bat.txt c:\paramfile.txt c:\outputfile.csv "description of the tests"
:: param1 sqlio parameter file, param2 output of each test to single csv file, param3 test description

SET paramfile=%1
SET outfile=%2
SET runtime=600
SET cooloff=5
SET desc=%3
@ECHO OFF

ECHO ComputerName: %COMPUTERNAME% > %OUTFILE%
ECHO Date: %DATE% %TIME% >> %OUTFILE%
ECHO Runtime: %RUNTIME% >> %OUTFILE%
ECHO Cool Off: %COOLOFF% >> %OUTFILE%
ECHO Parameters File: %PARAMFILE% >> %OUTFILE%
ECHO Description: %DESC% >> %OUTFILE%
ECHO Test Start >> %OUTFILE%

ECHO Command Line: sqlio -kW -s%RUNTIME% -frandom -b1 -o1 -LS -BY -F%PARAMFILE% >> %OUTFILE%
sqlio -kW -s%RUNTIME% -frandom -b1 -o1 -LS -BY -F%PARAMFILE% >> %OUTFILE%
timeout /T %COOLOFF%

ECHO End Date: %DATE% %TIME% >> %OUTFILE%
:: This batch will take approximately 264.0014 Hours to Execute.

The batch file has the instructions for calling it and what parameters you can pass into it. You can omit seconds and cooldown if you want to generate a more generic batch file. This tool is flexible enough for my needs. I can generate specific targeted tests when I have data back that up, or I can generate more general tests to feel out the performance edges.

You may have noticed the estimate run time, that is pretty accurate. This is a worst case scenario where you have chosen pretty much every possible test to run. I wouldn’t recommend this. With the data we have already we can narrow down our testing to just a few IO sizes and queue depths to keep the test well within reason.

SQLIOCommandGenerator.exe -k=Both -s=600 -c=5 --ol=2 --osl=8,64 -b=None --bat=c:\wes_sqlio_bat.txt -f=both

This batch will take approximately 80.08334 Minutes to Execute.

Much better! by focusing on our IO targets we now have a test that is meaningful and repeatable. Why would you want to repeat this test over and over? Simple, not all RAID controllers are created equal. You may need to adjust several options before you hit the optimal configuration.

Running The Tests

Now that I have my tests defined I need to start running them and gathering information. There are some constants I always stay with. One, use diskpart.exe to sector align your disks. Two, format NTFS with a 64k block size. Since I”m doing these tests over and over I wrote a little batch file for that too. Diskpart can take a command file to do its work. Once the RAID controller is in I create an array and look what disk number is assigned to it. As long as you don’t make multiple arrays you will always get the same disk number. After that I format the volume accordingly. WARNING, I do use the /Y so the format happens without prompting for permission!

diskpart.txt

select disk 2 
create partition primary align = 64
assign letter = X

testvol.bat

diskpart /S z:\diskpart.txt 
format x: /q /FS:NTFS /V:TEMP /A:64K /Y

I I also use the RAID controllers command line interface if it has one to make it easier to construct the tests and just let them run using a batch file as a control file. If that isn’t possible don’t worry, the bulk of your time will be waiting for the test to complete anyway.

Gathering The Data

As you have guessed, I have a tool to parse the output of the tests and import them into SQL Server or export it as a CSV file for easy access in Excel. SQLIOParser is also pretty simple to use.

SQLIOParser 0.20 
Usage: SQLIOParser [OPTIONS]
Process output of the SQLIO.exe program piped to a text file.

Options:
-c, --computername[=VALUE] The comptuer name that the test was executed on.
-s, --sqlserver[=VALUE] The SQL Server you want to import the data into.
-u, --sqluser[=VALUE] If using SQL Server authentication specify a user
-p, --sqlpass[=VALUE] If using SQL Server authentication specify a
password
-t, --tablename[=VALUE] The table you want to import the data into.
-d, --databasename[=VALUE] The database you want to import the data into.
-f, --sqliofilename[=VALUE]
The file name you want to import the data from.
-a, --sqliofiledirectory[=VALUE]
The directory containing the files you want to
import the data from.
-o, --csvoutputfilename[=VALUE]
The file name you want to export the data to.
-?, -h, --help show this message and exit

It will work with a single file or import a set of files in a single directory. If you are importing to SQL Server you need to have the table already created.

CREATE TABLE [dbo].[SQLIOResults](
[ComputerName] [varchar](255) NULL,
[TestDescription] [varchar](255) NULL,
[SQLIOCommandLine] [varchar](255) NULL,
[SQLIOFileName] [varchar](255) NULL,
[ParameterFile] [varchar](255) NULL,
[TestDate] [datetime] NULL,
[RunTime] [int] NULL,
[CoolOff] [int] NULL,
[NumberOfFiles] [int] NULL,
[FileSize] [int] NULL,
[NumberOfThreads] [int] NULL,
[IOOperation] [varchar](255) NULL,
[IOSize] [varchar](255) NULL,
[IOOutstanding] [int] NULL,
[IOType] [varchar](255) NULL,
[IOSec] [decimal](18, 2) NULL,
[MBSec] [decimal](18, 2) NULL,
[MinLatency] [int] NULL,
[AvgLatency] [int] NULL,
[MaxLatency] [int] NULL
) ON [PRIMARY]

This is the same structure the CSV is in as well.

Analyzing The Results

I will warn you that the results you get will not match your performance 100% once the server is in production. This shows you the potential of the system. If you have horrible queries hitting your SQL Server those queries are still just as bad as before. Generally, I ignore max latency and min latency focusing on the average. That is what I am most worried about as the IO load changes or queue depth increases how will the system respond. Remember raw megabytes a second isn’t always king. Number of IO’s at a given IO block size is also very important. I will go into great detail in the next article as I walk you through analyzing the results from my own system so stay tuned for that.

Final Thoughts

These tests aren’t the end of your road. I still advocate playing back traces and seeing how the system responds with your exact workload whenever possible. If you can’t do that then using tools like SQLIO is better than nothing at all. We are also working under the assumption that we are upgrading or replacing an existing production server. If that isn’t the case and this is a brand new deployment using SQLIO will help you know what your I/O system is capable of before you have a problem with bad queries or other issues that always crop up on new systems.

You can always to more testing. It is almost a never ending process, my goal isn’t to give you the end solution just to give you another tool to pull out when you need it. As always, I look forward to your feedback!


OT: Maximizing My Time, Using Technology

By Wesley Brown in SQL Man of Mystery 11-30-2009 8:00 AM | Categories: Filed under: ,
Rating: (not yet rated) Rate this |  Discuss | 1,913 Reads | 680 Reads in Last 30 Days |3 comment(s)

Well I've taken the plunge. I have finally broken down and decided to dictate blog post, articles, and other documents. Speech recognition has come a long way over the years, and it's time I make use of it. Does that mean I'm getting typing altogether? Not exactly. One of the biggest problems is learning how to speak in a way that isn't exactly natural. It also changes the way I write.

One of the things that people who know me know that I am dyslexic. My hand writing is horrible and my spelling is so poor that it is not unusual for the spellchecker in each article I write to completely miss a word or two. It also means that even when I see a word misspelled it looks perfectly okay to me. So, between the spellchecker missing words, and my inability to spot incorrect words means I spend a lot of time editing and proofing all my work. For short e-mails this isn’t a big problem. But, on average my blog posts are between 1000 and 2000 words, this leaves a lot of room for error. So, here we are leaning on technology. It also means I can “write” while away from my computer via my Sony recorder.

I don’t know how well this is going to go but, with a minimum of training I was able to dictate this post via the crappy head set they ship with the software :)


Fundamentals of Storage Systems – Capturing IO Patterns

By Wesley Brown in SQL Man of Mystery 11-20-2009 10:13 AM | Categories: Filed under: , , , , ,
Rating: |  Discuss | 4,413 Reads | 907 Reads in Last 30 Days |6 comment(s)

We often take the advice given to us on forums or in articles at face value. Even though the authors almost always say things like “your mileage may vary” or “may not apply to your situation” people still assume it is the gospel. Sometimes it is lack of experience. Other times it is just lack of knowledge on how to verify these things on your own. In this article I’m going to give you a tool to look at what SQL Server is doing at the disk level and allow you to make better decisions on how to configure your underlying disks.

The Basics

There are several things you need to know about how SQL Server accesses the database files and the implications of that before you can construct a proper testing methodology.

http://technet.microsoft.com/en-us/library/cc966500.aspx covers the basics. There are a few things I will highlight here.

ACID and WAL

ACID (Atomicity, Consistency, Isolation, and Durability) is what makes our database reliable. The ability to recover from a catastrophic failure is key to protecting your data.

WAL (Write-Ahead Logging) is how ACID is achieved. Basically, the log record must be flushed to disk before the data file is modified.

Stable Media

Stable media isn’t just the disk drive. A controller with a battery backed cache is also considered stable. Since SQL Server can request blocks as big as 64KB make sure your controller can handle that block size request in cache. Some older controllers only do a 16KB block or smaller.

FUA (Forced Unit Access)

With the requirement of stable media SQL Server creates and opens all files with a specific set of flags. FILE_FLAG_WRITETHROUGH tells the underlying OS not to use write caching that isn’t considered stable media. So, the local disk cache is normally bypassed. Not all hard drives honor the flag though, Some SATA/IDE drives ignore it. Usually, the drive manufacturer provides a tool to turn off write caching. If you are using desktop drives in a mission critical situation be aware of the potential for data loss. FILE_FLAG_NO_BUFFERING tells the OS not to buffer the file ether. At this point the only cache available will be the battery backed or other durable cached on the controller.

File Access

SQL Server uses asynchronous access for data and log files. This allows IO request to queue up and use the IO system as efficiently as possible. The main difference between the two are SQL Server will try and gather writes to the data file into bigger blocks but the log is always written to sequentially.

All of these rules apply to everything but tempdb. Since tempdb is recreated at restart every time recoverability isn’t an issue.

SQL Server data access patterns

Searching around you will find these generalities about SQL Server’s IO patterns

Log Writes

Sequential 512 bytes to 64KB

Data File Read/Writes

8KB

Read ahead – more important to Enterprise Edition

8KB to 125KB

Bulk Insert

8KB to 128KB

Create Database

512 byte – full initialize on log file only.

Backup Sequential Read/Write

1 MB

Restore Sequential Read/Write

64K

DBCC – CHECKDB

Sequential Read 8K – 64K

DBCC – DBREINDEX

(Read Phase) Sequential Read (see Read Ahead)

DBCC – DBREINDEX

(Write Phase) Sequential Write

Any multiple of 8K up to 128K

DBCC – SHOWCONTIG

Sequential Read 8K – 64K

Now that we have an idea of what SQL Server is suppose to be doing its time to verify our assumptions.

Capturing IO activity

There are a few tools that will allow you to capture the file activity at the system level. Process Monitor is a free tool from Microsoft that I will use to collect some base line information. In it’s standard configuration Process Monitor captures a ton of stuff and uses the page file to spool the info to. So, before we begin we need to change the default configuration.

ProcessMon1

Capturing IO data using process monitor.

Filter to apply

process is sqlservr.exe
Operation is Read
Operation is Write

ProcessMon2

Columns to choose.

ProcsessMon5

Process Name
PID
PATH
Detail
Date & Time
Time of Day
Relative Time
Duration
TID
Category

 

Change Backing File.

ProcessMon3

The maximum number of events it will capture is 199 million. This is enough on my system to capture 12 hours of activity easily. Once we have a good sample you can save it off as an XML file or CSV. Choosing CSV it is pretty easy to import the data into SQL Server using SSIS or your tool of choice.

ProcessMon4

I import the CSV into a raw table first.

Raw table to import into.

CREATE TABLE [SQLIO].[dbo].[pm_imp] (
[Process Name] VARCHAR(12),
[PID] SMALLINT,
[Path] VARCHAR(255),
[Detail] VARCHAR(255),
[Date & Time] DATETIME,
[Time of Day] VARCHAR(20),
[Relative Time] VARCHAR(50),
[Duration] REAL,
[TID] SMALLINT,
[Category] VARCHAR(6))

 

Next I create a cleaner structure with some additional information separated from the detail provided.

SELECT 
[Process Name] AS ProcessName,
PID AS ProcessID,
PATH AS DatabaseFilePath,
Detail,
[Date & Time] AS EventTimeStamp,
[Time of Day] AS TimeOfDay,
[Relative Time] AS RelativeTime,
[Duration],
TID AS ThreadID,
Category AS IOType,
substring(detail,charindex('Length: ',detail,0) + 8,
(charindex(', I/O',detail,0) - charindex('Length: ',detail,0) - 8)) AS IOLength,
CASE reverse(left(reverse(PATH),3))
WHEN 'mdf'
THEN 'Data'
WHEN 'ndf'
THEN 'Data'
WHEN 'ldf'
THEN 'Log'
END AS FileType
INTO SQLIOData
FROM
dbo.pm_imp
WHERE reverse(left(reverse(PATH),3)) IN ('mdf','ndf','ldf')

Once we have the data cleaned up a bit we can now start doing some analysis on it.

Queries for interesting patterns.

This query gives us our read and write counts.

SELECT   
count(* ) IOCount,
IOType
FROM
SQLIOData
GROUP BY IOType
ORDER BY count(* ) DESC

This one shows us the size of the IO and what type of operation it is.

SELECT   
count(* ) IOCount,
IOLength,
IOType
FROM
SQLIOData
GROUP BY IOLength,IOType
ORDER BY count(* ) DESC

This is a look at activity by file type data or log.

SELECT   
count(* ) IOCount,
FileType
FROM
SQLIOData
GROUP BY FileType
ORDER BY count(* ) DESC

Since we are capturing the thread id we can see how many IO’s by thread.

SELECT  
  count(* )        IOCount,
  ThreadID
FROM    
  SQLIOData
GROUP BY ThreadID
ORDER BY count(* ) DESC

We can also look at IO types, sizes and count by file helping you see which ones are hot.

SELECT   
count(* ) IOCount,
databasefilepath,
iotype,
iolength
FROM
SQLIOData
WHERE databasefilepath LIKE '%filename%'
GROUP BY databasefilepath,
iotype,
iolength
HAVING count(* ) > 10000
ORDER BY databasefilepath,
count(* ) DESC

Now that we see exactly what our IO patterns are we can make adjustments to the disk subsystem to help scale it up or tune it for a particular pattern.

This is just another tool in your tool belt. This is a supplement to using fn_virtualfilestats to track file usage. I use it to get a better idea of the size of the IO’s being issued.Using these two tools I can see the size of the IO’s in a window of time that is reported by my fn_virtualfilestats capture routine.

Always verify your assumptions, or advice from others.


Why does change control for SQL Server have to be so hard?

Rating: (not yet rated) Rate this |  Discuss | 2,395 Reads | 740 Reads in Last 30 Days |2 comment(s)

I’ve been dealing with change control and source code repositories for most of my professional career. While I’ve seen change control and integration advance steadily for writing programs it feels like the database part of things is just stuck in the stone age. For months now I’ve been researching solutions for source control, change management, and deployment of database objects. The conclusion I’ve come to is there is no solution. Well, no easy solution. I was very happy in the early days of SQL Server 2005 when they announced source control integration into management studio. It was a great pain for me personally to have Visual Studio, and the solution architecture it offered and not have that on the database side of things. Alas, it wasn’t meant to be. What they meant buy source control was using the previous generation of integration and then crippling it.

Really?

image

This doesn’t look like much of a solution to me.

I know what most of you are thinking. If you have Visual Studio use it. That works for me but not the people on my team that only have access to SSMS. It also means I have to jump between two tools to do one thing, work with SQL Server. I have been told that Microsoft is basically pushing you to Visual Studio for all of your development needs. Leaving SSMS as a management tool only. If Visual Studio did everything SSMS did it wouldn’t be that big a deal for me personally.

 

Options Available

SQL Server Management Studio Hacks

I tried several things to work around the limitations SSMS has. I found you could manually edit the solution file to get extra folders. The only problem with that is they all show up as ether Queries or Miscellaneous. Other than that one and the old fix for sorting files by name there aren’t any other hacks I can find.

Toad for SQL Server

Toad1

Generally has a nice look and feel.It has all the development and management features to be a true replacement for management studio. I tried all the normal things that I do in SSMS in Toad and several things were better. The debugger was nice and the statement optimizer is also a nice addition. It does fall down flat in some basic key areas. I never could get it to display an execution plan. As a T-SQL guy the plan is a must. I know it is a bug somewhere. Having something this fundamental during and evaluation is a big red light though.

The only down side is it doesn’t support Sourcegear Vault/Fortress which is a real shame. Lots of SMB’s use Vault for source control since it is miles better than visual source safe and much cheaper than team system.

ApexSQL Edit

apexsql1

That left only one other contender in this fight. ApexSQL Edit has been around quite a while as well. Initially, it has a similar look and feel to Toad. I know there isn’t a lot that you can do to since both look like Office. I is also missing the management features but I can live with that. The goal is to get the developers a tool they can develop in and use our code repository easily. ApexSQL Edit did include support for Vault and it worked as expected. Again, I started using it daily like I would SSMS. Everything I tried worked, for the most part. 95% of the time it would generate an execution plan. Not as clean as SSMS but it had more options on how to display the plan, which I liked. I did have a few crashes, but this was a beta build and I will let that go until I test the full release. Since this was a beta I did provide feedback and initially the folks at ApexSQL were very responsive. Eventually though everything just went quiet accept for the sales guys asking me how things were going. Right now they are a no go until the stability issues are addressed and the RTM is out so I can do a full evaluation again.

 

Final Thoughts

What I hoped would be a pretty easy exercise turned out to be a real work out. For all of SSMS’s problems it is stable and familiar. I was really hoping that ether Toad or ApexSQL Edit would solve my problems. I haven’t given up on ApexSQL Edit yet, we will just have to play the waiting game and keep using an inadequate solution until someone comes up with something better.


When Technical Support Fails You – UPDATE and Answers!

By Wesley Brown in SQL Man of Mystery 10-31-2009 4:25 AM | Categories:
Rating: (not yet rated) Rate this |  Discuss | 2,637 Reads | 601 Reads in Last 30 Days |no comments

As promised and update on what has happened so far. A correction needs to be made. the P800 is a PCIe 1.0 card so the bandwidth is cut in half from 4GB/sec to 2GB/sec.

My CDW rep did get me in contact with an HP technical rep who actually knew something about the hardware in question and its capabilities. It was one of those good news, bad news situations. We will start with the bad news. The performance isn’t off. My worst fears were confirmed.

The Hard Disks

The HP Guy (changing the names to protect the innocent) told me their rule of thumb for the performance of the 2.5” 73GB 15K drives is 10MB/Sec. I know what you are thinking, NO WAY! But, I’m not surprised at all. What I was told is the drives ship with the on board write cache disabled. They do this for data integrity reasons. Since the cache on the drive isn’t battery backed if there was any kind of failure the potential for data loss is there. There are three measurements of hard disk throughput, disk to cache, cache to system and disk to system. Disk to cache is how fast data can be transferred from the internal data cache to the disk usually sequentially. On our 15k drive this should be on average 80MB/sec. Disk to system, also referred to burst speed, is almost always as fast as our connection type. Since we are using SAS that will be close to 250MB/sec. Disk to system is no caching at all. Without the cache several IO reordering schemes aren’t used, there is no buffer between you and the system, so you are effectively limited by the Areal Density and the rotational speed of the disk. This gets us down to 10 to 15 megabytes a second. Write caching has a huge impact on performance. I hear you saying the controller has a battery backed cache on it, and you would be right.

The Disk Controller

The P800 controller was the top of the line that HP had for quite a while. It is showing its age now though. The most cache you can get at the moment is 512MB. It is battery backed so if there is a sudden loss of power the data in cache will stay there for as long as the battery holds out. When the system comes back on the controller will attempt a flush to disk. The problem with this scheme is two fold. The cache is effectively shared across all your drives since I have 50 drives total attached to the system that is around 10.5 megabytes per drive. Comparable drives ship with 16 to 32 megabytes of cache on them normally. The second problem is the controller can’t offload the IO sorting algorithms to the disk drive effectively limiting it’s throughput. It does support native command queuing and elevator sorting but applied at the controller level just isn’t as fast as at the disk level.If I had configured this array as a RAID 6 stripe the loss of performance from that would have masked the other bottlenecks in the controller. Since I’ve got this in a RAID 10 the bottleneck is hit much sooner with fewer drives. On the P800 this limit appears to be between 16 and 32 disks. I won’t know until I do some additional testing.

Its All My Fault

If you have been following my blog or coming to the CACTUSS meetings you know I tell you to test before you go into production. With the lack of documentation I went with a set of assumptions that weren’t valid in this situation. At that point I should have stopped and done the testing my self. In a perfect world I would have setup the system in a test lab run a series of controlled IO workloads and come up with the optimal configuration. I didn’t do as much testing as normal and now I’m paying the price for that. I will have to bring a system out of production as I run benchmarks to find the performance bottlenecks.

The Good News

I have two P800’s in the system and will try moving one of the MSA70’s to the other controller. This will also allow me to test overall system performance across multiple PCIe busses. I have another system that is an exact duplicate of this one and originally had the storage configured in this way but ran into some odd issues with performance as well.

HP has a faster external only controller out right now the P411. This controller supports the new SASII 6G protocols, has faster cache memory and is PCIe 2.0 complainant. I am told it also has a faster IO processor as well. We will be testing these newer controllers out soon. Also, there is a replacement for the P800 coming out next year as well. Since we are only using external chassis with this card the P411 may be a better fit.

We are also exploring a Fusion-io option for our tempdb space. We have an odd workload and tempdb accounts for half of our write operations on disk. Speeding up this aspect of the system and moving tempdb completely away from the data we should see a marked improvement over all.

Lessons Learned or Relearned

Faced with the lack of documentation, don’t make assumptions based on past experiences. Test your setup thoroughly. If you aren’t getting the information you need, try different avenues early. Don’t assume your hardware vendor has all the information. In my case, HP doesn’t tell you that the disks come with the write cache disabled. They also don’t give you the full performance specifications for their disk controllers. Not even my HP Guy had that information. We talked about how there was much more detailed information on the EVA SAN than there was on the P800.

Now What?

Again, I can’t tell you how awesome CDW was in this case. My rep, Dustin Wood, went above and beyond to get me as much help as he could, and in the end was a great help. It saddens me I couldn’t get this level of support directly from HP technical support. You can rest assured I will be giving HP feedback to that effect. By not giving the customer and even their own people all the information sets everyone up for failure.

I’m not done yet. There is a lot of work ahead of me, but at least I have some answers.You can bet I’ll be over at booth #414 next week at PASS asking HP some hard questions!


Heading back to Seattle

By Wesley Brown in SQL Man of Mystery 10-30-2009 2:54 PM | Categories:
Rating: (not yet rated) Rate this |  Discuss | 2,627 Reads | 600 Reads in Last 30 Days |4 comment(s)

I love visiting Seattle, even in November. Getting to catch up with old friends and meet new people is always the highlight of my trip. Don’t get me wrong, the training is awesome and I always learn so much, but I am a social creature. I can always go to specific training events to hear Kalen or Itzik. Getting to meet them in a social setting is something else all together. I learn as much talking to as many of the 2000 people there as I do attending sessions. Never underestimate the wisdom of the crowd!

I’m also a bingo square, if you aren’t playing SQLBingo shame on you! It is another great way to meet some smart people and expand your circle of friends.Even if you aren’t playing still come and find me, I’m always willing to meet new people.

 

See you next week!


When Technical Support Fails You

By Wesley Brown in SQL Man of Mystery 10-30-2009 2:18 PM | Categories:
Rating: (not yet rated) Rate this |  Discuss | 2,664 Reads | 600 Reads in Last 30 Days |no comments

I have had the pleasure of being a vendor, and technical support for both hardware and software products. I know it isn’t easy. I know it isn’t always possible to fix everything. The level of support I’ve received from HP on my current issue is just unacceptable. This is made more frustrating by the lack of documentation. The technical documents show capacity. How many drives in an array, Maximum volume size but nothing on throughput.Every benchmark they have seems to be relative to another product with no hard numbers. For example, the P800 is 30% faster than the previous generation.

I’m not working with a complicated system. It’s a DL380 G5 with a P800 and two MSA70’s fully populated with 15k 73GB hard drives. 46 of them are in a RAID 10 array with 128k stripe. Formatted it NTFS with a 64k block size and sector aligned the partition. Read/Write cache is set at 25%/75%. This server originally just had one MSA70. We added the second for capacity expansion and expected to see a boost in performance as well. As you can probably guess, there wasn’t any increase in performance at all.

Here is what I have as far as numbers. Some of these are guesses based on similar products.

P800 using two external miniSAS 4x connectors maximum throughput of 2400 MB/sec (2400Mbit per link x 4 per connector x 2 connectors).
The P800 uses a PCIe x8 connection to the system at 4,000 MB/Sec (PCIe 2.0 2.5GHz 4GB/sec each direction).
Attached to the controller are 15k 73GB 2.5” hard drives 46 of them for a raw speed 3680 MB/Sec of sequential read or write speed (23x80MB/sec write sequential 2 MSA70's RAID 10 46 Drives total based on Seagate 2.5 73GB SAS 15.1k)

Expected write speed should be around 1200 megabytes a second.

We get around 320 MB/Sec sequential write speed and 750MB/sec in reads.

Ouch.

Did I mention I also have a MSA60 with 8 7.2k 500GB SATA drives that burst to 600MB/sec and sustain 160MB/Sec writes in a RAID 10 array? Yeah, something is rotten in the state of Denmark.

With no other options before me I picked up the phone and called.

I go through HP’s automated phone system, which isn’t that painful at all, to get to storage support. Hold times in queue were very acceptable. A level one technician picked up the call and started the normal run of questions. It only took about 2 minutes to realize the L1 didn’t understand my issue and quickly told me that they don’t fix performance issues period. He told me to update the driver, firmware, and reboot. Of course none of that had worked the first time but what the heck, I’ll give it the old college try. Since this is a production system I am limited on when I can just do these kinds of things. This imposed lag makes it very difficult to keep an L1 just sitting on the phone for five or so hours on hold while they wait for me to complete the assigned tasks. I let him go with the initial action plan in place with an agreement that he would follow up.Twice I got automated emails that the L1 had tried to call and left voicemails for me. Twice, there were no voicemails. I sent him my numbers again just to be on the safe side. Next, I was told to run the standard Array Diagnostic Utility and a separate utility that they send you to gather all the system information and logs, think a PSSDiag or SQLDiag. After reviewing the logs he didn’t se anything wrong and had me update the array configuration utility. I was then told they would do a deeper examination of the logs I had sent and get back to me. Three days later I got another email saying the L1 had tried to call and left me a message. Again there was no voicemail on my cell or my desk phone. I sent a note back to the automated system only to find the case had been closed!

I called back in to the queue and gave the L1 who answered my case number, he of course told me it was closed. He read the case notes to me, the previous L1 had logged it as a network issue and closed the case. If I had been copying files over the network and not to another local array I can see why it had been logged that way. I asked to open a new case and to speak to a manager. I was then told the manager was in a meeting. No problem, I’ll stay on the line. After 45 minutes I was disconnected. Not one to be deterred, I called back again. The L1 that answered was professional and understanding. Again, I was put on hold while I waited for the manager to come out of his meeting. About 10 minutes later I was talking to him. He apologized and told me my issues would be addressed.

I now had a new case number and a new L1. Again, we dumped the diagnostic logs and started from the beginning. This time he saw things that weren’t right. There was a new firmware for the hard drives, a new driver for the P800, and a drive that was showing some errors. Finally, I felt like I was getting somewhere! At this point it has been ten days since I opened the previous case. We did another round of updates. A new drive was dispatched and installed. The L1 did call back and actually managed to ether talk to me or leave a message. When nothing had made any improvement he went silent. I added another note to the case requesting escalation.

That was eight days ago. At this point I have sent seven sets of diagnostic logs. Spent several hours on the phone. And worked after hours for several days. The last time I talked to my L1, the L2’s were refusing to accept the escalation. It was clearly a performance problem and they don’t cover that. The problem is, I agree. Through this whole process I have begged for additional documentation on configuration and setup options, something that would help me configure the array for maximum performance.

They do offer a higher level of support that covers performance issues, for a fee of course. This isn’t a cluster or a SAN. It is a basic setup in every way. The GUI walks you through the setup, click, click, click, monster RAID 10 array done. What would this next level of paid support tell me?

My last hope is CDW will be able to come through with documentation or someone I can talk to. They have been very understanding and responsive through this whole ordeal.

Thirty one days later, I’ve still got the same issue. I now have ordered enough drives to fill up the MSA60. The plan is to transfer enough data to free up one of the MSA70’s. Through trial and error, I will figure out what the optimum configuration is. Once I do I’ll post up my findings here.

If any of you out there in internet-land have any suggestions I’m all ears.


Yes, I am a square, but not the way my wife thinks.

By Wesley Brown in SQL Man of Mystery 10-21-2009 3:53 PM | Categories:
Rating: (not yet rated) Rate this |  Discuss | 2,798 Reads | 599 Reads in Last 30 Days |1 comment(s)

I’ll be on the Twitter Bingo card along with a lot of other great SQL Server folks at Pass! Quest Software/SQLServerPedia are sponsoring this event. This is an opportunity for you to meet lots of folks and expand your circle of SQL friends. It’s also a great opportunity for me to meet you! I spend quit a bit of time at the PASS conference meeting new people and making friends. The rules are here. They will have some cards at the Quest booth, if you need one just let me know I’ll make sure you get in the game. I’ll be updating my location via twitter through the day. To be honest though you can usually hear me from several blocks away, so you shouldn’t have any problems finding me!


Hardware Review – EVGA UV Plus+ 16

By Wesley Brown in SQL Man of Mystery 10-21-2009 2:08 PM | Categories:
Rating: (not yet rated) Rate this |  Discuss | 2,828 Reads | 598 Reads in Last 30 Days |no comments

This isn’t exactly related to SQL Server. I did effect my productivity. I am a monitor junkie. I love multi-monitor setups and have had them for a very long time. Today, having two monitors is trivial. Almost every video card on the market supports two displays. If you want more than that you have to ether step up to a professional graphics card, which can be very expensive, or add another video card to your system. On my rig at home two video cards isn’t a problem I’ve got multiple PCIe slots and run two cards for a four monitor setup. My workstation at work only has one PCIe slot and a PCI slot. Getting a modern card using PCI is harder and harder.

I did some digging and found a card that wouldn’t break the bank and should work with my primary card without having two sets of drivers so everything looked good. I slotted the card in and powered up the computer. Not only did I not have anything on my third display, my network card dropped out too! So, I tried several other older PCI cards to no avail. I knew there were other adaptors out that connected a display via USB. Last time I had checked they were very expensive and didn’t work all that well. I made a trip down to my local Fry’s store. There I saw something truly amazing. Not only did they have USB to VGA they had USB to DVI, and they were under 100 bucks! I settled on the UV Plus+ 16 since it would do 1650x1080 with 32 bit color. The price was right at 60 bucks.

image

The UV Plus+ 16 is based on the Display Link DL-160 chip. Several other manufacturers also have adaptors based on ether the DL-160 or DL-120 and should be similar in performance and installation. The DL-120 doesn’t support the higher resolutions that I needed to run my 22” monitors.

What You Get

Out of the box you get everything you need.

image

This thing is small! I was also expecting something that wasn’t in the package, a power supply. You know, one of those lovely wall warts that take up way to much space. To my surprise, the UV Plus gets its power directly from the USB port. They include a carry case for the adaptor but not the USB cable or the DVI to VGA adaptor. To me this makes the case useless. The unit is sturdy and shouldn’t have any issues floating around in my laptop bag when I travel. The name, why Plus+? One plus not good enough? It is going to be hard selling these on eBay. “I give the Plus+ an A+++++.”

Installation

Installing the adaptor is simple.

  1. Install the drivers
  2. Plug it in using the supplied USB cable
  3. Attach Monitor
  4. Adjust settings via windows display properties or tray utility.

That’s it.

Once installed changing the options is just like working with any other monitor attached to your system.

image

If you don’t want to go that route you can always use the try icon control application that comes with it.

image

image

I actually moved the UV Plus to power a monitor I have that rotates from portrait to landscape since the tray utility handles it so well.

There are other benefits to running these adaptors. You can have six, yes six attached to a single system. Unless you are running an Nvidia card then you are limited to four adaptors. This is going to allow me to move up to six displays and remove the second video card in my primary workstation at home. With the free PCIe slot I’m putting in a hardware RAID controller, you can never have enough IO!

 

Experience So Far

Has been pretty uneventful, generally it just works. I did have an issue with opening a PDF from inside adobe reader in full screen mode. The display became corrupted with blocks and color bands. I just used the tray tool to turn off the display then back on and it was fine again.

Conclusion

The UV Plus+ 16 is EXACTLY what I needed. It is cheap, easy to use, and small. All of these things add up to a solid little device with lots of possibilities. If you need a multi-monitor setup for work this is a great solution for you. If you need a multi-monitor setup for gaming then you need to go with the right PCIe video card setup.

More Posts Next page »