Fix Failures Fast

  • Comments posted to this topic are about the item Fix Failures Fast

  • There was another interesting point that said CPUs and RAM in white box (non-OEM) machines was less reliable than brand name systems. I struggle to see how a CPU would be less reliable, but that was part of the analysis. I'm not sure that would stop me from purchasing my own parts in the future, but it is something to keep in mind. Perhaps you should burn in the machine as soon as you can to allow for a parts return if you have issues.

    I agree. My guess is not that CPUs destined for branded machines are inherently any more reliable, but that it's the quality control process that makes the difference. Or, more simply, that I suspect the same percentage of CPUs fail irrespective of destination, but that brand names ensure most failures occur before the kit is bought by customers.

    Semper in excretia, suus solum profundum variat

  • Very interesting topic this. There are lots of servers out there running consumer grade equipment for sure so definately worth looking out for.

    Although I'm not sure that server grade stuff is really THAT much better - Seagate manufacture (or at lest they used to) pretty much all of the HDDs that go into both HP and IBM enterprise class servers - they are also a very popular choice among consumer equipment. I may be wrong but, if they have a 7200RPM motor/bearing/platter setup that works well in low power consumption servers and they manufacture 400,000 units at a time, are they going to design a completely different, less reliable 7200RPM motor/bearing/platter setup for their consumer drives? I would have thought it would be cheaper for them just to manufacture more of the server grade stuff as they already have the machinery and stick different labels on them and different firmware revisions. I may be completely wrong on that though.

    I know this isn't just about hard drives but we use almost entirely 15k 2.5" SAS600 IBM/Seagate drives in our servers here and we recently had a catastrophic disk failure in an x3650. One 15k disk decided that its platter had maintained molecular cohesion for just far too long and it exploded. So violent was the shock of the exploding platter that it took out the drives on either side of it and overloaded the power on the backplane. In a non-server class machine the likelyhood is that the hard drives would not have been jammed in so tight together and the flimsier metal construction would have absorbed some of the shock of the initial disk failure. The chance of that bad disk taking out 3 drives in a RAID 5 and therefore destroying the data (requiring a tape restore) would potentially be lower - but that is just one issue out of potentially hundreds of issues - most of which you'd do better having server grade stuff.

    I can't help but wonder wether the issues of OEM PCs vs home built machines has more to do with user demographic than the equipment itself. I recon 95% of PC users buy a pc from a supermarket or the likes of dell, use it for a year or 2 at the most then when it is full of junk/badly fragmented throw it away because 'it got slow' then go buy a new one, long before the hardware itself is worn out enough to fail.

    Those of us who build our own machines maintain them and therefore keep the hardware long enough for it to reach the end of its serviceable life. I only just 2 weeks ago replaced my Athlon X2 4200+ /2GB of OCZ RAM/ nVidia 7950PCIe that I bought in September 2006 - it was still working fine, perfectly reliable and still fast because it was maintained properly. I just wanted faster graphics. How many consumers of OEM PCs do you know who have a 6 year old PC that isn't slow?

    Most home builders also don't have proper anti-static equipment and run the risk of causing static-shock damage to their hardware that might not immediately become apparent, some might mix different brands/timings of RAM etc too.

    Ben

    ^ Thats me!

    ----------------------------------------
    01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
    ----------------------------------------

  • I'd like to see the linux/Unix community do a similar study on their installed user base. Better not hold your breath the Unix guys will complain they can't do it because their OS never crashes so please consult the open source license covering that and here is a BASH pound Tar Bol file you can extract out and read the read.me file of the instructions for a cool scientific calculator app that might help.

  • I'd suspect that the higher failure rate for white-box (versus brand name) PCs can be attributed to how the components are installed and paired up with other components. The assembly line style methods used by Dell and HP, where they purchase components of the same type in bulk and have workers who specialize in performing specific tasks; my belief is that would probably result in fewer defects better overall hardware configurations.

    The guys who set up shop in a strip mall, buying parts from eBay or the local BestBuy, and then assembling them on a fold up table; I doubt the quality is there.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • thadeushuck (7/3/2012)


    I'd like to see the linux/Unix community do a similar study on their installed user base. Better not hold your breath the Unix guys will complain they can't do it because their OS never crashes so please consult the open source license covering that and here is a BASH pound Tar Bol file you can extract out and read the read.me file of the instructions for a cool scientific calculator app that might help.

    LOL

    Then the legal department of the Santa Cruz Operation will attempt to sue Microsoft for including a scientific calculator in Windows 8 that looks a little bit like their app.

    Ben

    ^ Thats me!

    ----------------------------------------
    01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
    ----------------------------------------

  • summing it up for you upper management type:

    1. Buy your DBA's Laptops they fail less

    2. You need to burn in desktops longer

    3. If anything Failed it means replace it with new

    4. Everything crashes, Your DBA will need Google glasses as an alternative

  • Eric M Russell (7/3/2012)


    I'd suspect that the higher failure rate for white-box (versus brand name) PCs can be attributed to how the components are installed and paired up with other components. The assembly line style methods used by Dell and HP, where they purchase components of the same type in bulk and have workers who specialize in performing specific tasks; my belief is that would probably result in fewer defects better overall hardware configurations.

    The guys who set up shop in a strip mall, buying parts from eBay or the local BestBuy, and then assembling them on a fold up table; I doubt the quality is there.

    HP/DELL probabably have higher and stricter "burn-in" tests or their buyer's being businesses tend to do it for them. The fact that once a memory error crash happens, the likelihood of another memory crash happening again drops to 2, is scary. I dread memory replacement with my 8 year old awesome laptop, seems like opening it up just cause more crap to fail, and digging thru the pile of dead laptops for parts is a drag...

  • It would be interesting if a study has ever been made into why IT guys crash and burn.

    Sometimes I think that new DBAs and developers need to be "burned in" longer before they are "deployed to production".

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • thadeushuck (7/3/2012)


    Eric M Russell (7/3/2012)


    I'd suspect that the higher failure rate for white-box (versus brand name) PCs can be attributed to how the components are installed and paired up with other components. The assembly line style methods used by Dell and HP, where they purchase components of the same type in bulk and have workers who specialize in performing specific tasks; my belief is that would probably result in fewer defects better overall hardware configurations.

    The guys who set up shop in a strip mall, buying parts from eBay or the local BestBuy, and then assembling them on a fold up table; I doubt the quality is there.

    HP/DELL probabably have higher and stricter "burn-in" tests or their buyer's being businesses tend to do it for them. The fact that once a memory error crash happens, the likelihood of another memory crash happening again drops to 2, is scary. I dread memory replacement with my 8 year old awesome laptop, seems like opening it up just cause more crap to fail, and digging thru the pile of dead laptops for parts is a drag...

    It's probably been ten years since I've tinkered with the inside of one of my personal PCs or laptop. However, long ago I was in the habbit of popping the top on my PC and installing a new HD, upgrade the RAM, etc. about once a year. I recall from way back then doing stupid things like accidentally installing the RAM chip backward or dropping the screw driver while it bounced around inside the component case. I don't think I ever wore rubber gloves to prevent static charge or anything like that. Obviously someone who intalls hardware for a living would make a lot fewer mistakes, but when they're building one-off white boxes, it's bound to happen.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (7/3/2012)


    Sometimes I think that new DBAs and developers need to be "burned in" longer before they are "deployed to production".

    I don't know about that. I know people that have been in this business for years and still can't fill out Change Control requests properly. Common sense does not have a break-in period. They either got it or they don't. Time in has nothing to do with it. 😀

    "Technology is a weird thing. It brings you great gifts with one hand, and it stabs you in the back with the other. ...:-D"

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply