Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

The Decision to Fail Expand / Collapse
Author
Message
Posted Thursday, June 6, 2013 8:07 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Yesterday @ 7:45 PM
Points: 33,264, Visits: 15,424
Comments posted to this topic are about the item The Decision to Fail






Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #1460934
Posted Thursday, June 6, 2013 9:33 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: 2 days ago @ 10:52 PM
Points: 37,056, Visits: 31,619
I don't work for a huge company and we don't have Tera-byte sized databases but we have multiple mission critical 120GB databases and an awesome NetOps group. Instead of buying everything brand spanking new, they bought some awesome hardware that had been refurbished. What that allowed us to do was to afford not just 2 but 3 identical systems. 2 of them are clustered onsite. We regularly test them by forcing a failover. The "outage" is usually something less than 5 seconds. The third system is in another state about 500 miles away. It's the DR system. I don't know what tools they're using for all of this but the fail over to the DR site is also measured in very few seconds and it's all automatic. They've made my job as a DBA a proverbial cake walk when it comes to HA and DR.

During the "failover", most users lose no work and most don't even know the failover occurred.

Living on the bleeding edge is expensive. Instead of buying the latest and greatest which also commands the most expense, they bought the latest and greatest of the refurb world. For what most people would have paid a little more for just 1 system, we have 3. And, I have to tell you, these systems aren't some beatup ol' relics. They might not beat "state of the art" but they do a damn fine job of keeping up.



--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1460939
Posted Friday, June 7, 2013 3:56 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 3:59 AM
Points: 6,252, Visits: 7,424
I figure it's not a bad idea to chime in here. My current DBA (I'm playing little ol' developer at the moment) is beating his head on the walls creating a proper DR system. I don't respect restrictive DBA's often, but he has my respect, it makes sense. With that in mind...

It's been a political nightmare. The gear is not the issue. The setup is not the issue. The volume of wtf outside of our domain is the issue.

In some ways I want to go back to the 1990s... when you could piss on the sysadmins and they ran so you could get your DB setup properly. One LUN goes sideways now and you've got four parties screaming 'Not ME!' until you can nail one to the floor with a sledgehammer.

And then they cry.

I want to go back to when we OWNED our damned work. I want to go back to the time when three people stood up at a meeting and said "Whoops, that's mine, sorry.".

You ask about the decision of when to fail. I have an issue with that, but not with the question. My issue is "can we fail?!". Too often, the answer is "No. We die."

I realize this seems like a bit of a soapbox, but this is five companies deep where I've seen this. Seriously, own up.

I own my failures. They should too. DR is not a one system show. I can't DR anything that doesn't have proper backup from all parties. Currently DR is like pissing uphill and upwind for anywhere that doesn't have dedicated staff dedicated to their components. There's no way for things to end well.



- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Post #1461014
Posted Saturday, June 8, 2013 8:23 PM


SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Friday, September 12, 2014 4:18 PM
Points: 631, Visits: 2,192
Evil Kraig F (6/7/2013)
I own my failures. They should too. DR is not a one system show. I can't DR anything that doesn't have proper backup from all parties. Currently DR is like pissing uphill and upwind for anywhere that doesn't have dedicated staff dedicated to their components. There's no way for things to end well.


My current company thinks that the backup tapes that they send to Iron Mountain are enough. We have over 500 hosted companies that use RDP to access their data/apps.

I've been doing my best to create alternate backup solutions. But if our building were to be hit by a tornado, we are screwed.




----------------
Jim P.

A little bit of this and a little byte of that can cause bloatware.
Post #1461323
Posted Monday, June 10, 2013 7:44 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Friday, September 12, 2014 6:29 AM
Points: 1,657, Visits: 4,739
Failover clusters are a good idea, but don't let your SAN be a single point of failure. Years ago, I worked at a place which actually had invested in a failover cluster for all of their production SQL Server instances. Then there was a SAN failure, and we were down for a week. That's how long it took them to get a replacement SAN from the vendor and restore 10 TB of data from backup tape. I'm not a hardware / server guy, so I don't know exactly what they did wrong to get themselves up for that disaster.
Post #1461556
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse