'Load and Swap' functionality

  • We have a 1TB reporting database. We load data into it in batch once per day. The timing of the data loads are such that we need to find a way to do the load during the day without having the database made unavailable to the users during the load. The data is not valid between the time that the data load begins and the time it ends because we begin with a lot of delete statements for data that will be replaced. So, what we need to do is 1) perform the massive daily data load, 2) have the current (read-only) data remain available until the new data is completely loaded, and 3) swap the newly loaded copy of the database with the read-only copy.

    We would repeat this daily, with one copy always available, while one is offline being loaded. I know that we can do the offline loading pretty easily. What I'm not real clear on is the best method for doing the 'swap', such that there is no down time: transparent to the consumers of the data. In other words, in a way that wouldn't cause currently running processes to break. Is this even doable? I've heard about warm backups and log shipping methods, but I'm looking for guru input on the most bulletproof method.

    Thanks in advance for any enlightenment anyone can provide.

  • You are going to need some down time, even if it is just a cople of minutes in order to rename databases to effect the swap of data.

  • What about some method (presumably replication or replication-like) where we syncronized the read-only database with the data from the writeable one after the load is completed? I'm not going to be able to sell any solution that includes a break in availability of the data. The design is not mine, I'm just trying to work with what I have.

  • I gather that the data is read only and the end users cannot modify it. Is that correct?

    "Keep Trying"

  • We've been doing an A/B switch with a database rename. Downtime is measured in seconds, but it's absolutely there. I'm not aware of a way to do this without impact to the end user.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • The data is read-only to the users. The only thing that updates the data is the daily data load we do.

    I was thinking about an 'Active\Active' clustering configuration, where once I get the daily load done on node A, I go into Cluster Manager and force a failover to node A, then load node B offline with the same data to sync it up. We'd do the reverse the next day, and so on. It would require two separate loads, but since they're both done offline, it isn't that big a deal. With cluster failover, in my understanding, there is no downtime.

  • There will still be downtime with the cluster. Or there might be. As people try to connect, they'll get moved to the new cluster, but if you failover, those connected to the old one will go down.

    How do clients connect? do you have any say over that? If so, you could have them read the connection string from a central location, a file share, and then alter that. So you might have a small period of time where people are connected to both machines, current users on one machine, users making new sessions on the other. But within a short period of time, as you people ran new queries, they'd move to the new machine.

    I'd argue the 0% downtime as well. Often someone insists on it, not really understanding what they mean. Usually someone can handle a few minutes of downtime.

    The way we used to load a large DW in the past was that we had a join table that was used in most queries. Essentially it was populated with the load date, and you joined it into queries. we'd load data in the same DW, adding this date to rows and then at the end, we'd add a row to the join table. This made all the new data instantly appear.

  • I'm beginning to see that a limited, minimal amount of downtime during the switch is inevitable. So, I'll just have to say, "if you want the new data sooner, rather than later, you'll have to accept a few seconds of downtime for the swap". I've talked to a number of people much smarter than I (not a big stretch 🙂 ), and none of us can think of a better way. Thanks to everyone for your help.

  • Steve S (8/28/2009)


    The data is read-only to the users. The only thing that updates the data is the daily data load we do.

    I was thinking about an 'Active\Active' clustering configuration, where once I get the daily load done on node A, I go into Cluster Manager and force a failover to node A, then load node B offline with the same data to sync it up. We'd do the reverse the next day, and so on. It would require two separate loads, but since they're both done offline, it isn't that big a deal. With cluster failover, in my understanding, there is no downtime.

    Failover just doesn't work like that. If you had two active instances, let's say A & B, currently being hosted on two different machines and these things are in an active/active cluster. When you told A to failover, it would stop and restart on the server that was currently hosting B. It's not that your connections would switch to B while A stayed available.

    Funny thing is, just last night I was editing an article on the myths and misunderstandings around clusters and this was one of them. I'll have to send this link to the author. They might find it useful.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • The switch I mentioned is the only way I have seen that work, and that's only if you can co-locate the data in one system. Other than that, there's no way that I know of. Every other HA system moves connections to another server, but they are severed and re-connected.

  • As long as I can provide reasonably reliable assurances that the downtime would be on the order of a few seconds, I think I can sell it. The severing-and-reconnecting aspect is not ideal, but probably will impact only those few web requests that happen to be hitting it at the exact time the failover is being done. The people actively querying the data on an ad hoc basis can be coordinated around. All in all, I think that's the best way to go.

  • There are always games you can play with changing where IP addresses point to, but even that is a bit risky. In other words - force the reporting tool to talk to a new IP and/or simply re-route the IP somewhere else (in your routing table).

    The collocation scenario steve described is probably best (it's actually the scheme used by IIS7 and above to "recycle worker threads": start up a new pool and point new requests at it, and let the other one die off as the requests expire).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply