Failed SQL Server 2005 Cluster Upgrade to SP4

  • I am looking for some suggestions to a problem that we ran into upgrading our SQL Server 2005 Cluster to SP4.

    We have a 5 node Active-Active cluster.

    While applying SP4 to the active node of one of the instances of SQL Server 2005, the instance failed over to its primary preferred node. Rather than trying intervene, we let the patch complete.

    It appears that all the binaries did get upgraded on each passive node in the cluster including on the active node, but I question the integrity of any updates made to the system databases. The reason for this is that we ended up running into an issue with database mail not working because the binaries were expecting a new column in a table in msdb which is returned by a stored procedure and neither the table nor the stored procedure had this column accounted for. I compared this against another instance in the cluster where it was successful and I was able to repair the issue, but I suspect if this occurred, then there could be more system objects that might have been missed.

    I had seen the suggestion to reapply SP4 to only the active node before so figured I would try pausing all the nodes in the cluster and applying SP4 to just the active node again, but the installer pretty much flew right through and returned a successful upgrade, but didn't appear to apply any updates. I know I have reapplied SP4 to a non-clustered instance before and it actually went through the whole install so maybe this has something to do with being part of the cluster.

    I am hoping someone can give me some ideas on what I can do to make sure the system dbs get upgraded without having to reinstall any instances as this wouldn't go over so well in our environment.

  • I ended up opening a support case with MS. That did not yield a solution as they were not able to determine that the system database upgrade step never happened even though it was clearly missing from the logs while it existed in the logs for the successful instances.

    They contend that the patch was successful regardless of the 20 - 30 procs that didn't get updated on the instance. The patch could not be successful if the instance failed over before those updates would have been run and that appears to be the case.

    I'm still looking for any useful suggestions because at this point, I have just manually upgraded any objects the patch would have updated that I could confirm were different between a successfully upgraded instance and the failed one.

  • I saw your post... though I don't think I have an answer, I did have questions?

    Just wanted to confirm:

    Post mentioned there was nothing in the bootstrap logs? (I'm curious there are none?)

    Also when wanted to know ... when runing select@@version, I'm assuming it shows "upgraded" to SP4 patch num even though you mentioned some sprocs as part of the upgrade were not updated on the server?

    Are the SQLSRVR.EXE excutables the same size by the way? (before and after on a current successfully patched version?..such as your successfully patched stand-alone server) May want to check this to 100% confirm patch successful on cluster?

    --------------------------------------------------
    ...0.05 points per day since registration... slowly crawl up to 1 pt per day hopefully 😀

  • 1. The logs are present, but the part where the system databases are upgraded is missing from the logs for the instance that failed.

    2. Correct. It shows that it is upgraded to SP4 on every node when you fail it over to any node in the cluster, but none of the system database upgrades happened.

    The binaries are all patched and match, but I know the patch wasn't 100% successful since the system database updates were missed. I am just taking a chance that with me updating the system databases manually and that the binaries were patched, that we won't run into any issues, but there is no guarantee that other things weren't missed.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply