DB not available on all nodes

Question

Post reply

DB not available on all nodes

Mitch Beu

SSC Enthusiast

Points: 109
More actions
December 15, 2008 at 7:55 am

#128364

Hi
I have a problem I have never seen before. Can anyone please help?
We have a four node SQL Server 2005 cluster running on Windows 2003 servers. In an instance with two user DBs one of the DBs will not start when the instance is on two of the nodes but starts just fine when the instance is on either of the other two nodes.
DB1 DB2
------ ------
Node A OK Error
Node B OK Error
Node C OK OK
Node D OK OK
The SQL Server log file gives the following errors:
Error: 5173, Severity: 16, State: 1.
One or more files do not match the primary file of the database. If you are attempting to attach a database, retry the operation with the correct files. If this is an existing database, the file may be corrupted and should be restored from a backup.
Log file 'G:\log\Connect_log.ldf' does not match the primary file. It may be from a different database or the log may have been rebuilt previously.
Like I said this error only happens on two of the four nodes on one of the two databases.
TIA

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Nicholas Cain SSC-Dedicated Points: 33009 More actions · Answer 1

Is there a possibility of fixed disk on those nodes causing a problem?

Are all the machines configured exactly the same? And do the disk numbers match up correctly?

Could you post some of the relevant lines from the cluster log? Those are a pain to read, but I've found they have the best information when trying to troubleshoot.

Shamless self promotion - read my blog http://sirsql.net

Mitch Beu SSC Enthusiast Points: 109 More actions · Answer 2

Thanks for the quick reply.

The log file for the DB which always works (DB1) is on the same drive as the one which doesn't so I would think the drive isn't causing the problem.

All the machines are identical as far a I can tell. Is there a setting in particular you are thinking about?

I am afraid I don't know what you mean by disk number. :blush:

I can post the info from the cluster.log but I will have to wait until after hours to get it since this is a production instance. Any entries from the last time it happened have rolled off the file from new entries being added to the file.

Thanks again.

Nicholas Cain SSC-Dedicated Points: 33009 More actions · Answer 3

That'd would be great. Anything from the cluster log.

In the meantime, are there entries in the event logs as regards the problem? Also check the SQL Server Error Logs.

Shamless self promotion - read my blog http://sirsql.net

RBarryYoung SSC Guru Points: 143329 More actions · Answer 4

Do they have the same Drive letter/name on all three nodes? Is there any Full Text indexing on this database?

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Mitch Beu SSC Enthusiast Points: 109 More actions · Answer 5

They are the same Drive letter/name on all four nodes. It is all on the drive in the cluster group. So it moves from node to node as the drive letter defined in the CG.

I don't about the Full Text indexing. These are third party applications with their database on our server.

But I think I have bigger issues. As I was moving the instances (there are two, each w/ two DBs) around the nodes, various DBs became "suspect" going from node to node. Luckily (or unluckily, depending on how you look at it) all but one recovered.

Just to give a little more detail. The cluster drives are made up of three Lefthand Networks SAS drives mounted to one drive letter in the cluster group.

TIA

Mitch Beu SSC Enthusiast Points: 109 More actions · Answer 6

Here are some of the errors found in the cluster.log

thanks

Nicholas Cain SSC-Dedicated Points: 33009 More actions · Answer 7

SQL Error 5125 not good

File '%ls' appears to have been truncated by the operating system. Expected size is %I64d KB but actual size is %I64d KB.

Shamless self promotion - read my blog http://sirsql.net

Nicholas Cain SSC-Dedicated Points: 33009 More actions · Answer 8

Is this a majority node set cluster? Or a regular failover cluster, with a quorum drive?

Shamless self promotion - read my blog http://sirsql.net

Mitch Beu SSC Enthusiast Points: 109 More actions · Answer 9

It is a regular failover cluster, with a quorum drive.

Nicholas Cain SSC-Dedicated Points: 33009 More actions · Answer 10

Looks like your MSDTC is going offline it shows a physical disk failure.

Where does that resource live?

Shamless self promotion - read my blog http://sirsql.net

Mitch Beu SSC Enthusiast Points: 109 More actions · Answer 11

Back to your Full Text indexing question. This DB does not but the other three (including the one that went and stayed suspect) do.

Mitch Beu SSC Enthusiast Points: 109 More actions · Answer 12

MSDTC is in its own cluster group. I could live on any of the Nodes at any given time.

That is just one instance of that error though. It also happened to the quorum drive and the drive for the sql instance.

Nicholas Cain SSC-Dedicated Points: 33009 More actions · Answer 13

Not sure if this is best practice or not, but I usually put that resource on the same cluster group as the quorum and have them share the same drive. That way if the quorum fails, or fails over the DTC is restarted as well.

Do you have disk caching enabled?

Shamless self promotion - read my blog http://sirsql.net

Mitch Beu SSC Enthusiast Points: 109 More actions · Answer 14

The SAS has hardware disk caching. What was your concern?