December 15, 2008 at 7:55 am
Hi
I have a problem I have never seen before. Can anyone please help?
We have a four node SQL Server 2005 cluster running on Windows 2003 servers. In an instance with two user DBs one of the DBs will not start when the instance is on two of the nodes but starts just fine when the instance is on either of the other two nodes.
DB1 DB2
------ ------
Node A OK Error
Node B OK Error
Node C OK OK
Node D OK OK
The SQL Server log file gives the following errors:
Error: 5173, Severity: 16, State: 1.
One or more files do not match the primary file of the database. If you are attempting to attach a database, retry the operation with the correct files. If this is an existing database, the file may be corrupted and should be restored from a backup.
Log file 'G:\log\Connect_log.ldf' does not match the primary file. It may be from a different database or the log may have been rebuilt previously.
Like I said this error only happens on two of the four nodes on one of the two databases.
TIA
December 15, 2008 at 8:27 am
Is there a possibility of fixed disk on those nodes causing a problem?
Are all the machines configured exactly the same? And do the disk numbers match up correctly?
Could you post some of the relevant lines from the cluster log? Those are a pain to read, but I've found they have the best information when trying to troubleshoot.
December 15, 2008 at 10:35 am
Thanks for the quick reply.
The log file for the DB which always works (DB1) is on the same drive as the one which doesn't so I would think the drive isn't causing the problem.
All the machines are identical as far a I can tell. Is there a setting in particular you are thinking about?
I am afraid I don't know what you mean by disk number. :blush:
I can post the info from the cluster.log but I will have to wait until after hours to get it since this is a production instance. Any entries from the last time it happened have rolled off the file from new entries being added to the file.
Thanks again.
December 15, 2008 at 10:56 am
That'd would be great. Anything from the cluster log.
In the meantime, are there entries in the event logs as regards the problem? Also check the SQL Server Error Logs.
December 15, 2008 at 1:22 pm
Do they have the same Drive letter/name on all three nodes? Is there any Full Text indexing on this database?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
December 16, 2008 at 7:52 am
They are the same Drive letter/name on all four nodes. It is all on the drive in the cluster group. So it moves from node to node as the drive letter defined in the CG.
I don't about the Full Text indexing. These are third party applications with their database on our server.
But I think I have bigger issues. As I was moving the instances (there are two, each w/ two DBs) around the nodes, various DBs became "suspect" going from node to node. Luckily (or unluckily, depending on how you look at it) all but one recovered.
Just to give a little more detail. The cluster drives are made up of three Lefthand Networks SAS drives mounted to one drive letter in the cluster group.
TIA
December 16, 2008 at 8:37 am
Here are some of the errors found in the cluster.log
thanks
December 16, 2008 at 8:41 am
SQL Error 5125 not good
File '%ls' appears to have been truncated by the operating system. Expected size is %I64d KB but actual size is %I64d KB.
December 16, 2008 at 8:59 am
Is this a majority node set cluster? Or a regular failover cluster, with a quorum drive?
December 16, 2008 at 9:01 am
It is a regular failover cluster, with a quorum drive.
December 16, 2008 at 9:04 am
Looks like your MSDTC is going offline it shows a physical disk failure.
Where does that resource live?
December 16, 2008 at 9:06 am
Back to your Full Text indexing question. This DB does not but the other three (including the one that went and stayed suspect) do.
December 16, 2008 at 9:10 am
MSDTC is in its own cluster group. I could live on any of the Nodes at any given time.
That is just one instance of that error though. It also happened to the quorum drive and the drive for the sql instance.
December 16, 2008 at 9:13 am
Not sure if this is best practice or not, but I usually put that resource on the same cluster group as the quorum and have them share the same drive. That way if the quorum fails, or fails over the DTC is restarted as well.
Do you have disk caching enabled?
December 16, 2008 at 9:27 am
The SAS has hardware disk caching. What was your concern?
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply