• Well,

    Couple of things. I ran the validation test and passed with a few warnings. Some of them saying that I have mutiple NICs on same network segment, but this is normal, as I use "teaming" for the public NICs. It also says the patching level it's not the same, but that's not true, it is. So not sure if this is a bug or the Cluster Validation it's not properly reading the registry entries.

    I also turned the whole SQL failover group down, all resources. I moved to other node, and started turning on one by one, it worked. I turned the whole group off again, moved back, and started turning services and resources one by one, it worked. However, when I attempt to failover on the fly, from one node to another one, all resources come online, including SQL, but after a few seconds, it fails. So I dug into SQL log and found this entry, just before going down:

    2013-04-17 18:09:40.42 spid11s Could not create tempdb. You may not have enough disk space available. Free additional disk space by deleting other files on the tempdb drive and then restart SQL Server. Check for additional errors in the event log that may indicate why the tempdb files could not be initialized.

    2013-04-17 18:09:40.42 spid11s SQL Trace was stopped due to server shutdown. Trace ID = '1'. This is an informational message only; no user action is required.

    2013-04-17 18:09:40.45 Logon Error: 18456, Severity: 14, State: 38.

    The only other error when starting was this

    2013-04-17 18:09:26.69 Server The SQL Server Network Interface library could not register the Service Principal Name (SPN) [ MSSQLSvc/xxxxxx] for the SQL Server service. Windows return code: 0x2098, state: 20. Failure to register a SPN might cause integrated authentication to use NTLM instead of Kerberos. This is an informational message. Further action is only required if Kerberos authentication is required by authentication policies and if the SPN has not been manually registered.

    But I do not believe that may be the issue here.

    I guess that my question now is,

    Is tempdb too small for my SQL2012 Cluster, failing to initialize, and then bring SQL service down? How to calculate or measure that. tempdb is cleared up after a restart. And I've checked during normal hours and it looks fine.