I have total of 11 years of IT experience with Application development, Database Development and Database Administration. I have worked with different version of SQL server from 7.0 to 2008.Started my carrier as VB ,VC++ and database developer in a banking sector for implementing their core banking solution. Currently working as Database Administrator with wide knowledge in performance tuning, high availability solution, troubleshooting and server monitoring. This blog is my humble attempt to share my knowledge and what I learned from my day to day work.
In any system, logs are very important to troubleshoot the issue. In the case of windows cluster, cluster logs are like a black box which will have all information about the failure of cluster.This is very important information, that we need, to raise a case with Microsoft. In this post let us discuss only about cluster log behavior and generating the cluster log.In the coming post we will,discuss about deciphering the cluster log.
When you trouble shoot the cluster issue, I am sure that, in most of the cases cluster log will give you the root cause of the issue. You can link the windows event log entries for further analysis.In windows 2008, the cluster logs are captured using the new eventing and diagnostic channel called ETW (Event Tracing for Windows). You can see this tracing in the Reliability and Performance Monitor under "Data Collector Sets\Event Tracing Session". Below is the snapshot of the same.
The log file generated by ETW are stored in the folder %windir%\system32\winevt\logs
Each time the server is rebooted, a new file will be generated like clusterlog.etl.002 and start logging in that until the server is rebooted again. Up to 3 log files are kept, so after the third consecutive reboot, you will start loosing old log entries. Please find below a snapshot of etl file in the folder.
For example, in the above case, after two consecutive restart on 29th september, we lost all the logs till 12th september. If we had one more reboot on 29th, we might have lost all the logs till 29th. The ETL log file name incremented each time and has 00X suffix appended to it. Once the maximum number of log files (3) reached, it will start overwriting the first one.At any point of time, only one log file being actively used.
The default log file size is 100 MB (for each etl file). Once the file reached the limit of 100 MB, it will start deleting entries from the beginning of the file to make room for the current logging. In our case the active log is 'ClusterLog.etl.003' which created on 29th September 2012 1:41 AM and it is reached the limit of 100MB. Now the log entries at the beginning of the file ClusterLog.etl.003 will get deleted to make room for new entries.
Now we have the cluster log spread across three ETL files.The easiest way to read the ETL files is, use the Cluster Log command from the command line. The syntax is :
cluster log /g
This will merge the three etl files of each node and create a output file cluster.log. The output file will be stored in the %windir%\Cluster\Reports directory on each node of the cluster.
One more interesting switch available for Cluster Log command is /Span:<minutes> , Which will help us to generate the log only for last 30 minutes. For example /Span:15. This will help us to quickly troubleshoot the recent issues.
When you generate the log using the cluster log , in the output file, you might have notice a gap of log, that is log is not available for some days in between. This is happening due to the truncation of log once it is reached the limit of 100 MB. In our example, the etl file ClusterLog.etl.003' which created on 29th September 2012 1:41 AM, has already reached 100MB limit and might have truncated some data at the beginning. So when you merge the etl file using the cluster log command, you can notice a gap after 29th September 2012 1:41 AM. May be for some hours or days/week. In the output file ,you might have log from September 12th 5.21 AM (might have truncated at the beginning as it reached 100 MB) till 29th September 2012 1:41 AM. These entries are coming from the ClusterLog.etl and ClusterLog.etl.002. After that you might notice a gap of log data for few hours/days as the data at the beginning of ClusterLog.etl.003 got truncated to make room for current log entries.
we can change the default configuration of cluster using the following command. To change the default size use the below command.
cluster log /size:400
This will change the default size of the log to 400 MB.
You can check this change using the command
If you liked this post, do like my page on FaceBook