We have discovered the root trigger of an issue that others may want to know about. The default settings of the 6.0.667.0, Windows Server 2008 Operating System (Monitoring), Performance, Windows Server 2008 Logical Disk, checks the fragmentation levels of all logical disks on a periodic basis (Every Saturday at 3 a.m. by default). This behavior has caused BSOD, mini dump BUGCHECK_STR: 0x9E on Windows 2008 R2 Data Center servers, Hyper-V Hosts with ISCSI Cluster Shared Volume (CSV). The sequence appears to be that the monitoring of the management pack triggers Event ID: 7036, the Disk Defragmenter service entered the running state, on both Hyper-V Hosts with the Cluster Shared Volume (CSV) Saturday morning at nearly the same time. One of the Hyper-V Host nodes apparently is in control of the CSV and the other node reports Event ID: 5120 (see below). Sometime later the node that has control of the CSV reports Event ID: 1230 and after about 20-30 minutes crashes and reboots.
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Event ID: 5120
Task Category: Cluster Shared Volume
Level: Error
Keywords:
User: SYSTEM
Computer: Hyper-VHost01
Description:
Cluster Shared Volume 'Volume1' ('Cluster Disk 1') is no longer available on this node because of 'STATUS_CONNECTION_DISCONNECTED(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished.
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Event ID: 1230
Task Category: Resource Control Manager
Level: Error
Keywords:
User: SYSTEM
Computer: Hyper-VHost02
Description:
Cluster resource 'Cluster Disk 1' (resource type '', DLL 'clusres.dll') either crashed or deadlocked. The Resource Hosting Subsystem (RHS) process will now attempt to terminate, and the resource will be marked to run in a separate monitor.
At this point we have created an override with a group containing the Hyper-V Host and set it to disable the monitor Logical Disk Fragmentation Level for the group.
Reference: Why is my 2008 Failover Clustering node blue screening with a Stop 0x0000009E?