I have a system with two clusters, each consisting of a shared storage and 2 hosts. We had a power blip and one of the clusters will not come online now. This is a VRTX with a windows server 2016 dc and 2 hyper-v server 2016 hosts. Failover cluster manager shows the following error:
Event ID: 1795
Cluster physical disk resource terminate encountered an error.
Physical Disk Resource name: Cluster Disk 1
Device Number: 1
Device Guid: <guid>
Error Code: 1168
Additional reason: ReleaseDiskPRFailure
Failover Cluster Manager also shows the two hosts continually starting and stopping the cluster service.
The hosts are failing to mount the cluster shared volume. They are showing the following errrors:
Event ID: 7031
The Cluster Service service terminated unexpectedly. It has done this <x> time(s). The following corrective action will be taken in 15000 milliseconds: Restart the service.
Event ID: 137
The default transaction resource manager on volume \\?\Volume{<Quorum guid>} encountered a non-retryable error and could not start. The data contains the error code.
Event ID: 1795
Cluster physical disk resource terminate encountered an error.
Physical Disk resource name: Cluster Disk 1
The other cluster is online and all of my VM replicas took over. I have tried rebooting the system (shutdown order: hosts, dc, storage and boot up in the reverse order). It looked like a persistent reservation on the storage based on the cluster events, so I tried clearing persistent reservation with clear-clusterdiskreservation -force -disk <disk>.
I'm not quite sure what else to do on this one. Is is possible to destroy the cluster and just recreate it (not sure how to do this)? I feel like that may be easier at this point in time.