we have DPM 2012 SP1 backup to VMs on a 3-node Windows Server 2012 R2 SP1 Hyper-V cluster directly to tape for so many months without issues.
However, this month we had a very sad scenario.
The backup starts at 12:00 AM, at around 3:30 AM VMs on two nodes out of 3 failed due to being unable to access storage.
We get events about virtual machine configuration not available for all VMs hosted on these two nodes. After less than two minutes we get events that the VM configuration is again accessible.
during the two minutes of failure, 5 VMs were reported in event viewer as degraded on both nodes. these 5 VMs stayed down after the configuration became accessible.
other VMs on both nodes restarted unexpectedly, however, one 2003 server VM showed Boot failure and another XP VM showed Blue Screen 0xF4
When we arrived on site at 5:00 AM, we started the 5 degraded VMs without issues, and resetting the 2003 and XP VMs worked also.
However, we found the CSV in redirected access for backup, and backup was still running. needless to say the VMs were very slow.
stopping the backup restored the CSV to the online state.
any insights about the cause of this trouble!
PS. we are planning to restart all the nodes and DPM server and try the backup again to see how it goes.