After several days of testing, I have come to the conclusion that there is a direct connection between the MPIO and iSCSIPrt event messages that I have been seeing on my HyperV Server 2012 R2 when the host starts up a VM that is running domain services.
I understand that in this scenario, (as best as I can explain it) that HyperV in conjunction with the VM AD DS services will attempt to disable write cache flushing on the controller. I believe this article tells the tale:http://support.microsoft.com/kb/285395
as does this: http://support.microsoft.com/kb/2801713
However, since in this setup the VM's VHDX file is presented to the HyperV host via iSCSI using MPIO, neither the host nor the VM have the ability to disable this capability. It is completely controlled on the iSCSI Target server (which does in fact
have write caching enabled, but it doesn't matter... or it shouldn't matter in my opinion).
It all starts on the HyperV host with Disk Event 32 (a write cache warning) during boot. During the boot phase of the VM, and while the host and VM attempt to disable cache, MPIO will loose the path the iSCSI Target and MPIO will attempt
to recover it. The result is a slew of warnings and critical events logged in HyperV. iSCSIPrt will throw these events (20, 7, 34...). Event 7 being the most ugly and repetitive. MPIO has it's own, and of course the initial warning
from disk source (event 32).
I have noticed that some tweaking can be make to prevent the path loss in MPIO by adjusting the PDORemovePeriod, and enabling a CustomPathRecoveryTime. But this will only stop the events logged by MPIO in the path loss. The iSCSIPrt will continue
to post events 20, 7, and 34, while MPIO will then (with adjusted timeouts), just acknowledge all of this temporary loss and say it was able to recover the path.
The only way I have been able to mitigate the iSCSIPrt events is by actually "enabling" write cache in the VM guest (which is in reality disabling flushing). No longer will iSCSIPrt spew a dozen Event 7 critical messages and the others...
MPIO, totally silent.
I still get the event 1539 in the VM guest (running directory services) which acknowledges and warns of not being able to disable write cache... but it couldn't do it before anyway!
So, I guess my question really is... what is the true risk here if I am not able to disable write cache on the hyperv host and VM in this setup.
Based on that article linked above:
"However, Active Directory requests all database updates be completed without caching, which the Hyper-V storage subsystem ensures in order to prevent data loss from a power failure or other unexpected reboot."
since that isn't possible when using a SAN that presents the VMs to HyperV... am I correct in saying, the risk is really on my shoulders then? hasn't it always been though in a way to make sure things are shutdown appropriately during a power outage.
How can I mitigate this better? Is there a tool I should be using between these two systems to coordinate the VM shutdowns, and eventually the host shutdowns? or am I basically needing to do some powershell scripts between the hosts?