I've recently replaced a PERC H310 RAID controller with a PERC H710p RAID controller on a Dell PowerEdge R820 running Windows Server 2012 R2 and started experiencing this problem. Everything I've investigated so far doesn't point to this being the issue,
but wanted to mention it in case someone encountered something similar.
We were having terrible disk performance until the RAID card was swapped out, and as a precaution I moved all existing VMs off of this Hyper-V host server. Once everything was back up and running, I began a live migration back to the affected host server
and it blue screened. Tried again and same result. I then tried copying the VM images manually through a UNC share and hit the same problem. It doesn't always happen at the same time during the copy. I've had blue screens happen 4-5 GB into a transfer, and
200 GB into a transfer.
I've updated the RAID controller driver and firmware to the latest available from Dell, and have installed the latest BIOS and chipset driver. The server has Broadcom 5720 series NICs, updated with the latest drivers and firmware provided by Dell. All Windows/Microsoft
updates have been applied.
After all these firmware/driver updates the blue screens still kept occurring during network transfers. All the minidumps show a 0x133 DPC_watchdog_violation error, where the DPC time allotment is 500 ticks and the blue screens happening at 501 ticks. Running
the minidumps through Windows Debugger pointed to tcpip.sys, netio.sys, and vmswitch.sys initially. Since tcpip.sys and netio.sys aren't typically the issue, I looked around for anything related to vmswitch.sys being the problem.
I disabled the Hyper-V vSwitch in the OS and did a transfer 100% successfully, with the traffic running through the same NIC the vSwitch was configured to use. Once I re-enabled the vSwitch and transferred more files......the blue screen came back.
In researching, I found reports of this issue when a Hyper-V host was using NIC Teaming. We don't have any of the NICs on this server teamed, but I figured it wouldn't hurt to apply the latest hotfix that addressed the issue.3031598 - even after applying it I was getting more bluescreens. I couldn't find a way to use the updated vmswitch.sys that came with the hotfix (6.3.9600.17714); I tried deleting and recreating
the vSwitch, but the old driver (6.3.9600.16384) is what gets applied and searching the OS for an updated driver doesn't turn up anything. I also can't find any info online about manually updating the driver after applying a hotfix.
I'm fairly certain vmswitch.sys is the issue, but I don't know where to go from here. Are there any NIC or vSwitch settings I can adjust to help with this? Has anyone encountered a similar issue? Can anyone lend a hand in diagnosing this issue? I found some
good resources on debugging and troubleshooting this issue further (2 URLs below), but this has gone from "this is a good learning experience" to "this needs to get done" in the few weeks I've been troubleshooting.
http://blogs.msdn.com/b/ntdebugging/archive/2012/12/07/determining-the-source-of-bug-check-0x133-dpc-watchdog-violation-errors-on-windows-server-2012.aspx
http://blogs.msdn.com/b/ntdebugging/archive/2009/12/11/test.aspx
All the minidump's from every time this has happened can be viewed here: http://1drv.ms/1R6CfAO.
Here are all the suspects from those minidumps:
Probably caused by : vmswitch.sys ( vmswitch!VmsPlcApplyPolicy+26d )
Probably caused by : vmswitch.sys ( vmswitch!VmsPlcApplyPolicy+1da )
Probably caused by : NETIO.SYS ( NETIO!ProcessCallout+772 )
Probably caused by : tcpip.sys ( tcpip!TcpTcbReceive+d9 )
Probably caused by : vmswitch.sys ( vmswitch!VmsRouterForwardPackets+27a )
Probably caused by : NETIO.SYS ( NETIO!NetioAllocateAndReferenceCopyNetBufferListEx+4c )
Probably caused by : tcpip.sys ( tcpip!TcpValidateReceive+14 )
Probably caused by : klim6.sys ( klim6+3013 )
Probably caused by : vmswitch.sys ( vmswitch!VmsPtNicPvtPacketRouted+cf )
Probably caused by : NETIO.SYS ( NETIO!NetioAllocateAndReferenceCopyNetBufferListEx+7 )
Probably caused by : vmswitch.sys ( vmswitch!VmsMpNicPvtPacketForward+184 )
Probably caused by : tcpip.sys ( tcpip!Ipv4pFragmentPacketHelper+6d1 )
Probably caused by : ntkrnlmp.exe ( nt! ?? ::FNODOBFM::`string'+ad68 )
Probably caused by : ntkrnlmp.exe ( nt! ?? ::FNODOBFM::`string'+ad68 )
Probably caused by : tcpip.sys ( tcpip!InetInspectReceiveTcpDatagram+b0 )
Probably caused by : vmswitch.sys ( vmswitch!VmsRouterForwardPackets+1f3 )
Probably caused by : tcpip.sys ( tcpip!TcpTcbFastDatagram+1150 )