Afternoon
****Background Info ******
We have currently carried out an upgrade on our Left Hand P4000 SAN. We updated the HyperV Host Servers and SAN using SPP 2013.020.2013. After we carried out the initial upgrade we realise that the Failover Cluster manager was unable to Communicate with the Witness or the Cluster Shared Volumes.
We logged a call with both HP (SAN Providers) and Microsoft Premier Support. After a lengthy discussion with both parties we came to the realisation that the issue was being caused by the DSM for MPIO. On Microsoft recommendations we removed the HP DSM for MPIO software from the 4 SAN Nodes, after removing the software from the First Node we noticed an immediate improvement as we were able to talk to certain Shared Volumes and the Witness. Once the Software had been removed from all Nodes are SAN appeared to be fully functional.
At this Stage both Microsoft & HP recommended that we install the latest version of HP DSM. So we installed “HP StoreVirtual DSM for MPIO Installer AT004 10503” this brought it upto version 10.0.0.1486.1. The San appeared to remain stable, we were able to talk to the Witness, all CSV’s and we spot checked Live Migrate, Quick Migrate and Move VM to another Node. All appeared to work at this point.
***** Issue*****
The following day we carried out a full health check to confirm that all servers could migrate using all 3 methods. During our testing we discover several of the VM’s were unable to migrate from a particular Node2. We were able to migrate VM’s onto this node but not off it. Several other VM also seemed to have issues migrating onto other Nodes.
The VM’s affected have multiple OS, live in different CSV’s and are managed by different Nodes. Although these VM’s are unable to Live Migrate the majority of them will quick migrate or can be moved to another node.
*****Testing******
During are testing we have tried several tests to see if we can stop the potential issue. None of which has caused any difference when live migrating
- Restarted the HyperV Host Server
- Re installing the virtual quest services
- Remove the network connection from the VM
- Remove the Virtual Network Adapter
- Readded the Virtual Adapter (Synethic Adapter)
- Re created the Live Migration Team
Questions
- Has anyone else come across these issues after applying latest SPP version
- Is there any best practice methods for troubleshooting this issue
- Which is the best DSM for MPIO for a Left Hand P4000 SAN
- We are thinking of reverting back to the Microsoft DSM for MPIO, is this likely to cause any additional issues?
- What is the best way to revert back to the Microsoft DSM for MPIO (one Host Server at a time, or all at the same time?)
At this point it might be relevant to mention that we have set up a HP Team for Live Migration. All Teams have been set identically, same NICs (one onboard nic and one external nic) all set with LACP, same naming conventions. We have noticed that when we try to migrate from Node2 the Network Migration Team traffic fluctuates between 0.1% & 0.3% of the Nic Capability, while on other Host Servers it transfers at 35-40%.