Hi all,
I've spent days at this point trying to diagnose this one, so I'm opening it up for discussion/a bug report.
Windows Server 2019 / Hyper-V Server 2019 is ignoring the "Performance options" selection on hosts and using SMB regardless of any other administrative choice. This is using Kerberos authentication.
2 hosts, non-clustered, local storage. Hyper-V Server 2019 (I've installed 2019 Standard as well, with the same behavior)
I originally assumed that I must have a config problem. I blew two hosts away, clean installs of 2019, no configuration scripts. Just the LBFO network config and a a converged fabric network design. Same problem.
Next I assumed it had to be networking. So drop the teaming, drop the config scripts, drop the VLAN configs, drop the LAGs. 1 physical NIC with access to domain controllers for non-migration traffic. Another single physical NIC - on a crossover cable - for
live migration. Same problem.
After this, I have tried clean installs with all group policy being blocked from application on the hypervisors. I've tried clean installs with Microsoft Intel NIC drivers and Intel's v23 and v24 release NIC drivers. Same problem. I've tried the June 2019
re-release of Hyper-V Server 2019 and even a direct from DVD install (no language packs etc), so a 100% vanilla source.
Here is the problem in its simplest form (the two physical 1GbE NIC setup):
Live Migration: Kerberos & Compression
A test VM with a VHDX with a couple of GB of junk data in it to push back and forwards
All configs are verified
Windows Firewall modes are domain/private - and it makes no difference if Windows Firewall is off
Windows Defender is uninstalled and no other software (let alone security software) is installed, period. These are fresh installs
The ONLY live migration network in the Incoming Live Migration networks list is 10.0.1.1/32 on HV1 and 10.0.1.2/32 on HV2
Migration LAN: 10.0.1.0/24 (point to point via a crossover cable)
All other traffic: 192.168.1.0/24 (via a flat LAN configured switch i.e. it's in dumb mode)
VMMS listens ON THE CORRECT LAN
netstat -an | findstr 6600
TCP 10.0.1.2:6600 0.0.0.0:0 LISTENING
When performing an online/offline migration VMMS connects correctly over the >correct< LAN
netstat -an | findstr 6600
TCP 10.0.1.2:6600 0.0.0.0:0 LISTENING
TCP 10.0.1.2:54397 10.23.103.1:6600 ESTABLISHED
All fine!
Using Packet Capture on the 10.0.1.0/24 migration LAN there is plenty of chatter to/from TCP 6600. You can see the VMCX configuration state being transmitted in XML over TCP 6600 and lots of successful back and forth activity for 0.35 seconds. Then traffic
on TCP 6600 stops.
Traffic now starts up on the non-Migration network, the 192.168.1.0 network - that is NOT in the Migration networks list. A large block transfer occurs. Packet monitoring this connection shows an SMB transfer occurring. This block transfer is of course,
the VHDX file.
As soon as the block transfer completes on the 192.168.1.0 network (~16 seconds) traffic picks-up again over TCP 6600 on the 10.0.1.0 network for about 0.5 seconds and the Live Migration completes.
The only way that I can get the hosts to transfer over the 10.0.1.0 network is to add their respective FQDN entries to the local server Hosts files.
Re-doing the transfer now uses the correct 10.0.1.0 network. You can clearly see the VMCX transfer over TCP 6600, then a SMB 2.0 session is established using the value from the hosts file between source and destination over 10.0.1.0. An SMB transfer of the
VHDX occurs on the forced 10.0.1.0 network before finally the process is concluded via traffic on TCP 6600 (still on the 10.0.1.0 network) and the transfer completes successfully.
Without the Hosts file entries, Hyper-V seems to be using NetBIOS to find the migration target, it can't so it defaults to whatever network it can find a SMB connection on. However, I say again, the 192.168.1.0 network is not in the Live Migration networks
list - Hyper-V should be failing the transfer, not unilaterally deciding to "use any available network for live migration". PowerShell on both hosts confirm that this is correctly configured:
get-vmhost | fl *
....
MaximumStorageMigrations : 2
MaximumVirtualMachineMigrations : 2
UseAnyNetworkForMigration : False
VirtualMachineMigrationAuthenticationType : Kerberos
VirtualMachineMigrationEnabled : True
VirtualMachineMigrationPerformanceOption : Compression
...
Get-VMMigrationNetwork
Subnet : 10.0.1.2/32
Priority : 0
CimSession : CimSession: .
ComputerName : HV2
IsDeleted : False
Something is causing it to ignore the Compression setting, but only for VHDX transfers. Other VM data is being sent correctly over TCP 6600. As the 10.0.1.0 network isn't registered in DNS, Hyper-V isn't "aware" that it can find the destination
host over that link. Of course, in this test I do not want it to use SMB to perform this transfer, so it should not be using SMB in the first place. What I want is migration traffic to occur over a private 9K Jumbo Frame network - as I've always used - and
not bother the 1.5K frame management network.
I've clean installed Windows Server so many times to diagnose down on this I've gone dizzy! Does anyone have any bright ideas?
Thanks