Good morning!
I'm running into a problem with Hyper-V and I don't have a lot of information to troubleshoot off of. I'm hoping the experience and knowledge here will help me zero in on a solve for this.
Basics:
I'm trying to enable replication of a single VM'd server from my client's office in Dothan to their office in Auburn (War Eagle!). I'm also trying tore-enable replication of two user VMs from Auburn to Dothan. Dothan's hypervisor is MS Hyper-V running in Windows Server 2019 DE. Auburn's hypervisor is MS Hyper-V running in WS2016 DE. The VM to be replicated
from Dothan is WS2019 DE acting as a DC and an file server. The VMs being replicated from Auburn are Win10 Pro workstations. The purpose of all replication is simple redundancy; offsite backup of each VM in case of catastrophe. The WAN infrastructure is VPN
over broadband. The organization uses ADDS with all users and machines in one domain.
History:
I'd successfully set up replication of the two VMs from Auburn to Dothan a few weeks ago. I ran into issues with Kerberos that were solved by editing the registry on each hypervisor so that Kerberos had to use TCP, and eliminating packet fragmentation at the
VPN routers. Those VMs were replicating consistently until this past weekend.
During the weekend, I had to reformat the volume where the VMs are stored in Dothan. The Auburn server is a mess that I inherited from the last IT guy, nothing is virtualized on this machine and I want some redundancy, so I was going to use DFS to replicate critical files from this server to Dothan. However, when I built Dothan I used ReFS for the storage volume, and since DFS won't run on ReFS, I decided to back up Dothan, burn down the storage volume, reformat with NTFS and restore. I used Win Server Backup in WS2019 to perform a bare metal (full) backup to an external disk. Before running the backups, I merged all checkpoints back into each VHD and created new thin-provisioned VHDs from my thick-provisioned VHDs. I successfully tested each VM with the thin-prov VHDs attached before shutting all VMs down and running the backup set.
Problem:
After formatting the storage volume NTFS, I ran into issues restoring Hyper-V directly from WSB. Instead, I restored the VHDs, built new VMs for my 1 server VM and 2 replicated user VMs, and reconnected the VHDs. Everything boots up in Hyper-V and the machines
run normally. The server VM is up and running with no issues. All the data is there and I can remote into each machine. Once the dust had settled, I attempted to enable replication of the server VM from Dothan to Auburn but received the following error: https://1drv.ms/u/s!Av-G2MTFk-0ZgupOcDu6lQhOgOmLUQ?e=52vnr2
This is proving to be maddening for me because there's no data associated with the error. No code or whatnot. I'm not receiving any errors from the replication wizard in terms of "unable to determine configuration", authentication errors, or anything during the process to make me think these two servers can't talk to each other. They're talking about this replication, they're just not kicking it off.
In an attempt to control for other variables, I decided to re-enable replication of the two VMs from Auburn to Dothan. When I tried, I got this error: https://1drv.ms/u/s!Av-G2MTFk-0ZgupPTJzbNDNF_oiPgw?e=hWtNbE
Unspecified errors are a joy...
I've done my rounds on search engines and tried everything I can think of, but I just keep running into this wall. These VMs were replicating fine before all of this.
Misc:
The Dothan server uses two volumes for Hyper-V. Hyper-V configuration files and small VHDs are stored on E:, which is SSD-based. The Dothan server VM uses a small VHD on E: for it's operating system, and it's smart paging file is parked there too.
Large VHDs and all Hyper-V Replica (inbound-replicated VMs) is stored on D:. The Dothan server VM uses a VHD stored on D: for data storage in its role as a file server. Both my Auburn VMs are stored on D: when they replicate to Dothan, which is exactly how they were before I made all these changes over the weekend.
I've rebooted both hypervisors and I've checked all the VSS writers. No errors there. I can create standard and production checkpoints for each VM, the two VMs in Auburn and the server VM in Dothan.
If anybody can help point me in the right direction on this, I would greatly appreciate it, and I'd be happy to take you for a beer anytime you're in this part of the American south!
-MB