I have a 3 node HP cluster with each node directly attached to a JBOD device. Each node shows 4 CSV's that are all healthy.
I upgraded 2 of them from 2012R2 by draining node, reinstalling 2016 and then adding them back to the cluster and everything worked fine.
Started the same process off today and all the disks went offline along with all the machines in the cluster apart from 2 still on the original host that failed to migrate.
Got everything back by unpausing and fail back of the machines and restarting all the failed migrations.
Needless to say, that was not my favourite part of the day.
Event details for the drives show events such as
===============
Cluster Shared Volume 'Volume1' ('Cluster Virtual Disk (Volume 1)') has entered a paused state because of '(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished.
Cluster resource 'Cluster Virtual Disk (Volume 1)' of type 'Physical Disk' in clustered role '1912d37e-0360-434e-9212-3083db0d23fb' failed. The error code was '0x2' ('The system cannot find the file specified.').
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster
Manager or the Get-ClusterResource Windows PowerShell cmdlet.
Cluster physical disk resource online failed.
Physical Disk resource name: Cluster Virtual Disk (Volume 1)
Device Number: 4294967295
Device Guid: {00000000-0000-0000-0000-000000000000}
Error Code: 3224895541
Additional reason: AttachSpaceFailure
Cluster resource 'Cluster Virtual Disk (Volume 1)' of type 'Physical Disk' in clustered role '1912d37e-0360-434e-9212-3083db0d23fb' failed. The error code was '0xc0380035' ('The pack does not have a quorum of healthy disks.').
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster
Manager or the Get-ClusterResource Windows PowerShell cmdlet.
=======================
The pools are owned by this last node.
After bringing the last node online, I saw that all the CSV's showed an operation status of regenerating for a while before going back to healthy.
Anyone seen this behaviour before and have some pointers of anything that I might have missed in the process?
http://absoblogginlutely.net