Hi All,
Thanks in advance for anyone who can shed a light on this issue.
We have an EMC storage that hosts ESXi and Hyper-V. During firmware update (VAAI related), ESXi hosts had no issue but Hyper-V host lost all LUNs except one. We have Hyper-V cluster managed by failover cluster manager. On one of the Hyper-V host, B01, only
one LUN was there. After B01 was restarted manually, all VMs were back up running.
Storage support people advised that all upgrade was clean per their logs. Also ESXi hosts were running fine. Now that the Hyper-V Hosts are running ok, but still I like to know what happened. My guess is maybe BL01 locked LUNs for some reason and BL01 can't
provide service either in the mean time. Any comment or idea on this? Thanks.
B01 system log:
10:21 am
The description for Event ID 37 from source mpio cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can
install or repair the component on the local computer.
If the event originated on another computer, the display information had to be saved with the event.
The following information was included with the event:
\Device\MPIODisk2
Microsoft DSM
The resource loader failed to find MUI file
10:30:42 am
Event ID: 70
Initiator failed to connect to the target. Target IP address and TCP Port number are given in dump data.
10:42:54 am
Event ID: 5120
Cluster Shared Volume 'Volume5' ('Cluster Disk 4') is no longer available on this node because of 'STATUS_VOLUME_DISMOUNTED(c000026e)'. All I/O will temporarily be queued
until a path to the volume is reestablished.
10:43:06 am
Event ID: 20
Connection to the target was lost. The initiator will attempt to retry the connection.
11:46:32 am
Event ID: 113
Failed to allocate VMQ for NIC 4B8980C7-7EAC-449D-B614-5B0B00993C8D--449703B5-1D3A-4F4D-86F4-AD1147583C35 (Friendly Name: VFILE) on switch 6813D4F6-D891-4A4C-8037-CDF2B5DF5219
(Friendly Name: Villa Cluster Logical Switch). Reason - Maximum number of VMQs supported on the Protocol NIC is exceeded. Status = Insufficient system resources exist to complete the API.
12:24:45 pm
Event ID:5120
Cluster Shared Volume 'Volume3' ('Cluster Disk 5') is no longer available on this node because of 'STATUS_MEDIA_WRITE_PROTECTED(c00000a2)'. All I/O will temporarily be queued
until a path to the volume is reestablished.
12:27:36 pm
Event ID: 39
Initiator sent a task management command to reset the target. The target name is given in the dump data.
12:28:24 PM
Event ID: 153
The IO operation at logical block address 0xed8c5b50 for Disk 3 was retried.
12:28:24 pm
Event ID: 140
The system failed to flush data to the transaction log. Corruption may occur in VolumeId: LUN3, DeviceName: \Device\HarddiskVolume586.
(STATUS_DEVICE_NOT_CONNECTED)
12:28:24 pm
Event ID: 15
The device, \Device\Harddisk4\DR4, is not ready for access yet.
12:28:24 pm
Event ID: 5120
Cluster Shared Volume 'Volume4' ('Cluster Disk 6') is no longer available on this node because of 'STATUS_DEVICE_NOT_CONNECTED(c000009d)'. All I/O will temporarily be queued
until a path to the volume is reestablished.
12:28:24 pm
Event ID: 5121
Cluster Shared Volume 'Volume5' ('Cluster Disk 4') is no longer directly accessible from this cluster node. I/O access will be redirected to the storage device over the network
to the node that owns the volume. If this results in degraded performance, please troubleshoot this node's connectivity to the storage device and I/O will resume to a healthy state once connectivity to the storage device is reestablished.
12:28:24 PM
Event ID: 140
The system failed to flush data to the transaction log. Corruption may occur in VolumeId: LUN3, DeviceName: \Device\HarddiskVolume586.
(A device which does not exist was specified.)
12:30:16 pm
Event ID: 1230
Cluster resource 'SCVMM TSR1' (resource type 'Virtual Machine', DLL 'vmclusres.dll') did not respond to a request in a timely fashion. Cluster health detection will attempt to automatically recover
by terminating the Resource Hosting Subsystem (RHS) process running this resource. This may affect other resources hosted in the same RHS process. The resources will then be restarted.
The suspect resource 'SCVMM TSR1' will be marked to run in an isolated RHS process to avoid impacting multiple resources in the event that this resource failure occurs again.
Please ensure services, applications, or underlying infrastructure (such as storage or networking) associated with the suspect resource is functioning properly.
12:30:16
Event ID: 1146
The cluster Resource Hosting Subsystem (RHS) stopped unexpectedly. An attempt will be made to restart it. This is usually associated with recovery of a crashed or deadlocked
resource. Please determine which resource and resource DLL is causing the issue and verify it is functioning properly.
12:42:19 pm
Event ID: 21502
SCVMM TSR1 Configuration' failed to unregister the virtual machine configuration during the initialization of the resource: The wait operation timed out. (0x00000102).
1:28:51 pm
Event ID: 1074
The process Explorer.EXE has initiated the restart of computer BL01 on behalf of user COMPANYABC\Pepole1 for the following reason: Other (Unplanned)
Reason Code: 0x5000000
Shutdown Type: restart
Comment:
1:58:56 pm
Event ID: 6008
The previous system shutdown at 1:51:55 PM on 26/11/2015 was unexpected.
1:58:23 pm
Event ID: 41
The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.