We have experienced this in few environments and I thought to write this in my blog. Firstly, I’d like to share the symptoms that we experienced in this case before we discuss the fix.
We have been informed from one of our teams that Virtual Machines were not responding and unable to check the OS status from the VMware console. It was a bad situation and we had to check the status of couple of Virtual Machines and we were able to identify this error message in all of the Virtual Machine consoles “Unable to connect to the MKS: Virtual machine config file does not exist.” Also we have noticed that those Virtual Machines were residing in the same ESXi host and the LUN. Here is the message we could see in the Virtual Machine.
We tried to browse the datastore and it was not showing the content of the datastore properly.
After that checked the vmkernel log and identified there are IO reservations errors for few LUNs of the ESXi host
Here is the complete output of the log
This LUNs were locked by one of other ESXi hosts and lock was not released properly and it leads this to this unresponsive state. So we had to reset the LUN to find a fix for this.
One of the great tool that we use for lots of reasons is “vmkfstools” command line interface commands. So it had to pop up to fix this.
“vmkfstools -L lunreset /vmfs/devices/disks/<naa_id>” was issued to fix the issue lun naa ids were displaying the vmkernel log and also you can easily find those. Here is the command example
Unfortunately, there were no identified improvement with the situation and we used vml ids instead of the naa ids. To find the vml ids used “esxcfg-scsidevs -l” command, here is the similar output and you need to identify the affected LUN in the output.
Executed the LUN reset with the vml id
Finally, LUN came online and was able to browse the datastore. Virtual Machines started to response and some Virtual Machine rebooted to fix some issues.
Reference KB: 1000044