ESXi Hosts Disconnecting From vCenter - Investigating ESXi Heartbeats

If you’re seeing issues with ESXi hosts disconnecting from vCenter every ~60 seconds for a very brief period of time and immediately reconnecting, while not having any impact on VM network traffic (i.e., nothing else is going offline or dropping packet’s), read on. Chances are your issues relate to vCenter not receiving

ESXi hosts send UDP heartbeats to vCenter (destination port 902) every 10 seconds. By default, if vCenter doesn’t see one of these heartbeats in 60 seconds the host goes into a Not Responding/Disconnected state, though typically recovers almost immediately unless there are other network issues.

Confirming the issue

We can verify that this the root cause of our disconnections in multiple ways, but these two are my preferred.

  1. vCenter Web UI
  2. tcpdump

Verifying ESXi host disconnections via vCenter Web UI

Navigate to vCenter > Monitor > Events and filter for not responding in the Description field. You should see the host(s) in question disconnecting every 60 seconds.

Verifying ESXi host disconnections via tcpdump

SSH into vCenter (you will need shell access enabled) and run tcpdump -n udp dst portrange 902-902. This will show all UDP packets received destined to port 902. This will be the heartbeat traffic.

In this case we have two groups of hosts, the first on 10.250.6.0/24, which are working as expected, and the second to on 10.250.7.0/24, which are experiencing the disconnection issues. Let’s have a look at the output. In this example our vCenter appliance is at 10.250.10.25.

[email protected] [ ~ ]# tcpdump -n udp dst portrange 902-902
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
16:25:20.831491 IP 10.250.6.12.42007 > 10.250.10.25.902: UDP, length 334
16:25:21.073519 IP 10.250.6.11.16964 > 10.250.10.25.902: UDP, length 334
16:25:30.837316 IP 10.250.6.12.11110 > 10.250.10.25.902: UDP, length 334
16:25:31.075792 IP 10.250.6.11.34471 > 10.250.10.25.902: UDP, length 334

We can see vCenter is receiving heartbeat packets from two hosts on the 10.250.6.0/24 network, but none from 10.250.7.0/24.

Next let’s verify our ESXi hosts are sending the heartbeat IPs (and that they’re being sent to the correct IP).

Verifying ESXi hosts are sending heartbeat packets

SSH into the ESXi host(s) and once again let’s run tcpdump: tcpdump-uw -n udp dst portrange 902-902

[[email protected]:~] tcpdump-uw -n udp dst portrange 902-902
tcpdump-uw: verbose output suppressed, use -v or -vv for full protocol decode
listening on vmk0, link-type EN10MB (Ethernet), capture size 262144 bytes
06:16:42.248122 IP 10.250.7.11.36024 > 10.250.10.25.902: UDP, length 332
06:16:52.254367 IP 10.250.7.11.31275 > 10.250.10.25.902: UDP, length 332

So we can definitely see the packets are being sent out, and going to the correct IP, however, they’re not being received by vCenter.

At this point the issue was resolved by an update to the firewalls rules between the ESXi hosts and vCenter. Your root cause may be different, but either way you’ll be able to identify whether missing heartbeats are the cause of the disconnects.

Confirming the issue is resolved

Once the root cause has been been identified in your environment we can once again look at a packet dump on vCenter and confirm the packets are being received (as well as logs no longer indicating disconnects)

[email protected] [ ~ ]# tcpdump -n udp dst portrange 902-902
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
16:49:11.519034 IP 10.250.6.11.11804 > 10.250.10.25.902: UDP, length 334
16:49:11.560495 IP 10.250.6.12.14014 > 10.250.10.25.902: UDP, length 334
16:49:12.272670 IP 10.250.7.12.27974 > 10.250.10.25.902: UDP, length 332
16:49:12.345234 IP 10.250.7.11.15032 > 10.250.10.25.902: UDP, length 332

There we have it, heartbeats being received from hosts on both ESXi networks.


If you enjoyed this post consider sharing it on , , , or , and .