So setting up VMware HA for vCenter 2.5 and ESX 3.5 can be finicky some times. Here is what to check if you are getting the dreaded “HA agent has an Error.” This is an add-on to my last post about VMware HA. I do this all the time at work, and figured I’d just put it out there. Thanks to the guys at work for writing most of this, I just prettied it up a bit and made it easier to read and understand
Most of this is done on the console of ESX.
- Verify that host name is lowercase and fully qualified:
- Verify that hostname is shortname only and lowercase:
- Verify that the correct service console IP is displayed:
- Verify that host name in
/etc/hostsis lowercase and both FQDN and shortname are present
- Verify that search domain in
/etc/resolv.confis present and lowercase
- Verify that host name in
/etc/sysconfig/networkis FQDN, all lowercase
- Verify that the host name in
/etc/vmware/esx.confis FQDN, all lowercase
- Verify that the
uname -ais lowercase
- Verify that host name is in your DNS server and is lowercase
nslookup hostnameshould respond with the service console IP
nslookup FQDNshould respond with the service console ip
nslookup ipaddressshould respond with the FQDN
- Make sure the route for the service console is correct. (Ping from each host to the others)
- Verify that all primary service consoles have the same name (ie Service Console).
- Verify that all primary service consoles are in the same IP subnet.
- If VMotion the vmkernel port is on same vSwitch as primary Service Console, use
das.allowVmotionNetworks=1in the Advanced HA Settings of the cluster.
- If the host has multiple service consoles, use KB 1006541 and the
das.allowNetwork0in the Advanced HA Settings of the cluster to ensure that only the primary service console is allowed.
- Verify that you have the appropriate licensing for HA, and have available licenses: In LM Tools, perform a status enquiry, verify that you have VC_DAS licenses available.
16. Once you have met all of the above criteria, enable HA.
- If, after you have verified all the above, and HA still won’t configure:
a. On the host, stop vpxa:
service vmware-vpxa stop
b. The host will show not responding in VC after a while
c. uninstall aam
rpm -qa | grep aam
rpm -e (package names output from command above)
rpm -e (other package names output from command above)
find / -name aam
V. remove any directories you find from the command above.
d. Disconnect the host from VC
e. Re-connect the host to VC
f. This will force the VPXA package to re-deploy, as well as the HA packages to re-deploy.
g. Re-configure all the hosts for HA
18 .Upgrade to ESX 3.5 U4 (or above) and VC 2.5U4 (or above)
19 .After upgrading, do the following: add
das.bypassNetCompatCheck=true to the Advanced HA Settings of the cluster if it continues to be a pain.
After you have done all that and it still continues to fail Check your Networking, Storage or something else because it isn’t VMware!