vSphere Performance – vCenter Server Appliance (VCSA) monitoring

This post is a (late) follow-up on a previous post I did about exploring the monitoring endpoints of the vCenter Server Appliance (VCSA), and an addition to the vSphere Performance blog series.

Now we will add performance metrics and health status of the VCSA to our monitoring solution. We’ll utilize the REST APIs in vCenter and feed the data into our Influx database and visualize it in Grafana.

In vCenter we have the Appliance Management page also refered to as the VAMI. We will use this as a blueprint of what we want to visualize, but we’ll try to fit the important parts into a single Grafana dashboard. ...  continue reading

Welcome to VMworld

VMworld 2018 comes to an end

VMworld 2018 is over. As always I’m leaving with lots of great impressions and lots of content to digest and further explore over the coming weeks.

I think it has been even clearer after this year that VMware is focusing on their Cloud strategy together with partners like AWS and IBM, that vSAN is the storage solution they want you to go forward with and that together with NSX this will be the base for the future.

It was also interesting that only since last year the focus on Containers and Kubernetes has really picked up the pace with lots of new offerings and solution as well as the acquisition of Heptio...  continue reading

HPE iLO affects ESXi management agents – hosts in “not responding”

The last months we have had several issues with ESXi hosts going in a “Not responding” status. The VMs are still active and online in this scenario, but the ESXi cannot be managed. This also affets backup as it won’t be able to reach the VMs through the APIs.

Previously we have normally just restarted the management agents on the host and it has been able to connect to vCenter and after this we have managed to migrate the VMs off the host. Lately this hasn’t worked and we have been forced to boot the host with the result of the VMs getting rebooted by HA and eventually started on a different host.

Almost all of our ESXi hosts is HPE servers. We have also seen in many of these cases that iLO (Integrated Lights-out) management has not been accessible or not responsive. ...  continue reading

Limiting disk i/o in vSphere

As a Service provider we need to have some way of limiting individual VMs from utilizing too much of our shared resources.

When it comes to CPU and Memory this is rarely an issue as we try to not over-committing these resources, at least not the Memory. For CPU we closely monitor counters like CPU Ready and Latency to ensure that our VMs will have access to the resources they need.

For storage this can be more difficult. Where we usually have 50-60 VMs on a host we will probably have hundreds on a Storage Array (SAN). Of course the SAN should be spec’ed to handle the IOPS and Throughput you need, but you also need to balance the amount of disk space available and maybe most importantly, the cost. Add to this that storage utilization often will be intermittent and bursty hence even more difficult to plan and control. ...  continue reading