vSphere Performance – vCenter Server Appliance (VCSA) monitoring

This post is a (late) follow-up on a previous post I did about exploring the monitoring endpoints of the vCenter Server Appliance (VCSA), and an addition to the vSphere Performance blog series.

Now we will add performance metrics and health status of the VCSA to our monitoring solution. We’ll utilize the REST APIs in vCenter and feed the data into our Influx database and visualize it in Grafana.

In vCenter we have the Appliance Management page also refered to as the VAMI. We will use this as a blueprint of what we want to visualize, but we’ll try to fit the important parts into a single Grafana dashboard. ...  continue reading

vSphere Performance data – New vSphere plugin for Telegraf

Recently there was a new release of Telegraf, a monitoring agent from the guys that built InfluxDB. This new version, 1.8.0, comes with a plugin for vSphere which I’m pretty excited about!

Previously I’ve been testing Telegraf for monitoring some Linux VMs and also my InfluxDB servers and the agent works as expected and it’s as easy to use as the other products in the TICK stack from Influx.

If you’ve followed my blog series about building a monitoring solution for vSphere and other infrastructure components you know that I’ve pulled metrics with PowerCLI scripts. With this new plugin to Telegraf I want to see if I can use this as a replacement. ...  continue reading

Limiting disk i/o in vSphere

As a Service provider we need to have some way of limiting individual VMs from utilizing too much of our shared resources.

When it comes to CPU and Memory this is rarely an issue as we try to not over-committing these resources, at least not the Memory. For CPU we closely monitor counters like CPU Ready and Latency to ensure that our VMs will have access to the resources they need.

For storage this can be more difficult. Where we usually have 50-60 VMs on a host we will probably have hundreds on a Storage Array (SAN). Of course the SAN should be spec’ed to handle the IOPS and Throughput you need, but you also need to balance the amount of disk space available and maybe most importantly, the cost. Add to this that storage utilization often will be intermittent and bursty hence even more difficult to plan and control. ...  continue reading

Slides and scripts from VMUG sessions

I had the privilege of delivering 3 sessions at VMUG Norway this week in Oslo, Trondheim and Bergen.

With the extremely nice weather in Norway this week in mind the attendance were great and as always the discussions were valuable.

My session on vSphere Performance monitoring were the short version of the blog series I did about how we built our solution for doing performance monitoring of vSphere with InfluxDB and Grafana, and how we easily can customize with adding metrics and datasources. ...  continue reading

vSphere Performance data – Monitoring VMware vSAN performance

In my blog series on building a solution for monitoring vSphere Performance we have scripts for pulling VM and Host performance. I did some changes to those recently, mainly by adding some more metrics for instance for VDI hosts.

This post will be about how we included our VSAN environments to the performance monitoring. This has gotten a great deal easier after the Get-VSANStat cmdlet came along in recent versions of PowerCLI.

We will build with the same components as before, a PowerCLI script pulling data and pushing it to an InfluxDB time-series database and finally visualizing it in some Grafana dashboards. ...  continue reading

Running Grafana on the Red Hat Openshift Container Platform

Last year we started building our own solution for Performance Monitoring of our Infrastructure platform with the focus on the VMware vSphere environment. The components used for this solution is PowerCLI for extracting the metrics, InfluxDB for storing the metrics, and Grafana for presenting the metrics.

I did a Blog series on this project which explains in detail what we did when building the solution.

The solution has been very well received and are used daily by many of my colleagues, and we frequently update the solution with new metrics and dashboards. ...  continue reading

More DRS group automation

Following up on my last post on Automating DRS Groups with PowerCLI I found that we also need to automatically remove VMs and Hosts from a given DRS Group.

Although I could have included this in the previous script which creates the groups and adds members I wanted to separate them. There could for instance be times when you would like to run such a script on a different interval than the one that adds members as well as other scenarios. I believe it’s also a good practice to build smaller scripts and functions that have more specific tasks. You could argue that the creation script also could be split up into a part that creates groups and a part that adds the members, and even maybe further splitting Hosts from VMs but that would be a future task.

Anyways, the removal of entities like Hosts and VMs from a DRS group is as easy as putting them in. ...  continue reading