vSphere Performance data – Part 3 – Get-stat

This is the Part 3 in my series on vSphere performance data.

Part 1 discussed the project, Part 2 was about checking the methods of retrieving data and ended with me realizing I would use Get-Stat against all (4000) VMs to retrieve data.
Part 2 was posted over a month ago as I have been busy preparing for the VCP 6.5 DCV exam (which I passed btw) as well as upgrading/migrating our vCenter servers, but I have actually been able to do a lot of work on this project as well so there will be some updates in the next couple of days.

Previously I had done some benchmarks on retrieving data from VMs using PowerCLI and the Get-Stat cmdlet. I would land on roughly 1 second per VM to retrieve and process the metrics I wanted. As I discussed in part 2 that would result in 4000 seconds to retrieve the data I needed, and with my goal to retrieve all 20 sec metrics within 5 minutes I would need to have around 14 scripts running simultaneously to achieve this.

There’s (at least) a couple of things with this that worries me. First is the management and operation of 14 scripts running every 5 minutes, secondly the potential extra load this will put on vCenter and the environment.

Anyways, I started out building a script..

There are lots of resources on using Get-Stat, one that does a great job on explaining the basics is this one from LucD, even though it’s from 2009 it’s still valid. Another one that talks about a similar project to mine is this one from orchestration.io

I already had some thoughts on the need for parallellization but decided to build a script that didn’t care about that at first.

(To begin exploring Get-Stat and other stat cmdlets you should read the blogpost from LucD referred above)

First I started out with checking the different stats that is available for a VM:

As I’ve described in the previous posts in this series we already have some performance dashboards so we had a fairly clear understanding on which metrics we would want to use:

  • CPU usage/utilization
  • CPU Ready & Latency
  • MEM usage/utilization
  • Network throughput
  • Disk throughput (kBps & IOPS)
  • Storage latency

The list above is missing several of these. It turns out that you need to add the -RealTime switch to get access to those missing (and a lot more):

So, with access to all of these we mapped the desired list of metrics to the corresponding Get-StatType name. With that retrieving metrics is as easy as:

This will retrieve a lot of stats! The cmdlet will retrieve stats for the given metrics at 20 second intervals for the last hour!

I would only need the last 5 minutes as I would run the script on that interval so I can make use of the -MaxSamples parameter and give it the value 15 (3 metrics per minute x 5) and with that I have a lot less stats to work with

One thing to be aware of is that many of the stats will be per instance. Looking at one of the cpu metrics you’ll find several instances with the same timestamp. This will correspond to the number of vCPUs this VM has and one metric which is the aggregation identified by the one without a value in “Instance”:

Please note that you need to examine all of the different metrics you retrieve to understand if you can use the metric without a “Instance” value or not to get the aggregation for the VM. You will also need to understand what the metric actually is and how to read it. For instance CPU Ready might need to be calculated to a percentage value and the IOPS counters (disk.number…average) might need to be grouped if you want the total.

After exploring each of the metrics and trying to understand how to read them I looked at how to build a script for traversing VMs. A quick sudo-coded version would be:

  1. Define metrics and how much to pull
  2. Connect to vCenter
  3. Get VMs
  4. Traverse and pull stats for VMs
  5. Process and build an output object per timestamp for each VM
  6. Output to file or post to an API

#5 above will depend heavily on what I need to do in #6 so I decided not to build the entire script before I had looked more at InfluxDB and how I would push the data to the database. This will be the focus for the next part of this series

Rudi

Working with Cloud Infrastructure @ Intility, Oslo Norway. Mainly focused on automating Datacenter stuff. All posts are personal

Leave a Reply