vSphere performance data - Part 1

There is lots of posts on retrieving performance data from your vSphere environment (I'll probably use a lot of them in this series), but here's my take on it.

My ultimate goal is to build my own database of performance data and have a nice front-end presenting this. I also want to have an API that extracts data from the performance DB which I will use in our in-house portals and dashboards.

This project can be quite big and complex so I will split it in 3 parts:

  • Retrieve performance data from vSphere
  • Push data to a database and set up a front-end
  • Build an API on top of the perf db

These will not necessarily correspond to the parts in this blog series as I suspect the posts would be quite long and the different parts will take some time to complete. They will probably overlap in time as well.

The first and biggest issue will be to retrieve the data from vSphere. With an environment of 100+ hosts and 4000+ VMs I will need to come up with an effective and scalable way to fetch the data. We want to have stats pulled as close to real-time as possible so this will be a challenge.

Next I need to invest some time in coming up with the logic around the database. With the numbers of hosts and VMs in our environment and "real-time" stats it will produce massive amounts of records. With a 20 second interval you will produce 3 records per minute, 180 per hour and 4320 per day. That's per VM. And depending on the type of database it can also be per metric. Multiply this by our 4000 VMs and you will have over 17 million records per day per metric!

Hence I will need to replicate some of the "roll-up" in i.e the vCenter performance data, but we might want to keep real-time stats longer than the one hour default.

I start out with a couple of thoughts on the technology I will use (this can change over the course of this project so no lock-in as of yet):

  • PowerCLI for retrieving data from vSphere (the new 6.5 API could show to be an alternative)
  • InfluxDB as the database (I haven't used this before, and the only real "production" experience I have is with MS SQL)
  • Grafana as the front-end (Same as above, no experience with this either)
  • An ASP.NET Core site as the API connecting to the database

The above shows that some of this will be completely new to me, except for PowerCLI and ASP.NET which I have used a lot. As I'm not a developer I fear that a lot of this will be fairly complex and several levels above my current experience, but I'm looking forward to the challenge. This is also a great opportunity for me to gain experience in these techs.

Depending on how I will retrieve the data I might need to build something to administering this. I might end up with multiple "pollers". For this I'm interesting to build a solution based on node.js, mostly because I want to get hands-on experience in a real-life application built on node.

Time will show if this project will be to big for me to tackle, but either way I hope I will learn a lot from it.

Stay tuned!

This page was modified on March 29, 2019: All wp posts