vSphere Performance data – Part 4 – InfluxDB

This is Part 4 of my series on vSphere Performance data.

Part 1 discusses the project, Part 2 is about exploring how to retrieve data, Part 3 is about using Get-Stat for the retrieval. This post will be about the database used to store the retrieved data, InfluxDB.

The last post left of with the beginning of a script that had retrieved data from vCenter. Before I can finish that script I need to have somewhere to put that data. As I discussed in Part 1 I had decided to use InfluxDB for this purpose.

InfluxDB is a time-series database built for storing large amounts of timestamped data. The database promises high-performance for time-series data and gives interfaces for interacting with the data in a SQL-like fashion. To learn more about it, and get all the marketing stuff visit their website

InfluxDB is currently in version 1.2 and that is the version we will use. It seems they have made some changes in their latest versions so please be aware of this if you already have Influx installed and want to compare.

So Influx needs to be downloaded and installed. There are lots of ways to do this and I will not go through this as it's pretty well explained on their Getting started page.

Key concepts

I started out with a RedHat 7 VM with 1 CPU and 4 GB RAM (I have also used CentOS which is free compared to RHEL). The download and installation is as I've explained really easy.

So after installing Influx on a VM I started exploring.

The data in Influx is organized by time series, a time serie will be a measured value. In my instance it could be "cpu_ready". This will be referred to as a "measurement". Each measurement will consist of zero to many "points". A point is like a "point-in-time" sample of what you are measuring. Each point will contain the measurement, the timestamp and a value field, as well as zero to many key-value tags which could be some metadata about this value.

We can think of a measurment as kind of similar to an SQL table. The measurements' primary index would be the time (timestamp). The tags and fields would be columns. One thing to be aware of is that tags are also indexed while fields are not. One key difference with an SQL table is that you don't define the "tables" or measurements up front. They will be created and indexed as they are written to the database.

So with this in mind I could start writing data.

Writing and querying data

I first created a database from the influx CLI called "testing"

> create database testing
> show databases
name: databases
name
----
_internal
testing

Continuing with CPU Ready as an example, one point could be written like this to the database:

INSERT cpu_ready,vm=VM2012,host=lab-esx-001,vcenter=vcenter-001,unit=percent value=5 1499957981000000000

We are starting with defining the measurement as "cpu_ready". Then we create som key-value tags, namely "vm", "host", "vcenter" and "unit". After the tags we have the value field and lastly the timestamp. Note that the timestamp by default will be a unix timestamp in nanoseconds

We could have more than 1 value field, then you would need to separate them with commas. You can also omit the timestamp to have Influx write the local timestamp.

With a point written to the database we can do a SELECT query to retrieve the written data:

> select * from cpu_ready
name: cpu_ready
time                host        unit    value vcenter     vm
----                ----        ----    ----- -------     --
1499957981000000000 lab-esx-001 percent 5     vcenter-001 VM2012

With a couple of more points written we'll get some more results

> select * from cpu_ready
name: cpu_ready
time                host        unit    value vcenter     vm
----                ----        ----    ----- -------     --
1499957981000000000 lab-esx-001 percent 5     vcenter-001 VM2012
1499957981000000000 lab-esx-002 percent 5     vcenter-001 VM2008
1499957981000000000 lab-esx-001 percent 5     vcenter-001 VM2016

If you want to use Influx for storing data I really suggest that you spend some time in exploring and trying different ways to store your data. Even though it is similar to SQL there are differences and this could infer other ways to store the data than how you are used to. These kinds of databases, as well as other document db's, will have more difficulties with relations than your traditional Relational Database systems.

You can group data in Influx based on your tags. Let's group on host as this is the differentiator in my small example

> select * from cpu_ready group by "host"
name: cpu_ready
tags: host=lab-esx-001
time                unit    value vcenter     vm
----                ----    ----- -------     --
1499957981000000000 percent 5     vcenter-001 VM2012
1499957981000000000 percent 5     vcenter-001 VM2016

name: cpu_ready
tags: host=lab-esx-002
time                unit    value vcenter     vm
----                ----    ----- -------     --
1499957981000000000 percent 5     vcenter-001 VM2008

Let's insert three new points for a different timestamp

> INSERT cpu_ready,vm=VM2012,host=lab-esx-001,vcenter=vcenter-001,unit=percent value=3.4 1499958849000000000
> INSERT cpu_ready,vm=VM2016,host=lab-esx-001,vcenter=vcenter-001,unit=percent value=8.6 1499958849000000000
> INSERT cpu_ready,vm=VM2008,host=lab-esx-002,vcenter=vcenter-001,unit=percent value=5.1 1499958849000000000
> select * from cpu_ready
name: cpu_ready
time                host        unit    value vcenter     vm
----                ----        ----    ----- -------     --
1499957981000000000 lab-esx-001 percent 5     vcenter-001 VM2012
1499957981000000000 lab-esx-002 percent 5     vcenter-001 VM2008
1499957981000000000 lab-esx-001 percent 5     vcenter-001 VM2016
1499958849000000000 lab-esx-002 percent 5.1   vcenter-001 VM2008
1499958849000000000 lab-esx-001 percent 8.6   vcenter-001 VM2016
1499958849000000000 lab-esx-001 percent 3.4   vcenter-001 VM2012

As you can see they are ordered by time, but you can of course order in the other direction (Note that you can only order by time and not other tags/fields)

> select * from cpu_ready order by time desc
name: cpu_ready
time                host        unit    value vcenter     vm
----                ----        ----    ----- -------     --
1499958849000000000 lab-esx-002 percent 5.1   vcenter-001 VM2008
1499958849000000000 lab-esx-001 percent 3.4   vcenter-001 VM2012
1499958849000000000 lab-esx-001 percent 8.6   vcenter-001 VM2016
1499957981000000000 lab-esx-001 percent 5     vcenter-001 VM2012
1499957981000000000 lab-esx-001 percent 5     vcenter-001 VM2016
1499957981000000000 lab-esx-002 percent 5     vcenter-001 VM2008

> select * from cpu_ready order by vm
ERR: error parsing query: only ORDER BY time supported at this time

The SELECT query can filter the points as you would normally do with a WHERE clause. Note that the query can be quite specific on how you need to use the quotation marks. Also keep in mind that tags are index where as fields (the value field in my example) is not

> select * from cpu_ready where "host" = 'lab-esx-001'
name: cpu_ready
time                host        statinterval unit    value vcenter     vm
----                ----        ------------ ----    ----- -------     --
1499957981000000000 lab-esx-001              percent 5     vcenter-001 VM2012
1499957981000000000 lab-esx-001              percent 5     vcenter-001 VM2016
1499958849000000000 lab-esx-001              percent 3.4   vcenter-001 VM2012
1499958849000000000 lab-esx-001              percent 8.6   vcenter-001 VM2016

If you want to add additional metadata tags to your data it's as easy as just adding it to your insert query:

> INSERT cpu_ready,vm=VM2008,host=lab-esx-002,vcenter=vcenter-001,unit=percent,statinterval=20 value=5.1 1499959085000000000
> select * from cpu_ready
name: cpu_ready
time                host        statinterval unit    value vcenter     vm
----                ----        ------------ ----    ----- -------     --
1499957981000000000 lab-esx-001              percent 5     vcenter-001 VM2012
1499957981000000000 lab-esx-002              percent 5     vcenter-001 VM2008
1499957981000000000 lab-esx-001              percent 5     vcenter-001 VM2016
1499958849000000000 lab-esx-002              percent 5.1   vcenter-001 VM2008
1499958849000000000 lab-esx-001              percent 8.6   vcenter-001 VM2016
1499958849000000000 lab-esx-001              percent 3.4   vcenter-001 VM2012
1499959085000000000 lab-esx-002 20           percent 5.1   vcenter-001 VM2008

Another important thing to consider when exploring the database is thinking about how and what metadata (aka tags) you want to use. Even though it's really easy to add more tags (or stop using them), it's not that easy to update previous records with the same. This might not be an issue, but if you want to use tags for filtering then it could turn out to be a pain point.

API

The InfluxDB also has an API for inserting and retrieving data which will be what I'll use for writing my data from vCenter. With the API you can also create and manipulate databases.

The InfluxDB documentation uses curl in their examples, but I'll of course use Powershell to test the API. The $db variable will be in the format of http://your-db-server:port (8086 is the default)

PS C:\> $uri = $db + "write?db=testing"
PS C:\> $query = "cpu_ready,vm=VM2016,host=lab-esx-001,vcenter=vcenter-001,unit=percent,statinterval=20 value=4 14999597
67000000000"
PS C:\> Invoke-RestMethod -Uri $uri -Method Post -Body $query

If the query is successful you'll get an empty response

So, let's query the data as well. I'll start by building the same $uri and a $query with the same SELECT query we have used already, and do an Invoke-Restmethod. I'll put the response in a $result variable as we need to use that for exploring the output.

PS C:\> $uri = $db + "query?db=testing"
PS C:\> $query = "q=select * from cpu_ready"
PS C:\> $result = Invoke-RestMethod -Uri $uri -Method post -Body $query
PS C:\> $result

results
-------
{@{statement_id=0; series=System.Object[]}}


PS C:\> $result.results

statement_id series
------------ ------
           0 {@{name=cpu_ready; columns=System.Object[]; values=System.Object[]}}


PS C:\> $result.results.series

name      columns                             values
----      -------                             ------
cpu_ready {time, host, statinterval, unit...} {2017-07-13T14:59:41Z lab-esx-001  percent 5 vcenter-001 VM2012, 2017-...

With this exploration I figured out that the API was the way to go when it came to writing to the database. You could also write to a file and somehow import that to the DB, but it's much more easy to write directly without having a need for shared storage etc.

I did some testing on the writing of stats from my script and found that I needed to decide on if I would do one query for each metric and VM or if I could write it in bulk.

Influx' API supports a POST with multiple points, which is no more than multiple points (to multiple series if you like) separated with a new line. This should be much more performant than having to POST for each point. The API also supports writing points from a passed file, which is actually similar to the multiple points query above.

With that I went back to my script to create the logic for writing data to InfluxDB through the API and that will be the focus for the next part of this blog series.

This page was modified on March 29, 2019: Fixing links and markdown syntax