In our environment we have several HPE Blade Chassis systems. The chassis is managed with the Onboard Administrator (OA) which consists of one or two management modules.
Like all other hardware these modules have components that needs firmware to run. And firmware needs to be kept updated to fix bugs, add features, new hardware compatibility and mitigate security risks. It’s also a good thing to keep it pretty close to the iLO version updates on your blades as I suspect HPE might not test newer iLO against a lot of old OA version. However I haven’t found that kind of compatibility matrix.
Normally you need to restart the specific hardware after doing a firmware update. This means downtime, but how can you do this on a blade chassis containing up to 16 blade servers running your production load?
While we usually do a firmware update on the blade servers around once a year to keep updated and on a supported version the Onboard Administrator is easily forgotten or not prioritized. This, in combination with the fact that you might only have a couple of chassis compared to multiple blades, will probably say that it can be years between the time you actually do such a upgrade.
At this point you will want to consult the HPE Onboard Administrator User Guide to get the official documentation. Maybe also consult with your local HPE representatives, or HPE support to get their feedback and guidelines specific to your environment.
From the last time I did a firmware upgrade on the OA I was pretty confident that it didn’t affect any production load on the server blades. As it was some time ago the last update, something I haven’t done that often, and in addition the fact that we had a crash on two of our blade chassis when doing work on the OA a while back we are extra cautious when doing anything on the OA. With this in mind I wanted to double check, and read up on the docs.
What I found was that it’s not that clear what actually happens when you do a fw update on the OA. The documentation actually states in a paragraph that “the enclosure will power down and reboot”. What!?!? This is also one of the reasons to do a blog post on it. I did several searches and there is not a whole lot of info available on this. There is some posts, like this one, but many of them are a few years old.
I did some more reading and contacted one of the local HPE techs to get the confirmation I needed. As I remembered the OA FW Update will not do anything to the OS on the blade servers.
With that we continued with our update.
In our environment we have several FW versions in use. All from 4.31 to the latest 4.70. Usually there is one release a year. The last years they have used the first minor version digit as a reference to the release year. 4.50 was released in 2015, 4.60 in 2016 and 4.70 was released this summer.
From what I’ve found if you’re on 4.30 or later you can do an upgrade to 4.70. On pre 4.30 you should do an upgrade to that version first. If you’re on a 3.x or earlier release please check the recommended upgrade steps. Also note that in 4.50 there was a few changes to some FIPS and some certificate stuff that might be a thing to check for compatibility in your environment.
We decided to upgrade our 4.31, 4.40 and 4.50 to 4.60 as this is the version the majority of our chassis have run on.
Please note that we do NOT have Virtual Connect in our environment so the steps and prepartion could differ for such systems.
The steps we took for our update was as follows (please be sure to align this to your specific environment):
- Read docs, release notes and cross-check with HPE for details and compatibility specific to your environment
- Identify which chassis you need to update, find IP information and login details
- Identify what runs on the chassis in case of any issues during the update, and to verify against after a successful upgrade
- Follow any change request procedure your organization might have and notify according to your organizations standards
- Download OA firmware
- Verify that the environment is healthy
- Export the configuration or the “show all” and save in a suitable location
- Mute or unmanage any service monitoring the OA
- Perform update
- After update, verify status on the OA, chassis and blade servers, if applicable verify OneView status (you might need to refresh the enclosure in OV to see the new FW version)
- Verify that the load running on the blade servers are behaving as it should
- Remanage the monitoring service
- Notify status according to your organizations standards
The actual process when the update is running is as follows:
- The Standby OA module is flashed with new firmware and will be reset
- The Active OA module is flashed with new firmware and will be reset, note that the Standby could still be offline. iLO to the blades will go offline. The blades will have normal network connectivity and the OS is not affected
- The Active OA comes online, iLO is back online
- The Standby OA comes online
- The OA web page refreshes (or you’ll have to do it manually) and you will see the new FW version (you might have to clear your browser cache to see the update)
I was a bit surprised that both OA modules goes offline as I expected one of them to come online before continuing on the next one. But so long it doesn’t affect production I’m happy.
The whole update process is pretty quick. In about 10-15 minutes you should be finished. Please note that this is for the OA modules only.
Pleas also note that we do not have any Virtual Connect stuff in our blade chassis and I have never worked with those. I would imagine that there will be more steps to the whole process, and also more risk involved.
As always I would recommend you to try the firmware update process on a test environment if available, or with the least risky production load possible. Also please check the compatibility on your components in the chassis and make use of HPE Support for official guidelines. As previously stated we do NOT have Virtual Connect in our environment so the process could be different for those systems.