All posts by Aaron Margeson

Qlogic QConvergeConsole vCenter Plugin

I recently had a customer running into storage performance issues, and we determined that Qlogic CNA I/O card firmware and driver upgrades were in order.

Updating the drivers on multiple hosts if you have VMware Update Manager is easy. Just simply import the VIB into the patch repository, setup or change out the drivers in an existing baseline, and remediate your hosts. You can see similar instructions on my other blog post about updating Cisco UCS drivers within vSphere environments.

But what about firmware updates?

Qlogic QConvergeConsole

Qlogic has a pre-boot package you can use to update the firmware, but that’s not automated.  Plus, it can get pretty tedious having to boot off a USB stick or image remotely through an out of band management card.

Thankfully, Qlogic has a set of utilities to help you get more information about your Qlogic cards, and deploy new adapter firmware packages right from vCenter!

How do you take advantage of this?

You need to deploy the vCenter Qlogic plug-in. There are two packages available – one for the thick client, and one for the Web Client. I would recommend the Web Client one. First off, VMware has ceased development of the thick client, so generally using web plug-ins when given a choice is better. Secondly, I quite honestly had issues even getting the vSphere client plug-in to work properly, but the web client plugin worked great. Both had installation issues. I neglected to write down the exact wording, but the installer (and this was on a Windows installable vCenter 5.5 server) said something to the effect of an invalid console mode. After some googling, I found for other software from other manufacturers this can be overcome by setting the compability mode to Windows 7/2008R2. After doing that, both installed easily.

You also need to deploy the CIM provider VIB to the ESXi hosts. This is again another simple VIB deployment to the hosts, but does require a reboot. Updated drivers and CIM provider can be done to save time.

How to update Qlogic firmware

To update the firmware on Qlogic HBAs, navigate to the host you wish to update, and go to Manage > QConvergeConsole. Be patient, as the plug-in gathers and displays the information. The plug-in isn’t exactly a speed demon. Once it displays information for the HBAs, you click on the card on the left pain of the plug-in, and then click the card you wish to update. You’ll get current information about the card, including firmware information, and various features. You can even toggle some modes on the card like Personality Type and SR-IOV.

qlogicfwupdate1

Click “Update Adapter Flash Image”. In the next window, browse to the downloaded card firmware, select the bin image, and click OK. You’ll then be provided with a confirmation dialogue box that shows you the firmware you’re currently running, and the new firmware you’re about to install. Click OK to proceed.

qlogicfwupdate2

Be patient as it deploys the firmware. This can take several minutes. Once completed, you’ll receive a dialogue box that says the firmware will not take effect until after you reboot.

Note also if you have multiple Qlogic cards in your server, you must update each card individually, even if they’re the same model card.

At this point, reboot your ESXi server normally, and verify firmware updated successfully after the server boots back up.

FYI, for this customer, updating the drivers and firmware increased synthetic benchmarks by about 250%.  It’s definitely worthwhile to check for these updates, especially if your environment isn’t performing up to snuff.

No, seriously, I’m gonna blog again now

You know the saying when it rains, it pours?

After my last blog entry where I promised to blog more again, we had two deaths in the family (one on my side, one on my wife’s side).  Plus, our dog of 10 months old became ill, and come to find out after tests and what not, she had an undeveloped kidney that eventually had to be removed, so we’ve had to keep a constant eye on her and what not.  She’s still recovering, but she’s slowly getting back to her lovable self.

Free time has been pretty sparse, so blogging took a back seat.

Well, it’s time to get back into the habit, so here we go!  No, like, seriously this time!  I mean it!

(Anybody want a peanut?)

VCA6-HC exam review

I’ve been doing some work with vCloud Air, which got me really reading up on everything, and taking advantage of learning resources available.  It occurred to me that maybe there’s a cert for this, so I looked into it, and there is – VMware Certified Associate 6 – Hybrid Cloud (VCA6-HC).

I’ve not taken any VMware Certified Associate exams because I was a VCAP by the time this certification level became available from VMware, and I’ve stayed within the Data Center Virtualization track.  But in this case, I had no plans to jump higher in the Cloud Management and Automation track in the foreseeable future, and it would be nice to validate skills and knowledge I’d acquired over the past week of work and my weekend learning, and I scored an 88% on the practice exam, so why not?

VCA exams, if you’re not familiar with them, are paid exams you take on your computer, and are effectively “open book”.  I didn’t prepare for this VCA6-HC exam exhaustively like I normally do, but I did go through the exam topics, ensured I felt comfortable with the topics, and I had already read through most of the documents they suggested specifically to read (the vCloud Air Users Guide and the vCloud Air Networking Guide), but there were a few more that were more marketing type stuff, so I glanced through those.  I also did some stuff on VMware’s Partner Portal for VTSP for Hybrid Cloud, too, which I think covered me on the marketing side.

The VCA6-HC exam is 50 questions, and you have 75 minutes to complete it.  The exam interface is pretty much exactly like a VCP exam.  There are multiple choice, select X out of the following option type questions.  You can mark questions to go back to for review in case you don’t want to waste time on it right then.  I had absolutely no problem finishing the exam within 75 minutes.  I completed the exam first pass within about 30 minutes, reviewed the ones I marked and then reviewed all of my answers one more time within about 45 minutes, and went ahead and ended it.  You need to score at least 300, and I received a 400.

How difficult is VCA6-HC?  It’s not a tough exam, especially with it being open book.  A few tips I’d offer would be:

  • Be very familiar with the networking basic concepts.  You’ll likely fail if you don’t.
  • Know the different services and basic options offered for vCloud Air.
  • Know the basic functionality and architecture of all the management tools related to vCloud Air the video training covers.
  • Do the free video training they have.  Pretty good chance you’ll fail if you don’t.

All and all, it’s a pretty straight forward exam.  I’m not wild about VMware/Pearson charging $120 for this exam.  I just think that’s an awful lot to charge people for a basic exam like this that’s proctored online.  IMO, these exams probably should be free, or dirt cheap.

But, at least there’s a certification out there to vet familiarization with vCloud Air, so it’s back to VCAP6-DCV Deployment, which I REALLY need to get cracking on!

Long Time No Blog

It’s been one of those last few weeks where I was out of town, stuff came up, and many of my posts became half baked, and I didn’t have time finish them up.  I just realized it’s been three weeks since a blog post.  Yikes!

So…  expect some more posts coming.  vCloud Air stuff, a good RecoverPoint walkthrough post, and I’ve got some home automation stuff I’ve been messing with.

Enjoy!

EMC VSI RecoverPoint/SRM Integration

I’ve recently set a customer up with new VNX storage arrays, RecoverPoint , and it’s all to be integrated with VMware Site Recovery Manager.  Previously, the customer used SRM in conjunction with MirrorView/A.  Why RecoverPoint?

The really cool thing about RecoverPoint is you can easily rollback to specific points in time, as they like to call it DVR functionality for disaster recovery.  MirrorView/A only allows you to rollback to a specific snapshots at specific points in time.

EMC also provides their VSI for VMware environments.  This integrates with many of their storage products, including VNX, RecoverPoint, and it provides the DVR selection ability within SRM if you integrate it as well!

Setup is pretty straight forward:

  1. Deploy the OVA for the VSI in each site.
  2. Login to the VSI’s web portal by hitting https://<ip>:8443/vsi_vum with user name admin and password ChangeMe.  Change the password as prompted.
  3. Install the VSI’s plugin with vCenter by going to VSI Setup and provide the required info.  If you don’t get “The Operation is successful.”, do it again unless you’re provided an error to troubleshoot.  For me, that happened on one of the two vCenter servers I was deploying this on.  Also, be patient, as this can take quite sometime.  For me, the plugin took about 10-15 minutes to complete the installation.
  4. Login to the vCenter Web Client, and go to vCenter Inventory Lists. At the end, you should see an EMC VSI section. emcvsisection
  5. Click on Storage Integration Service.  Under Actions, click Register Solutions Integration Service, and enter the VSI’s info for that vCenter.  Click Test to ensure there’s connectivity to the VSI, and click OK.
  6. Under Storage Systems, add the storage array for that site.  Again, click Test to ensure there’s connectivity to the storage array, and click OK.  VSI supports VMAX, VNX, VNXe, ViPR, and XtremIO, so this isn’t just limited to the VNX on this project.
  7. Under Data Protection Systems, add the RecoverPoint cluster info for that site using the RPA cluster IP address, and be sure to select RecoverPoint as the Protection System Type.  rpprotectionsystemtypeClick Test to ensure communication will work.  If successful, OK will no longer be grayed out.  Click OK.
  8. Repeat step 7, but select SRM this time for the Data Proection System type.  Here’s where I ran into a gotcha.  The FQDN/IP address and port fields were grayed out.  I went ahead and clicked to Test, and got an error: “Could not communicate with the data protection system SRM at <IP of vCenter server>. Details: Cannot reach the target SRM server at <IP of vCenter server>:1” vsisrmregerrorGoogle didn’t yield any results for a solution, so I began troubleshooting.  Thankfully, I knew my ports, and decided to click the check box for the FQDN or IP/Port line, and entered in the FQDN of the SRM server and the port.  Be aware that SRM 6.X uses 9086.  I provided that, clicked Test, got my green “OK to go” text, and clicked OK.

Note that this needs to be done for each vCenter/RPA cluster/storage array/SRM server in the environment.  Note also only one VSI instance can be registered per vCenter server, so you’ll need to deploy one VSI per vCenter.

After setting up each site, go to a VM, click it, go to Manage, view the snapshots for its Consistency Group, click the one you want and apply, and launch your Failover or Test action from SRM.

vsiselectsnapshot

And there you have it!

Insufficient permission to run CLI command

'Insufficient permission to run CLI command' when performing upgrade on VNX File OE.
Error message:  Insufficient permission to run CLI command

Ran into this today while attempting to update VNX File OE code for a customer using Unisphere Service Manager.  While there were no major issues reported within Unisphere, I got the following when attempting to start the process by running “Prepare for Installation (Step-1)”:

Google only yielded an article that basically said to ensure you’re running USM on the same subnet, which I was.  I began troubleshooting by running within USM “Health Check”, which showed various errors indicating a failover of a Control Station.  I failed back to CS0, and reran the Health Check within USM, which passed, and then tried again, and everything worked like a champ!

Update on Air Console and get 10% off!

Several weeks ago, I wrote about a remote serial solution called Air Console, which provides an all in one solution for wired, LAN, wifi, and bluetooth serial connectivity.  I’ve found Air Console extremely useful since then.  I just initializing  two Data Domains and four RecoverPoint RPAs in a cramped, crowded server room with no comfortable place to work from my laptop.  No problem!  I simply walked in with my Air Console Mini and iPad, and initialized all six devices wirelessly.  It beats figuring out how to maneuver a serial cable to some place where I would have to sit on a floor indian leg style.  Full disclosure: I have the flexibility of a 2×4.  It worked once again like a champ.

Get-Console noticed my blog article and contacted me to offer my readers 10% off using coupon code JJGH667QS on their orders.  (I wish I got that deal, but it was still worth every penny!)

Also, Get-Console has solutions to connect to multiple serial devices simultaneously.  This could be useful for initializing six devices like I just did.  It could also be used as an out of band management solution for a rack full of routers and switches, too!

So, if you’re looking for a smarter serial solution, check them out!

Troubleshoot VSS errors in whole VM backups

I’ve dealt with many whole VM backup products in my experience with virtualization, including Veeam, VMware Data Protection, Avamar, vRanger Pro, Backup Exec, and more.  With that experience came lots of troubleshooting through various issues.  Originally, this post was going to deal with a recent specific issue I had, but I thought a better post would be to deal with an entire category of problems with these products, so someone could use this post to perhaps fix what could be one (or more) of lots of potential root causes, not just the singular one.  Many of the steps to troubleshoot this stuff helps keep your environment healthy and avoid lots of issues, not just issues with backups.

This post will focus specifically with VSS quiescing problems, not a definitive guide to all backup problems of VMs.

Revision Level of Your Backup Product

Often times, the issue has to do with the revision level of your backup product itself.   Generally, it’s good to be on the latest patch level, but not always.  Here are a few things to think about:

  • Is your backup product patched to current?  If not, perhaps look into doing so.
  • Is your backup product compatibile with your environment?  Check to ensure it supports the current build of your hypervisor, your hypervisor management software such as SCVMM or vCenter, and the guests you’re backing up, and take appropriate action.
  • Did you install an update to the backup product recently?  If so, perhaps there’s a bug in that update.

Revision Level of Guests That Are Backed Up

Backups that quiesce the file systems of guests depend upon OS components within said guests, and this is especially true of Windows guests, which rely on Volume Shadow Copies (VSS).  VSS, just like any other software, can have bugs in it that need to be fixed, so there are patches to VSS.  Other OS components could also be the culprit.  Ensure your guests are patched to current.  Conversely, if you recently applied patches to your guests recently, perhaps there are problems with those updates, so you may try removing those.

As a side note, I would recommend using multiple methods of checking your guest patch levels.  For example, while not very common, I’ve seen numerous cases of Windows Update saying all patches are installed, but when I used a second utility to check, those utilities reported missing patches.  Use a second utility to check, such as Microsoft Baseline Security Analyzer (which is free) if the guest is Windows based, to ensure you’re not missing anything.

Also, don’t assume the guests are patched to current.  I recently ran into an issue where the customer somehow hadn’t patched the server… ever.  Somehow it slipped through the cracks.

Hypervisor Revisions

Hypervisors also can cause issues with quiescing.  Some considerations here:

  • Does the build of the hypervisor support the guest having the issue?
  • Are the hypervisors patched to current?  If not, consider updating them.
  • Were the hypervisors recently patched?  If so, perhaps one of the installed patches has a problem, and removing it might resolve the issue.
  • Have the in guest optimization components such as VMTools within the guests been updated?  If not, do so.  If this was done recently, perhaps try to downgrade them to see if that resolves the issue.  These are important, as this is typically the means by which the hypervisor issues the command to quiesce the file system within the guest.

Other Guest Considerations

There are other issues that can cause problems with backups.

  • Other backup agents installed within the guest can also cause problems.  Remove any backup agents that are no longer needed.  I personally just ran into this issue with a customer that had an old Backup Exec agent from before they used their current backup product.
  • Applications have their own VSS agents, such as SQL and Exchange.  Sometimes those need to be updated, too.  It can also be that recent updates to them can also cause problems with quiescing.  Look for updates to those, or remove recent updates.
  • Antivirus software has also been known to cause VSS issues.  Try updating, disabling, configure proper exclusions, uninstalling and/or reinstalling the AV agents.
  • Ensure there is adequate free space within the guests.
  • There are a finite number of shadow copies, and when that limit is reached, it can cause quiescing to fail.  Try removing all shadow copies within the guest using the command:  vssadmin delete shadows /all

Hopefully, this provides you with some ideas to try to resolve the issue you’re experiencing.

Do you have any other tips for resolving VSS issues with whole VM backups?

Unregister Cisco UCS from Cisco UCS Central if you’re not using it

Just banged my head against a wall for hours trying to get a new Service Profile associated with a new blade.  It kept retrying to “Configure resolve identifiers”.  Finally stumbled on an obscure support forum article just when I was about to give up and call Cisco TAC…

https://supportforums.cisco.com/discussion/12174866/cannot-create-service-profiles-template-anymore

TLDR version…

If the UCS domain (FIs) have been registered with UCS Central, you can’t associate service profiles with UCS Manager.  If it looks like this in UCS Manager with Registration Status showing Lost Visibility…

unregisterucscentral

Your UCS Domain is either having problems connecting to UCS Central, or UCS Central is burnt… or dead…  Troubleshoot why it has lost connectivity, or if UCS Central is gone, click the red circled Unregister From UCS Central, and proceed.

If you see a nice green check box, that means UCS Central is present, and you should use it to associate the Service Profile, not UCS Manager.

In this case, the customer deployed UCS Central to play with, decided they didn’t need it, and removed it from their environment, but kept it registered.  So I unregistered it, and the Service Profile associated nice and easy.

VCSA can’t enumerate AD accounts

Ran into an interesting issue.  After deploying greenfield vCenter 6 Server Appliances (VCSA) using an external PSC for a remote branch site, when I tried to do some permissioning with AD accounts.  Joining the PSC to the domain wasn’t a problem, nor was adding the AD domain as an identity source.  But when I tried to enumerate accounts for permissioning, that would fail with the error: “Cannot load the users for the selected domain”.

I found an excellent VMware KB article that gave lots of things to check when troubleshooting this.

I verified DNS was working.  No surprise there.  However, when I ran the command less /var/lib/likewise/krb5-affinity.conf, I noticed the DCs used were not the correct DCs that should be using, rather DCs from a different remote branch office site.  When I checked AD Sites and Services, it was clear that a subnet  object was associated to the wrong branch office that included the IP of the PSC.  Therefore, PSC was attempting to use the DCs in that site.  That’s good to know that vCenter Appliances are apparently AD Site aware.  Furthermore, the first DC used of the two in the remote branch site didn’t have a PTR record because the Reverse Lookup Zone for that subnet for the wrong remote branch didn’t exist.  Apparently, if the first domain controller to be used can be contacted but doesn’t have a PTR record, the PSC won’t enumerate users and groups for permissioning.

Creating the Reverse Lookup Zone and forcing the PTR record creation along with some AD replication fixed the issue, and I kindly suggested to the customer it was time for some tender loving care with AD Sites and Services, along with DNS.

So, FYI, it’s not a bad idea to review your Active Directory Sites and Services, and your DNS Forward and Reverse Lookup zones before you deploy the VCSA.