Troubleshoot VSS errors in whole VM backups

I’ve dealt with many whole VM backup products in my experience with virtualization, including Veeam, VMware Data Protection, Avamar, vRanger Pro, Backup Exec, and more.  With that experience came lots of troubleshooting through various issues.  Originally, this post was going to deal with a recent specific issue I had, but I thought a better post would be to deal with an entire category of problems with these products, so someone could use this post to perhaps fix what could be one (or more) of lots of potential root causes, not just the singular one.  Many of the steps to troubleshoot this stuff helps keep your environment healthy and avoid lots of issues, not just issues with backups.

This post will focus specifically with VSS quiescing problems, not a definitive guide to all backup problems of VMs.

Revision Level of Your Backup Product

Often times, the issue has to do with the revision level of your backup product itself.   Generally, it’s good to be on the latest patch level, but not always.  Here are a few things to think about:

  • Is your backup product patched to current?  If not, perhaps look into doing so.
  • Is your backup product compatibile with your environment?  Check to ensure it supports the current build of your hypervisor, your hypervisor management software such as SCVMM or vCenter, and the guests you’re backing up, and take appropriate action.
  • Did you install an update to the backup product recently?  If so, perhaps there’s a bug in that update.

Revision Level of Guests That Are Backed Up

Backups that quiesce the file systems of guests depend upon OS components within said guests, and this is especially true of Windows guests, which rely on Volume Shadow Copies (VSS).  VSS, just like any other software, can have bugs in it that need to be fixed, so there are patches to VSS.  Other OS components could also be the culprit.  Ensure your guests are patched to current.  Conversely, if you recently applied patches to your guests recently, perhaps there are problems with those updates, so you may try removing those.

As a side note, I would recommend using multiple methods of checking your guest patch levels.  For example, while not very common, I’ve seen numerous cases of Windows Update saying all patches are installed, but when I used a second utility to check, those utilities reported missing patches.  Use a second utility to check, such as Microsoft Baseline Security Analyzer (which is free) if the guest is Windows based, to ensure you’re not missing anything.

Also, don’t assume the guests are patched to current.  I recently ran into an issue where the customer somehow hadn’t patched the server… ever.  Somehow it slipped through the cracks.

Hypervisor Revisions

Hypervisors also can cause issues with quiescing.  Some considerations here:

  • Does the build of the hypervisor support the guest having the issue?
  • Are the hypervisors patched to current?  If not, consider updating them.
  • Were the hypervisors recently patched?  If so, perhaps one of the installed patches has a problem, and removing it might resolve the issue.
  • Have the in guest optimization components such as VMTools within the guests been updated?  If not, do so.  If this was done recently, perhaps try to downgrade them to see if that resolves the issue.  These are important, as this is typically the means by which the hypervisor issues the command to quiesce the file system within the guest.

Other Guest Considerations

There are other issues that can cause problems with backups.

  • Other backup agents installed within the guest can also cause problems.  Remove any backup agents that are no longer needed.  I personally just ran into this issue with a customer that had an old Backup Exec agent from before they used their current backup product.
  • Applications have their own VSS agents, such as SQL and Exchange.  Sometimes those need to be updated, too.  It can also be that recent updates to them can also cause problems with quiescing.  Look for updates to those, or remove recent updates.
  • Antivirus software has also been known to cause VSS issues.  Try updating, disabling, configure proper exclusions, uninstalling and/or reinstalling the AV agents.
  • Ensure there is adequate free space within the guests.
  • There are a finite number of shadow copies, and when that limit is reached, it can cause quiescing to fail.  Try removing all shadow copies within the guest using the command:  vssadmin delete shadows /all

Hopefully, this provides you with some ideas to try to resolve the issue you’re experiencing.

Do you have any other tips for resolving VSS issues with whole VM backups?

Leave a Reply

Your email address will not be published. Required fields are marked *