Category Archives: vSphere

vSphere 6 NFS 4.1 Gotchas

VMware has added some additional NFS features to v6.  I knew it supported NFS 4.1 as well as 3, but there are some significant ramifications related to this and some gotchas.

  1. vSphere does not support parallel NFS (pNFS)! 
  2. It DOES support multipathing if connecting with NFS 4.1.  What you do is add the multiple NFS target IPs when setting up your NFS mount.
  3. IMPORTANT: This is the biggest gotcha, and something we all need to be aware of.  If your system supports both 4.1 and 3 simultaneously for a mount, you MUST use one or the other for an export used by VMware!  V3 and v4.1 use different locking mechanisms.  Having both enabled for an mount simultaneously and then having different ESXi hosts mounting it with differing versions can corrupt data!!!  Best practice is enable only one of the two protocols for that export, never both.
  4. You can authenticate using Kerberos, which is more secure.

NFS 4.1 support though isn’t all roses.  Here’s what you can’t do…

  • No Storage DRS
  • No SIOC
  • No SRM
  • No vVol
  • No VAAI (bigger deal now that Standard licensing includes VAAI)
  • Must run vSphere Replication 6.1 or higher to use VR to replicate VMs on NFS 4.1 mounts.

vSphere 6 – Certificate Management – Part 2

Previously, I posted about certificate management in vSphere 6, which has simplified the process and provided several ways to trust certificates that will be used, while providing flexibility about what will issue the certificates to vSphere internal functionality and client facing functionality as well.

One of the simplest means to allow your clients to trust SSL certificates used by vSphere 6 is to utilize the CA functionality built into vCenter 6, and configure your clients to trust it as a Root CA, which is the focus of this blog post.

Basically, to do this, you need to obtain the root certificates of your vCenter servers, and the install them into your clients Trusted Root CA stores.  Thankfully, this is a relatively straightforward and easy process.

Obtaining Root CA files for vCenter

Obtaining the root CA files for vCenter is very easy.  To do this, simply connect to the FQDN of your vCenter server using HTTPS.  Do not add anything after this, like you may to connect to the vSphere Web Client.  For example, you would use:

https://vcentersvrname.domain.com

Once connected, you will see a link to obtain the vCenter certificates, as you can see below:

Download vCenter root CA certsWhen you download this file, it will be a zip file containing two certificate files for your vCenter server.  If your vCenter server is part of a linked mode installation, you will have two certificate files for every vCenter in the linked mode instance.  The files with an .0 extension are the root CA files you need to import.  Below, you can see the zip file downloaded from a vCenter in a two vCenter server linked mode installation.

vcenterdownloadfile
Extract the contents of the zip file.  Next, rename the .0 files with a .cer extension.  This will allow the files to be easily installed within a Windows machine.  You can then open them to check out the properties of the files if you like.

Installing Root CA file(s)

If you’re familiar with Windows machines, this is pretty straight forward.  You would either import this file into each machine manually, or you can use a Group Policy Object, import the file(s) into it, and refresh your GPO.

That’s it!  Pretty easy to do!  At the very least, you should do this instead of blindly allowing the exception for the untrusted certificate every time, because we all know we aren’t checking the thumbprints of those certs to ensure we’re connecting into the same server, plus this removes the warnings if you’re not creating a permanent exception.

VPLEX Failure Scenarios

I recently setup a VMware Storage Metro Cluster using EMC’s VPLEX with synchronous storage replication. I wanted to put out there a description that’s hopefully easy to understand the failover logic.

VPLEX has volumes that are synchronously replicated, presented to VMware ESXi hosts that are in a single cluster, half of which are in SiteA, the other half in SiteB, and there’s a VPLEX witness in SiteC.

There are a couple of concepts to get out of the way.  First off, VPLEX systems in this configuration have to be able to connect to each other over two different subnets – management and storage replication.  The Witness has to be able to connect to both VPLEX’s via their management network.  You should be taking into consideration the fact that these network connections are EXTREMELY important and do everything you reasonably can to avoid VPLEX’s especially from becoming network isolated from each other.  Bad things will often happen if they do.  Notice that EMC requires quite a bit of network redundancy.

VPLEX synchronously replicates storage within Consistency Groups, which contain one or more LUNs. For the rest of this explanation, I may say LUN, so assume we’re talking a consistency group that just has one LUN.

VPLEX consistency groups also contain two settings.  One dictates which is the preferred site.  This basically means that under various scenarios, the identified site’s VPLEX will become the sole active copy of the LUN.  That value can be one of the VPLEX’s, or nothing at all.  There’s also a setting within a consistency group that basically states whether or not the Witness should be used to determine the proper site that a LUN should be placed in under various scenarios.

For those of you who aren’t familiar with VPLEX, the witness is either a virtual appliance or physical server.  It is optional but highly recommended.  It must be deployed in a third site if it will be deployed, and it must have connectivity to both VPLEX management links, and those links must be completely independent from each other.

Failure scenarios work very similarly to majority node set clusters, such as MS Clustering. Most of this works in the manner you’d probably guess if you’ve ever dealt with high availability solutions, especially those that cross site links, such as an Exchange 2010 DAG.  It’s pretty much a majority node set/majority node set + witness type logic.  I want to focus on specific scenarios that has very significant design implications when it comes to vSphere in a Storage Metro Cluster scenario, and how VPLEX avoids split brain scenarios when network links go down.

The chief concept to remember in all this is VPLEX must always always ALWAYS make sure a LUN doesn’t get active in both VPLEX sites simultaneously should they not be able to talk to each other.  If that happens, data for a single LUN would be inconsistent, both potentially with data that can’t be lost, but no real way to sync them up anymore.  Under normal operations, VPLEX would allow both sites to actively write data into them, but the minute a VPLEX in a site goes down or gets disconnected, it must be ensured that ONLY one of them has an active copy of the LUNs.  The absolute worst thing that could ever happen, even worse than prolonged downtime, is there can’t be two disparate copies of the same LUN.

Scenario 1:  What happens if the synchronous storage replication link between the two sites for VPLEX goes down?  What if total connectivity only between the two VPLEX sites is lost?

The problem here is that VPLEX can’t synchronously write data to both copies of the LUN in each site anymore.  LUNs therefore must become active SiteA, SiteB, or worst case, neither.

How does this work?  It depends on what site preference is set on the consistency group.  It doesn’t really matter whether the option is set to use the witness or not or if a witness is even present.  If no site preference has been identified for the consistency group, the LUNs must go offline in both sites because there’s no way to determine the right site in this situation.  If a site preference is defined, LUNs would become active in their preferred sites only.  The existence of the witness here is irrelevant because both VPLEX’s can still talk to each other via their management link.

There’s a VMware implication here – you should note probably in the datastore name somehow which is the preferred failover site, and then make sure you make VM-to-host should rules that encourage VMs placed in datastores that map to LUNs with a preference for SiteA VPLEX should failures occur to run on SiteA ESXi hosts.  This eliminates HA events caused by connectivity problems between the VPLEX’s, specifically the synchronous storage link.

Scenario 2: What happens if the management connectivity between the VPLEX’s goes down?

Everything continues to work because VPLEX can communicate via the storage replication link.  Both VPLEX’s Keep calm and write I/O on in both sites, just like it was.  The presence and options concerning the witness are irrelevant.

Scenario 3: What happens if there’s a total loss of connectivity between the two VPLEX sites, both management and storage replication, but both sites can communicate to a witness if there is one?

In this scenario, basically, the outcome is one of two things:  If the LUN has a preferred site identified, it becomes active on that site only.  If it doesn’t, it goes offline in both sites.  The witness, regardless if the option relevant to whether it factors into the decision on what to do is enabled, serves as a communication mechanism to both sites to let them known this is the scenario.  Otherwise, the two VPLEX systems wouldn’t know this happened vs the other had actually failed.

Scenario 4: What happens if a VPLEX failed in one site?

Depends on if the witness option on the VPLEX consistency group is enabled (and of course if you deployed a witness).  If it is enabled, the LUN fails over to the second site.  If the option isn’t enabled, it depends if the preferred site is the one that failed.  If it did, LUN goes offline.  If the non-preferred site failed, the LUN remains active in the preferred site.  You should see the value now of a witness.  Usually, having a witness and enabling this option is a good thing.  But not always!

Scenario 5: What happens if all sites stay up, but network connectivity fails complete between all of them?

Depends on if the option to use the witness is turned on or not.  If it’s off, the LUN becomes active in its preferred site, and becomes inaccessible in the other.  If the witness option is turned on in the consistency group, then there’s no way for each site to know if the other sites failed, or only it got isolated.  Therefore, nobody knows if the LUN has become active anywhere else, so the only way to avoid a split brain is make the LUN unavailable in ALL sites.

There’s a design implication here – if a workload should stay up in a preferred site in any situation, even network isolation, at the cost it may be down if its site goes down, you should place the VM on datastores with a preference for the correct site, and DO NOT enable the consistency group to use the witness.

One last design implication with VPLEX – I see limited use of not identifying a preferred site.  I see even less use of having a consistency group set without a preferred site AND not to use a witness when needed.  You’re just asking for more instances in both cases of a LUN taken offline in every site.  To be honest, I think almost always, a witness should be deployed, consistency groups should be a set with a preferred site for failure scenarios, and the witness use option should be enabled.

There you have it!

Routing VMotion Traffic in vSphere 6

One of the new features that is officially supported and exposed in the UI for vSphere 6 is routed VMotion traffic.  This blog post will identify what the use cases are, why there was difficulty leading up to vSphere 6, and how vSphere 6 overcomes it.

Use Cases

So why would you want to route Vmotion traffic, anyway?  Truthfully, in the overwhelming majority of cases, you wouldn’t and shouldn’t route Vmotion traffic for various reasons.

Why?  Remember a few facts about Vmotion traffic:

Additional latency, delays, and reduced throughput exponentially reduces vSphere performance.  When a Vmotion operation gets underway, the running contents of memory for a VM is copied from one host to another and changes in the VM’s working set of memory are monitored for changes.  Interative copies continue to copy changes until there is a small enough delta difference that those small differences can be copied very quickly.  Therefore, the longer a Vmotion takes, the more changes in the working set accumulate, and the more changes that accumulate, the longer the operation will take, which invites even more changes to occur during the operation.  Adding an unnecessary hop in the network can only reduce Vmotion performance.  Therefore, if you are within the same datacenter, it is almost certainly the case that routing Vmotion traffic is ill advised at best.  About the only situation I could possibly think this might be a good idea is if you have a very large datacenter with hundreds of hosts, which causes performance deteriortation because of too many broadcasts within a single LAN segment, but you infrequently need to Vmotion VMs between hosts within different clusters.  But you would need A LOT of ESXi hosts that may need to Vmotion between each other before that would make sense.

So when would routed Vmotion traffic make sense?  Vmotioning VMs between datacenters!  Sure, you could stretch the VMotion Layer 2 network between the datacenters with OTV instead, but at that point, you are choosing the lesser of two evils – Vmotioning with a router between hosts in different datacenters, or the inherent perils of stretching an L2 network across sites.  The WAN link will take the bigger toll over an extra hop in the network by far, so there’s no question here the better choice would be to route the Vmotion traffic instead of stretching the Vmotion network between sites.

This is important because cross vCenter VMotioning is now possible, too, and Vmware has enabled additional network portability via other technologies such as NSX, so the need to do this is far greater than in the past, when the only scenario routing Vmotion traffic would make sense is in stretched storage metro clusters and the like.

Why was this a problem in the past?

If you’ve never done stretched metro storage clusters, this may never have occurred to you because there was pretty much never a need to route any kernel port group traffic other than host management traffic.  The fundamental problem was ESXi had a single TCP/IP stack, with one default gateway.  If you followed best practices, you would make multiple kernel port groups to segregate iSCSI, NFS, Vmotion, Fault Tolerance, and Host Management traffic, each in their own segregated VLANs.  You would configure the host’s default gateway as an IP in the host’s management traffic subnet, because you probably shouldn’t route any of that other traffic.  Well, now we need to.  Your only option to do this would be to create static route statements via command line to make this happen on every single host.  As workload mobility increases with vSphere 6 cross-vCenter Vmotion capabilities, NSX, and vCloud Air, this just isn’t a very practical solution.

How does VMware accomplish this in vSphere 6?

Very simple, at least conceptually anyway.  ESXi 6 has the capability of having multiple independent TCP/IP stacks.  By default, there already exists separate TCP/IP stacks for Vmotion and other traffic.  Each can be assigned separate default gateways.

Simple to manage, and configure!  Just configure the stacks appropriately, and ensure your kernel port groups are configured to use the appropriate stack.  Vmotion port groups should use the Vmotion stack, while pretty much everything else should use the default stack.

How cool is that?

vSphere 6 – Certificate Management Intro

I like VMware and their core products like vCenter, ESXi, etc.  Personally, one thing I really admire is the general quality of these products, how reliable they are, how well they work, and how VMware works to address pain points of them to make them extremely usable.  They just work.

However, certificate management has been a big pain point of the core vSphere product line.  There’s just no way around it.  And certificates are important.  You want to ensure the systems you’re connecting to when you manage them are those systems.  For many customers I’ve worked with, because of the pain of certificate management within vSphere, the fact that some customers are too small and don’t have an on premise Certificate Authority, and to ensure the product continues to work, they often don’t replace the default self-signed certificates generated by vSphere.

That’s obviously less than ideal.  The good news is certificate management has been completely revamped in vSphere 6.  It’s far easier to replace certificates if you like, and you have some flexibility as to how you go about this.

Three Models of Certificate Management

Now, you have several choices for managing vSphere certificates. This post will outline them.  Later, I’ll show you how you can implement each model.  Much of this information comes from a VMworld session I attended called “Certificate Management for Mere Mortals.”  If you have access to the session video, I would highly encourage viewing it!

Before we get into the models, be aware that certificates can basically fall under one of two categories – certificates that facilitate client connections from users and admins, and certificates that allow different product components to interact.  Also, vCenter also has built in Certificate Authority functionality within it.  That’s a bit obvious since you already had self-signed certificates, but this functionality has been expanded.  For example, you can allow vCenter to act as a subordinate authority of your enterprise PKI, too!

Effectively, this means you have some questions up front you want to answer:

  1. Are you cool with vCenter acting as a certificate authority at all?  The biggest reason to use vCenter is it is easier to manage certificates this way, but your security guidelines may not allow it.
  2. Are you cool with vCenter being a root certificate authority should you be cool with it generating certificates?  If not, you could make it a subordinate CA.
  3. For each certificate, which certificate authority should generate them?  Maybe your security requirement that the internal PKI must be used is only for certificates viewable on client connections as an example.

From these questions, typically a few models emerge for certificate management.  You effectively have four models that emerge, which is a combination of your vCenter acting as a certificate authority or not, and which certificates it will generate.

Model 1: Let vCenter do it all!

This model is pretty straight forward.  vCenter will act as a certificate authority for your vSphere environment, and it will generate all the certificates for all the things!  This can be attractive for several reasons.

  1. It’s by far the easiest to implement.  It will generate all your certificates for you pretty much, and install them.
  2. It’ll definitely work.  No worries about generating the wrong certificate.
  3. If you don’t have an internal CA, you’re covered!  vCenter is now your PKI for vSphere.  Sweet!  You can even export vCenter’s root CA certificate, and import it into your clients using Active Directory Group Policy, or other technologies to get client machines to automatically trust these certificates!  Note that it is unsupported for vCenter to generate certificates for anything other than vSphere components.

Model 2: Let vCenter do it all as a subordinate CA to your enternal PKI

Very similar model to the above.  The only exception is instead of vCenter being the root CA, you make vCenter become a subordinate CA for your enterprise PKI.  This allows your vCenter server to more easily generate certificates that are trusted automatically by client machines.  Yet it also ensures that certificates are still easily generated and installed properly.

However, it is a bit more involved than the first model, since you must create a certificate request (CSR) in vCenter to submit to your enterprise PKI, and then install the issued certificate within vCenter manually.

Model 3: Make your enterprise PKI issue all the certificates

Arguably the most secure if your enteprise PKI is secured, this model is pretty self-explanatory.  You don’t make use of any of the certificate functionality within vCenter.  Instead, you must manually generate all certificate requests for all vCenter components, ESXi servers, etc., submit them to your enterprise PKI, and install all the resulting certificates for each yourself.

While this could be the most secure way to go about certificate management, it is by far the most laborious solution to implement, and it is the solution that is most likely to be problematic.  You have to ensure your PKI is configured to issue the correct certificate type and properties, you have to install the right certificates on the right components, etc.  It’s all pretty much on you to get everything right!

Model 4: Mix and match!  (SAY WHAT?!?!?)

When I first heard this being discussed in the session, my immediate reaction by my security inner conscious was, “This sounds like a REALLY bad idea!!!”

But as I listened, it actually makes quite a bit of sense when done properly.  You can mix and match which certificates are and are not generated by the PKI components within vCenter.  However, the model that makes sense if you go hybrid (a hybrid solution doesn’t make sense for everyone!) would be to allow vCenter to manage the certificate generation for all certificates that facilitate vSphere component communication, but use either Model 1, 2, or 3 for all other certificates that facilitate client connections.  Should this meet your security requirement, it meets the best of both worlds – certificates issued by your internal PKI that your clients automatically trust and thereby (potentially) more secure, but ease of management and better reliability for all the certificates that clients don’t see for internal vSphere components.

Which should you go with?

I hate using the universal consultant answer, but I have to.  It depends.  If you don’t have an internal PKI, go with Model 1.

If you have an internal PKI just because you had to for something else, and you want easy trusting of vSphere connections by your clients, go with model 1 and import vCenter’s root CA into your client machines, OR go with Model 2.  Which one in this case?  If you don’t consider yourself really good at PKI management, or if you don’t need many machines to be able to connect to vSphere components, probably Model 1.  The more clients that need to connect, the more it might lean you towards Model 2.

Do you have security requirements that prevent you from using vCenter’s PKI capabilities altogether?  You have no choice, go with Model 3.

I would generally try though for people who think they need to go with Model 3 to look at Model 4’s hybrid approach.  Unless you absolutely have to go with Model 3, go Model 4.

Hope this helps!

HP NC375T NICs are drunk, should go home

I ran into one of the most bizarre issues I’ve ever encountered in my decade of experience with VMware this past week.

I was conducting a health check of a customer’s vSphere 5.5 environment, and found that the servers were deployed with 8 NICs, but only 4 were wired up.  While the customer was running FC for storage, 4 NICs isn’t enough redundantly segregate VMotion, VM, and Management traffic, and the customer was complaining about VM performance issues when VMotioning VMs around.  The plan was to wire up the extra add-on NIC ports, and take a port each from the quadport onboard NIC and the add-on HP NC375T.

So first, I looked to see if I had the right driver and firmware installed for this NIC according to VMware’s compatibility list guide.  The driver was good, but commands to determine the firmware wouldn’t provide any info.  Also curious was the fact that this NIC was showing up as additional ports for the onboard Broadcom NIC.  FYI, this server is an HP DL380 Gen7, a bit older but still supported server for VMware vSphere 5.5.

At this point, I wanted to see if the onboard NIC would function, so I went to add the NICs into a new vSwitch.  Interestingly enough, the NICs did not show up as available NICs to add.  However, if I plugged the NICs in and just looked at the Network Adapters info, the NICs showed up there and even reported their connection state accurately.  I tried rebooting the server, same result.  One other server was identical, so I tried the same on that one, same exact behavior – they reported as ports that were part of the onboard NIC, commands to list the firmware version did not work, you could not add them into any vSwitch, but the connection status info reported accurately under the Network Adapters section of the vSphere console.

At this point, I was partly intrigued and enraged, because accomplishing this network reconfiguration shouldn’t be a big deal.  I put the original host I was working on in maintenance mode, evacuated all the VMs, and powered it off.  I reseated the card, powered it back on, and I got the same exact results.  I powered it off, removed the add-on NIC, and powered it back on, expecting to see the NIC ports gone, and they were, along with the first two onboard NIC ports!

This was, and still is, utterly baffling to me.  I did some more research, thinking this HP NC375T must be a Broadcom NIC since it’s messing with the onboard Broadcom adapter in mysterious ways, but nope!  It’s a rebadged Qlogic!  I reboot it, same result.  Cold boot it, same result.  I put the NIC back in, and the add-on NIC ports AND the two onboard NICs come back, all listed as part of the onboard Broadcom NIC!

I researched the NC375T for probably over an hour at this point, finding people having other weird problems, some of them fixed by firmware upgrades.  It took 45 minutes to actually find a spot on HP’s site to download drivers and firmware, but the firmware VMware and everyone else who had issues with this card swore you better be running to have any prayer of stability was not available.  I tried their FTP site, I tried Qlogic’s site, no dice.  I recommended to the customer that we should probably replace these cards since they’re poorly supported, and people were having so many problems, AND we were seeing the most absolutely bizarre behavior I’ve ever seen with a NIC.  The customer agreed, but we needed to get this host back to working again with the four NICs until we could get the replacement NIC cards.

At this point, I had a purely instinctual voice out of nowhere come in to my head and say, “You should pull the NIC out and reset the BIOS to defaults.”  To which, I replied, “Thanks weird oddly technically knowledgeable voice.”

And sure enough, it worked.  All onboard NIC ports were visible again.  Weird!  Just for fun, I stuck the NC375T back in.  What do you know, it was now listed as it’s own separate NIC, not a part of the onboard Broadcom adapter, AND I could add it to a vSwitch if I wanted, AND I could run commands to get the firmware version, which confirmed it was nowhere near the supported version for vSphere 5.5.

In the end, the customer still wanted these NICs replaced, which I was totally onboard with at this point, too, for many obvious reasons.

So, in conclusion, HP NC375T adapters are drunk, and should go home!