As a virtualization consultant, I know there’s a wide variety of technologies at every level – hypervisor, storage, networking, and even server hardware is getting to some degree more complex in terms of what you need to know to manage it effectively. Everyone can’t be an expert in every single storage technology as an example, and with more and more options that are radically different in their architecture, right now I wanted to make my own little contribution to the world for consultants and admins alike on basic things you should and shouldn’t do with one storage solution – Nutanix. For us consultants, we often find ourselves within environments with something we’re not totally familiar with, so some helpful concise guidance can go a long way. Admins, too, may have depended upon a consultant or previous colleagues that no longer work there for implementation and support, but now it’s on them, so I thought this would be helpful.
There are quite a few things everyone should know if they ever are working on a environment with Nutanix that aren’t necessarily obvious. I can see it being pretty darn easy to blow up a Nutanix environment if you’re not aware of some of these things.
Common stuff
- Contact Nutanix Support before downgrading licensing or destroying cluster to reclaim licenses (unnecessary if you’re using Starter licensing though). This was repeated many times, so I’m guessing if this isn’t done, you’ll be hating life getting licensing straight.
- Do NOT delete the Nutanix Controller VM on any Nutanix host (CVM names look like: NTNX-<blockid>-<position>-CVM)
- Do NOT modify any settings of a Controller VM, all the way down to even the name of the VM.
- Shutdown/Startup gotchas:
- It’s probably best to never shutdown/reboot/etc. more than one Nutanix node in a cluster at a time. If you do more, you may cause all hosts in the Nutanix cluster to lose storage connectivity.
- When shutting down a single host or < the redundancy factor (Nutanix number of hosts it is configured to tolerate failure in a Nutanix cluster), migrate/shutdown all VMs on host EXCEPT the controller VM, THEN shutdown the controller VM.
- If you are shutting down a number of hosts that exceeds the redundancy factor, you need to shutdown the Nutanix cluster. There’s also a specialized procedure to start up the Nutanix cluster in this situation. That’s beyond the scope of this email.
- When booting up a host, do the following:
- start the Controller VM first that resides on it, and verify it’s services are working by SSH to it using:
- Ncli cluster status | grep –A 15 <controllerVmIP>
- Then have it rescan its datastores.
- Then verify the Nutanix Cluster state using the following to ensure cluster services are all up via same SSH session:
- cluster status
- start the Controller VM first that resides on it, and verify it’s services are working by SSH to it using:
- Hypervisor Patching
- Make sure to patch one hypervisor node and ensure Controller VM comes back up with services are good before proceeding to the next one. Also do one at a time in a Nutanix cluster (see above).
- Follow shutdown host procedure above.
vSphere
- NEVER use “Reset System Configuration” command in Nutanix.
- If resource pools are created, Controller VM (CVM) must have the highest share.
- Do NOT modify NFS settings.
- VM swapfile location should be the same folder as the VM. Do NOT place it on a dedicated datastore.
- Do NOT modify the Controller VM startup/shutdown order.
- Do NOT modify iSCSI software adapter settings.
- Do NOT modify vSwitchNutanix standard vSwitch.
- Do NOT modify Vmk0 interface in port group “Management Network”.
- Do NOT disable ESXi host SSH.
- HA configuration recommended settings:
- Enable admission control and use percentage based policy with value based on number of nodes in cluster
- Set VM Restart Priority for CVMs to Disabled.
- Set Host Isolation Response of cluster to Power Off
- Set Host Isolation Response of CVMs to Leave Powered ON.
- Disable VM Monitoring for all CVMs
- Enable Datastore Heartbeating by clicking Select only from my preferred datastores and choosing Nutanix datastores. If cluster has only one datastore (which would be common potentially in Nutanix deployments), add advanced option das.ignoreInsufficientHbDatastore=true to avoid warnings about not having at least two heartbeat datastores.
- DRS stuff:
- Disable automation of all CVMs
- Leave power management disabled (DPM)
- Enable EVC for lowest processor class in cluster.
Hyper-V
- Do NOT use Validate Cluster within Failover Clustering nor SCVMM, as it is not supported. Not sure what would happen if you did, but I’m guessing it would be pretty awesome, and you probably should make sure you got popcorn ready if you’re gonna do that.
- Do NOT modify the Nutanix or Hyper-V cluster name
- Do NOT modify the external network adapter name
- Do NOT modify the Nutanix specific virtual switch settings
KVM (the Hypervisor… also assuming this means if you’re using Acropolis Hypervisor from Nutanix since it’s KVM based…)
- Do NOT modify the Hypervisor configuration, including installed packages
- Do NOT modify iSCSI settings
- Do NOT modify the Open vSwitch settings
I hope this proves helpful to people who unexpectedly find themselves working on Nutanix and need a quick primer to ensure they don’t break something!
Hi!
Thanks for the nice article…regarding vSphere there is one topic I don’t agree on:
“Do NOT modify vmk0 interface in port group “Management Network””
Why that? I want management traffic to be separated from the other traffic types within a vSphere environment. Also I only want management traffic to use 1GbE interfaces. IMHO reconfiguring the vSwitch0s is an elementary post installation task regarding Nutanix/vSphere. E.g. would I further like to isolate CVM traffic to specific ports (1*active, 1*standby) – as well as vMotion traffic (same 2 ports in reversed order). Even further it is a VMware best practice to keep ports with different link speeds (1GbE vs 10/40GbE) on separate vSwitches (whyever). I want to rename port groups etc., pp.
But maybe I’ve been working just too long with vSphere to be able to properly ignore the complexity that comes with it 😉
Regards,
Chris
I agree that the vmk0 is fine to make adjustments on if needed. In our situation we have VMWare NSX layered on top of Nutanix nodes which complicates things a bit. When Nutanix is first installed vmk0 is on a vSwitch. For VMWare NSX and to increase redundancy and throughput we combined the 2 10GB links into an LACP port-channel trunk configured across a dvSwitch. This configuration requires vnic0 to be migrated from the vSwitch to the dvSwtich port-channel one leg at a time so connectivity isn’t lost.
A little convoluted but has been working well in production for over a year.
Hey David,
Sorry for the super long delay responding. I think the idea here is not to make changes like disabling management functionality, enabling vmotion functionality, that kind of thing. IP address changes are fine. That bullet point was pulled out of Nutanix training.