The NSX-T Control Cluster can be configure to a maximum of three Controller nodes. In a scenario where one of the NSX Controller is failed or in error state, the administrator must replace the failed NSX-T Controller to avoid any split-brain scenario and for the Cluster majority. In this blog post, we will look at how to replace the failed NSX-T controller with a new NSX-T Controller node and add it to the existing NSX-T Control cluster.
Steps to replace the failed NSX-T Controller:
Step 1: Deactivate the Control Cluster:
To replace the failed NSX-T Controller we must deactivate the control-cluster with the deactivate control-cluster command.
nsx-controller-03> deactivate control-cluster
Example:
sa-nsxctrl-03> deactivate control-cluster Deactivating the control cluster will delete all clustering configuration on the node. Are you sure you want to deactivate the control cluster (yes/no): yes Control cluster deactivation successful.
Step 2: Check the Control Cluster status:
Lets verify the control cluster status with the get control-cluster status command on any of the active controller nodes, here the nsx-controller-01.
nsx-controller-01> get control-cluster status
Example:
nsx-controller-01> get control-cluster status uuid: a6a220a3-e7b3-49db-9fbe-b27556d9ce76 is master: true in majority: true uuid address status a6a220a3-e7b3-49db-9fbe-b27556d9ce76 172.20.10.46 active 6b8226c9-c9a2-4236-885d-8ca5c814f581 172.20.10.47 active 8592c367-02ad-4fd0-9281-65f2316672b5 172.20.10.48 not active
Step 3: Detach the failed NSX-T controller from the control cluster:
The failed NSX-T Control-Cluster status would be not active after deactivating the control cluster and can be detached from control cluster.
detach control-cluster <ip-address-failed-Controller-node>
Example:
nsx-controller-01> detach control-cluster 172.20.10.48 Successfully detached node from the control cluster.
Step 4: Check the control cluster status:
Now verify the NSX-T Controller cluster status on any of the active NSX-T Controller nodes with get control-cluster status command. You will no longer see the failed NSX-T Controller from the command output.
get control-cluster status
Example:
nsx-controller-01> get control-cluster status uuid: a6a220a3-e7b3-49db-9fbe-b27556d9ce76 is master: true in majority: true uuid address status a6a220a3-e7b3-49db-9fbe-b27556d9ce76 172.20.10.46 active 6b8226c9-c9a2-4236-885d-8ca5c814f581 172.20.10.47 active
Step 5: Detach the failed NSX-T Controller from NSX Manager:
Lets go ahead detach the controller from the NSX Manager with detach controller command. This command will remove the failed NSX-T Controller node.
nsxmgr-01> detach controller <UUID-of-sa-nsxctrl-03>
Step 6: Check the Management Cluster status:
Verify the management cluster with get management-cluster status command.
get management-cluster status
Example:
nsxmgr-01> get management-cluster status
Number of nodes in management cluster: 1
- 172.20.10.41 (UUID 2e2f75b1-f3a1-4b42-a8f8-36130d7970bf) Online
Management cluster status: STABLE
Number of nodes in control cluster: 2
- 172.20.10.47 (UUID 6b8226c9-c9a2-4236-885d-8ca5c814f581)
- 172.20.10.46 (UUID a6a220a3-e7b3-49db-9fbe-b27556d9ce76)
Control cluster status: STABLE
Step 7: List the nodes registered with the NSX Manager:
The get nodes command will show the number of nodes registered with the NSX-T Manager, from the list you will not see the failed NSX Controller node.
get nodes
Example:
nsxmgr-01> get nodes UUID Type Display Name 6b8226c9-c9a2-4236-885d-8ca5c814f581 ctl nsx-controller-02 a6a220a3-e7b3-49db-9fbe-b27556d9ce76 ctl nsx-controller-01 09ac70dc-b93b-11e8-a93a-0050569da5b7 edg nsx-edge-02 9f64f271-f592-4a59-8b80-8952692c3698 edg nsx-edge-01 863638cb-945b-4798-a8c7-9a7a93f291b7 esx esxi-05.virtualbrigade.com 7571389d-3d0c-4bd0-bb7a-033622121f4c esx sa-esxi-04.virtualbrigade.com 0e886d3e-d2c9-4a19-9bab-ea0cf3f5d5f7 kvm kvm-01.virtualbrigade.com 18b5ee82-b9bd-11e8-b968-00505602d015 kvm kvm-02..virtualbrigade.com 2e2f75b1-f3a1-4b42-a8f8-36130d7970bf mgr nsxmgr-01
Step 8: Delete the failed NSX Controller Virtual Machine from ESXi or KVM Hypervisor
After removing the failed NSX Controller from the existing NSX-T Controller cluster and from NSX Manager, this can be safely deleted
Step 9: Deploy a new NSX-T Controller node
Please refer to the blog post on how to deploy the NSX-T Controller node: http://virtualbrigade.com/deploying-nsx-t-controllers/
Step 10: Join the new NSX-T Controller with the NSX Management Plane
Please refer to the blog post on how to join the NSX-T Controller with the Management Plane http://virtualbrigade.com/configuring-nsx-t-control-cluster/
Step 11: Join the new NSX-T Controller to the existing NSX-T Control cluster
Please refer to the blog post on how to join the NSX-T Controller Cluster: http://virtualbrigade.com/configuring-nsx-t-control-cluster/
For more information, please refer to the VMware Documentation on NSX-T Data Center product at https://docs.vmware.com/en/VMware-NSX-T/index.html
I hope this is informative for you and I thank you for reading.
Related Posts:
- NSX-T 2.1 Complete video series:
- Introduction to NSX-T
- NSX-T Architecture
- Deploy NSX-T Manager Virtual Machine on ESXi host
- Configure NSX-T Control cluster
- Prepare ESXi host as fabric node in NSX-T
- Prepare KVM hosts as fabric Node in NSX-T
- How to add vCenter Server as Compute Manager?
- What is N-VDS or hostSwitch in NSX-T?
- How to create Transport Zones in NSX-T?
- What is Uplink Profile and how to Create in NSX-T?
- Create an IP pools for TEP in NSX-T
- Verify hostswitch configuration on ESXi and KVM
- How to create Logical Switches in NSX-T?
- NSX-T Logical Routing