Troubleshooting TKC Nodes in VMC on AWS
Troubleshooting Tanzu Kubernetes Cluster Nodes in VMware Cloud on AWS
As much as we might try to develop perfect code and deploy perfectly configured infrastructure, occasionally, we’ll still need to do some troubleshooting. The purposes of that troubleshooting are hard to guess, but this post will walk you through how to obtain access to the Tanzu Kubernetes Cluster (TKC) nodes deployed from the Tanzu Kubernetes Grid (TKG) Service in VMware Cloud on AWS. Hopefully, once you’ve gained access to your deployed TKCs, you’ll be able to identify whatever problem you’re having and solve your issue.
Access to TKC Nodes
The first thing you might be wondering is why you can’t directly access your Tanzu Kubernetes Cluster nodes. The cluster nodes are inaccessible because any clusters deployed in a Supervisor Namespace are segmented from the rest of the environment. To access these clusters, you’ll typically use the Ingress IP Addresses that you entered for the Ingress CIDR during setup. These Ingress addresses are virtual IPs (VIPs) on the client end of the load balancer, and they map traffic to the cluster services inside the namespace segment.
So how do we access these non-routable networks hidden behind an NSX-T Tier1 router? One of those ways is to use a jump box with a pair of NICs attached to them.
Deploy a Jump Box
Since we’ll use a jump box to access our cluster nodes, we first have to deploy a jump box. Deploy a virtual machine in one of your VMware Cloud on AWS network segments. Be sure to open firewall rules from your desktop/laptop to this jump box in the customer gateway firewall rules. Typically firewall rules to this host consist of either TCP Port 3389 for Windows RDP or TCP Port 22 for SSH. Now the steps to set up your jump box are up to you. Troubleshooting could take you in several directions, and it’s hard to explain in this blog post what tools you might need, but standard tools to install would be an ssh client so that you can access the TKC nodes, and usually the Kubernetes client (kubectl).
Configure Jump Box Networking
Now that you’ve gotten your jump box setup, it’s time to give that jump box access to the supervisor namespace where your Tanzu Kubernetes Cluster is running. The first thing we need to do is to add a second NIC to our jump box. This second NIC should be connected to the namespace segment. This namespace network is not a network that you’ve created by hand, this was done by the Tanzu Kubernetes Grid Service when you created a namespace, so you may need to go look up what the name of the segment is. You can find the segment name by selecting your TKC node in the vCenter UI in the summary tab.
So, once you’ve found your namespace network segment, you can add a second NIC to your jump host and assign the second NIC to this network segment.
Once you’ve configured your network interface, you need to set up an IP Address on that interface inside your jump host. I won’t go into details about setting up an IP address, but you should know that you’ll need to find an open IP Address in that segment so that you can statically assign it. You can figure out an IP Address to use by looking at the other IP Addresses used by your TKC.
For example, I found that my four-node cluster (One Control Plane Node, three worker nodes = Four total) was using 10.244.0.18 – 10.244.0.21 by looking at their IP addresses vCenter. So, I added the 10.244.0.22/28 IP Address to my second NIC on my jump host.
NOTE: Once you’ve finished your troubleshooting, you should disconnect the secondary NIC. If your jump box is connected to a namespace segment while removing a namespace, it will fail. Be sure to remove your jump box from your namespace segment when you finish troubleshooting.
Add Distributed Firewall Rule
Before using your jump box, the last step is to set up a distributed firewall rule in the Cloud Console. To start this process, go to the Tier-1 Gateways tab of the Gateway Firewall configuration page. For the gateway dropdown, select the namespace router for the cluster you’re trying to access. Since my cluster is in the “hardtop” namespace, I’ll choose the router below.
Next, click the “Add Policy” button to add a new distributed firewall policy. You can click on the name box to change the policy name, as I have done.
Once a policy is added, you can select the kabob menu next to the policy and select “Add Rule.” Add the rule and modify the source, destination, services, and the “applied to” columns.
In my case, I used this info:
Item |
Value |
Details |
Source |
10.244.0.22/28 |
Jump box secondary interface IP Address |
Destination |
vnet-domain-c55:3b33d1d4-d13c-467e-82da-e846e7eaa5d5-hardtop-ht-dev-vnet-0 |
Network segment housing my TKC nodes |
Services |
SSH |
Port 22 access |
Applied to |
t1-domain-c55:3b33d1d4-d13c-467e-82da-e846e7eaa5d5-hardtop-rtr |
The Tier-1 router used by the namespace |
My complete rule for my policy is shown below. Note: the source/destination/ssh fields refer to groups I’ve created with the above information.
Get Login Credentials
You’ve got network access to your TKC nodes now, but you have another problem. What login credentials do you use to log in to the node? This information is stored in the Supervisor Cluster, and the full instructions for obtaining this password can be found here.
The summary of those instructions is this:
- Use kubectl to connect to your Supervisor cluster namespace.
- To pull the secret with the ssh credentials run:
kubectl get secret [my-cluster-name-here]-ssh -o jsonpath={.data.ssh-privatekey} | base64 --decode > tkc-key
- Change the permissions on the tkc-key before using it as your ssh key for authentication by running:
chmod 600 tkc-key
Log in to a Tanzu Kubernetes Cluster Node
You’ve got everything ready to go now. Go ahead and run your ssh command from the jump host and specify the tkc-key as your private key for authentication.
ssh -i tkc-key vmware-system-user@[my-node-ip-address]
Summary and Additional Resources
Hopefully, you’ll deploy your Tanzu Kubernetes Clusters without any issues, and you can focus on your work of building modern applications on top of Kubernetes. But if you should need to do some troubleshooting, you can use the steps detailed in this post to log in to your TKC nodes through SSH to run troubleshooting commands.
Additional Resources
SSH to Tanzu Kubernetes Cluster Nodes as the System User Using a Private Key
Create a Jump Host VM using Photon
Changelog
The following updates were made to this guide.
Date |
Description of Changes |
2021-11-29 |
Initial publication |
About the Author and Contributors
Eric Shanks has spent two decades working with VMware and cloud technologies focusing on hybrid cloud and automation. Eric has obtained some of the industry’s highest distinctions, including two VMware Certified Design Expert (VCDX #195) certifications and many others across various solutions, including Microsoft, Cisco, and Amazon Web Services.
Eric’s acted as a community contributor as a Chicago VMUG Users Group leader, blogger at theITHollow.com, and Tech Field Day delegate.
- Eric Shanks, Sr. Technical Marketing Architect, Cloud Services Business Unit, VMware