Troubleshooting TKC Nodes in VMC on AWS

Troubleshooting Tanzu Kubernetes Cluster Nodes in VMware Cloud on AWS

As much as we might try to develop perfect code and deploy perfectly configured infrastructure, occasionally, we’ll still need to do some troubleshooting. The purposes of that troubleshooting are hard to guess, but this post will walk you through how to obtain access to the Tanzu Kubernetes Cluster (TKC) nodes deployed from the Tanzu Kubernetes Grid (TKG) Service in VMware Cloud on AWS. Hopefully, once you’ve gained access to your deployed TKCs, you’ll be able to identify whatever problem you’re having and solve your issue.

Access to TKC Nodes

The first thing you might be wondering is why you can’t directly access your Tanzu Kubernetes Cluster nodes. The cluster nodes are inaccessible because any clusters deployed in a Supervisor Namespace are segmented from the rest of the environment. To access these clusters, you’ll typically use the Ingress IP Addresses that you entered for the Ingress CIDR during setup. These Ingress addresses are virtual IPs (VIPs) on the client end of the load balancer, and they map traffic to the cluster services inside the namespace segment.

So how do we access these non-routable networks hidden behind an NSX-T Tier1 router? One of those ways is to use a jump box with a pair of NICs attached to them.

Diagram</p>
<p>Description automatically generated

 

Deploy a Jump Box

Since we’ll use a jump box to access our cluster nodes, we first have to deploy a jump box. Deploy a virtual machine in one of your VMware Cloud on AWS network segments. Be sure to open firewall rules from your desktop/laptop to this jump box in the customer gateway firewall rules. Typically firewall rules to this host consist of either TCP Port 3389 for Windows RDP or TCP Port 22 for SSH. Now the steps to set up your jump box are up to you. Troubleshooting could take you in several directions, and it’s hard to explain in this blog post what tools you might need, but standard tools to install would be an ssh client so that you can access the TKC nodes, and usually the Kubernetes client (kubectl).

Configure Jump Box Networking

Now that you’ve gotten your jump box setup, it’s time to give that jump box access to the supervisor namespace where your Tanzu Kubernetes Cluster is running. The first thing we need to do is to add a second NIC to our jump box. This second NIC should be connected to the namespace segment. This namespace network is not a network that you’ve created by hand, this was done by the Tanzu Kubernetes Grid Service when you created a namespace, so you may need to go look up what the name of the segment is. You can find the segment name by selecting your TKC node in the vCenter UI in the summary tab.

 

Graphical user interface, application</p>
<p>Description automatically generated

So, once you’ve found your namespace network segment, you can add a second NIC to your jump host and assign the second NIC to this network segment.

Graphical user interface, text, application, email</p>
<p>Description automatically generated

Once you’ve configured your network interface, you need to set up an IP Address on that interface inside your jump host. I won’t go into details about setting up an IP address, but you should know that you’ll need to find an open IP Address in that segment so that you can statically assign it. You can figure out an IP Address to use by looking at the other IP Addresses used by your TKC.

 

Graphical user interface, application, email</p>
<p>Description automatically generated

For example, I found that my four-node cluster (One Control Plane Node, three worker nodes = Four total) was using 10.244.0.18 – 10.244.0.21 by looking at their IP addresses vCenter. So, I added the 10.244.0.22/28 IP Address to my second NIC on my jump host.

NOTE: Once you’ve finished your troubleshooting, you should disconnect the secondary NIC. If your jump box is connected to a namespace segment while removing a namespace, it will fail. Be sure to remove your jump box from your namespace segment when you finish troubleshooting.

Add Distributed Firewall Rule

Before using your jump box, the last step is to set up a distributed firewall rule in the Cloud Console. To start this process, go to the Tier-1 Gateways tab of the Gateway Firewall configuration page. For the gateway dropdown, select the namespace router for the cluster you’re trying to access. Since my cluster is in the “hardtop” namespace, I’ll choose the router below.

Graphical user interface, application</p>
<p>Description automatically generated

Next, click the “Add Policy” button to add a new distributed firewall policy. You can click on the name box to change the policy name, as I have done.

Graphical user interface, text, application</p>
<p>Description automatically generated

Once a policy is added, you can select the kabob menu next to the policy and select “Add Rule.” Add the rule and modify the source, destination, services, and the “applied to” columns.

In my case, I used this info:

Item

Value

Details

Source

10.244.0.22/28

Jump box secondary interface IP Address

Destination

vnet-domain-c55:3b33d1d4-d13c-467e-82da-e846e7eaa5d5-hardtop-ht-dev-vnet-0

Network segment housing my TKC nodes

Services

SSH

Port 22 access

Applied to

t1-domain-c55:3b33d1d4-d13c-467e-82da-e846e7eaa5d5-hardtop-rtr

The Tier-1 router used by the namespace

My complete rule for my policy is shown below. Note: the source/destination/ssh fields refer to groups I’ve created with the above information.

Graphical user interface, application</p>
<p>Description automatically generated

Get Login Credentials

You’ve got network access to your TKC nodes now, but you have another problem. What login credentials do you use to log in to the node? This information is stored in the Supervisor Cluster, and the full instructions for obtaining this password can be found here.

The summary of those instructions is this:

  1. Use kubectl to connect to your Supervisor cluster namespace.
  2. To pull the secret with the ssh credentials run:
    kubectl get secret [my-cluster-name-here]-ssh -o jsonpath={.data.ssh-privatekey} | base64 --decode > tkc-key
  3. Change the permissions on the tkc-key before using it as your ssh key for authentication by running:
    chmod 600 tkc-key

 

Log in to a Tanzu Kubernetes Cluster Node

You’ve got everything ready to go now. Go ahead and run your ssh command from the jump host and specify the tkc-key as your private key for authentication.

ssh -i tkc-key vmware-system-user@[my-node-ip-address]

Text</p>
<p>Description automatically generated with medium confidence

Summary and Additional Resources

Hopefully, you’ll deploy your Tanzu Kubernetes Clusters without any issues, and you can focus on your work of building modern applications on top of Kubernetes. But if you should need to do some troubleshooting, you can use the steps detailed in this post to log in to your TKC nodes through SSH to run troubleshooting commands.

Additional Resources

SSH to Tanzu Kubernetes Cluster Nodes as the System User Using a Private Key

Create a Jump Host VM using Photon

Changelog

The following updates were made to this guide.

Date 

Description of Changes

2021-11-29

Initial publication

About the Author and Contributors

Eric Shanks has spent two decades working with VMware and cloud technologies focusing on hybrid cloud and automation. Eric has obtained some of the industry’s highest distinctions, including two VMware Certified Design Expert (VCDX #195) certifications and many others across various solutions, including Microsoft, Cisco, and Amazon Web Services.

Eric’s acted as a community contributor as a Chicago VMUG Users Group leader, blogger at theITHollow.com, and Tech Field Day delegate.

  • Eric Shanks, Sr. Technical Marketing Architect, Cloud Services Business Unit, VMware

 

 

 

 

 

Filter Tags

App Modernization Tanzu Kubernetes VMware Cloud on AWS Document Technical Walkthrough Intermediate Manage