Deploy Tanzu Kubernetes Grid on VMC on AWS

Overview

The Tanzu Kubernetes Grid (TKG) solution enables you to create and manage Kubernetes clusters across multiple infrastructure providers such as VMware vSphere, AWS, and Microsoft Azure using the Kubernetes Cluster API.

TKG functions through the creation of a management Kubernetes cluster that houses the Cluster API. The Cluster API then interacts with the infrastructure provider to service workload Kubernetes cluster lifecycle requests.

Scope

This document describes the steps to deploy a TKG Management and Workload cluster and how to host a sample Kubernetes application in the workload cluster. The document also describes the communication that happens between the TKG components and the NSX Advanced Load Balancer to create Kubernetes objects.

Prerequisites and Considerations

The table below lists the use cases and deployment considerations for TKGm implementation in VMC.

Pre-requisites

SDDC is deployed in VMC and outbound access to vCenter is configured.
Segments for NSX ALB (Mgmt & VIP) are created.
NSX ALB Controllers and Service Engines are deployed and controllers’ initial configuration is completed. Please refer to this article to understand how NSX ALB is deployed and configured in VMC on AWS.
The bootstrap environment is ready. Please refer to the TKG intro guide for instructions to setup a bootstrapper machine.

General Considerations/Recommendations

Deploy TKG management cluster and workload cluster on separate logical segments.
For network isolation, it is recommended to create new segments for each TKG workload cluster.
Ensure that the IP address that you will be using as cluster IP when deploying mgmt/workload cluster is excluded from the DHCP range configured on the network.
Ensure that the network where the TKG bootstrapper VM is connected can reach the TKG-Management & TKG-Workload network.
Create a dedicated folder and resource pools for the TKG Management VM’s and TKG workload VM’s for logical separation.
Deploy Service Engines VM's in Single-Arm mode.
Create Service Engine groups per workload cluster and deploy the corresponding Service Engine in the Service Engine group.

Deployment Requirements

Network requirements

Create 2 DHCP-enabled logical segments, (one for the TKG Management and one for the TKG Workload) in your SDDC. Make sure that the new subnet CIDR does not overlap with the existing segments.

An example is shown below:

Network Type	Segment Name	Type	CIDR	DHCP Pool
TKG Management	TKG-Management	Routed	192.168.17.0/24	192.168.17.101-192.168.17.250
TKG Workload	K8-Backend	Routed	192.168.18.0/24	192.168.18.2-192.168.18.99

Port & Protocols Requirement

To understand the firewall requirements for a successful TKG implementation in VMC, refer to this Firewall rules for TKG on VMC on AWS article.

Licensing Requirement

A minimum NSX ALB basic license for TKG deployment is required. However, an enterprise license is also supported.

Deploy TKG Management Cluster

You can deploy management clusters in two ways:

Run the Tanzu Kubernetes Grid installer, a wizard interface that guides you through the process of deploying a management cluster.
Create a deployment YAML configuration file and use it to deploy the management cluster with the Tanzu CLI commands.

The UI installer is an easy way to deploy the cluster, the following steps describe the process.

To launch the UI installer wizard, run the following command on the bootstrapper machine:

# tanzu management-cluster create –ui –bind <bootstrapper-ip>:8080 –browser none

You can the access the UI wizard by opening a browser and entering http://<bootstrapper-ip>:8080/

Note: If you see a “connection refused” error, make sure that you have allowed port 8080 in the firewall that is running on your bootstrapper machine.

From the TKG Installation user interface, you can see that it is possible to install TKG on vSphere (including VMware Cloud on AWS), AWS EC2, and Microsoft Azure.

Figure 1 - TKG installer user interface

Step 1 - To deploy the TKG Management Cluster in VMC on AWS, click the deploy button under "VMware vSphere".

Step 2 - On the "IaaS Provider" page, enter the IP/FQDN and credentials of the vCenter server where the TKG management cluster will be deployed.

Figure 2 - IaaS Provider

Step 3 - Click Connect and accept the vCenter server SSL thumbprint.

Figure 3 - vCenter SSL Thumbprint

Step 4 - If you are running a vSphere 7.x environment, the TKG installer will detect it and provides the user a choice to deploy either vSphere with Tanzu (TKGS) or the TKG management cluster.

Select the "Deploy TKG Management Cluster" option.

Figure 4 - vSphere Environment Detection

Step 5 - Select the Virtual Datacenter and enter the SSH public key that you generated earlier.

Figure 5 - vCenter Server Details

Step 6 - On the Management cluster settings page, select the instance type for control plane node and worker node and provide the following information:

Management Cluster Name: Name for your management cluster.
Control Plane Endpoint: A free IP from the network created for TKG management. Ensure that the IP address that you provide is not part of the DHCP range configured on the network.

Figure 6 - Management Cluster Settings

Step 7 - On the NSX Advanced Load Balancer page, provide the following:

NSX ALB Controller IP address (ALB Controller cluster IP if the controller cluster is configured)
Controller credentials.
Controller certificate.

Step 8 - Click Verify and select the following:

Cloud Name
SE Group name
VIP Network
VIP Network CIDR

Optionally provide labels for your deployment.

Figure 7 - NSX ALB Details

Step 9 (optional) - On the Metadata page, you can specify location and labels.

Figure 8 - Metadata Details

Step 10 - On the Resources page, specify the compute containers for the TKG management cluster deployment.

Figure 9 - vSphere Resources Detail

Step 11 - On the Kubernetes Network page, select the Network where the control plane and worker nodes will be placed during management cluster deployment.

Optionally you can change the Service and Pod CIDR to the custom values.

Figure 10 - Kubernetes Network Settings

If you have LDAP configured in your environment, refer to the VMware Documentation for instructions on how to integrate an identity management system with TKG.

In this example, Identity management integration has been disabled.

Figure 11- Identity Management Details

Step 12 - Select the OS image that will be used for the management cluster deployment.

Note - that this list will appear empty if you don’t have a compatible template present in your environment. Once you have imported the correct template, you can come here and click on the refresh button and the installer will detect the image automatically.

Figure 12 - K8 Image Selection

Step 13 - If you have a subscription to Tanzu Mission Control and want to register your management cluster with the TMC, enter the registration URL here.

In this example, this step is skipped.

Figure 13 - TKG-TMC Integration

Step 14 (optional) – Check the “Participate in the Customer Experience Improvement Program”, if you so desire.

Figure 14 - CEIP Agreement

Step 15 - Click the Review Configuration button to verify your configuration settings.

Figure 15 - Configuration Review page

Step 16 - After you have verified the configuration settings, click Deploy Management Cluster to deploy the management cluster. Click EDIT CONFIGURATION to change the deployment parameters.

Note: Deployment of the management cluster can be also triggered from the CLI by using the command that the installer has generated for you.

Figure 16 - Deploy Management Cluster

When the deployment is triggered from the UI, the installer wizard displays the deployment logs on the screen.

Figure 17 - Management Cluster Setup Progress

Deployment of the management cluster takes about 20-30 minutes to complete. Close the installer wizard after the deployment is complete.

The installer will automatically set the context to the management cluster so that you can log in to it and perform additional tasks such as verifying the management cluster health and deploy the workload clusters etc.

Figure 18 - Management Cluster Setup Completed

Verify Management Cluster Health

After the management cluster deployment, run the below commands to verify the health status of the cluster:

# tanzu management-cluster get

# kubectl get nodes

# kubectl get pods -A

The following sample screenshots show what a healthy cluster looks like.

Figure 19 - Management Cluster Health Status

Figure 20 - Management Cluster Nodes & Pods

You are now ready to deploy the Tanzu Kubernetes Cluster aka workload cluster.

Deploy Tanzu Kubernetes Cluster (Workload Cluster)

The process of creating a Tanzu Kubernetes Cluster is similar to creating the management cluster. Follow the below commands to create a new workload cluster for your applications.

Step 1: Set the context to the management cluster

# kubectl config use-context <mgmt_cluster_name>-admin@<mgmt_cluster_name>

Step 2: Create a namespace for the workload cluster.

# kubectl create ns wld01

Step 3: Prepare the YAML file for workload cluster deployment

A sample YAML file is shown below for workload cluster deployment.

workload-cluster.yaml Expand source

CLUSTER_CIDR: 100.96.0.0/11

CLUSTER_NAME: mj-wld01

NAMESPACE: wld01

CLUSTER_PLAN: prod

ENABLE_CEIP_PARTICIPATION: "false"

OS_NAME: photon

ENABLE_MHC: "true"

IDENTITY_MANAGEMENT_TYPE: none

INFRASTRUCTURE_PROVIDER: vsphere

SERVICE_CIDR: 100.64.0.0/13

TKG_HTTP_PROXY_ENABLED: "false"

DEPLOY_TKG_ON_VSPHERE7: true

ENABLE_TKGS_ON_VSPHERE7: false

VSPHERE_CONTROL_PLANE_ENDPOINT: 192.168.18.110

VSPHERE_CONTROL_PLANE_DISK_GIB: "40"

VSPHERE_CONTROL_PLANE_MEM_MIB: "16384"

VSPHERE_CONTROL_PLANE_NUM_CPUS: "4"

VSPHERE_WORKER_DISK_GIB: "20"

VSPHERE_WORKER_MEM_MIB: "8192"

VSPHERE_WORKER_NUM_CPUS: "4"

VSPHERE_DATACENTER: /SDDC-Datacenter

VSPHERE_DATASTORE: /SDDC-Datacenter/datastore/WorkloadDatastore

VSPHERE_FOLDER: /SDDC-Datacenter/vm/TKG-Workload-VM's

VSPHERE_NETWORK: K8-Backend

VSPHERE_USERNAME: cloudadmin@vmc.local

VSPHERE_PASSWORD: <Fill-me-in>

VSPHERE_RESOURCE_POOL: /SDDC-Datacenter/host/Cluster-1/Resources/TKG-Workload

VSPHERE_SERVER: <Fill-me-in>

VSPHERE_INSECURE: true

VSPHERE_SSH_AUTHORIZED_KEY: <Fill-me-in>

You can change the deployment parameters in the yaml as per your infrastructure.

Step 4: Modify the NSX ALB Service Engine VM's

Service Engine should have layer-2 connectivity to the workload cluster so that the SE's can communicate with the applications that are deployed in the workload cluster. This is achieved via editing the SE VM's and attaching the first available free NIC to the logical segment where the TKG workload cluster will be deployed.

Figure 21 - Service Engine NIC's

Since you already have DHCP enabled on this segment, the SE VM's will be assigned an IP address from the DHCP pool. This can be verified by logging into the Controller UI and navigating to Infrastructure > Service Engine and editing the service engine settings and locating the Mac address that corresponds to the NIC that is attached to the workload cluster network.

Figure 22 - Verify Service Engine IP Address

Step 5: Initiate TKG Workload Cluster Deployment

Run the below command to start creating your first workload cluster

# tanzu cluster create tkg-wld01 –file=workload.yaml -v 6

After a successful deployment, the following log entries are displayed on your screen.

Deployment Log Expand source

Using namespace from config:

Validating configuration...

Waiting for resource pinniped-info of type *v1.ConfigMap to be up and running

configmaps "pinniped-info" not found, retrying

cluster control plane is still being initialized, retrying

Getting secret for cluster

Waiting for resource tkg-wld01-kubeconfig of type *v1.Secret to be up and running

Waiting for cluster nodes to be available...

Waiting for resource tkg-wld01 of type *v1alpha3.Cluster to be up and running

Waiting for resources type *v1alpha3.MachineDeploymentList to be up and running

Waiting for resources type *v1alpha3.MachineList to be up and running

Waiting for addons installation...

Waiting for resources type *v1alpha3.ClusterResourceSetList to be up and running

Waiting for resource antrea-controller of type *v1.Deployment to be up and running

Workload cluster 'tkg-wld01' created

Verify Workload Cluster Health

Once the cluster is deployed, you can run the below commands to verify the health of the cluster.

# tanzu cluster list

# tanzu cluster get <clustername> -n <namespace>

Figure 23 - Workload Cluster Health Status

Export the Workload Cluster Kubeconfig

Cluster Kubeconfig is needed to perform any operations against the workload cluster. Once a workload cluster has been deployed, the corresponding kubeconfig file can be exported using the below command:

# tanzu cluster kubeconfig get <workload-cluster-name> -n <workloadcluster-namespace> --admin --export-file <path-to-file>

Switch to the workload cluster context to start using it.

# kubectl config use-context <workload-cluster-name>-admin@<worklaod-cluster-namespace> --kubeconfig=<path-to-kubeconfig-file>

Install Avi Kubernetes Operator (AKO) in Workload Cluster

The Avi Kubernetes Operator (AKO) is a Kubernetes operator which works as an Ingress controller and performs Avi-specific functions in a Kubernetes environment with the Avi Controller. It runs as a pod in the cluster and translates the required Kubernetes objects to Avi objects and automates the implementation of ingresses/routes/services on the Service Engines (SE) via the Avi Controller. To know more about AKO, please refer to the NSX ALB official documentation.

Every workload cluster that you deploy should have an AKO pod running to leverage NSX ALB to create Virtual Services and VIP for VS. The AKO pod is not present by default on the workload cluster, even if you have passed NSX ALB details in the workload cluster deployment file.

AKO deployment is controlled via AKO config that needs to be applied manually on every workload cluster that you have deployed. Multiple workload clusters can share the same AKO configuration or can have a dedicated config if you are looking for isolation between the workload clusters.

To deploy AKO for workload cluster, follow the steps below:

Step 1: Prepare the yaml for AKO Configuration. A sample yaml is shown below.

deploy-ako.yaml Expand source

apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1

kind: AKODeploymentConfig

metadata:

  finalizers:

    - ako-operator.networking.tkg.tanzu.vmware.com

  generation: 2

  name: ako-tkc

spec:

  adminCredentialRef:

    name: avi-controller-credentials

    namespace: tkg-system-networking

  certificateAuthorityRef:

    name: avi-controller-ca

    namespace: tkg-system-networking

  cloudName: <cloud-name-in-ALB>

  clusterSelector:

    matchLabels:

      <key>: <value>

  controller: <ALB Controller IP>

  dataNetwork:

    cidr: <VIP Network CIDR>

    name: <VIP Network name as configured in ALB>

  extraConfigs:

    image:

      pullPolicy: IfNotPresent

      repository: projects.registry.vmware.com/ako/ako

      version: 1.3.1

    ingress:

      defaultIngressController: true

      disableIngressClass: true

  serviceEngineGroup: Default-Group

Step 2: Apply the AKO configuration by running the command:

# kubectl create -f deploy-ako.yaml

The above configuration updates the AKO Operator pod that is running in the management cluster. AKO Operator will keep an eye on the newly created workload cluster and its labels. if any workload cluster label matches with the label specified in the AKO config file, an AKO pod will be created in the workload cluster automatically.

Step 3: Create Avi namespace and label workload cluster for automated AKO installation.

Ako Pod List

# kubectl create ns avi-system --kubeconfig=/root/wld01-kubeconfig.yaml

# kubectl label cluster <workload-cluster-name> -n <namespace> <key>=<value>

example: kubectl label cluster tkg13-wld01 -n wld01 location=haas-lab

As soon as a matching label is provided to a workload cluster, you will see the creation of an AKO pod in the avi-system namespace.

# kubectl get all -n avi-system --kubeconfig=/root/wld01-kubeconfig.yaml

NAME        READY   STATUS    RESTARTS   AGE

pod/ako-0   1/1     Running   0          29h

NAME                   READY   AGE

statefulset.apps/ako   1/1     29h

Deploy Sample Application

Now that you have got the AKO pod deployed for the workload cluster, it’s time to deploy a sample app of the type load balancer and verify that it creates objects (VS, VIP, Pool, etc) in NSX ALB and you can access the application.

A sample yaml is shown below for deploying a 'Yelb' application.

yelb.yaml Expand source

apiVersion: v1

kind: Service

metadata:

  name: redis-server

  labels:

    app: redis-server

    tier: cache

  namespace: yelb

spec:

  type: ClusterIP

  ports:

  - port: 6379

  selector:

    app: redis-server

    tier: cache

---

apiVersion: v1

kind: Service

metadata:

  name: yelb-db

  labels:

    app: yelb-db

    tier: backenddb

  namespace: yelb

spec:

  type: ClusterIP

  ports:

  - port: 5432

  selector:

    app: yelb-db

    tier: backenddb

---

apiVersion: v1

kind: Service

metadata:

  name: yelb-appserver

  labels:

    app: yelb-appserver

    tier: middletier

  namespace: yelb

spec:

  type: ClusterIP

  ports:

  - port: 4567

  selector:

    app: yelb-appserver

    tier: middletier

---

apiVersion: v1

kind: Service

metadata:

  name: yelb-ui

  labels:

    app: yelb-ui

    tier: frontend

  namespace: yelb

spec:

  type: LoadBalancer

  ports:

  - port: 80

    protocol: TCP

    targetPort: 80

  selector:

    app: yelb-ui

    tier: frontend

---

apiVersion: apps/v1

kind: Deployment

metadata:

  name: yelb-ui

  namespace: yelb

spec:

  selector:

    matchLabels:

      app: yelb-ui

  replicas: 1

  template:

    metadata:

      labels:

        app: yelb-ui

        tier: frontend

    spec:

      containers:

      - name: yelb-ui

        image: docker.io/yelb/yelb-ui:v1

        ports:

        - containerPort: 80

---

apiVersion: apps/v1

kind: Deployment

metadata:

  name: redis-server

  namespace: yelb

spec:

  selector:

    matchLabels:

      app: redis-server

  replicas: 1

  template:

    metadata:

      labels:

        app: redis-server

        tier: cache

    spec:

      containers:

      - name: redis-server

        image: docker.io/yelb/yelb-redis:v1

        ports:

        - containerPort: 6379

---

apiVersion: apps/v1

kind: Deployment

metadata:

  name: yelb-db

  namespace: yelb

spec:

  selector:

    matchLabels:

      app: yelb-db

  replicas: 1

  template:

    metadata:

      labels:

        app: yelb-db

        tier: backenddb

    spec:

      containers:

      - name: yelb-db

        image: docker.io/yelb/yelb-db:v1

        ports:

        - containerPort: 5432

---

apiVersion: apps/v1

kind: Deployment

metadata:

  name: yelb-appserver

  namespace: yelb

spec:

  selector:

    matchLabels:

      app: yelb-appserver

  replicas: 1

  template:

    metadata:

      labels:

        app: yelb-appserver

        tier: middletier

    spec:

      containers:

      - name: yelb-appserver

        image: docker.io/yelb/yelb-appserver:v1

        ports:

        - containerPort: 4567

To deploy the application run the command: # kubectl create -f yelb.yaml

Listing the pods in the yelb namespace returns the status of pods that have been created as part of the yelb application deployment.

Yelb Pod List

# kubectl get pods -n yelb --kubeconfig=/root/wld01-kubeconfig.yaml

NAME                              READY   STATUS    RESTARTS   AGE

redis-server-576b9667ff-52btx     1/1     Running   0          8h

yelb-appserver-7f784ccd64-vtxlx   1/1     Running   0          8h

yelb-db-7cdddcff5-km67v           1/1     Running   0          8h

yelb-ui-f6b557d47-v772q           1/1     Running   0          8h

Verify NSX ALB Objects

Login to NSX ALB and verify that a VS and VIP have been created for the yelb application.

Virtual Service

VIP

Server Pool

On hitting the VIP created for the yelb application, verify that you can see the Yelb dashboard.

Author and Contributors

Manish Jha has authored this article.