Protecting Kubernetes Data with Tanzu Mission Control
Backing up Workloads in TKG with Tanzu Mission Control
Before diving in, it’s worth mentioning that this post is not about protecting an entire Kubernetes cluster from failure. Cluster setup configuration stored as code allows us to redeploy clusters if there is a problem and is likely much more efficient than trying to perform a backup and restore of the etcd store and other components. While a backup of the cluster could be helpful, having an automated deployment strategy enables the teams using the cluster to be confident that new deployments will always work. This mentality prevents us from using a backup as a crutch instead of ensuring our workloads are always in a production-ready deployment state. However, some things can’t be rebuilt from code, such as (CRDs) and (PVs). The example in this post uses an in-cluster database stored on a persistent volume, which necessitates a workload backup for recovery purposes.
Before we get into the process, let’s review the application. This application consists of a Kubernetes deployment with two replicas serving as a web layer. These web servers store critical data such as login information in another container running MongoDB. The MongoDB container uses a persistent volume where the database is stored. The MongoDB container and persistent volume must get backed up because it houses the information we can’t re-create from code.
Prepare the Backup Location
This example will use an (S3) bucket to store our backups, although any accessible S3 storage will suffice. Amazon S3 provides a low-cost, object storage service that we can leverage to house our backup data. Amazon S3 has eleven nine’s (99.999999999%) of durability, so it’s a very safe place to store our critical backups, and those backups can also be replicated to additional regions if your business requires these precautions. S3 is also convenient for backing up VMware Cloud on AWS resources due to the proximity of the services to each other. In fact, by using an S3 endpoint, the backup traffic can take place without traveling over the Internet.
For this example, I’ve created an Amazon S3 bucket in the connected VPC and applied a bucket policy to allow access to the bucket from my VPC.
NOTE: It’s possible to create lifecycle rules on the S3 bucket, but Velero will automatically clean up stale backups, so this configuration is not necessary.
Configure Backup Access
To run the backups, I’ll be using Tanzu Mission Control’s (TMC) data protection functionality. Tanzu Mission Control uses project Velero for containerized backups. Velero can also be installed through the TKG extensions if you’re interested in running the components directly and not through TMC.
To set up backup access in TMC, we need to add some account credentials. Navigate to the Administration tab and click the “Create Account Credential” button. From there, give the credentials a descriptive name, and enter your Access Key and Secret Key with permissions to the Amazon S3 bucket.
Once the credentials have been saved, we can set up a Target Location. The target locations store the configurations for where the backups will be stored, in this case, Amazon S3.
Navigate to the Target locations tab and click the “Create Target Location” button, and then click the “Customer provisioned S3-compatible storage” link to use your own S3 storage.
NOTE: you can use TMC
Once the wizard opens, you’ll step through the setup. First, begin by selecting the account credentials created in the previous step.
Next, enter the S3 URL that you are attempting to access. These URLs can be region-specific, so use the correct URL and then enter the bucket name and region.
On the “Allow cluster groups” screen, select with cluster groups will have access to this target location. You will likely be managing clusters in different geographic regions, and all clusters may not use the same backup target. This option lets you configure those other locations.
Lastly, name the target location. I’ve called my target after the region for easy identification later.
The last step before you can start using data protection is to enable it on the cluster. The data protection really happens from within the Kubernetes cluster and not TMC. As new backups are requested, the TMC agents notify the backup tools on the cluster to perform the backups. Before this can happen, we must enable the cluster to use data protection.
To do this, click the “Enable Data Protection” link in the cluster menu from TMC.
Perform a Backup
Once data protection has been enabled on the cluster, you can start creating backups. Navigate to the cluster where the backup should take place, then select the “Data protection” tab.
Click the “Create Backup” button. You can then select what to backup. You can choose the entire cluster, a specific namespace, or using a label selector. Here I’ll pick the whole namespace named “loyalty.”
Select the target location from the drop-down.
Select the desired schedule. You can make this re-occurring or an on-demand backup.
Select a retention time before the backups will be removed from the S3 location.
Give the backup job a name and click “Create.”
You’ll be taken back to the data protection screen, where you can view the status.
Velero backups usually consist of the backup configuration of the pods, which would then be re-applied to a cluster. But any resources using a persistent volume claim on vSphere will be backed up using . You’ll find that backups of resources with persistent volumes will take longer to complete since there is actual data to copy.
NOTE: With the current version of Velero, you must annotate any pods that have persistent volumes that you want backed up with restic. If you’re running backups on EBS volumes the volume snapshotter is used and this step is not necessary. The command below can be modified to annotate your pods.
kubectl -n YOUR_POD_NAMESPACE annotate pod/YOUR_POD_NAME backup.velero.io/backup-volumes=YOUR_VOLUME_NAME_1,YOUR_VOLUME_NAME_2…
Perform a Restore
At this point, you can simulate a failure to your cluster or setup a second cluster to simulate a migration activity. To do the restore, select the cluster that you’d like to perform the restore from in the TMC console. Under the data protection tab, select the backup you’d like to restore and then click the “Restore” link.
As part of the restore, we can select what we want to be restored. I’ve selected the namespaces again, and you’ll notice that there is only one namespace listed. This is because it’s the only namespace that was backed up during the backup section of this post.
Then you give your restore job a name as well before clicking the “Restore” button.
Summary and Additional Resources
This post walked through setting up backups of Tanzu Kubernetes Grid workloads through Tanzu Mission Control and storing the backups in an Amazon S3 bucket. The Tanzu Mission Control portal uses project Velero to perform backups of the Kubernetes resources and store them in Amazon S3 for storage until it is needed or ages out.
These backups are useful for stateful data that can’t be redeployed and can also be useful for migration activities when moving between Kubernetes clusters.
The following updates were made to this guide.
Description of Changes
About the Author and Contributors
Eric Shanks has spent two decades working with VMware and cloud technologies focusing on hybrid cloud and automation. Eric has obtained some of the industry’s highest distinctions, including two VMware Certified Design Expert (VCDX #195) certifications and many others across a variety of solutions including Microsoft, Cisco, and Amazon Web Services.
Eric’s acted as a community contributor through work as a Chicago VMUG Users Group leader, blogger at theITHollow.com and Tech Field Day delegate.