Managing Your DR Recovery SDDC Deployment Method

Introduction

VMware Cloud DR provides an easy to use and cost efficient DRaaS (DR as a Service) solution for your production data center disaster recovery needs by leveraging the VMware Cloud on AWS infrastructure as your Disaster Recovery site. With this DR solution, it is possible to protect on-prem vCenter environments as well as other VMC SDDCs to a Recovery SDDC in another VMC supported AWS Region / AZ. There are some things to understand and consider when setting up and managing the Recovery DR SDDC. This article will review several of the key topics.

Deploying or Connecting the Recovery SDDC

The VMware Cloud DR Orchestrator UI provides one method to deploy a new Recovery SDDC or attach an existing SDDC. There is a 1-1 mapping of the Scale-out Cloud File System (SCFS) used to hold the protected VM data and the Recovery SDDC. To understand the rules and limitations of AWS Region and AZ dependencies, please consult the product documentation details here.

Working within the deployment location constraints, it is a simple task of creating a new Recovery SDDC or attaching an existing (Bring Your Own) SDDC instance. These options are shown in the figure screenshot below where we see that there is not an existing SDDC in the same Region/AZ as the SCFS:

On Demand vs Pilot Light

For organizations on a limited DR budget, or just getting started with VMware Cloud on AWS, it is possible to treat the Recovery SDDC in an On-Demand approach and deploy the cloud-based Recovery SDDC site only when needed for testing purposes or to support an active DR event. With the On-Demand approach, the Recovery SDDC is manually created by the DR SDDC administrator. Note that this step can take a couple of hours to complete from task initiation to final configuration and should be factored into the overall RTO planning for On-Demand scenarios. Once the testing has been completed, or the disaster resolved and services returned to the original Protected site, decommissioning (deleting) of an On-Demand Recovery SDDC is also a manual task of the DR SDDC administrator. An On-Demand SDDC will need to be manually created, configured, and destroyed each time it is needed.

The alternative is to have a Pilot-Light Recovery SDDC. In this case the Recovery SDDC is created once and configured and then stays online 24/7, it's always there. Note that in either an On-Demand or Pilot-Light mode the Recovery SDDC can be a minimal configuration of just 2 hosts. In addition to faster recovery times because the Recovery SDDC already exists and does not have to be deployed, configuring the Recovery SDDC for proper DR operations can also be set up and tested ahead of time with a Pilot-Light configuration.

One other thing to consider when setting up either an On-Demand or a Pilot-Light configuration for your Recovery SDDC, is that once VMware Cloud Disaster Recovery has connected to the Scale-out Cloud File System (SCFS) and it is mounted to that Recovery SDDC for VM recovery operations, there is no straightforward way to remove that Scale-out Cloud File System relationship. It will remain attached to the Pilot-Light Recovery SDDC, or it will go away when the On-Demand Recovery SDDC is then deleted.

Configuring the Recovery SDDC for DR

The Recovery SDDC can be created directly from the Orchestrator UI following the rules for deploying the Recovery SDDC into the appropriate AZ and the appropriate region. Once the Recovery SDDC is deployed it will still need to be configured. Some of these configuration updates will have to be done directly in the VMC console or in the Recovery SDDC itself.

To access the Recovery SDDC, you will need to update the firewall rules to enable access to the administrator to log into the vCenter and any other peripheral networks that are not directly part of VMware Cloud Disaster Recovery.

Once the Recovery SDDC is deployed you'll need to configure access to that Recovery SDDC on the external networks that you want to use for your recovery site. Setting up those application access networks is outside the scope of this discussion.

Configuring the Recovery SDDC state for proper testing or DR use requires a couple of post deployment steps to be completed. Other things that need to be configured in the recovery site for proper DR plan mapping include:

considerations for resource groups if you are building out clusters or want to partition the recovery site in any way, those resource groups will need to be created and configured so that they can be included in the DR plan mappings
folder structure and virtual networks of the Recovery SDDC that will be part of the mappings
necessary vCenter tags that you are using in your Protected Site inventory Protection Groups will need to also be registered into the Recovery SDDC for proper compliance checks

Each of these configuration settings will need to be set up in the Recovery SDDC prior to the plans being compliant and fully ready to run for a successful failover operation with VMware Cloud Disaster Recovery. These are tasks that you need to do inside the vCenter operations of the Recovery SDDC.

For the situation where the SDDC may get deployed and deleted on a regular basis, there is a VMware Fling that might be useful in this scenario here: SDDC Import/Export for VMware Cloud on AWS

With the On-Demand approach, anytime the Recovery SDDC is deleted those four things (resources, networks, folders, tags) go away and will need to be reconfigured manually in the next On-Demand deployment. Avoiding these manual reconfiguration processes is one of the other advantages of working with a Pilot-Light recovery Recovery SDDC.

In addition to having the DR plan mappings in place and being monitored by automated compliance checks by the Orchestrator every 30 minutes, you can have additional services already up and running in that small Pilot Light Recovery SDDC footprint. These other services might include stretched networks from the Protected Site to the Recovery Site, VPN access points, even DNS / AD services that you might be using between your Protected Site and your Recovery Site.

These additional services can be configured upfront and running and operational, while consuming limited resources. The Pilot Light Recovery SDDC approach also lowers the recovery time (RTO) as the Recovery Site is ready to go into service at any time.

Creating / Checking DR Plans

To construct a DR plan, the Recovery SDDC needs to be present so that the Orchestrator can include all the mappings that we just talked about. As you can see there are several things that a Pilot-Light Recovery SDDC can provide to the overall solution and make the DR administration the DR setup and recovery processes easier to manage and expedient.

Even with an On-Demand approach, you could:

create the Recovery SDDC,
configure it as desired,
construct the DR plans,
run through all the necessary testing,
deactivate the DR plan,
then decommission/delete the SDDC.

If a DR plan is completely configured and then deactivated and the Recovery SDDC deleted, then when the Recovery SDDC is redeployed On-Demand, the DR plan can be reactivated, and the compliance checks can be used to guide what changes need to be made to the Recovery SDDC to match the DR plans.

Scaling the SDDC Up or Down

In either case, for the On-Demand or the Pilot-Light Recovery SDDC, when testing or during an actual DR event the Recovery SDDC can be scaled up with more hosts and clusters as needed to support the actual workloads that are being tested or failed over. The Recovery SDDC scaling can be semi-automated using elastic DRS capabilities or it can be manually configured by the DR SDDC administrator. In addition to the additional costs with larger Recovery SDDCs, there will be some provisioning time and cloud compute resource availability associated with periodically scaling a Recovery SDDC. In some cases, it is advised to have the Recovery SDDC sized to the minimum configuration to support the most critical DR workload needs.

Another consideration when scaling up the Recovery SDDC for testing or for actual DR, is to consider what you want the desired Recovery SDDC size to be after the testing is complete or the workload has been failed back to the protected site. A Recovery SDDC can be as small as 2-host cluster. Another consideration is to consider leaving the initial Recovery SDDC alone and adding additional SDDC clusters and hosts to that Recovery SDDC. The additional clusters can be scaled up with additional hosts and you can have multiple clusters in a single Recovery SDDC configuration. The scale back task, which again is a manual operation by the DR SDDC administrator, can remove the additional clusters leaving just the original Recovery SDDC.

Removing the SDDC

Removing the Recovery SDDC is a manual task even for On-Demand instances. When you remove the Recovery SDDC, you will also be removing the connection to the SCFS. Note that active DR plans that have this Recovery SDDC in their configuration will begin to have compliance check alerts as the DR site no longer exists. It is suggested to deactivate such plans until a Recovery SDDC is deployed again.

Summary

In conclusion with VMware Cloud Disaster Recovery, you can create your Recovery SDDC On-Demand if desired or use an always-on Pilot Light approach. In either case, the Recovery SDDC is easily provisioned and configured. You can deploy the Recovery SDDC from the Orchestrator UI or through the VMware Cloud Services console or bring your own from an existing configuration. Once connected to the Scale-out Cloud File System, that Recovery SDDC can service your testing or running of the DR plans.

Whether you follow an On-Demand or Pilot Light approach, the scalability up and down of the Recovery SDDC, although it's a manual process, is also very simple and straightforward with VMware Cloud Disaster Recovery. This DRaaS solution provides a very effective way of leveraging VMware Cloud on a AWS for disaster recovery with and easier to manage and cost effective architecture.