An Introduction to VMware Cloud Disaster Recovery
VMware Cloud Disaster Recovery is VMware's on-demand disaster recovery service that is delivered as an easy-to-use SaaS solution and offers cloud economics to help keep your disaster recovery costs under control.
VMware Cloud Disaster Recovery protects your virtual machines, on-premises or on VMware Cloud on AWS, by replicating them to the cloud and recovering them to a VMware Cloud on AWS Software Defined Data Center ("SDDC").
VMware Cloud DR is a managed disaster recovery (DR) service, which means you only need to enable protection and configure recovery methods and don’t have to worry about managing the infrastructure which facilitates the same.
The Architecture of VMware Cloud DR consists of many different components.
- The DRaaS Connector is a virtual appliance installed in the VMware vSphere environment where the virtual machines to be protected are running. The DRaaS Connector communicates to the SaaS Orchestrator.
- The SaaS Orchestrator is a cloud component that presents a user interface (UI) to consume the Service Offering and includes several disaster recovery orchestration capabilities to automate the disaster recovery process. The SaaS Orchestrator handles the following tasks:
- Creation or attachment of the Recovery SDDC
- Establishing communication from the cloud DRaaS Connector to the SaaS Orchestrator
- Manage/execute backup schedules
- Mounting the NFS volume to the Recovery SDDC
- Initiating the storage vMotion to local Recovery SDDC vSAN storage.
- Failback of VM back to the protected site
- Executing Compliance checks
- Facilitate exports of activity and reports
- Scale-out Cloud File System (SCFS) is a cloud component that enables the efficient storage of backups of the protected virtual machines in cloud storage and allows virtual machines to be recovered very quickly without a time-consuming data rehydration process.
- VMware Cloud on AWS SDDC is used as a recovery site to perform DR plan test or actual failover.
You must keep in mind the following considerations about the different sites used in VMware Cloud DR before you deploy VMware Cloud DR:
- Protected Sites
- Backup Sites
- Recovery Sites
- On-premises vSphere environment
- VMware Cloud on AWS SDDC (one connector per-vCenter)
A backup site is the SCFS cloud backup where the replicated virtual machines are saved.You can choose to deploy additional backup sites to have separate failure domains, additional Recovery SDDCs, and increase the total maximum capacity.
Note – When you create a Recovery SDDC, only one backup site can be used to mount the storage to the SDDC. Therefore, if you choose to create a protection group of VMs from different on-prem sites, then to recover the VMs, the VMs must be saved on the same back up site.
VMware Cloud DR provides two deployment methods for Recovery SDDC.
- On-demand (also known as "just in time") deployment of a cloud DR site provides an attractive alternative to continuously maintaining a warm standby cloud DR site. Primarily used when your environment does not need strict RTO and has strict budgets.
- Pilot light deployment - VMware Cloud Disaster Recovery enables a smaller subset of SDDC hosts to be deployed ahead-of-time for recovering critical applications with lower RTO requirements than an on- demand approach.
The following table provides a comparison of the various considerations and factors of deployment between On-demand and Pilot Light deployments.
Pre-deployed Recovery SDDC not necessary
Minimum 2 node VMC on AWS SDDC is needed
VADP (standard) or High-Frequency snapshots and copied to cloud SCFS
VADP (standard) or High-Frequency snapshots and copied to cloud SCFS
Possible only by deploying a Recovery SDDC
Yes, supported using the existing Recovery SDDC
Approximately 4 hours => Recovery SDDC provisioning time + mounting backup SCFS filesystem + VM customization + VM power on
Nearly instant => VM customization + VM power on
<= 30 Minutes
<= 30 Minutes
Maximum protected VMDK file size
Subscription and API token
To get started with VMware Cloud Disaster Recovery, you must request the service from VMware and ensure that it is activated. If you are a new customer to VMware Cloud Services, you can request VMware Cloud Disaster Recovery through your VMware sales representative. If you already use VMware Cloud Services, you can request access to the VMware Cloud Disaster Recovery from the VMware Cloud Services console by following the procedures listed below.
You can purchase a protected storage capacity subscription for VMware Cloud Disaster Recovery, which you can buy either through VMware, using VMware Subscription Purchasing Program (SPP) credits, or from AWS.
To use VMware Cloud Disaster Recovery, you must first create an API token from the VMC console. An API token authorizes service access per organization. The API token is used by VMware Cloud DR to control the customer's VMC on AWS service on behalf of the customer.
Ensure that you create and configure the API token before your users access the VMware Cloud Disaster Recovery UI. The token should have access for the VMware Cloud account that will have the protected and Recovery SDDCs.
You can get the token on VMware Cloud Services > My Account.
Choose Organization Owner role and then choose the following:
- Administrator and NSX Cloud Admin VMware Cloud on AWS service roles
- Not Expiring soon
- Scoped within the organization "Customer ORG name"
DRaaS Connector and Compute Requirements
The VMware Cloud Disaster Recovery DRaaS Connector is a stateless software appliance that enables replicating VM snapshot deltas from "protected" vSphere sites (on-premises or VMware Cloud on AWS) to cloud backup sites, and back, driven by policies you set in protection groups.
The DRaaS Connector can be redeployed if needed at any time without losing backup data. Software upgrades for it are over-the-air and automatic across time. Each connector provides additional replication bandwidth for the site.
To deploy the DRaaS Connector VM, make sure that the vSphere site where you intend to deploy it has the following available resources for the VM:
- CPU: 8 GHz (reserved)
- RAM: 12 GiB (reserved)
- Disk: 100 GiB vDisk
- Network connectivity
- Between DRaaS Connector and vCenter and ESXi hosts
- Between DRaaS Connector and VMware Cloud Disaster Recovery
Disaster Recovery Components
Grouping a set of VMs which can then be used during recovery is facilitated from the protection group. You can create many such groups. VMs that are a part of a protection group should exist on the same protected site only. To create a protection group, you must have an existing subscription.
A protection group consists of the following components:
- Site selection (on-premises or SDDC vCenter)
- Members (VMs)
- Policies for snapshots (schedule, retention)
- Cloud backup site (SCFS)
Note: The members of a single protection group must share the same vCenter. In other words, you cannot create a protection group that contains VMs from two different vCenters.
Disaster Recovery Plan
A DR plan defines the orchestration configuration for disaster recovery and workload mobility.
You can create, name, edit, duplicate, save, and run DR Plans. Environment variables in a plan map difference between the sites for smooth recovery, ensuring that vSphere configurations and parameters are mapped consistently between sites.
Plans run either for recovery as an actual DR operation, or they run as a test recovery, which perform all the plan’s recovery operations in a test site for validation.
VMware Cloud Disaster Recovery can maintain multiple plans of different types, and the plans can be in various stages of execution at any given time, even concurrently.
Operation allowed under the DR plan section are:
- Configure DR Plans - require defining where you want your protected data moved to when the plan runs.
- View DR Plans - shows the currently defined plans along with plan summary information: the current status, protected and recovery sites, and the last run compliance check results.
- Activate DR Plans - can be in an active or deactivated state.
A DR plan includes a set of recovery steps that capture ordering constraints and action sequencing instructions for DR operations, which occur when you run the plan.
You can run a DR Plan either as a failover or a Test Failover. The running plan creates a ‘workflow instance’ - a runtime representation of the recovery steps in the plan, combined with other information available only when the plan starts running, such as the snapshot selection coupled with the plan's underlying failover operations.
Plan recovery steps apply to the plan itself and control the failover workflow. For example, a planned failover creates a workflow of operations based on the recovery steps defined in the plan. An executing plan’s recovery steps are run on the source site (power off VMs, replicate the last snapshot) and destination site (recover VMs in the predefined order).
An unplanned failover creates a different workflow based on the same recovery steps defined in the plan.
- If the test site is deactivated, the test tab is not displayed.
- If the test site is specified, the Failover mappings and Test mappings tabs can be the same depending on check box selection.
You can run a DR Plan to failback from a VMware Cloud on AWS SDDC to a protected vSphere site. Failback from an SDDC returns only changed data. There is no rehydration, and the data remains in its native compressed and deduplicated form. You can run the failback plan by clicking the Failover from VMC button.
A failback from VMware Cloud on AWS runs several steps, including the following:
- VMs are powered off on the SDDC.
- The last VM snapshot is taken following the power off. The differences between the VM state at the time of recovery and failback are then applied to the snapshot used for recovery to construct a VM backup on the SCFS for subsequent retrieval.
- These VM backups are then retrieved to an on-premises system using a general forever incremental protocol.
- VMs are recovered to a protected vSphere site.
- Upon successful recovery, VMs are automatically deleted from the SDDC.
Once a failback DR plan is created from duplicating the plan and reversing its steps, the new failback plan operates the same way as any other plan. You can edit the plan to change the destination site to point to a new VMware Cloud Disaster Recovery protected site. Or you can change the vCenter mapping if the failback target site has more than one protected site.
You can also use a new protected site and/or vCenter for failback, if the proper mappings are configured, but in this case incremental recovery is possible. If VMware Cloud Disaster Recovery can find a VM with the same instance UUID, then an incremental recovery is performed. If VMware Cloud Disaster Recovery cannot find the same instance UUID for a VM, then a full recovery is initiated.
Once a protection group has taken snapshots of the VMs on your protected site, you can restore individual VMs from a snapshot back to the protected site.
The VM will be restored to the same state it was in when the snapshot was taken, including its vCenter location, configuration, data, etc.
You may need to restore a VM during a failed software upgrade attempt or when something was accidentally deleted or uninstalled from a virtual machine.
Inventory and Resource Mapping
Mapping vCenters in a DR plan consists of selecting source vCenters that are registered to the protected site. Choosing a target vCenter for a Failover SDDC is simple; each SDDC contains a single vCenter instance. For VMware Cloud Disaster Recovery, keep in mind that a protected site can have multiple registered vCenters, but you can only map one vCenter on VMware Cloud on AWS per-DR plan.
Every valid vCenter mapping will create a vCenter mapping for the following three elements
- vCenter folders
- Compute resources
- Virtual networks
Resource Pool and Folder Mapping
This page of the plan wizard displays a subset of the vCenter object inventory for both the source and target vCenters. Source vCenter object nodes that are detected to contain protected VMs are required to be mapped and are displayed in the UI with blue text. All other mappings are optional.
- Select the source vCenter node and the corresponding target vCenter node indicating where the source VMs should be recovered.
- Click Add.
- Complete this step for each mapping.
- Click OK when finished.
Note: If your VMs on the protected vSphere site have tags associated with them, make sure that the same sets of tags and tag categories also exist on the target site of the plan (the Recovery SDDC).
Tip: Avoid having other VMs in target folders because name conflicts can arise when registering VMs with vCenter.
Mappable vCenter objects:
Mappable vCenter objects:
- Clusters. If the Cluster contains VMs, its icon is highlighted in light blue font to indicate that mapping for this item is required (This color scheme applies to all mappings).
- Resource pool
- Standalone host (not in a cluster). Note that a standalone host can only be mapped to another standalone host.
Note: Regarding vCenter cluster names, "Cluster-1-<clusterIndex>" represents the name of the initial cluster when the SDDC was first created.
If the SDDC that your clusters belong to is deleted, then any plans with mappings to clusters on that SDDC will display the target cluster names with an asterisk. For example: "Cluster-*-<clusterIndex>".
Additionally, plan compliance reports will indicate an error when clusters are mapped to a deleted SDDC, or if there is a mapping to a deleted cluster.
Mappable vCenter objects:
- Virtual network
- Distributed virtual port group
When performing a failover from VMware Cloud on AWS, datastore mappings are established automatically. The VMware cloud recovery site has a single datastore making datastore mappings unnecessary.
Note: All VMs that are recovered are located at the root storage folder of the "WorkloadDatastore/" directory after the failover operation.
However, when a DR plan for failback is created (failback from/to on-prem), you must configure datastore mapping.
IP address Mapping
IP mappings determine how a VM’s IP address is assigned when a protected source site is failed over to a target site. When a VM is recovered from one site to another, VMware Cloud Disaster Recovery needs to know which IP addresses will be used for the recovered VMs.
IP address mappings can be configured for VMs installed with Linux or Windows guest OS’s. VMs configured for IP address mapping will display with a target IP, target subnet mask, target gateways, and target DNS servers.
Important: To map IP addresses for Windows VMs, the system drive of the VMs must be mapped to c:\. Additionally, the mapped c:\ drive cannot be dynamic volume; it must be a basic disk.
Note: VMware Tools must be installed on the guest OS to ensure successful IP address mapping. Only iPv4 is supported for protection plan IP address mapping. This means that any VMs referenced in a DR plan must be using iPv4, or the IP address mapping will not work.
Individual IP Address Mapping
The following options are available on IP address mapping page:
- Optional rule description
- Source and target IP addresses
- Source and target subnet masks
- Source and target gateways
- Source and target DNS servers
Entries for gateways and DNS servers must be separated by white spaces. If multiple IP addresses are specified, they will be matched in the specified order from source to target.
IP Address Range Mapping
Alternatively, you can configure IP address ranges rather than individual IP addresses. Switching to IP ranges can be done by selecting Range from Range/IP addresses, as shown below:
Limitations when mapping IP address ranges:
- You can provide a bits value that is smaller than the subnet mask size (CIDR prefix). For instance, if the subnet is a /20 you can define a CIDR prefix (bits) that provides a smaller IP range (i.e., /21, /22, etc.) for the range mapping.
- You cannot, however, do the reverse. If the subnet is a /20 you cannot enter a CIDR prefix (bits) that provides a greater IP range (i.e., /19, /18, etc.) for the range mapping. If attempted, the UI will display an error.
VMware Cloud Disaster Recovery provides report generation in the PDF format for failover and test failover operations, plan configuration changes, and compliance checks.
The generated report contains a summary of the plan configuration, failover mapping details, and the configured failover steps
Failover and Test Failover Reports
Failover and test failover reports provide information about a completed DR plan operation.
After a failover or test failover plan has completed (and you have committed or acknowledged the plan), you can generate a PDF report of the plan operation. Click the Reports tab on a plan’s details page to create a PDF report.
Continuous compliance checks verify the integrity of a DR plan and ensure that any changes in the failover environment do not invalidate a DR Plan’s directives when running.
Compliance checks also make sure that the specified protection groups are live on the protected site and are replicating successfully to the target Recovery SDDC. Compliance checks run automatically every 30 minutes for activated plans. A plan can be out of compliance if any of its conditions become violated because of environmental (such as VM migrated to different datastore) or plan configuration changes.
You can generate and download these reports as a PDF or have them emailed on an automated schedule.
Author and Contributors
Sharath has been working in the IT industry for over 15 years primarily on SDDC and cloud related technologies. Go to https://vmc.techzone.vmware.com/users/sharath-bn for more information.