Replication and Disaster Recovery Using VMware Cloud Disaster Recovery (VCDR)
VMware Cloud Disaster Recovery (VCDR) protects your vSphere virtual machines by replicating them periodically to a VMware Cloud backup site and recovering them as needed to a VMware Cloud on AWS Recovery Software Defined Data Center ("SDDC").
VCDR supports two deployment modes- On-demand and Pilot Light:
- The on-demand deployment (also known as "just in time" deployment) of a cloud DR site provides an attractive alternative to continuously maintaining a warm standby cloud DR site. With on-demand deployment, the recurring costs of a cloud DR site are eliminated in their entirety until a failover occurs and cloud resources are provisioned.
- With a Pilot Light deployment, VMware Cloud Disaster Recovery enables a smaller subset of SDDC hosts to be deployed ahead-of-time for recovering critical applications with lower RTO requirements than an on-demand approach. The Pilot Light deployment mode assists organizations to reduce the total cost of cloud infrastructure by maintaining a scaled-down version of a fully functional environment always running in warm-standby while ensuring that core applications are readily available when a disaster occurs.
VCDR supports two different sites which can be configured as protected sites:
- On-prem vCenter
- VMware Cloud on AWS SDDC
The following figure demonstrates the communication between VCDR components.
While the minimum RTO of 4 hours is configurable, also add recovery SDDC creation time when designing for On-demand use case.
Enabling Replication Using VCDR
The following figure shows the replication and failback workflow in an on-demand SDDC Scenario.
The following figure shows the replication and failback workflow in a Pilot Light Scenario.
Setup a Protected Site
VCDR supports two types of protected sites for both On-Demand and Pilot Light SDDC deployment – on-prem vSphere and VMware Cloud on AWS SDDC.
Before configuring the on-prem environment, note the network requirements for the DRaaS connector and also configuring API token in the Intro to DR document. Procedure:
- In the VMware Cloud Disaster Recovery UI, click Sites > Protected sites.
- Click the Set up protected site button in the upper right corner.
- In the Setup protected site dialog box, under Site types select On-prem vSphere.
- Enter a name for the protected site.
- Select a time zone from the drop-down menu, and then click the button on the right to set the time zone for the protected site.
- Click Setup.
Before you set up a protected site for an SDDC, you must deploy an SDDC and have a network segment already configured for it.
When the protected site is a VMware Cloud on AWS SDDC, the time zone schedule followed for snapshot/replication is dependent on the region of the source SDDC deployment. The time zone cannot be modified and you must adjust the schedule of replication accordingly. Note: the SDDC which is being protected should belong to the same account and can exist in the same region or different region than the recovery site. However the Availability zone(AZ) must not be the same. Example: SDDC in regionA under AZ1 can be registered as protected site and recovery site can be in same regionA under AZ2 or AZ3, Or across region as well.
- In the VMware Cloud Disaster Recovery UI, click Sites > Protected sites.
- Click Set up protected site in the upper right corner.
- In the Setup protected site dialog box, under Site types select VMware Cloud on AWS.
- Under Cloud backup, if there is more than one Cloud backup site deployed in your environment, you can select the backup site. The backup site you select can be in the same AWS region where your Recovery SDDC is running, However it has has to under different availability zone(AZ).
- Under Time Zone, you see that the time zone is set to the same time zone as your recovery SDDC. After the protected site is created, you can change this time zone for the site.
- Click Next.
- Select an SDDC to protect. This SDDC cannot be in the same AWS region where your Recovery SDDC is deployed.
- Click Next.
- Create firewall rules. You have a choice when creating the firewall rules. You can allow the system to create firewall rules for the DRaaS Connector (recommended). Or you can manually create those firewall rules from the VMware Cloud Disaster Recovery UI. If you are not sure which to select, see for more information.
- Click Setup. When the site is set up, it is displayed as a protected site.
Deploying the DRaaS Connector
Once the sites are configured, the next step is to deploy a connector which would enable the SaaS orchestrator communicate with the vCenter that is either deployed on-prem or as a VMware Cloud SDDC. Refer to the for the VM CPU and network requirements.
The below figure demonstrates the communication between the various components:
Using the VMware Cloud Disaster Recovery UI, you can copy the URL to download the OVA into your environment. The DRaaS Connector has access to the resources of the selected object. For example, a VM has access to the memory and CPU resources of the host on which it resides.
Follow the steps below to begin deploying the connector:
- In the VMware Cloud Disaster Recovery UI, click Sites > Protected sites and then click the protected site on the left side of the application.
- Under Connectors, click Deploy. (If this protected site is an SDDC, the Deploy button is under Clusters.) Click the Copy button to copy the download URL.
- In vSphere, select any inventory object that is a valid parent object of a virtual machine, such as a data center, folder, cluster, resource pool, or host.
- Right-click the object and select Actions → Deploy OVF Template.
- Follow the directions on the wizard for deployment.
- Do not name the DRaaS Connector VM using the same naming conventions you use to name VMs in your vSphere environment. Avoid giving the DRaaS Connector VM a name that might match the VM name pattern you use when you define protection groups.
- If you are deploying the DRaaS Connector to a VMware Cloud SDDC with more than one cluster, you must choose a cluster to deploy the connector VM on. Each cluster in your SDDC must have the connector VM deployed on it in order for the VMs running on the cluster to be added to protection groups and replicated to a cloud backup site.
- Do not use non-ASCII characters for the connector name label.
Create a Protection Group
A protection group means that you create and group a set of VMs which can be then used for recovery. You can create multiple groups of VMs. VMs that are a part of a Protection group should exist on the same protected site only. Creation of protection group needs an existing subscription. to read more.
- Choose the protected site and vCenter from which the VMs will be selected.
- Create vCenter queries that define the protection group's dynamic membership. A vCenter query is defined using vSphere tags, folders, and/or VM name patterns. These vCenter queries are evaluated each time a snapshot is taken and define the contents of the snapshot.
- Define how frequently you want to take snapshots of the VMs defined in the group using a snapshot schedule. You also define how long you want to retain those snapshots on the SCFS.
Important: Research indicates most victims of ransomware don't discover that they have been compromised until an average of 3-6 months after the initial infection, so choose the retention period accordingly.
Create a DR Plan
A DR plan defines the orchestration configuration for disaster recovery and workload mobility.
You can create, name, edit, duplicate, save, and run DR Plans. Environment variables in a plan map differences between the sites for smooth recovery, ensuring that vSphere configurations and parameters are mapped consistently between sites.
Plans run either for recovery as an actual DR operation, or they run as a test recovery, which performs all the plan’s recovery operations in a test site for validation.
VMware Cloud Disaster Recovery can maintain multiple plans of different types, and multiple plans can be in various stages of execution at any given time, even concurrently.
The following operations are allowed under the DR plan section:
- Configuring DR Plans - require defining where you want your protected data moved to when the plan runs.
- Viewing DR Plans - shows the currently defined plans along with plan summary information: the current status, protected and recovery sites, and the last run compliance check results.
- Activating DR Plans - can be in an active or deactivated state.
Configuring a DR Plan for On-demand Use Case
Configuring DR plans require defining where you want to move your protected data when the plan runs.
Specifically, your plan must define where the protected resources such as protection groups, VMs, files, vCenter(s), all vCenter folders, compute resources, virtual networks, and IP addresses (individual or ranges) are moved to on the recovery site.
Follow the steps below to configure….alerts and recovery steps? In the VCDR UI:
- Create a DR plan and choose Recovery Site for Failover. For this use case choose Recovery SDDC deployed in case of disaster.
- Choose the Protected site (in case you have more than one, choose the site where the protection group is created).
- Choose the protection group which would be failed over when this DR plan is executed.
- Choose a location to save the custom scripts.
- Add recovery steps. There are different options under the Recovery Steps.
- Choose a step which can be executed for either whole protection groups or an individual VM under the protection group.
- Select the power action for recovered VMs.
- Select pre-recover or post-recover actions from the drop-down menu which can be running scripts which were saved under step 4 above.
- Configure alerts. Ensure you have added recipients for alerts.
Configuring a DR Plan for Pilot Light
Configuring DR plans require defining where you want your protected data moved to when the plan runs.
Specifically, your plan must define where protected resources are moved to on the recovery site: protection groups, VMs, files, vCenter(s), all vCenter folders, compute resources, virtual networks, and IP addresses (individual or ranges).
Additionally, you must configure the Test Site operating environment for failover exercises.
- Create a DR plan and choose Recovery Site for Failover. For this configuration, choose Existing Recovery SDDC.
- Choose the Protected site (if there is more than site, choose the site where the PG is created).
- Choose the Protection group which would be failed over when this DR plan is executed.
- Perform Mapping for vCenter, Folders, Compute, network and IP between the sites. An option flag is provided for each of these options if the mappings are same for test and failover.
- Perform protected and recovery site vCenter Mapping.
- Map each vCenter folder in 'Protected Site' containing protected VMs to a folder in 'Recovery SDDC'.
- Map each vCenter compute resource in 'Protected Site' containing protected VMs to a compute resource in 'Recovery SDDC'
- Map each vCenter virtual network in 'Protected Site' containing protected VMs to a virtual network in 'Recovery SDDC'
- Map source subnets to subnets in Recovery SDDC
- The script VM is where the custom scripts specified in the recovery steps are run. Both Windows and Linux are supported; the VM must have VMware Tools installed. When running the plan, you will need to enter the credentials to run the scripts.
- Add recovery steps. There are different configurable option under the Recovery Steps.
- Choose the type of execution of step which can be but not limited to whole Protection groups or a few individual VM under the protection group.
- Power action for recovered VMs.
- Pre-recover or Post-recover actions which can be running scripts which were saved under step 4 above.
- Configure alerts. Ensure to add recipients for alerts.
Now that you have created a DR plan, take a look at these additional options.
- DR plan status
- Compliance checks status
- Executing a test plan
- Perform a failover
Status of DR plan
Each DR plan protects a single vSphere protected site. The status of DR plans are different when created for On-demand and Pilot Light deployments. The On-Demand deployment will show the status No recovery site' which is expected. At this moment there is no automated option of running a test recovery for an on-demand use case unless the recovery SDDC is deployed. You can also choose to have the DR plan in activated or a deactivated state when the status of the DR plan is Ready.
When configuring a recovery site for On-demand DR plan you need to perform similar mapping as done for Pilot light which includes vCenter, Compute, Folder, Networks and IP address. You also need to specify the Script VM configuration.
The Pilot-Recovery DR plan status is 'Ready' since the recovery site configuration and mapping of resource is a mandate when creating one.
Compliance checks status
Continuous compliance checks verify the integrity of a DR plan and ensure that any changes in the failover environment do not invalidate a DR Plan’s directives when running.
Compliance checks also make sure that the specified protection groups are live on the protected site and are replicating successfully to the target Recovery SDDC. Compliance checks run automatically every 30 minutes for activated plans. A plan can become out of compliance if any of its conditions become violated because of environmental or plan configuration changes. An example is in below screenshot.
Executing a Test Plan
A test failover runs in the context of its own test failover environment, specified by the DR plan’s test mapping rules. The results of the test failover do not permanently affect a target failover destination.
You can choose the snapshot state to be used to run the test recovery. A test failover stops on the first failure by default. You can override all other default behavior using custom options.
Test failover operations give you the option of performing a full storage vMotion from the staging datastore to the SDDC datastore to emulate a real failover, or to leave VMs on the staging datastore to cut down on the failover time (preview feature), and to allow you to test and debug your failover faster. This preview feature is available in Test Failover only at this time.
Note that a svMotion(All test recovered VM’s) task during a test failover must complete in order to initiate a cleanup. The time taken for this Storage vmotion task is dependent on the how large are the disk attached to the VM’s.
The following steps are executed when you choose the Test storage migration to leave the VMs and files in cloud backup. Note that there is no relocate virtual machine task executed.
Perform a failover
First wizard when performing a failover is summary of compliance check. This is in place to ensure that compliance information is validated before execution. You also get to choose the snapshot available from the DR plan which must be used during failover.
The Runtime is default at Ignore all error by default and the preview would help understand the steps which would be executed. When it begins running, the plan moves into the failover state. You can observe the running plan’s progress from the plan’s detail page.
You can also observe the storage VMotion from the Staging datastore to the SDDC datastore following VM recovery
While a plan runs, you can perform the following operations:
- Cancel/Cancel and Rollback
- Wait for user input
Note the events on the VM which is chosen for failover. The VM is relocated to the WorkloadDatastore of the SDDC from the Cloud backup location mounted as ds01 over NFS.
Failback to the Primary Site
After a failback is executed , you can replicate the data from the VMware Cloud on AWS SDDC to the protected site. A failback from VMware Cloud on AWS includes the following sequence of steps:
- VMs are powered off on the SDDC.
- A last VM snapshot is taken following the power off. The differences between the VM state at the time of recovery and failback are then applied to the snapshot used for recovery to construct a VM backup on the SCFS for subsequent retrieval.
- These VM backups are then retrieved to an on-premises system using a general forever incremental protocol.
- VMs are recovered to a protected vSphere site.
- Upon successful recovery, VMs are automatically deleted from the SDDC.
- Option to create a failback plan when committing a failover is provided thus reducing the overhead for creating a DR plan manually.
You must configure the default datastore for recovery VM. This is useful when you choose to restore VMs back to different location than the earlier VM. This location can be a datastore in an existing protected site or a new datastore in a new protected site.
Failback from an SDDC returns only changed data. There is no rehydration, and the data remains in its native compressed and deduplicated form. You have to activate the plan which is created to run the failback.
Once the plan is activated, you get an option to failover from the VMware Cloud on AWS SDDC:
Failover and test failover reports provide information about a completed DR plan operation.
After a failover or test failover plan has completed (and you have committed or acknowledged the plan), you can generate a PDF report of the plan operation by clicking the Reports tab on a plan’s details page.
The generated report contains summaries of the plan configuration at the time of recovery, and a summary of the recovery. The report also includes details for the recovery mappings, the plan’s recovery steps, and a detailed report on each action taken during the recovery operation, and any errors that occurred.