Replication and Disaster Recovery Using VMware Cloud Disaster Recovery (VCDR)

Introduction

VMware Cloud Disaster Recovery (VCDR) protects your vSphere virtual machines by replicating them periodically to a VMware Cloud backup site and recovering them as needed to a VMware Cloud on AWS Recovery Software Defined Data Center ("SDDC").

VCDR supports two deployment modes- On-demand and Pilot Light:

  • The on-demand deployment (also known as "just in time" deployment) of a cloud DR site provides an attractive alternative to continuously maintaining a warm standby cloud DR site. With on-demand deployment, the recurring costs of a cloud DR site are eliminated in their entirety until a failover occurs and cloud resources are provisioned.
  • With a Pilot Light deployment, VMware Cloud Disaster Recovery enables a smaller subset of SDDC hosts to be deployed ahead-of-time for recovering critical applications with lower RTO requirements than an on-demand approach. The Pilot Light deployment mode assists organizations to reduce the total cost of cloud infrastructure by maintaining a scaled-down version of a fully functional environment always running in warm-standby while ensuring that core applications are readily available when a disaster  occurs.

VCDR supports two different sites which can be configured as protected sites:

  • On-prem vCenter
  • VMware Cloud on AWS SDDC

The following figure demonstrates the communication between VCDR components.

 

 

Use Cases

  • Protecting Virtual machine from either one or more sites
  • Utilise the Cloud backup space to save the replicated VM from different Protected sites
  • Configure the replication of VM which has a Higher RTO as compared to critical workloads
  • Single VM Recovery from Snapshots to a recovery site or on protected site
  • Ransomware Recovery

Pre-requisites

  • User account API token
  • On-prem or VMC on AWS SDDC
  • Subscription for VCDR
  • Recovery SDDC (3 node minimum) deployed in case of Pilot Light.
  • When you deploy your recovery SDDC, it must be connected to an AWS account belonging to you, called the 'customer AWS account'.

General Considerations/Recommendations

  • When you deploy your Recovery SDDC, it must be connected to an AWS account belonging to you, called the 'customer AWS account'.
  • Since VCDR also utilizes snapshot technology, ensure no overlap of backup and replication tasks are happening at the same time.
  • Deploy a minimum of 2 Connector for resilience and load balancing
  • Add additional connector for every 500 VM configured for replication

Cost Implications

  • Combination of storage and virtual machine. For detailed information, see the VCDR pricing information.
  • per-TiB charge based on the protected storage capacity
  • per virtual machine charge based on the number of protected VMs (total price = $/TiB + $/VM)

Performance Considerations

While the minimum RTO of 4 hours is configurable, also add recovery SDDC creation time when designing for On-demand use case.

Documentation Reference

Last Updated

June 2021

 

Enabling Replication Using VCDR

The following figure shows the replication and failback workflow in an on-demand SDDC Scenario.

The following figure shows the replication and failback workflow in a Pilot Light Scenario.

Setup a Protected Site

VCDR supports two types of protected sites for both On-Demand and Pilot Light SDDC deployment – on-prem vSphere and VMware Cloud on AWS SDDC.

On-Prem vSphere 

Before configuring the on-prem environment, note the network requirements for the DRaaS connector  and also configuring API token in the  Intro to DR document. Procedure:

  1. In the VMware Cloud Disaster Recovery UI, click Sites > Protected sites.
  2. Click the Set up protected site button in the upper right corner.
  3. In the Setup protected site dialog box, under Site types select On-prem vSphere.
  4. Enter a name for the protected site.
  5. Select a time zone from the drop-down menu, and then click the button on the right to set the time zone for the protected site.
  6. Click Setup.

 

VMware Cloud on AWS SDDC 

Before you set up a protected site for an SDDC, you must deploy an SDDC and have a network segment already configured for it.

When the protected site is a VMware Cloud on AWS SDDC, the time zone schedule followed for snapshot/replication is dependent on the region of the source SDDC deployment. The time zone cannot be modified and you must adjust the schedule of replication accordingly.  Note:  the SDDC which is being protected should belong to the same account .

  1. In the VMware Cloud Disaster Recovery UI, click Sites > Protected sites.
  2. Click Set up protected site in the upper right corner.
  3. In the Setup protected site dialog box, under Site types select VMware Cloud on AWS.
  4. Under Cloud backup, if there is more than one Cloud backup site deployed in your environment, you can select the backup site. The backup site you select cannot be in the same AWS region where your Recovery SDDC is running.
  5. Under Time Zone, you see that the time zone is set to the same time zone as your recovery SDDC. After the protected site is created, you can change this time zone for the site.
  6. Click Next.
  7. Select an SDDC to protect. This SDDC cannot be in the same AWS region where your Recovery SDDC is deployed.
  8. Click Next.

  1. Create firewall rules. You have a choice when creating the firewall rules. You can allow the system to create firewall rules for the DRaaS Connector (recommended). Or you can manually create those firewall rules from the VMware Cloud Disaster Recovery UI. If you are not sure which to select, see Network Considerations for a Protected SDDC for more information.

 

  1. Click Setup. When the site is set up, it is displayed as a protected site.
     

Deploying the DRaaS Connector

Once the sites are configured, the next step is to deploy a connector which would enable the SaaS orchestrator communicate with the vCenter that is either deployed on-prem or as a VMware Cloud SDDC. Refer to the Introduction doc for the VM CPU and network requirements.

The below figure demonstrates the communication between the various components:

Diagram</p>
<p>Description automatically generated

Using the VMware Cloud Disaster Recovery UI, you can copy the URL to download the OVA into your environment. The DRaaS Connector has access to the resources of the selected object. For example, a VM has access to the memory and CPU resources of the host on which it resides.

Follow the steps below to begin deploying the connector:

  1. In the VMware Cloud Disaster Recovery UI, click Sites > Protected sites and then click the protected site on the left side of the application.
  2. Under Connectors, click Deploy. (If this protected site is an SDDC, the Deploy button is under Clusters.) Click the Copy button to copy the download URL.
     
  3. In vSphere, select any inventory object that is a valid parent object of a virtual machine, such as a data center, folder, cluster, resource pool, or host.
  4. Right-click the object and select Actions  Deploy OVF Template.
  5. Follow the directions on the wizard for deployment.

Note:

  • Do not name the DRaaS Connector VM using the same naming conventions you use to name VMs in your vSphere environment. Avoid giving the DRaaS Connector VM a name that might match the VM name pattern you use when you define protection groups.
  • If you are deploying the DRaaS Connector to a VMware Cloud SDDC with more than one cluster, you must choose a cluster to deploy the connector VM on. Each cluster in your SDDC must have the connector VM deployed on it in order for the VMs running on the cluster to be added to protection groups and replicated to a cloud backup site.
  • Do not use non-ASCII characters for the connector name label.

Create a Protection Group

A protection group means that you create and group a set of VMs which can be then used for recovery. You can create multiple groups of VMs. VMs that are a part of a Protection group should exist on the same protected site only. Creation of protection group needs an existing subscription. Click here to read more.

  1. Choose the protected site and vCenter from which the  VMs will be selected.
  2. Create vCenter queries that define the protection group's dynamic membership. A vCenter query is defined using vSphere tags, folders, and/or VM name patterns. These vCenter queries are evaluated each time a snapshot is taken and define the contents of the snapshot.
  3. Define how frequently you want to take snapshots of the VMs defined in the group using a snapshot schedule. You also define how long you want to retain those snapshots on the SCFS.

 

 

Important: Research indicates most victims of ransomware don't discover that they have been compromised until an average of 3-6 months after the initial infection, so choose the retention period accordingly. 

Create a DR Plan

A DR plan defines the orchestration configuration for disaster recovery and workload mobility.

You can create, name, edit, duplicate, save, and run DR Plans. Environment variables in a plan map differences between the sites for smooth recovery, ensuring that vSphere configurations and parameters are mapped consistently between sites.

Plans run either for recovery as an actual DR operation, or they run as a test recovery, which performs all the plan’s recovery operations in a test site for validation.

VMware Cloud Disaster Recovery can maintain multiple plans of different types, and multiple plans can be in various stages of execution at any given time, even concurrently.

The following operations are allowed under the DR plan section:

  • Configuring DR Plans - require defining where you want your protected data moved to when the plan runs.
  • Viewing DR Plans - shows the currently defined plans along with plan summary information: the current status, protected and recovery sites, and the last run compliance check results.
  • Activating DR Plans - can be in an active or deactivated state.

Configuring a DR Plan for On-demand Use Case

Configuring DR plans require defining where you want to move your protected data  when the plan runs.

Specifically, your plan must define where the protected resources such as protection groups, VMs, files, vCenter(s), all vCenter folders, compute resources, virtual networks, and IP addresses (individual or ranges) are moved to on the recovery site.

Follow the steps below to configure….alerts and recovery steps? In the VCDR UI:

  1. Create a DR plan and choose Recovery Site for Failover. For this use case choose Recovery SDDC deployed in case of disaster.
  2. Choose the Protected site (in case you have more than one, choose the site where the protection group is created).
  3. Choose the protection group which would be failed over when this DR plan is executed.
  4. Choose a location to save the custom scripts.
  5. Add recovery steps. There are different  options under the Recovery Steps.
    1. Choose a step which can be executed for either whole protection groups or an individual VM under the protection group.
    2. Select the power action for recovered VMs.
    3. Select pre-recover or post-recover actions from the drop-down menu which can be running scripts which were saved under step 4 above.

 

  1. Configure alerts. Ensure you have added recipients for alerts.

 

Configuring a DR Plan for Pilot Light

Configuring DR plans require defining where you want your protected data moved to when the plan runs.

Specifically, your plan must define where protected resources are moved to on the recovery site: protection groups, VMs, files, vCenter(s), all vCenter folders, compute resources, virtual networks, and IP addresses (individual or ranges).

Additionally, you must configure the Test Site operating environment for failover exercises.

  1. Create a DR plan and choose Recovery Site for Failover. For this configuration, choose Existing Recovery SDDC.
  2. Choose the Protected site (if there is more than site, choose the site where the PG is created).
  3. Choose the Protection group which would be failed over when this DR plan is executed.
  4. Perform Mapping for vCenter, Folders, Compute, network and IP between the sites. An option flag is provided for each of these options if the mappings are same for test and failover.
    1. Perform protected and recovery site vCenter Mapping.
    2. Map each vCenter folder in 'Protected Site' containing protected VMs to a folder in 'Recovery SDDC'.
    3. Map each vCenter compute resource in 'Protected Site' containing protected VMs to a compute resource in 'Recovery SDDC'
    4. Map each vCenter virtual network in 'Protected Site' containing protected VMs to a virtual network in 'Recovery SDDC'
    5. Map source subnets to subnets in Recovery SDDC
  5. The script VM is where the custom scripts specified in the recovery steps are run. Both Windows and Linux are supported; the VM must have VMware Tools installed. When running the plan, you will need to enter the credentials to run the scripts.
  6. Add recovery steps. There are different configurable option under the Recovery Steps.
    1. Choose the type of execution of step which can be but not limited to whole Protection groups or a few individual VM under the protection group.
    2. Power action for recovered VMs.
  7. Pre-recover or Post-recover actions which can be running scripts which were saved under step 4 above.
     
  8. Configure alerts. Ensure to add recipients for alerts.

 

DR Execution

Now that you have created a DR plan, take a look at these additional options.

  • DR plan status
  • Compliance checks status
  • Executing a test plan
  • Perform a failover
  • Reports

Status of DR plan

Each DR plan protects a single vSphere protected site. The status of DR plans are different when created for  On-demand and Pilot Light deployments. The On-Demand deployment will show the status No recovery site' which is expected. At this moment there is no automated option of running a test recovery for an on-demand use case unless the recovery SDDC is deployed. You can also choose to have the DR plan in activated or a deactivated state when the status of the DR plan is Ready.

When configuring a recovery site for On-demand DR plan you need to perform similar mapping as done for Pilot light which includes vCenter, Compute, Folder, Networks and IP address. You also need to specify the Script VM configuration.

The Pilot-Recovery DR plan status is 'Ready' since the recovery site configuration and mapping of resource is a mandate when creating one.

There are additional states in a DR plan. For more details about these different states, see VCDR document.

Compliance checks status

Continuous compliance checks verify the integrity of a DR plan and ensure that any changes in the failover environment do not invalidate a DR Plan’s directives when running.

Compliance checks also make sure that the specified protection groups are live on the protected site and are replicating successfully to the target Recovery SDDC. Compliance checks run automatically every 30 minutes for activated plans. A plan can become out of compliance if any of its conditions become violated because of environmental or plan configuration changes. An example is in below screenshot.

 

Executing a Test Plan

A test failover runs in the context of its own test failover environment, specified by the DR plan’s test mapping rules. The results of the test failover do not permanently affect a target failover destination.

You can choose the snapshot state to be used to run the test recovery. A test failover stops on the first failure by default. You can override all other default behavior using custom options.

Test failover operations give you the option of performing a full storage vMotion from the staging datastore to the SDDC datastore to emulate a real failover, or to leave VMs on the staging datastore to cut down on the failover time (preview feature), and to allow you to test and debug your failover faster. This preview feature is available in Test Failover only at this time.

Note that a svMotion(All test recovered VM’s) task during a test failover must complete in order to initiate a cleanup. The time taken for this Storage vmotion task is dependent on the how large are the disk attached to the VM’s.

The following steps are executed when you choose the Test storage migration to leave the VMs and files in cloud backup. Note that there is no relocate virtual machine task executed.

 

Perform a failover

First wizard when performing a failover is summary of compliance check. This is in place to ensure that compliance information is validated before execution. You also get to choose the snapshot available from the DR plan which must be used during failover.

The Runtime is default at Ignore all error by default and the preview would help understand the steps which would be executed. When it begins running, the plan moves into the failover state. You can observe the running plan’s progress from the plan’s detail page.

You can also observe the storage VMotion from the Staging datastore to the SDDC datastore following VM recovery

While a plan runs, you can perform the following operations:

  • Cancel/Cancel and Rollback
  • Wait for user input
  • Terminate

Note the events on the VM which is chosen for failover. The VM is relocated to the WorkloadDatastore of the SDDC from the Cloud backup location mounted as ds01 over NFS.

Failback to the Primary Site

After a failback is executed , you can replicate the data from the VMware Cloud on AWS SDDC to the protected site. A failback from VMware Cloud on AWS includes the following sequence of steps:

  1. VMs are powered off on the SDDC.
  2. A last VM snapshot is taken following the power off. The differences between the VM state at the time of recovery and failback are then applied to the snapshot used for recovery to construct a VM backup on the SCFS for subsequent retrieval.
  3. These VM backups are then retrieved to an on-premises system using a general forever incremental protocol.
  4. VMs are recovered to a protected vSphere site.
  5. Upon successful recovery, VMs are automatically deleted from the SDDC.
  6. Option to create a failback plan when committing a failover is provided thus reducing the overhead for creating a DR plan manually.

You must configure the default datastore for recovery VM. This is useful when you choose to restore VMs back to different location than the earlier VM. This location can be a datastore in an existing protected site or a new datastore in a new protected site.

Failback from an SDDC returns only changed data. There is no rehydration, and the data remains in its native compressed and deduplicated form. You have to activate the plan which is created to run the failback.

Once the plan is activated, you get an option to failover from the VMware Cloud on AWS SDDC:

 

Reports

Failover and test failover reports provide information about a completed DR plan operation.

After a failover or test failover plan has completed (and you have committed or acknowledged the plan), you can generate a PDF report of the plan operation by clicking the Reports tab on a plan’s details page.

The generated report contains summaries of the plan configuration at the time of recovery, and a summary of the recovery. The report also includes details for the recovery mappings, the plan’s recovery steps, and a detailed report on each action taken during the recovery operation, and any errors that occurred.

Author and Contributors

Sharath has been working in the IT industry for over 15 years primarily on SDDC and cloud related technologies. For more information, see https://vmc.techzone.vmware.com/users/sharath-bn.

 

Filter Tags

DRaaS SDDC VMware Cloud Disaster Recovery VMware Cloud Document Technical Guide Intermediate Deploy Manage