July 18, 2023

Leverage the Cloud File System for DR

Run VMs on Cloud File System

The latest release of VMware Cloud DR has a new capability now generally available that can improve the dynamics of disaster recovery failover and failback scenarios. This feature allows you to run VMs on the Cloud File System – and as the description suggests, it’s just that simple.

What is the Cloud File System?   The Cloud File System performs two functions for VMware Cloud DR.  The first is to effectively act as the “Cloud Backup” where data is replicated and stored in an immutable air-gapped file system.  The second is to provide a high performance, high capacity, low latency, NFS datastore directly to your Recovery SDDC vCenter from which copies of any VM needed for Disaster Recovery or for Ransomware Recovery workloads can boot immediately without waiting for restores, hydrations or data migration / recovery.

Prior to this release, running a workload on the Cloud File System was a capability only available during Recovery plan testing. Allowing VMs to boot and run on the Cloud File System helped speed up cycle times when performing frequent testing of Recovery plans as the VMs did not need to storage vMotion to the vSAN datastore after powering on and returning to service. The VMs running on the Cloud File System that is mounted directly to the VMC Recovery SDDC start up quickly and run efficiently without the extra overhead of moving them just for testing.

Now, when running a Recovery plan in full disaster failover mode, you can choose to leave the VM workloads running on the Cloud File System datastore and commit the plan without waiting for a storage vMotion to take place, thus allowing the workloads to run in the disaster recovery site utilizing the full potential of the Cloud File System (NFS) datastore that is fundamental to the VMware Cloud DR architecture.

This is a new runtime option available to all Recovery plans and requires no special configuration or setup. When it comes time to run the plan – either for testing or actual failover – the operator is presented with the choice of where to run the failed over workloads. The choice will be remembered for this Recovery plan as the default for future runs. This selection is shown in the figure below:

A screenshot of a computer</p>
<p>Description automatically generated

 

There are several advantages and a few considerations that go along with this new functionality.

Advantages

Flexibility of operations for the SDDC storage during a DR event

First, failover plans can now complete more quickly and get to the final Commit phase – as the background storage migration previously required after VM restart during a failover can be bypassed by choosing to leave the VMs running on the Cloud File System datastore used when they are initially brought into inventory and powered on. Even though the VM migration was performed in the background as the last step of the failover automation, it had to complete before the plan could be finalized, which, depending on aggregate VM size could take some time to complete.

Simpler and faster failover and failback operations

In addition to simpler failover processing, after the disaster has been resolved at the Production site, failback from the Recovery site SDDC to the Production is also simplified and potentially faster – as the delta changes incurred by the VM while running in DR mode in the Recovery SDDC, do not have to be extracted from the vSAN datastore and passed through the Cloud File System on the path back to the production site. This method also speeds up failback times with a more simplified and automated failback process.

Leverage cloud scalability and elasticity

Leaving the VM workloads running on the Cloud File System, has another potential recovery site design configuration advantage. There is now an associated reduction in total SDDC vSAN storage required for the failover configuration, further leveraging the cloud economics, elasticity, and scalability of this DR solution. This reduction vSAN storage needed during disaster operations may also reduce the number of SDDC hosts required in the Recovery SDDC.  In addition to financial savings during recovery, this could also make failover less complex with potentially fewer hosts – and fewer clusters – required for mapping and recovery purposes.

When running workloads in the Recovery SDDC, it is essential to understand how much datacenter resources are needed in the SDDC to optimally run those VMs. Compute resources include CPU and memory – which is dictated by the SDDC host type used. Storage (disk) resource capacity is now driven by two different scenarios – the vSAN capacity provided from each host in the SDDC cluster – and now the NFS datastore capacity available in the Cloud File System.

It’s important to note that the vSAN datastore is more performance oriented using local solid-state disks from each of the configured SDDC hosts and the Cloud File System (NFS) datastore is more capacity oriented and can support up to approximately 400 TiB of VM capacity on a single instance.

Considerations

It’s important to note that this feature applies to the entire Recovery plan and all Protection Groups (PGs) used in the scope of that plan. When building and refining protection policies (PGs) and Recovery plans, it’s good to keep this requirement in mind as it could drive the granularity built into the configuration. The flexibility of policy and plan membership is further exploited with this new capability.

The main thing to consider is that not every workload is an ideal candidate for running on the Cloud File System. The SDDC vSAN datastore is built for performance and the Cloud File System is built for capacity. To determine where your production workloads fit into this new capability, it’s easy to leverage the existing ability to run the workloads on the Cloud File System during DR testing activities. This will help profile and plan for the best options to use during actual DR scenarios.

Testing your DR workloads while running on the Cloud File System datastore is a simple process and can yield practical insights into the performance thresholds required so that it is easier to make the optimal storage decision.

With the launch of this feature, we are recommending that for almost all use cases, the option to run from the Cloud File System datastore should be the right one.  Faster failover time, faster access to VMs, better initial performance, less network impact with storage vMotion are all great reasons to use this option. The Cloud File is a proven solution that will meet the storage IO requirements of most workloads, especially during a DR event – and if there are a few workloads that require vSAN performance, it’s a simple matter to simply storage vMotion them to vSAN after the DR plan is complete and committed to ensure those workloads get the IO that they need.  

Monitoring

When Recovery plans are active and VMs are running on the Cloud File System – as opposed to in the SDDC vSAN datastore – there is also a useful indication provided in the UI for the Cloud File System details as we see here:

A screenshot of a computer</p>
<p>Description automatically generated

 

And the Virtual Machine inventory list, as shown below, also indicates which VMs are currently running live on the file system in the Recovery site.

A screenshot of a search</p>
<p>Description automatically generated

This new setting is also tracked in any associated Recovery plan run or test report produced for tracking the configuration used during testing or failover. In the PDF reports created, you will now see the VM storge option used for that run (as show below).

A close up of a sign</p>
<p>Description automatically generated

Take advantage of the new choice

Running workloads live on the Cloud File System during failover provides multiple benefits to help simplify and optimize existing DRaaS operations. No immediate actions are required to enable the feature or reconfigure existing DR configurations. Simply choose to run the workload live on the Cloud File System during plan execution.

With this new capability, you now have more flexibility to address the disaster recovery handling of different workloads and target them to either the SDDC vSAN datastore or the Cloud File System datastore presented with VMware Cloud DR when running them in the Recovery SDDC.

For more details on this feature and what else is covered in this release – check out the latest VMware Cloud DR Release Notes or the VMware Cloud DR Product Documentation.

Filter Tags

Cloud Disaster Recovery Disaster Recovery VMware Cloud on AWS Blog Feature Brief What's New Overview