VMware Site Recovery Use Cases and Benefits for Google Cloud VMware Engine
Use Cases
Though the primary use case of a Site Recovery Manager (SRM) is to perform a recovery of a site in case of disaster, it can also perform additional tasks. For example, the VMware Site Recovery Manager supports non-disruptive testing of recovery plans in network and storage isolated environments. This provides the ability to test disaster recovery, disaster avoidance, or planned migrations as frequently as desired to ensure confidence in the configuration and operation of recovery plans.
Disaster Recovery
SRM was specifically designed to accomplish disaster recovery or unplanned failover. This is the most critical, however, it is the most infrequently used operation for SRM. Unexpected site failures don’t happen often but when they do, a fast recovery is critical to the business. VMware’s SRM can help in this situation by automating and orchestrating the recovery of critical business systems for partial or full site failures ensuring the fastest RTO.
Disaster Avoidance
Preventive failover is another common use case for SRM. This can be anything from an oncoming storm to the threat of power issues. SRM allows for the graceful shutdown of virtual machines at the protected site, full replication of data, and ordered startup of virtual machines and applications at the recovery site ensuring app consistency and zero data loss.
Upgrade Patch and Testing
The VMware Site Recovery sandboxed test environment provides a perfect location for conducting an operating system and application upgrade and patch testing. Test environments are complete copies of production environments configured in an isolated network segment which ensures that testing is as realistic as possible while at the same time not impacting production workloads or replication.
Topologies
SRM can be used in several different failover scenarios depending on customer requirements, constraints, and objectives. All these arrangements are supported and easily configured.
- Active-Active
- Active-Passive
- Bi-directional
Benefits
VMware Site Recovery Manager utilizes vSphere Replication to move virtual machine data between sites. vSphere Replication can utilize any storage supported by vSphere so there is no requirement for storage arrays, similar or otherwise at either site. There are many benefits to using VSR with SRM:
- Build Flexible Configurations (different topologies)
- Customize the recovery point objective (RPO) from 5 minutes to 24 hours
- Use multiple point-in-time (MPIT) recovery to revert to previous known states
- Eliminate Storage Lock-In
- Use Microsoft Volume Shadow Copy Service (VSS) and Linux file system quiescing to ensure guest applications are not affected during replication
- Optionally enable data compression to further reduce network bandwidth consumption refer : Replication data compression
Features
Protection Groups
A protection group consists of the virtual machines that support service or application that together provides a function. For example, an application might consist of a two-server database cluster, three application servers, and four web servers. In most cases, it would not be beneficial to fail over part of this application, only two or three of the virtual machines in the example, so all nine virtual machines would be included in a single protection group.
Creating a protection group for each application or service has the benefit of selective testing. Having a protection group for each application enables non-disruptive, low-risk testing of individual applications allowing application owners to non-disruptively test disaster recovery plans as needed. Note that a virtual machine can only belong to a single protection group. However, a protection group can belong to one or more recovery plans.
Recovery Plans
Recovery Plans in VMware Site Recovery Manager are like an automated playbook, controlling all the steps in the recovery process. A recovery plan contains one or more protection groups and they can be included in more than one recovery plan. This provides for the flexibility to test or recover an application by itself and also test or recover a group of applications or the entire site.
A recovery plan provides following customizable options in workflow:
Priority Groups
There are five priority groups in VMware Site Recovery Manager. The virtual machines in priority group one are recovered first, then the virtual machines in priority group two are recovered, and so on. All virtual machines in a priority group are started at the same time and the next priority group is started only after all virtual machines are booted up and responding.
This provides administrators one option for prioritizing the recovery of virtual machines. For example, the most important virtual machines with the lowest RTO are typically placed in the first priority group and less important virtual machines in subsequent priority groups. Another example is by application tier - database servers could be placed in priority group two; application and middleware servers in priority group 3; client and web servers in priority group four.
Dependencies
When more granularity is needed for startup order dependencies can be used. A dependency requires that before a virtual machine can start, a specific other virtual machine must already be running. For example, a virtual machine named “app01” can be configured to have a dependency on a virtual machine named “DB01” - VMware Site Recovery Manager will wait until “DB01” starts before powering on “app01”. VMware Tools heartbeats are used to validate when a virtual machine has started successfully.
Shutdown and Startup Actions
Shutdown actions apply to the protected virtual machines at the protected site during the run of a recovery plan. Shutdown actions are not used during the test of a recovery plan. By default, VMware Site Recovery Manager will issue a guest OS shutdown, which requires VMware Tools and there is a time limit of five minutes. The time limit can be modified. If the guest OS shutdown fails and the time limit is reached, the virtual machine is powered off.
Shutting down and powering off the protected virtual machines at the protected site when running a recovery plan is important for a few reasons. First, shutting it down quiesces the guest OS and applications before the final storage synchronization occurs. And second, it avoids the potential conflict of having virtual machines with duplicate network configurations on the same network.
A startup action applies to a virtual machine that is recovered by VMware Site Recovery Manager. There are two options to choose from here. Power and keep powered off. This option would used in accordance with the needs of the VM or application. In some cases, we may just want to recover and keep the VM powered off for later use.
Pre and Post Power-On Steps
As part of a recovery plan, VMware Site Recovery Manager can run a command on a recovered virtual machine after powering it on. A common use case is calling a script to perform actions such as making changes to DNS and modifying application settings on a physical server. VMware Site Recovery Manager can also display a visual prompt before or after any step in the recovery plan. This prompt might be used to remind an operator to place a call to an application owner, modify the configuration of a router, or verify the status of a physical machine.
IP customization
The most modified virtual machine recovery property is IP customization. Most organizations have different IP address ranges at the protected and recovery sites. When a virtual machine is failed over, VMware Site Recovery Manager can automatically change the network configuration (IP address, default gateway, etc.) of the virtual network interface card(s) in the virtual machine. This functionality is available in both failover and failback operations.
History Reports
When workflows such as a recovery plan test and clean up are performed in VMware Site Recovery Manager, history reports are automatically generated. These reports document items such as the workflow name, execution times, successful operations, failures, and error messages. History reports are useful for several reasons, including internal auditing, proof of disaster recovery protection for regulatory requirements, and troubleshooting. Reports can be exported to HTML, XML, CSV, or a Microsoft Excel or Word document.