Well-Architected Design: Cloud Migration Planning

Introduction

As a follow up to the Cloud Migration – Assessment, this document will focus on the planning phase of a cloud migration and builds upon the outcomes and results of the “Assessment and Discovery”. See the Well-Architected Design – Cloud Migration Assessment for more information. The applications discovered in the earlier phase are further curated and their logical grouping is fine-tuned based on different organizational and technical criteria. Necessary line-of-business approvals from application owners are obtained, and the target cloud(s) are sized and designed. One of the most important outcomes for this phase is a migration wave plan, which shows a timeline of the overall migration effort grouped into multiple waves.

 

Migration Planning

An outcome of the assessment phase is the comprehensive view of your infrastructure. This includes details of your source environment’s applications and technologies. With this data you can begin the planning phase of your cloud migration. As part of this, the migration strategy will be developed, the application scopes will be finalized, and wave plans will be created. At the end of this phase your organization will have a clear strategy to begin executing on the cloud migration. As we discuss these topics, it is good to approach cloud migration as an iterative journey that may not have concrete start and end scope. Instead, an organization may be on a continuous cycle of migrating application and workloads to or from their cloud locations. It is possible that each application may be in a different phase of the journey.

Develop a Cloud Migration Strategy

What is a Cloud Migration Strategy?

Cloud Migration is the process of transferring IT applications and workloads from a source cloud to a target cloud. For organizations which are planning a cloud migration for the first time, the source cloud typically consists of their on-premises datacenters. The target cloud can be one or more cloud provider environments of their choice. “Transferring” in this context does not just solely mean a 1:1 relocation or rehosting of such applications. Instead, the term “Transferring” can mean “Rehosting”, “Re-platforming”, “Refactoring” or “Repurchasing” corresponding to the commonly known “6Rs of Cloud Migration”. The 6Rs of Cloud Migration are described in the “Cloud Migration Strategies” Well-Architected Design. “Retaining” or “Retiring” workloads implies, that these workloads are out of scope of a cloud migration and belong to a bucket, which we call “outliers”. Although these outliers are not part of an actual migration, it is important to clearly define this group as well. In case of “retained” workloads it can even mean that those workloads staying in your on-premises datacenter may have dependencies to migrated workloads in the cloud and hence influence your operations and your network connectivity design.

A Cloud Migration Strategy is the state of having a clear and well documented plan and vision for transferring IT applications to the cloud.

Elements of a Cloud Migration Strategy Plan

A high-level cloud migration strategy plan covers the following aspects:

  • What is the overall goal for an organization to migrate its IT resources to the cloud?
  • What was the trigger of the cloud migration project (for example a planned datacenter closure or increase business agility with a certain application)?
  • What are the constraints for the cloud migration project?
    For example, if a datacenter will be shut down at a certain point of time, the evacuation of workloads must be finished by then.
  • Estimate the “R-mix” of your cloud migration.
    Based on the above-mentioned goals, triggers and constraints of your cloud migration you may specify the percentage of applications, which will be either rehosted, re-platformed, refactored and so on. For example, if your cloud migration project was triggered by a planned datacenter evacuation, then speed of migration will be a top priority and you will likely have a high percentage of rehosting activities.
  • What is the overall time scope of the cloud migration project? When is the planned project targeted to be finished?
  • What is the cost model of the cloud migration? Which cost savings are planned?
  • Based on which criteria will the target cloud provider platform(s) be selected?
    Potential criteria may for example be costs or existing preferences towards certain cloud provider native applications, which your business needs to support.
  • Define the success factors of your cloud migration.
    Aligning to your goal definition of your cloud migration project you need to define the success criteria to measure your project against. If for example your goal is to save costs with a cloud migration then you need to define metrics, how to measure these cost savings, and continuously monitor those. If your goal is to increase developer productivity you need to define, how to measure this as well.
  • Determine and document the stakeholders of your cloud migration project.

The following table provides information about the different factors you must consider during this phase of deployment:

Design Consideration

Design Justification

Design Implication

Create a high-level plan for your cloud migration strategy covering topics such as:

  • Goals and Triggering Events of a Cloud Migration.
  • Scope of Migration
  • Migration Timelines
  • Cost Model
  • Measurable Success Factors

A solid cloud migration strategy plan is required for any successful cloud migration.

Elaborating a cloud migration strategy is a complex task, which needs involvement of appropriate stakeholders and teams.

Detail your Cloud Migration Strategy

Once a high-level plan of your cloud migration strategy has been created, a detailed plan for all relevant disciplines needs to be produced. This ultimately enables the execution of a migration of all your workloads end-to-end. This plan will consist of, but may not be limited to, the following aspects:

  • Establish a Cross-Functional Team
    Aligning with the complexity of a cloud migration project ensure that all relevant skills and resources are grouped into a team, which accompanies all the phases of a cloud migration project. Required resources are for example application owners, IT infrastructure experts, IT operations experts and Networking specialists.
  • Define Migration Type for Applications
    For each application it must be defined, whether it will be rehosted, re-platformed, refactored, and so on. In this context it is also important to document “outliers”, which are out of scope due to various reasons, for example because they get retired or retained due to severe dependencies from on-premises infrastructure or other constraints.
  • Operational Transition of Applications
    Each application in your on-premises environments has a certain operational profile tied to it. As an example, this could be the operational profile for backup and monitoring. The areas of this profile must be clearly defined and ultimately mapped to corresponding operational processes and technologies in the cloud provider environment.
    Also ensure your operational processes are able to handle a hybrid environment. Operational procedures still need to take place when part of the workloads or applications are already migrated to the destination cloud while others still reside in the source on-premises datacenters. Depending on your strategy this hybrid environment will exist permanently for at least for a few months. As an example, operational processes such as backup, systems/applications monitoring must be capable to treat on-premises workloads as well as cloud workloads. Other operational areas to consider may include Security and Compliance or Identity Access Management.
  • Cloud Migration Governance
    In a large cloud migration project it is easy to lose control over the status of your migration effort, especially when you have a high number of applications and workloads, which reside in different phases of a migration. Any phase may have friction, as different tools or databases may be used in each phase. For example, the same set of applications may be discovered or inventoried separately by your assessment tool versus your cloud migration tool. Hence for a successful execution of your cloud migration it is key to establish a consistent dataset of applications and workloads across all phases of your cloud migration. At any point, you must be able to determine which phase each application resides within the cloud migration lifecycle.
  • Design Target Cloud
    In order to migrate workloads to the cloud you must have a clear vision of the configuration of your target cloud environment as well as how this configuration is deployed and scaled on demand. First and foremost, you will also have to select your cloud provider platforms. More detailed considerations for designing the Target Cloud are outlined in a later section of this document.
  • Design Hybrid Connectivity
    Network Connectivity between your source datacenter environments and your target cloud is required for the actual migration of workloads and ongoing operations of a permanent hybrid environment. Performance and bandwidth requirements as well as security requirements need to be carefully considered when designing the hybrid connectivity.
  • Create Migration Wave Plan
    Having discovered, curated, and scoped your applications you are able to detail out a migration wave plan. Not all applications and workloads will be migrated all at once. Instead, they will be assigned to migration waves, which occur sequentially. Considerations for creating a migration wave plan are outlined in the corresponding section further below in this document.

The following table provides information about the different factors you must consider during this phase of deployment:

Design Consideration

Design Justification

Design Implication

Identify the disciplines, for which your cloud migration strategy needs to be detailed. Assign clear responsibilities, ownership, and timelines to each of these disciplines.

A cloud migration is a complex effort, which requires involvement of various skillsets across your IT organization. Involving the right resources is key to success.

Involving cross-functional resources from multiple units of your IT organization requires a high level of commitment from each involved resource over a long period of time. Ensure, that these commitments are in place along with an authoritative sponsor for the whole project.

Finalize Application Scoping

One of the first steps in building your migration plan is defining the scope of your migration. Previously you grouped your application based on specific categories. These groupings helped you understand dependencies, such an application, networks, or datacenters. One example would be a network dependency between applications, forcing an entire network to be migrated in a single wave. You should now look at taking those groupings and make decisions on what is in scope for this cloud migration. For this, take the previous application groupings and add an additional tag to categorize workload as in-scope and out-of-scope.

In-scope

The following tags cover the “transferring” category as outlined in previous chapter. Applications and workload with these tags will be directly impacted as part of this cloud migration.

  • Rehosting tag - This tag will be applied to applications that are moving to a different hardware environment without making application code changes. This is also known as “Lift and Shift”.
  • Re-platforming tag - This tag is similar to “rehosting” but including making minor changes or optimization to the application. In this way you can gain some benefits of the cloud without the full cost associated with fully refactoring an application.
  • Refactoring tag - Is the rewriting of one or more components of an application, typically to take advantage of public cloud services. This can also involve refactoring the traditional application into from a legacy 3-tier application design to granular application built using micro-services.
  • Repurchasing tag - Repurchasing is the discontinuation of an on-premises application in favor of an off-the-shelf product offered by the cloud provider. Moving from a custom-built CRM system to a SAAS CRM service is an example of repurchasing. This is also known as “Drop-and-Shop”.

Out-of-scope

The following tags will be used for applications and workloads that will not be directly impacted during this cloud migration also referred to as “outliers”. Keep in mind, an application may be out-of-scope for this migration but move to in-scope for future migrations. In this sense, tags can be fluid.

  • Retain tag – This will be assigned to application and workloads that are out of scope for this stage of migration. For this discussion we will assume retained application will stay on their existing infrastructure with no change. An exception would be an application that is not in scope for migration but must be adjusted due to dependencies changes with migrated workloads. An example would be an application remains in the source cloud but must change its target database due to a workload that was migrated. We may revisit these applications in future migrations where they move to in-scope tags.
    • Reasons to retain workloads:
      • Moving application would introduce additional cost with no benefit.
      • Compliance regulations require data stays in source cloud.
      • Application platform is not supported in target cloud.
  • Retire tag – This will be assigned to applications and workloads that are out-of-scope for migration and will be removed during this stage.
    • Reasons to retire workloads:
      • Application redundancies.
      • Lowering costs by removing non-critical workloads.
      • Application has served their purpose but are no longer needed.

The following table provides information about the different factors you must consider during this phase of deployment:

Design Consideration

Design Justification

Design Implication

Logically tag your workloads and applications groups for migration planning purposes.

  • In-scope tags
    • Rehosting
    • Re-platforming
    • Refactoring
    • Repurchasing
  • Out-of-scope tags
    • Retain
    • Retire

When planning your migration, we should identify what is in scope for each phase. Applying tags will allow you to understand the workloads that are moving and effectively plan your migration waves

These tags will serve you in migration wave planning. Without identifying what is in scope, we have no way to gauge timing and level of effort involved with each wave.

Design Cloud Target

Choosing the right cloud provider and subsequent cloud offering is a critical part of migration planning. Often there are design choices difficult to change at best or irreversible at worst. This design does not go into details of the different cloud providers and their unique features and functionality; however, it does outline the design considerations that need to be made before deploying to your target cloud.

A common result of a cloud migration strategy is a target cloud comprised of Native Public Cloud (NPC) resources and VMware Cloud resources. As an example, you may have a database running in NPC and application servers running on a VMware Cloud offering. In the next section we will discuss design considerations for NPC and VMware Cloud. For this discussion we will assume that both NPC and VMware Cloud resources are hosted by a single provider such as Amazon Web Services + VMware Cloud on AWS.

Native Public Cloud

Public cloud providers offer a catalog of services and capabilities, with many similarities across providers. Each provider also has key differentiators in those services that may drive an organizations decision. As an example, all public cloud providers offer hosted database services, but they may have implemented horizontal scaling differently. This detail could determine supportability for your application on a particular cloud provider.

The following considerations should be evaluated when designing native cloud resources:

Regional availability – Cloud providers have resources and services deployed in many regions around the globe. Some services are only available in limited regions which may influence your design.

Security and compliance – When choosing to host applications in public clouds compliance becomes a key design consideration. Cloud compliance is grouped by category and industry with each provider meeting these requirements differently. Ensure your design includes support for your business’s compliance requirements.

SLAs – When looking at SLAs as part of your design, consider 2 areas:

  • SLAs by Service – Each cloud service has a defined service level agreement tied to it. For example, a database offering 99.999 (five 9s) uptime. This SLA is aligned to that single service.
  • SLAs by Application – With a typical application, we may have multiple services and variables beyond a single service SLA. A business must provide its customers an SLA that covers the entire application. This application may consist of 4 services and span multiple geographic regions. Ensure when designing for your application SLA you consider all the underlying services and each system SLA.
    • Serial SLA – An application depends on 2 Services, each with their own SLA.  If either service fails, the application fails.
    • Parallel SLA – To improve the application availability you deploy multiple replicas of each service.  By adding additional replicas, you increase a services SLA and therefore your applications SLA.

Licensing – Cloud providers have service licensing constraints that should be considered when designing your target cloud.

  • Deployment Model License – Cloud providers model their licensing support differently. As an example, Cloud Provider A allows you to run your preferred database as a virtual machine. Cloud Provider B only allows that database to be run on dedicated bare metal hardware. As a customer, Cloud Provider B will be substantially more expensive due to database licensing on bare metal.
  • Cloud Affinity Licensing – Some cloud services are owned and created by the cloud provider. In addition to offering the service on their platform, they may also have an agreement with Cloud Provider B to resell the service. In this example, Cloud Provider A may offer significant licensing discounts if you run the database on their cloud platform instead of Cloud Provider B.

VMware Cloud

VMware has partnerships with all the major cloud providers to offer a VMware full stack SDDC located within their datacenter. This gives customers the choice when selecting a public cloud to also run VMware workloads hosted and serviced by the cloud provider. This section will cover considerations when designing your VMware Cloud.

VMware versioning – The VMware Cloud offerings are consistent across providers in that they include the full SDDC stack of vSphere, NSX and vSAN. They may differ in the versioning deployed by the provider. Cloud provider A may run a vSphere 8 based SDDC, and cloud provider B is running a vSphere 7 based deployment. You should consider the VMware versions of the target cloud when designing for our cloud migration.

  • vSphere Version – Cloud providers have a level of control with regards to the VMware versions deployed in their offering. You may find that a cloud provider has an older vSphere version than your on-premises datacenter. In some cases, your IT operations may have tools or processes that are optimized for the version of vSphere deployed on-premises. In this case, your cloud provider decision may be based on the vSphere version deployed in each offering.
  • Virtual Machine Hardware Version – Each VMware virtual machine is assigned a hardware version that reflects the hardware features and capabilities that are supported by the VM. The hardware version is assigned at VM creation and can be upgraded as a day-2 operation. The hardware version also determines its compatibility with running on an ESXi host. Depending on your on-premises virtual machine hardware versions, you may find it is not compatible with in the VMware Cloud offering. As an example, suppose you are running vSphere 8.0 which supports VM hardware version 20. Your VMware cloud provider is running on vSphere 7.0 which supports up to version 19. In this case you will be unable to power on the VM until you downgrade the VM hardware version. Doing this may have unintended consequences, so when designing your target cloud, ensure you understand how different hardware version can impact workload migrations. When possible, select a cloud offering with the latest version of vSphere to give the greatest support flexibility.
  • EVC Mode - Enhanced vMotion Compatibility allows VMs to live migrate between ESXi hosts with different CPU features. This becomes a powerful workload migration feature as its common for on-premises hardware to be different from what is running in a VMware Cloud. Depending on how the cloud provider has implanted EVC in vSphere may impact your migration workflow and ability to migrate back on-premises. Such cases may require workload downtime to change EVC settings. Consider your CPU features and EVC compatibility when designing your target cloud.

SDDC Design/Features – The VMware software stack used across all VMware Cloud offerings are similar but has been implemented slightly differently by each of the providers. This results in some VMware feature differences. A customer should consider these differences when selecting a VMware Cloud provider:

  • vSphere Cluster Design – Designing your vSphere clusters is more than ensuring you have enough resources to run your workloads. We will look at a couple below.
    • Regions and Availability zones - When designing VMware Cloud deployments, you should consider where clusters are deployed. Depending on your application you may require clusters deployed in multiple regions. Therefore, ensure the cloud provider offers VMware resources in those regions.
    • Cluster Size – A common design decision with VMware clouds is cluster sizing. Depending on the cloud provider you may have limited flexibility when it comes to node choices and cluster sizes. Consider the following design options and ensure your provider supports this model.
      • Scale-out Cluster – consists of more ESXi hosts, smaller in size. This option tends to be cheaper to add incremental resources.
      • Scale-up Cluster – consists of fewer ESXi hosts, larger in size. Per GB / GHz may be cheaper with this design.
  • Stretched Cluster – This is where a vSphere cluster has been stretched across multiple regions with an addition node deployed as a witness. Not all VMware Cloud providers support this feature. If your design requires Stretched Clusters, be sure to consider this when selecting the provider.
  • Resource scaling – Some of VMware Cloud providers have implemented a resource scaling feature for their VMware Cloud offering. This allows the VMware Cloud to grow and shrink based on workload utilization.

Hardware - VMware Cloud runs on bare-metal servers chosen by the cloud provider. These servers have different capabilities that should be considered in your design.

  • Node Density – Each server that runs a VMware Cloud offering has preconfigured quantity of CPU and Memory. As an example, Cloud Provider A offers VMware nodes with 256 GB of RAM and Cloud Provider B offers 768 GB. This difference can impact the number of workloads you can run per node as well as how many nodes you require for you overall design. In addition, the per GB cost may vary between solutions. Design considerations should include selecting a provider and node type that meets your workloads.
  • Node Features – Depending on the server type offered by the cloud provider, a VMware Cloud offering may include offload features performed by additional chips and add on cards. One example would be a server that includes GPU cards that allow for specialized computing like AI and machine learning. Consider these provider specific features when designing your target cloud.

Design Network Connectivity

The network provides the backbone for all communication between applications, workloads, and users. Connecting the infrastructure is essential to every workload and application running in your environment. Modern IT infrastructures add a layer of complexity due to the mesh of geographic locations, including on-premises, hosted datacenters, and public clouds. Planning your network design for a cloud is critical to the success of your migration as well as the operations of your business after migration. A cloud migration can occur from “any-cloud” to “any-cloud” but the considerations in this section will focus on local on-premises as the source and cloud as the destination.

When you look at network connectivity for cloud migration, you should consider 2 categories. (Note, some network design elements will span both categories).

  1. Migration Network planning – This will cover network design elements specific to your migration project.
  2. Steady State Network planning – This will cover network design elements required to run your application in a production state, post migration.

In the following table we will look at topics to consider when planning your network connectivity.

Topic

Considerations

Type

When choosing how you will connect on-premises to the cloud we have 2 types of connections to consider, public and private.

Public – This type of connection uses the public Internet which is a shared network. This shared network can impact your performance as well as security controls as the traffic is not encrypted. A level of security can be added by establishing a site-to-site VPN to encrypt traffic. This type of connection is best suited for dev/test applications or as a redundant links to a private connection.

Private – This type of connection is a dedicated, circuit that bypasses the public internet. This connection is provided by the cloud provider or 3rd party. This is the best choice as you have predictable performance and security controls.

Performance

When planning your network design, you must ensure it meets performance requirements during migration as well as steady state operations. Performance consists of bandwidth and latency.

Migration network – The following outlines traffic types that exist only during the cloud migration:

  • Replication Traffic – This is the moving of data between sites in preparation and during the migration. This can include workload images and application data.
  • Backup Traffic – As you migration from one cloud to another, you may need to move existing backup replicas to the destination cloud.
  • Split Application Traffic – As workloads and application migrate, they will create additional traffic between on-premises and the cloud as the application will be temporarily split between sites.

Steady State Network – Traffic types that will exist during migration and continue once complete.

  • Replication Traffic – After the migration you may still have application and workloads replicating across the WAN. Examples include DR workloads and distributed databases.
  • User Traffic – With workloads spread across multiple locations, you may have additional user traffic between sites.
  • Distributed Applications – Prior to cloud migration, all workloads and applications lived in one location with localized traffic. After a cloud migration, some workloads may be retained on-premises. This will cause an increase in traffic between sites and application are distributed.

Availability

With a cloud migration, workloads become distributed between on-premises and cloud. Business critical application depend on the ability to communicate across sites. To ensure application SLAs, consider designing redundant connections between on-premises and the cloud.

Multiple connections – Plan to have 2 or more connections between your on-premises and cloud location with different providers. This will ensure no interruptions of service if one provide experiences an outage. Consider a VPN over the public internet as a backup option to save on cost.

Regional Redundancy – After a cloud migration, application may be spread across regions. Ensure you have network redundancy between regions. (May be covered by cloud provider SLAs)

Security

In planning your network design, consider the security requirements for both migration and steady state operations. For more detailed design recommendations see the VMware Cloud Well-Architected Design for Network Security.

Migration Network Security

  • Distributed Workloads – Applications and workloads may have security controls in place while running on-premises such as firewall rules and IDPS policies. During migration, you should ensure the security posture is consistent and these workloads do not become vulnerable.

Steady State Network Security

  • Site-to-Site – Each cloud providers has different security features available for their public and private connection options. Ensure the connection offerings meets your security and compliance controls.
  • Distributed Workloads – Once the migration is complete the application may be distributed across locations that use different security tools. An example may be different firewall products from on-premises to the cloud provider. Ensure your application complies to security requirements when spread across clouds.

Customize Application Profiles for the Target Cloud

Importance of Application Profiles

Typically, you will have already gathered network flow data during the migration assessment phase and should have already gained some level of insight of which workloads or VMs your applications are comprised of. However, the quality of this automatically gathered data will not yet meet the requirements needed to have fully curated application profiles. There are several reasons you need to have defined application profiles for the applications of your source environment:

  • Migrating workloads to the cloud is a large operational change. The infrastructure and application services in the cloud are not the same as compared to an on-premises environment.
  • Hardware infrastructure design is limited to the formalized descriptions of a cloud provider’s infrastructure and application services. This already drives the necessity of having clear application profiles for your applications and their required infrastructure.
  • The service level (SLA) of a cloud provider’s infrastructure and application services will need to match the level of service required by the migrated applications.

Categories of Application Profiles

An application itself is a complex construct, whose characteristics need to be split up into different categories. The full detail of each category is beyond the scope of this design. However, the following outlines an overview of the most relevant categories:

List of Application Components

Every application consists of 1 or more components or VMs, which should be documented as part of this profile category.

Network Flows and Dependencies

This category goes into the details of network flow relationships between Application Components. Typically, dependencies are analyzed via a corresponding network flow discovery tool in the migration assessment phase. Collected properties of a network flow are source and destination component/VM of a network communication, the protocol or UDP/TCP port correspondingly.

Security and Firewall Profile

A Security and Firewall Profile defines the level of protection for an application. It defines which other components or services an application or VM requires to communicate with. From this definition it can be derived which security segments or zones are required for the workloads. In other words, this profile defines the communication or isolation requirements of your application, both intra-application as well as external communication requirements. One example may be whether your application requires public internet connectivity.
Other aspects of a Security Profile may be the integration of an application in intrusion protection systems or antivirus infrastructures.

Performance Profile

The performance profile defines the required performance behavior of the application. Depending on the application design and the application monitoring concept this profile ideally consists of clearly measurable performance metrics, for example in case of a database application a performance metric would be the number of transactions per second, which a database must be capable to process. From this type of top-level application metrics, you may have derived lower-level infrastructure metrics. For example, if the database needs to support a defined number of x transactions per second, the storage subsystem hosting the database must be able to support a defined number of y write-IO-operations per second (and a defined number of z read-IO-operations per second). Performance profiles are usually highly specific for each individual application or application category.

Backup/Restore Profile

The Backup/Restore Profile of an application and its components defines the requirements of backing up your application’s state. It enables recovery after an outage or data corruption event. Some of the most important characteristics of a backup/restore profile are:

  • Recovery Point Objective (RPO):
    The RPO is the maximum amount of data loss measured by time after a recovery from an outage has occurred. For example, if an outage has occurred at the time t1 and you need to restore from backup, the newest available data restored from backup may be t1 - 1 hour. This would mean you have lost 1 hour of data due to your backup concept.
  • Recovery Time Objective (RTO):
    The RTO with regards to your backup process defines the time it takes to restore from backup. Note that in this context only the backup/restore process is referenced. Overall, the RTO of an application after a disaster can be much higher, as other environment recovery processes may be involved. The overall RTO plays an important role in an Availability Profile, see next section.
  • Backup Retention Policy:
    Based on the data or application requirements your backup retention policy does not only need to meet your RTO, but in certain cases you must be able to recover certain parts of your data to an older state from a backup that is older than the very latest available. Consider the following questions: How long back in time does your application require to potentially recover data from? Does the granularity of available backups need to stay the same the longer you move back in time? For example: Typically, you will need a staggered backup retention policy like “Keep all backups for 7 days, keep weekly backups for 4 weeks, keep monthly backups for 3 months”.

 

Availability Profile

The availability profile of an application in its core is quite simple and straight-forward: It defines the percentage of time, to which an application must be available. It is specified as a “number of nines” uptime as listed below:

  • “Five nines” equals 99.999% uptime or less than 6 minutes downtime a year.
  • “Four nines” equals 99.99% uptime or less than 1 hour downtime a year.
  • “Three nines” equals 99.9% uptime or less than 9 hours of downtime a year.

To increase the availability of an application beyond “three nines” or even “four nines” significant efforts are necessary either on an application level or infrastructure level (or even both levels).
 

Obtaining an increase of availability at an infrastructure level is typically achieved by implementing redundant hardware components on several layers like redundant server hardware components (e.g., network cards), redundant servers, redundant storage hardware or redundant networking equipment.
At an application level, increasing availability is achieved by deploying application components in a redundant way. For example, by deploying multiple web servers serving the same web content.
Depending on the design and complexity of an application options of increasing the availability of an application may be limited.

Compliance Profile

Compliance Profiles in the context of this discussion defines aspects of certain regulatory requirements which organizations may adhere to and which mandate, where an organization for example must host its data with certain applications. Prominent examples of this type of requirements are the General Data Protection Regulation (GDPR) of the European Union or the Health Insurance Portability and Accountability Act (HIPAA) in the USA. Regulations like these define rules for several aspects of data storage like data security, data privacy or data residency and transfer. Depending on how strict these regulations are, the options to migrate your applications with data can be severely limited or in rare cases even impossible. For example, a certain application may mandate to be hosted in the European Union due to regulatory requirements, but a corresponding cloud service of a certain provider may only be available for hosting in the USA. In such a case you must retain your application in your on-premises datacenter.

The following table provides information about the different factors you must consider during this phase of deployment:

Design Consideration

Design Justification

Design Implication

Ensure you create or customize your application profiles to prepare for a migration to the cloud

Having accurate application profiles helps you design your cloud environment along with peripheral operational services like backup or system monitoring. In case of regulatory requirements, it even helps you to detect applications, which must be retained on-premises and cannot be migrated to the cloud.

To prepare for a cloud migration a higher level of formalization of your application profiles may be required as compared to your on-premises environment.

Try to group your applications into larger buckets per application profile category. Do not create application profiles, which are individual for each application.

Grouping application profiles into buckets helps you simplify and streamline the cloud migration. As an example, you may have a “Gold”, “Silver” and “Bronze”-standard for your availability categories instead of defining an availability standard for each application in a highly individual way

Some highly specialized applications may require individual application profiles. But these applications may also be candidates for retaining on-premises.

Plan Infrastructure Technology Mappings

Delivering a business-critical application depends on more than just the application itself. Many things are needed to meet an application SLA. As an outcome of the previous section, you should have application profiles which mapped the dependencies for each workload. In your on-premises environment, these dependencies are met through several infrastructure technologies such as tools, features, and 3rd Party products. The challenge when migrating application to a new target cloud is ensuring that dependencies are met while not always having the same tools available in each location. In this section we will look at planning the technology mappings used at the source cloud to what is available at your target cloud.

This can be broken down into 3 types of technology mappings to help with planning.

  • Lift and Shift – A lift and shift technology mapping is one where the identical tool or feature is available at both the source and target clouds. This is likely the most straight forward mapping requiring the least work.
    • Firewall product – To meet the security requirements in your application profiles you may use the NSX-T distributed firewall on-premises. All VMware Cloud providers include NSX-T firewalling as part of the offering. In this case, the technology mapping is straight forward using the same product and features. For this, plan to migrate your firewall rules and ensure NSX-T licensing is the same to enable the necessary features.
  • Modernize – Some technologies used on-premises may be available in your target cloud but are consumed differently. This could be a constraint in how the provider has implemented the technology or even a strategic decision to shift operating model. As an example, the provider may choose to only support the SaaS based version in cloud.
    • VMware Aria Operation for Logs – When on-premises customers use Aria Operation for Logs, they have 2 deployment models, installed locally, or run as a SaaS service. When choosing to migrate workloads to a VMware Cloud offering, customers must use the SaaS based services for Aria Operation for Logs. This shift could impact the operational model for logging and require additional efforts to transition to the SaaS service.
  • Drop and Shop – In some cases a technology used at the source cloud is simply not available at the destination. For these scenarios a direct replacement product is needed, or you may choose to meet the application requirement through a different method.
    • Direct replacement – Meet application profile dependencies through finding a product replacement from a different vendor, such as finding a new backup product that supports your workloads and destination cloud.
    • Alternative solution – An application profile may have available SLAs that are met through a different technology. Using the example of stretched clusters, some of the VMware Cloud providers do not offer this feature. To meet the same SLA without stretched clusters, consider alternative technologies. In this example, workloads deployed in 2 regions with a load balancer may be a suitable replacement to achieve the same SLA.

The following table provides information about the different factors you must consider during this phase of deployment:

Design Consideration

Design Justification

Design Implication

Consider mapping your technology mappings into 3 categories.

  • Lift and Shift
  • Modernize
  • Drop and Shop

Infrastructure technologies are not always consistent between on-premises and cloud providers. Planning your source and destination technology mappings will ensure your applications SLAs are met and cloud migration are completed on schedule.

Without a technology mapping, workloads may be migrated to the cloud without coverage for critical services. This could cause service disruptions or delay migration waves while replacement technologies can be identified, tested and deployed.

Considerations for Creating a Migration Wave Plan

Definition and Importance of Migration Wave Planning

Wave planning is the process of assigning workloads to migration groups or “waves”, that will be migrated as a single projected event. Wave Planning ensures a staggered migration approach, where you migrate workloads and applications in multiple, sequential waves. The alternative would be to migrate all workloads at once, which has severe drawbacks:

  • Migrating all your workloads at once would impose you to concentrated risk. If anything goes wrong in the migration process or if there is even a failure of your infrastructure, then all workloads would be impacted at once.
  • Network bandwidth constraints often require spreading the migration workloads across multiple waves. If all workloads were migrated at once, then the available network bandwidth per workload to be migrated would be reduced. This would lead to overall prolonged migration times and a more error-prone migration process.
  • Organizational constraints often require a staggered migration approach. In large organizations with multiple business units and teams, each has their own schedules, business constraints and goals. This makes it nearly impossible to get all stakeholders to agree to a single migration window.
  • For many organizations a cloud migration may still be a substantial change introduced to their applications and IT processes. These organizations may want to leverage a “proof-of-concept”-like migration approach, beginning with less critical applications and workloads first. Typically, this would include test or development environments, and then gradually moving to the more business critical applications. An “all at once” migration approach would contradict to these principles.

Ways of Grouping Workloads into Migration Waves

Having clarified the importance or necessity of migration wave planning the question arises, how to group workloads into migration waves. Below is a list of guiding principles, to help find suitable migration wave assignments:

  • Plan migration waves around applications whenever possible. Attempt to migrate all workloads of a given application in as few waves as possible. This helps minimize intra-application traffic across WAN networks or between source environment and the target cloud platform.
  • There are different network tools on the marketplace, which can help analyze application dependencies.
  • Plan to migrate interdependent applications in close sequence. This may help to reduce unnecessary north-south traffic.
  • Plan to isolate larger/complex workloads to dedicated waves. Larger/complex workloads are typically databases or those VMs that have a high rate of change (lots of writes). These types of workloads tend to generate excessive amounts of delta data replication and will negatively impact other migrations.
  • Choose the most appropriate migration option that fits the need of the individual workload. If a workload can afford to be powered off, then use a cold migration. Most workloads are suited well to bulk migration, as it is rare that a workload cannot tolerate the small amount of cutover time imposed by a bulk migration. Additionally, bulk migration provides the opportunity to perform certain updates such as VM machine version and VMtools upgrades.
  • Manage traffic flows for migrations carefully. Be conscious of the impact that a given migration wave will have on overall network utilization. Pay attention to potentially sub-optimal network flows introduced by layer-2 network extensions between your on-premises environment and the target cloud.
  • Take organizational constraints and meta-data into account. Migration waves may not only be dictated by technical aspects. It is a good idea to have organizational meta-data assigned to your workloads such as application ownership or business unit assignment.
  • Identify potential other criteria for creating migration waves. For example, if you plan to deprecate networks in your source environment, it may be useful to group migration waves based on IP subnets. Other criteria may involve storage characteristics. For example, you may choose to migrate workloads with low storage capacity first. Ultimately finding an appropriate migration wave grouping will depend on your concrete business scenario or goals.
  • Scheduling migration waves helps to ensure you get the approvals of application owners and other required stakeholders in a timely fashion. Always provide maximum transparency to these stakeholders throughout the entire migration wave planning process.

Estimating Migration Times

There are many variables to consider when estimating total time required to perform a migration. Some factors which may affect migration times include:

  • WAN capacity (in terms of speed/throughput) between the source site and the target site.
  • Network throughput in the source site.
  • Storage IOPS at the source and destination sites.
  • Load (CPU/memory/storage) of individual hosts at the source site.
  • Rate of change for workloads which are being migrated (i.e., how much delta data sync must regularly take place in the case of hot migrations).
  • Load at target site during the cutover phase of a migration (large migration waves will cause more resource contention).
  • Time required to perform user validation on migrated workloads.

In order to make reasonable estimates, it is important to understand which variables will represent the largest constraint. For example, if the migration is driven by a data center evacuation where the source site is running on older hardware, then storage or host constraints may easily become the limiting factor.

In addition to assessing and knowing the “theoretical” variables, which will influence migration times, it is strongly recommended to also pursue a “pragmatical” proof-of-concept-like approach, where you assess migration times with actual test case scenarios. A migration proof-of-concept will provide feedback on the accuracy of your migration variables’ assessment and their influence on migration times. It also enables you to detect additional variables specific to your environment that you may not have originally considered.

Understanding the complete end-to-end picture of the migration is vital when creating a time estimate.


Filter Tags

App Modernization Cloud Migration Cloud Well-Architected Framework HCX VMware Cloud Document Deployment Considerations Intermediate Migrate