VMware Cloud on AWS: vSAN Architecture
Overview
The SDDC utilizes vSAN as its mechanism for providing distributed storage within each Cluster of the SDDC. While the underlying storage for a vSAN Datastore depends on the instance type of the hosts within the Cluster, the overall behavior of vSAN is consistent throughout the SDDC. The details of the vSAN architecture are detailed below.
Storage Types
Native NVMe-Based Storage
This type of storage is housed directly within the hosts of the SDDC and has a fixed size depending on the host instance type. For each host that is added to a Cluster, the overall storage pool will increase. NVMe storage is the most performant of the available storage types.
NFS
NFS storage is available today from three different options.
VMware Cloud Flex Storage
VMware Cloud Flex Storage offers a scalable, elastic, and natively integrated storage and data management service that is fully managed by VMware
Amazon FSx for NetApp ONTAP and VMware Cloud on AWS
Amazon FSx for NetApp ONTAP and VMware Cloud on AWS provide a robust and simple way to map production workloads onto the appropriate storage to meet the availability and performance requirements of customers’ workloads in a cost-optimized manner.
3rd party managed service providers
The vCenter Base Cluster
Within the base Cluster of the SDDC, vSAN has been modified to present two logical Datastores from the same underlying physical storage: one for management appliances and the other for end-user workloads. It is important to point out that this separation exists purely as a means of enforcing permissions on the storage for management appliances.
Key Takeaways
- The logical Datastores reflect the same underlying pool of capacity. Do not mistake them for independent sets of storage.
- The free space, used space, and total capacity numbers will be identical between the two logical Datastores.
- You must consider management appliance storage footprints within the base cluster when sizing an SDDC.
vSAN Slack Space Requirements
vSAN requires that a certain percentage of raw storage within a Datastore be reserved as “slack” space. This reserved space is used for operations such as deduplication, object re-balancing, and for recovering from hardware outages within the underlying pool of storage capacity. The current official recommendation is to maintain a 30% buffer for slack space.
As part of its service level agreement with customers, VMware will ensure the health of the SDDC by enforcing this slack space requirement using storage-based EDRS scale-up whenever necessary. A notification will be sent whenever the 30% threshold is exceeded and EDRS will automatically scale-up the SDDC if available slack space drops to 25% or less.
Deduplication and/or Compression
vSAN Compression is in use by default on i3en powered vSAN Datastores. Introduced in VMConAWS 1.12, this option provides exceptional storage performance while maintaining capacity efficiency. For more Information see this blog.
Deduplication and Compression is designed to optimize storage by reducing the amount of space required to store data within the Datastore. These features are enabled i3 vSAN Datastores by default and cannot be disabled.
Storage Capacity Reclaim using TRIM/UNMAP
The TRIM/UNMAP commands for the respective ATA and SCSI protocols provide a means for guest operating systems to reclaim unused storage space as free space. You can have it enabled in your M12 and newer SDDC by requesting enablement of the TRIM/UNMAP feature flag in chat support.
For more information about this see this feature brief.
Encryption at Rest and in Transit
Encryption at rest
vSAN implements encryption at rest using the AWS KMS service. Although much of the functionality behind vSAN encryption is automated, it is worth understanding the key components and which of them may be customized.
The components of encryption are as follows:
- Customer Master Keys (CMK) - This is the master key which is used to encrypt all other keys used by vSAN. It is controlled and managed by AWS and not may not be updated. One CMK is required per Cluster.
- Key Encryption Key (KEK) - This key is used by vSAN to encrypt DEKs and may be updated through the vCenter UI (referred to as a shallow rekey). One KEK is required per Cluster.
- Disk Encryption Key (DEK) - This key is used to encrypt disk data and may not be updated. One DEK is required per disk in vSAN.
Encryption in transit
With i3EN instances, we added in-transit hardware level encryption between instances within the SDDC boundaries
Storage Policy Based Management
Storage Policy Based Management (SPBM) is a declarative system for the management of storage within the SDDC. SPBM allows for the definition of policy that defines specific data protection and performance criteria and permits it to be applied granularly to objects within vCenter. SPBM allows multiple policies to coexist and permits them to be applied at both the VM and VMDK level. Once defined, these policies may be extended across clusters. The Policy configuration and assignment of any and all management appliances is controlled by VMware and may not be modified. The default poilicy configuration for any workloadDatastores is also maintained by the service but customers may create and assign custom policies if they wish to override this default behavior.
The first consideration when designing policy is availability, which defines the level of redundancy desired for objects within the datastore. The second consideration for policy design is with performance, which defines both the processing overhead required to implement the policy and the overall end-user impact in terms of IOPS for a given workload. Both of these considerations are discussed below.
Availability
Availability is broken down into 2 levels: ability to survive a site-level disaster and ability to survive failures internal to a given site. Within the context of VMware Cloud on AWS, a site is falls within the boundary of an AWS Availability Zone (AZ). The ability of a VM or VMDK to survive an AZ-level outage applies only to stretched cluster designs. Specifically, the “dual site mirror” option for stretched cluster will cause data to be replicated across AZs and will enable an object to survive a complete failure of either AZ. See the document on stretched cluster SDDCs for more information.
Within an AZ, there are two settings to consider for data resiliency.
- Failures to Tolerate (FTT) - This defines the number of host or disk failures to tolerate. In other words, it defines the number of devices that can fail before data loss occurs.
- Fault Tolerance Method (FTM) - This defines the type of data replication used: mirroring (RAID1) or erasure coding (RAID5/RAID6)
The options available for FTT/FTM will be determined by the minimum number of hosts within a cluster. The options chosen will impact the total amount of usable capacity of the datastore.
As seen above, there are six FTT/FTM options available to policies. Each option protects against a certain number of host failures, but also requires a minimum number of hosts to implement. Additionally, each option will incur a certain amount of storage overhead. This storage overhead comes in the form of additional data copies and witnesses (RAID1), or parity data (RAID5/RAID6).
Note that the figure above is not representative of actual data balancing within the cluster. In reality, vSAN will attempt to balance objects evenly across the cluster.
Performance
Data that is stored using an erasure coding FTM will consist of the actual data itself, which is broken into segments, along with parity copies. While erasure coding provides better efficiency in terms of storage overhead, this practice of segmenting data with added parity comes at a cost to performance. This performance penalty is especially evident during failure scenarios, when data must be calculated from parity.
In general, when performance is a concern, the order from most to least performant is: RAID1, RAID5, RAID6.
Default Policy
vSAN requires a default policy to exist, and this policy is applied to VMs that do not have an explicit policy set. A default policy for the SDDC is created automatically by VMware, at the time of provisioning, and is based on the number of hosts within the vCenter base Cluster.