Endpoint and Workload Security
VMware Cloud provides numerous ways in which workloads can be made resilient to security and other types of incidents.
Consider the following ideas:
- Ensure all rules allowing inbound access are restricted to the most specific set of source IP addresses and services required and avoid using “ANY” as the source or destination IP or services.
- Provide more general outbound rules at the perimeter but enforce specific outbound rules at the DFW.
- Use groups with dynamic membership and/or tag-based membership to simplify management.
- Include top-line rules to drop any traffic that should NEVER be allowed (for example traffic from a public IP source).
- Logging should be enabled on rules necessary to track access, or attempted access. By default, logging is not enabled.
- Always limit the Applied To field to the specific uplink the traffic is expected on, and avoid using “All Uplinks”
- Block traffic closest to the source (e.g. outbound traffic with the DFW, on-premises outbound traffic at the on-premises FW)
- When using NAT, ensure that only the ports required for the NAT are included in the NAT rule, and ensure the NAT matching criteria is set correctly for the use case. If the NAT matching criteria is set to private IP, then it will not be possible to differentiate between traffic that has been NATted and traffic that originated internally.
- If a NAT rule is configured for ANY services, then that NAT rule will be also be used for outbound (SNAT) traffic for Internet traffic from the private IP specified.
- NAT will not match unless the traffic is routed out the SDDC’s native internet (e.g. NAT cannot be used when a default route is advertised from one of the uplinks to the SDDC).
- Traffic between Management VMs and customer VMs in the same SDDC do not require Compute Gateway rules, but still must be allowed by the Management GW firewall.
The NSX Distributed Firewall is included with every VMware Cloud on AWS Software Defined Datacenter (SDDC). This firewall provides microsegmentation capabilities by inspecting and controlling traffic at the VM network interface. Unlike a traditional firewall, this allows control of network traffic between workloads on the same network segment, as well as from other sources.
The Distributed Firewall can be configured using a variety of rule types, from traditional rules to dynamic groups that allow policies to be applied based on tags, VM names, or other workload properties. Rules can also be applied to specific objects allowing for scoped policies, including default rules that only apply to specific VMs. Limiting traffic between VMs makes it much harder for attackers to move laterally, and the flexibility of rule definitions means that rules can be very specific but also easily updated when environments change.
Ideas to consider:
- The Distributed Firewall is IP-based, so dynamic objects are translated into IP addresses, using the IP addresses detected by VMware Tools or through traffic snooping. Dynamic membership cannot be “Applied To” IP addresses.
- Much as with traditional firewalls, the complexity and scope of rules impact performance as each packet is evaluated against each rule, though the Distributed Firewall can take advantage of additional CPU as clusters scale out. It is recommended that rules be limited in scope, such as to a particular network segment, and global rules be considered carefully before implementing.
- Where possible use groups to define rules so that changes are easier and updates less susceptible to human error. Create nested groups to aggregate similar rules. For example, rather than having one group for all cloud administrators, consider creating a group for each cloud administrator, then aggregating that into a “supergroup.” If an administrator leaves your organization it is easier to find and remove their access group.
- Consider defining and documenting a naming strategy for groups so that similar items are grouped together, and data is sortable and easily filtered.
- Consider defining and documenting a firewalling strategy, such as default allow, default deny, or a documented combination using “Applied To.” Do the same for rules, such as per-application or per-service, so that there is consistency.
- Enable and implement Distributed Firewalling, so that workloads must have specific distributed firewall rules at all times. Limit generic rules implemented at the gateways.
- Restrict outbound traffic using DFW to meet the standard approach of blocking traffic closest to the source.
- Apply IDS/IPS to outbound connections to look for known Command & Control access and common attack signatures.
- Consider using a proxy server for filtering & inspecting web traffic if virtual desktops are being hosted in the SDDC. Using this model, Internet access would be blocked for endpoints, and only the proxy would be permitted to access the Internet, providing additional URL filtering based on real-time updated lists and/or other identifying capabilities such as geo-location, site categorization, etc.
- NSX L7 firewall from the Advanced Security add-on can be used to ensure SSL/TLS connections are using encryption methods that meet minimum standards to avoid known attack vectors.
Restrict DNS traffic destined for Internet-based DNS servers and require all workloads to use internal DNS servers that are managed and patched. Log all queries and block or check for requests of known C&C or malicious domains using lists that are updated frequently.
VMware Tools are an important component for virtual machines, supplying drivers for paravirtual devices like the vmxnet3 network interface and the pvscsi virtual SCSI controller, as well as a communications channel between ESXi and the guest operating system. That communications channel is important, as it can ensure that guest operating systems and workload applications shut down gracefully when needed. It will also help the infrastructure detect when virtual machines have booted correctly, as part of vSphere HA actions should a cloud host fail.
VMware Tools is a software package that, like all other software packages, requires updates and maintenance. Include it in your configuration management systems or enable “Check and upgrade VMware Tools before each power on” in the VM settings to have them automatically updated on Microsoft Windows guests.
The Windows drivers for virtual machine hardware have been added to the Windows Update repositories, so that Microsoft Windows operating systems will automatically download and install the latest versions if automatic driver updates is enabled. If your organization manages Microsoft Windows updates using Windows Server Update Services (WSUS) ensure that these driver updates are configured as part of what is presented to client systems.
Linux vmxnet3 and pvscsi drivers are incorporated into the upstream Linux kernel sources. The other components that manage the hypervisor-to-guest communications are part of the open-vm-tools package supplied by VMware to Linux distribution maintainers. The open-vm-tools package is updated when you patch and update your Linux guest operating system.
VMware Tools also allows an SDDC to automatically detect the IP address of a workload, for use in dynamic firewall rules.
Security controls inside workloads are the responsibility of the customer in the Shared Responsibility Model. As discussed earlier in this document, we often suggest that organizations explore using configuration management tools like SaltStack to apply and audit configuration settings on workloads. This has benefits of saving time and ensuring security consistency, but also simplifying template management.
Ideas to Consider:
Monitoring and configuration management systems are two examples of systems that have privileged access to an organization’s workloads. High-profile attacks have demonstrated that these types of systems are targets for attackers to breach, allowing them to move laterally throughout the organization with ease. Are there sufficient controls protecting other guests from monitoring breaches? Does your organization use change control and source code control techniques to manage and test changes to configurations? How will you know if an attacker has gained access to that system?
Workloads deployed in a Kubernetes environment require some additional considerations to prevent containers from gaining too many permissions on the node’s operating system. Without protections in place, containers may be able to access host resources such as processes, volumes, or network access. To account for this, consider using Kubernetes admission controllers to prevent unwanted access to the host operating system.
In-Guest Data-at-Rest Protections
VMware Cloud uses vSAN Data-at-Rest encryption to store data in a public cloud provider’s storage. It is possible to use in-guest encryption technologies like Microsoft BitLocker and Linux dm-crypt to protect workloads. This has performance impacts, given the double encryption (BitLocker plus vSAN Encryption), and defeats space efficiency processes like deduplication and compression (your virtual machine will consume its entire allocated disk space). In general, VMware does not suggest using in-guest encryption, but for some very sensitive workloads like Microsoft Active Directory it may be a suitable additional layer of defense. Use it sparingly due to the performance impacts and management overhead.
Third-party integrations, such as Amazon FSx for NetApp ONTAP, may have different encryption and performance considerations depending on the features and capabilities present in those solutions.
Workloads can benefit from the use of virtual Trusted Platform Module (vTPM) available in VMware Cloud on AWS 1.19 and newer. The addition of a vTPM presents a TPM 2.0-compliant device to the guest OS, for use by the guest OS and workloads as they see fit, just as the workload would use a physical TPM when running on physical hardware.
Virtual TPMs use VM Encryption to protect data on disk, encrypting just the VM “home” files, but not the entire VM. VM Encryption is enabled with Native Key Provider, a feature within VMware Cloud that manages encryption keys without requiring an external key management system (KMS).
As discussed in the Infrastructure Design section, virtual machines can be assigned different vSAN storage policies which have an impact on performance and storage usage. Reviewing these policies and ensuring they match your organization’s risk tolerance and use of VMware Cloud on AWS SDDCs is important. These policies can be customized on a per-VMDK basis, but in general simpler is better. If complex policy setups are needed it is suggested that they be automated, for auditing and reconfiguration purposes.
L3 Multicast is not supported (e.g. PIM, IGMP snooping). However, and L2 multicast traffic is treated as a broadcast and sent to all ports on the network segment. This enables applications that use multicast to communicate in the same network segment but does not support the optimization of having the network send traffic only subscribed devices.
VMware Cloud offers many of the same resilience features found in local cloud versions of vSphere and Cloud Foundation. This includes snapshots, clones, replication, as well as vSphere High Availability, vMotion, and the Distributed Resource Scheduler (DRS).
Ideas to consider:
- Ensure that workload applications start automatically when the virtual machine boots. This helps immensely for regular and automated patching, but also as part of incident response. For example, if a cloud host fails vSphere HA will restart the workloads on other cluster hosts. If the workloads automatically start the need for off-hours administration work is reduced, pushing it to normal working hours.
- Ensure that workloads spanning multiple virtual machines or containers are resilient to restarts on components, either because of patching or from a vSphere HA automated restart. Applications should employ techniques to retry connections periodically. Use of the NSX Advanced Load Balancer can help make internal application subcomponents more reliable, as well as detect application health and present customized outage pages to customers.
- Use DRS affinity and anti-affinity rules to separate clustered components from each other, reducing the impact of a host failure.