VMware Cloud Well-Architected Framework- Secure Pillar – Endpoint and Workload Security for Azure VMware Solution

Endpoint & Workload Security

VMware Cloud provides numerous ways in which workloads can be made resilient to security and other types of incidents.

NSX Gateways

Workload network segments that are defined in the SDDC are protected by the NSX Gateways , The NSX Gateway offer additional capabilities, for instance, groups can include Private Cloud objects such as virtual machines, networks, and other object types. Groups can include dynamic criteria such as virtual machine names and tags, which will be automatically translated to the detected IP address for the virtual machines associated with those criteria or objects. NSX firewall rules can also be applied to particular uplinks or uplink paths, and are not necessarily global.

Ideas to consider:

Ensure all rules allowing inbound access are restricted to the most specific set of source IP addresses and services required, and avoid using “ANY” as the source or destination IP or services.
Provide more general outbound rules at the perimeter, but enforce specific outbound rules at the DFW.
Use groups with dynamic membership and/or tag-based membership to simplify management.
Include top-line rules to drop any traffic that should NEVER be allowed (for example traffic from a public IP source).
Logging should be enabled on rules necessary to track access, or attempted access. By default logging is not enabled.
Always limit the Applied To field to the specific uplink the traffic is expected on, and avoid using “All Uplinks”
Block traffic closest to the source (e.g. outbound traffic with the DFW, on-premises outbound traffic at the on-premises FW)
When using NAT, ensure that only the ports required for the NAT are included in the NAT rule, and ensure the NAT matching criteria is set correctly for the use case. If the NAT matching criteria is set to private IP, then it will not be possible to differentiate between traffic that has been NATted and traffic that originated internally.
If a NAT rule is configured for ANY services, then that NAT rule will be also be used for outbound (SNAT) traffic for Internet traffic from the private IP specified.
NAT will not match unless the traffic is routed out the SDDC’s native internet (e.g. NAT cannot be used when a default route is advertised from one of the uplinks to the SDDC).
Traffic between Management VMs and customer VMs in the same SDDC do not require Compute Gateway rules, but still must be allowed by the Management GW firewall.

Microsegmentation

The NSX Distributed Firewall is included with every Azure VMware Solution. This firewall provides microsegmentation capabilities by inspecting and controlling traffic at the VM network interface. Unlike a traditional firewall, this allows control of network traffic between workloads on the same network segment, as well as from other sources.

The Distributed Firewall can be configured using a variety of rule types, from traditional rules to dynamic groups that allow policies to be applied based on tags, VM names, or other workload properties. Rules can also be applied to specific objects allowing for scoped policies, including default rules that only apply to specific VMs. Limiting traffic between VMs makes it much harder for attackers to move laterally, and the flexibility of rule definitions mean that rules can be very specific but also easily updated when environments change.

Ideas to consider:

The Distributed Firewall is IP-based, so dynamic objects are translated into IP addresses, using the IP addresses detected by VMware Tools or through traffic snooping. Dynamic membership cannot be “Applied To” IP addresses.
Much as with traditional firewalls, the complexity and scope of rules impacts performance as each packet is evaluated against each rule, though the Distributed Firewall can take advantage of additional CPU as clusters scale out. It is recommended that rules be limited in scope, such as to a particular network segment, and global rules be considered carefully before implementing.
Where possible use groups to define rules, so that changes are easier and updates less susceptible to human error. Create nested groups to aggregate similar rules. For example, rather than having one group for all cloud administrators, consider creating a group for each cloud administrator, then aggregating that into a “supergroup.” If an administrator leaves your organization it is easier to find and remove their access group.
Consider defining and documenting a naming strategy for groups so that similar items are grouped together, and data is sortable and easily filtered.
Consider defining and documenting a firewalling strategy, such as default allow, default deny, or a documented combination using “Applied To.” Do the same for rules, such as per-application or per-service, so that there is consistency.
Enable and implement Distributed Firewalling, so that workloads must have specific distributed firewall rules at all times. Limit generic rules implemented at the gateways.

Network Egress Controls

The public IP address network service allows you to connect from the internet to a workload virtual machine (VM), a management appliance, or a load balancer running in your private cloud. For example, if you run a web server on your workload VM, you can serve web traffic using a public IP address through the internet. By default, the public IP network service is disabled.

Allocating a public IP address to a resource also provides the following benefits:

Distributed denial of service (DDoS) attack prevention. This protection is automatically enabled for the public IP address.
Always-on traffic monitoring and real-time mitigation of common network-level attacks.
Protection and mitigation of attacks across the entire scale of the global network. The network can be used to distribute and mitigate attack traffic across regions.

Ideas to consider:

Allow only required access to the Internet in the Compute Gateway Firewall by limiting the source IPs/VMs, services, and destination IPs as much as possible.
Restrict outbound traffic using DFW to meet the standard approach of blocking traffic closest to the source.
Apply IDS/IPS to outbound connections to look for known Command & Control access and common attack signatures.
Consider using a proxy server for filtering & inspecting web traffic if virtual desktops are being hosted in the SDDC. Using this model, Internet access would be blocked for endpoints, and only the proxy would be permitted to access the Internet, providing additional URL filtering based on real-time updated lists and/or other identifying capabilities such as geo-location, site categorization, etc.
NSX L7 firewall from the Advanced Security add-on can be used to ensure SSL/TLS connections are using encryption methods that meet minimum standards to avoid known attack vectors.
Restrict DNS traffic destined for Internet-based DNS servers and require all workloads to use internal DNS servers that are managed and patched. Log all queries and block or check for requests of known C&C or malicious domains using lists that are updated frequently.

Console Access

Many organizations adopt a mindset for vCenter Server & Azure portal access that is taken directly from traditional data center practices. Most organizations do not allow everyone in the organization to stroll into the data center whenever they desire. Only staff that have a business requirement to be in the data center can enter. The same approach, adapted to vCenter Server and the Azure portal, works well for deciding who needs access. Practically speaking, workloads do not need to be managed from the console of the virtual machine on a day-to-day basis, and those that do can connect to the console of Microsoft Windows with the “/console” switch for the Remote Desktop client.

Administrator access to the workloads should be through the workload virtual machine’s own network interface, via SSH or RDP. That makes their management traffic and access subject to network intrusion detection and other monitoring systems. It also simplifies the access control for both the workload and vSphere by avoiding the need to co-mingle access requirements. This makes auditing access, monitoring access & network traffic, and ongoing management much easier.

Virtual machine remote console access is proxied through vCenter Server for local and public VMware Cloud Infrastructure. Direct access to ESXi is not required.

Intrusion Detection & Prevention

The NSX Advanced Firewall Add On provides a distributed IDS/IPS, L7 FW and DNS filtering that enhance the capabilities of the existing distributed firewall, providing a distributed, scalable security solution that is fully integrated with the Private Cloud and vRealize Log Insight Cloud for monitoring, and can help address many of the considerations in this document.

Ideas to consider:

Ensure that “Auto Update new versions (recommended)” is enabled in the Distributed IDS/IPS settings.
Automatic signature updates require outbound/egress network access for the NSX Manager to the update servers. If you have created egress rules for your SDDC ensure that NSX Manager can continue to retrieve signature updates.
By default, the Distributed IDS/IPS does not have a defined rule for detection. You will need to create and enable a rule. Be mindful of the tradeoffs between detection scope and system performance. For example, you may find it helpful to create an IP address group called “Internet” that is defined as 0.0.0.0/0 (IPv4) and/or ::/0 (IPv6), that represents all possible IP addresses. You can use this group as both the source and destination for the IDS/IPS rules, to apply detection logic to all traffic. However, inspecting more traffic requires more performance. Monitor performance and reduce the scope of your detection rules as needed.
Remember that if an attacker has breached a workload the attacks might come from inside your SDDC as the attacker tries to move laterally. Microsegmentation allows for very granular rules and easily updated rule sets. Use very specific rules where possible and leverage groups to allow easy updates when needed.
The default Intrusion Detection Service Profile, “DefaultIDSProfile,” does not include all rules, as more rules is a tradeoff with performance. Consider adding a new profile that is customized to the needs of your organization.
In order for traffic to be blocked/dropped by the IPS engine, the signature in the profile has to be set with an action of Drop or Reject AND the rule mode has to be set to Detect & Prevent. This enables precise controls about how different signatures are applied on different traffic types.

VMware Tools

VMware Tools are an important component for virtual machines, supplying drivers for paravirtual devices like the vmxnet3 network interface and the pvscsi virtual SCSI controller, as well as a communications channel between ESXi and the guest operating system. That communications channel is important, as it can ensure that guest operating systems and workload applications shut down gracefully when needed. It will also help the infrastructure detect when virtual machines have booted correctly, as part of vSphere HA actions should a cloud host fail.

VMware Tools is a software package that, like all other software packages, requires updates and maintenance. Include it in your configuration management systems or enable “Check and upgrade VMware Tools before each power on” in the VM settings to have them automatically updated on Microsoft Windows guests.

The Windows drivers for virtual machine hardware have been added to the Windows Update repositories, so that Microsoft Windows operating systems will automatically download and install the latest versions if automatic driver updates is enabled. If your organization manages Microsoft Windows updates using Windows Server Update Services (WSUS) ensure that these driver updates are configured as part of what is presented to client systems.

Linux vmxnet3 and pvscsi drivers are incorporated into the upstream Linux kernel sources. The other components that manage the hypervisor-to-guest communications are part of the open-vm-tools package supplied by VMware to Linux distribution maintainers. The open-vm-tools package is updated when you patch and update your Linux guest operating system.

VMware Tools also allows an SDDC to automatically detect the IP address of a workload, for use in dynamic firewall rules.

In-Guest Controls

Security controls inside workloads are the responsibility of the customer in the Shared Responsibility Model. As discussed earlier in this document, we often suggest that organizations explore using configuration management tools like SaltStack to apply and audit configuration settings on workloads. This has benefits of saving time and ensuring security consistency, but also simplifying template management.

Ideas to Consider:

Monitoring and configuration management systems are two examples of systems that have privileged access to an organization’s workloads. High-profile attacks have demonstrated that these types of systems are targets for attackers to breach, allowing them to move laterally throughout the organization with ease. Are there sufficient controls protecting other guests from monitoring breaches? Does your organization use change control and source code control techniques to manage and test changes to configurations? How will you know if an attacker has gained access to that system?
Workloads deployed in a Kubernetes environment require some additional considerations to prevent containers from gaining too many permissions on the node’s operating system. Without protections in place, containers may be able to access host resources such as processes, volumes, or network access. To account for this, consider using Kubernetes admission controllers to prevent unwanted access to the host operating system.

In-Guest Data-at-Rest Protections

VMware Cloud uses vSAN Data-at-Rest encryption to store data in a public cloud provider’s storage. It is possible to use in-guest encryption technologies like Microsoft BitLocker and Linux dm-crypt to protect workloads. This has performance impacts, given the double encryption (BitLocker plus vSAN Encryption), and defeats space efficiency processes like deduplication and compression (your virtual machine will consume its entire allocated disk space). In general, VMware does not suggest using in-guest encryption, but for some very sensitive workloads like Microsoft Active Directory it may be a suitable additional layer of defense. Use it sparingly due to the performance impacts and management overhead.

Third-party integrations, such as Amazon FSx for NetApp ONTAP, may have different encryption and performance considerations depending on the features and capabilities present in those solutions.

Storage Policies

As discussed in the Infrastructure Design section, virtual machines can be assigned different vSAN storage policies which have an impact on performance and storage usage. Reviewing these policies and ensuring they match your organization’s risk tolerance and use of Azure VMware Solution Private Cloud is important. These policies can be customized on a per-VMDK basis, but in general simpler is better. If complex policy setups are needed it is suggested that they be automated, for auditing and reconfiguration purposes.

Multicast

L3 Multicast is not supported (e.g. PIM, IGMP snooping). However, and L2 multicast traffic is treated as a broadcast and sent to all ports on the network segment. This enables applications that use multicast to communicate in the same network segment, but does not support the optimization of having the network send traffic only subscribed devices.

Workload Resilience

VMware Cloud offers many of the same resilience features found in local cloud versions of vSphere and Cloud Foundation. This includes snapshots, clones, replication, as well as vSphere High Availability, vMotion, and the Distributed Resource Scheduler (DRS).

Ideas to consider:

Ensure that workload applications start automatically when the virtual machine boots. This helps immensely for regular and automated patching, but also as part of incident response. For example, if a cloud host fails vSphere HA will restart the workloads on other cluster hosts. If the workloads start automatically the need for off-hours administration work is reduced, pushing it to normal working hours.
Ensure that workloads spanning multiple virtual machines or containers are resilient to restarts on components, either because of patching or from a vSphere HA automated restart. Applications should employ techniques to retry connections periodically. Use of the NSX Advanced Load Balancer can help make internal application subcomponents more reliable, as well as detect application health and present customized outage pages to customers.
Use DRS affinity and anti-affinity rules to separate clustered components from each other, reducing the impact of a host failure.

Filter Tags

Cloud Well-Architected Framework Security Azure VMware Solution Document Design