Managing Administrative Access
Protecting the management interfaces of infrastructure is critical, as virtual and cloud administrators have enormous power over workloads and data. Core information security practices such as least privilege, separation of duties, and defense-in-depth are important to deny attackers access to environments.
Cloud Console Account Management
The VMware Cloud Console is the central management portal for VMware Cloud Services, and provides the ability to deploy, manage, and deprovision SDDCs, subscriptions, network connectivity, and other services like NSX Advanced Firewalling, vRealize products, and Tanzu Mission Control. By default, the organization’s owner’s Customer Connect account is granted access as part of the onboarding process.
Customer Connect accounts are managed by VMware and support multi-factor authentication through the use of a time-based one-time password (TOTP) application, such as Google Authenticator. An organization can also configure Enterprise Federation, allowing a SAML 2.0 Identity Provider (IdP) or a connection method supported by VMware Workspace ONE Access to handle authentication and authorization in the Cloud Console. This allows an organization to control access through existing account management processes. Additionally, any multi-factor authentication solution supported by the IdP can be used seamlessly.
API tokens can be generated by Cloud Console users, giving the token an equivalent level of access to their own user account. Organization-level applications can be defined by organization owners without connecting them to a user account.
Ideas to consider:
- Use Enterprise Federation to support Single Sign-on through an enterprise IdP.
- Require multi-factor authentication for all accounts with access to the VMware Cloud organizations. Carefully consider the use of source IP address restrictions in context of incident response and access. Consider using a “break glass” native VMware Customer Connect account with multi-factor authentication enabled in case of a loss of connectivity to the configured IdP, or a loss of access to the network that the access is restricted to.
- Consider using dedicated administrative accounts that are different from what the cloud infrastructure administrators use on their desktops. This helps prevent immediate lateral movement by attackers when an administrator’s workstation has been compromised.
- Configure the allowed domains for Cloud Console accounts to prevent the addition of external users, either accidentally or maliciously, to the Cloud organization.
- Define policies for API token management that include token lifetime and key storage requirements. Regularly enable, review, and revoke OAuth Apps violations reports for tokens that do not meet the defined policies.
- Use organization-level application IDs for services connecting via API, to avoid sharing accounts and help enforce least privilege.
Role-Based Access Control (RBAC)
VMware Cloud Infrastructure products, from VMware Cloud down to the core vSphere, contain a robust set of permissions that can be configured as part of roles that users are assigned to. These permissions allow granular access to capabilities inside the VMware Cloud SDDC. The VMware Cloud Console also allows users to be assigned roles and permissions to manage their organization’s assets.
Ideas to consider:
- Define groups for each role and grant access based on those groups.
- Follow a least-privilege model when assigning permissions to roles. Only assign the minimum permissions necessary for that user or system to do its job.
Virtual Private Network (VPN)
VMware Cloud VPN functions provide an encrypted end-to-end path over untrusted networks using IPsec. It can be used for connections across the open Internet, but also across a Direct Connect. Security is always a tradeoff, and IPsec VPNs trade security for performance, limited by available CPU and network capacity inside the SDDC.
IPsec VPNs rely on Path MTU Discovery, which in turn may require relevant ICMP protocol messages (IPv4 type 3, IPv6 type 2) to be permitted. This is a general best practice for networks, as blocking all ICMP messages to disable ICMP echo (“ping”) causes the collateral loss of other important network messages like Fragmentation Needed, Time Exceeded, and more. Path MTU Discovery is important for automatic network optimization of most modern operating systems. Workarounds such as MSS Clamping add complexity and rigidity to an environment and may not be the best solution.
Deploying a VPN to connect to an SDDC involves other decisions about network topology and will depend on the network capabilities and topologies of the SDDC and other sites. Route-based VPNs use the BGP routing protocol to exchange information about networks between sites. This adds both complexity and flexibility, and the design of these networks is beyond the scope of this document. With simpler IP addressing schemes and network deployments the Policy-Based VPN options are possible. Layer 2 VPN connectivity allows for migrations into the cloud without re-addressing a workload, by extending an on-premises network, but requires the NSX Autonomous Edge appliance to be deployed in the local cloud.
VPNs between sites with dynamic addresses may require additional design considerations or operational process work. If the dynamic address changes then the VPN connection will not be functional until the SDDC is updated for the remote site’s new public IP address.
Ideas to consider:
- Use IKEv2 with a GCM-based cipher with as high a bitrate as can support the required performance levels.
- Use Diffie Hellman Elliptical Curve groups (19, 20 or 21), with the highest group number of those that can support the required performance (generally based on the total number of tunnels).
- Enable Perfect Forward Secrecy where supported on both sides of the VPN connection. Enabling it on one side only may initially work but will disconnect after a preset amount of time.
- Use a long, randomly generated pre-shared key, or if available, certificate-based authentication.
- If the BGP endpoint is on a different device from the IPSec VPN, or there is a possibility of access to the BGP network being used, then a BGP Secret should be configured on both endpoints to prevent route hijacking.
Private Network Links
Direct Connect is an AWS solution where a network port on AWS’s network is made available for customers to connect to. In most cases, the port will be in a Point of Presence (PoP) datacenter facility where the end customer will order an MPLS WAN connection from their preferred carrier, who will assist with cross-connecting it to the port provided by AWS. Other configurations are possible, such as a Hosted Connection (a VLAN on a shared port) a Hosted VIF (a single virtual interface on a shared connection), and in some cases customers may collocate space in the PoP and run the cross-connect directly from their own equipment. All of these options provide different features, bandwidth, and cost models. Dedicated ports provide the most capability and highest bandwidth, including the possibility of using MACSEC to provide Layer-2 encryption between the AWS router and the customer router. Note that this can provide protection for a portion of the path but will require additional MACSEC or other encryption methods to provide end-to-end protection.
Ideas to consider:
- In order to minimize latency, select an AWS point-of-presence that your WAN provider can support, and is as close as possible to the sites that will be communicating with the SDDCs.
- Deploy multiple Direct Connect circuits to different points-of-presence for redundancy, that terminate in the same AWS account so that AWS knows they are for redundancy and will provision them on independent paths. Ensure that they have fully independent paths to the enterprise network.
- If multiple regions are being used for SDDCs, and latency tolerance is acceptable, consider deploying Direct Connects to different regions, and mapping them to a DX Gateway attached to an SDDC Group to provide redundancy against wider-area events while simultaneously providing connectivity to multiple regions.
- If possible, use MACsec encryption on the Direct Connect link to prevent packet interception on the wire.
- Use BGP secrets on all BGP sessions to avoid route hijacking.
Connected Accounts and Virtual Private Clouds (VPCs)
VMware Cloud configures native connections to the public cloud provider’s networks and accounts to enable fast and secure access to public cloud services. Every SDDC in VMware Cloud on AWS is connected to a VPC in a native AWS account owned by the customer. This connection is made by running a CloudFormation template provided by VMware that creates the necessary IAM roles in the customer account. Once those roles are in place, VMware will create and update the VPC, ENI, and route tables to establish and maintain connectivity. These IAM roles are necessary for proper SDDC operation, but there are other security controls that can also help manage the connectivity between the SDDC and connected AWS account.
Ideas to consider:
- Ensure only one CloudFormation Template (CFT) is used for each linked AWS account. Only the last successfully run CFT will be tracked by the VMware Cloud organization, and that will be used for any SDDCs deployed within that Organization and linked to that AWS account. However, once deployed, the SDDC will reference the AWS IAM roles, VPC, subnet, and main route table from that point in time. It will not automatically update them if a new CFT is run in that AWS account and Organization, which can result in different IAM roles being used by different SDDCs.
- The Lambda function created by the CFT is only used for the initial template deployment. It can be deleted once the linking is successful. Do not delete the entire CFT as it will remove the IAM roles as well, which are required for the operation of SDDCs.
- SDDCs will create Elastic Network Interfaces (ENIs) in the selected VPC & subnet upon their deployment. In some cases additional ENIs will be created afterwards, such as if the SDDC’s Cluster-1 ever grows beyond 16 hosts. These ENIs will have the VPC’s default security group (SG) attached to them. This security group operates as though the entire SDDC was an EC2 instance with that security group attached. For example, Outbound rules refer to traffic originating within the SDDC and going to native AWS service, and Inbound rules refer to traffic originating within the native AWS account and going to the SDDC. By default, this security group will allow all traffic from the SDDC, but traffic going to the SDDC must be manually added. Since the Compute Gateway firewall in the SDDC provides the same protection (using the Services Interface under its Applied To field), it is a viable option to allow all traffic through the security group and enforce protection through the compute gateway firewall alone. Both firewalls can be configured for the reduced traffic set, but this can make it operationally challenging to keep them in sync and does not provide meaningful security improvements in most cases.
- It is also possible to replace the default security group with a custom security group for all 17 ENIs created by the SDDC. This may cause operational challenges in case a new ENI is ever added, as the customer will need to monitor for that scenario and apply the desired security group immediately to avoid disruptions to network traffic.
Network Perimeter Controls
VMware Cloud has multiple network boundaries and perimeters that should be secured. The primary boundary is at the SDDC itself, consisting of dedicated sets of network segments for management and workloads. These network segments are separated from the network uplinks by an NSX Edge Gateway firewall. This firewall implements two different network gateways, one for SDDC management components, another for workloads and compute.
The VMware Cloud also employs the concept of an SDDC Group, which can extend the security perimeter to include multiple SDDCs, native VPCs, and Direct Connect Gateways, across multiple regions. The SDDC Group itself does not implement firewalling directly, relying on the individual SDDC gateway firewalls, VPC network ACLs, cloud provider security groups, and on-premises devices terminating connections from Direct Connect circuits. However, it can be considered an isolated zone, and should be treated as a managed network service, like an MPLS WAN.
The gateway firewall is divided into two different policies, one which protects the management appliances in the SDDC (vCenter, NSX Manager, add-on service managers, etc.). It does not affect customer workload VMs and has a limited set of rules that only allow specific services through to each management appliance. It also allows creation of outbound rules from the management appliances, which always allow any service. The source or destination of every rule on the management gateway must be one of the management appliances. Arbitrary rule definitions are not permitted, nor are inbound rules with “any” as the source.
Access to management appliances can be via the private IP, allocated from the management network that was supplied during the SDDC provisioning process. Some appliances, such as the vCenter Server and HCX Manager, also have a public IP address automatically configured with destination NAT. These appliances register public DNS Fully-Qualified Domain Names that can be configured to resolve to either the private or the public IP address. Additionally, access to the NSX Manager can be through a reverse proxy accessible through links from the Cloud Console, allowing firewall rules to be managed “out-of-band.” This helps if an errant firewall rule denies access to the SDDC.
Ideas to consider:
- Ensure all rules allowing inbound access are restricted to the most specific set of source IP addresses and services required.
- Use private DNS resolution (and therefore access only over private connections) for the connections that offer public or private.
- Consider that the DNS resolution only changes the IP returned by DNS. It does not impact IP connectivity, and the public IP and NAT will always be in place, regardless of the DNS setting for vCenter Server, HCX Manager, and NSX Manager. Therefore, the source IPs for the firewall should still be configured to the minimal set of private IPs required, even when DNS is set to private resolution.
- Outbound traffic from the management appliances will follow the SDDC’s routing table, so if a default route is advertised, then outbound traffic will go through the connection advertising that route rather than using the SDDC’s native Internet connection, and therefore public IP. You will not be able to use the SDDC-assigned public IP when a default route is being advertised to the SDDC.
- Only groups can be referenced for the source and destination fields in firewall rules. Groups can only consist of IP addresses/CIDRs in the Management Gateway, and groups created here are separate from groups used by the compute gateway or DFW. Defining groups so that they are named clearly to represent the purpose and members is important to ensuring that the desired access is being defined by the rules.
- There is one exception to the above: traffic to/from ESXi hosts will NOT pass through the gateway firewall when a Direct Connect (DX) Private VIF (PVIF) is connected to the SDDC, or if the SDDC is a member of an SDDC group. In these specific scenarios, this traffic will always follow the DX PVIF or SDDC Group/vTGW path, regardless of the SDDC’s route table and management gateway firewall rules.
- Logging should be enabled on rules necessary to track access, or attempted access. By default, logging is not enabled.
Management Appliance Access and Authentication
A deployed SDDC will have a number of appliances that manage different aspects of the infrastructure. These appliances are managed by VMware as part of the Shared Responsibility Model, and include vCenter Server, NSX Manager, and NSX Edge appliances by default. If enabled there may also be HCX Manager, Site Recovery Manager, Tanzu Kubernetes Grid Supervisor cluster, and vSphere Replication appliances.
All appliances are joined to the SDDC’s Single Sign-on (SSO) domain, vmc.local. This SSO domain is local to the deployed SDDC and customers cannot create additional users in the SSO domain. Instead, they are provided with a single administrative account, . The cloudadmin account has restricted management permissions as part of the Shared Responsibility Model and is allowed to perform operations in support of workloads. Full administrative control of the SDDC is reserved for VMware itself.
The initial credentials for are displayed in the VMware Cloud Console. The password for this account can be changed through vCenter Server or its APIs/automation tools. Once the password is changed the Cloud Console will no longer display the correct password (it does not update from vCenter Server), so care should be taken to save it. The password is not recoverable, though VMware Cloud support can reset it via a support request.
A vCenter Server allows for the integration of an LDAP-based identity source which allows customers to use existing directories and authentication sources. Additionally, the vSphere Cloud Gateway Appliance can be deployed which allows linking the vmc.local domain to a local cloud vCenter Server’s SSO domain. This permits the use of the on-premises SSO domain for access to the VMware Cloud infrastructure.
Ideas to consider:
- Use private DNS resolution for vCenter & HCX Manager so that these appliances are accessed from the on-premises network. SRM, vSphere Replication & NSX Manager only support private DNS and private IP connectivity, although NSX Manager can be accessed through the VMware Cloud console as well.
- Link an on-premises identity source to vCenter using either the Cloud Gateway appliance or an LDAP connection, to use existing accounts for access to vCenter.
- Adding individual user accounts to the Administrators group, rather than importing an Active Directory group, helps separate authorization from authentication, reducing attack vectors in case of Active Directory compromise.
- Use tiered access models where everyday tasks can be handled by regular accounts/group access, but any privileged access should use a separate account, individually added to the vCenter group.
- Reset the account password using a PowerCLI script that automatically stores the password in your credential store, and only use it as a break-glass account when required for configuring new services that do not support service accounts (e.g. HCX) or when needed to make changes that other accounts to not have access. Rotate this password according to your password policy.
- If HCX has been enabled on the SDDC, remove any unused Public IPs (for example if HCX is being connected over a Direct Connect).
- Access to management components should not depend solely on IP address restrictions, as the compromise of an administrator's desktop often also includes the compromise of the administrator’s credentials. A bastion host or “jump box” solution may be implemented with multi-factor authentication. The Management Gateway firewall should then have appropriate restrictions on management services, allowing only the bastion host access.
Appropriate hardening and monitoring should be applied to bastion hosts, including considerations for the compromise of an organization's central Active Directory or authentication source. Using separate administrator accounts is also recommended to help identify the presence of attackers. The compromise of an administrator’s regular desktop account would not automatically lead to the compromise of infrastructure and may force the attacker to generate login failures which can be monitored.
- Limit connectivity to the SDDC’s ESXi hosts for destinations using the services required:
- vMotion can be proxied through HCX for a controlled, secure channel.
- VM Remote Console access is proxied through vCenter Server. Direct access to ESXi hosts by VM administrators is not required nor desired. Workload administrators should access guest OSes using Remote Desktop console functionality, or through direct SSH to the guest OS. This helps simplify firewall rulesets and access control for both the workload and the infrastructure.
- IPFix data will originate from SDDC ESXi hosts, and traffic should be restricted through the on-premises firewall to only the IPFix collectors.
- Port Mirroring traffic also originates from the SDDC ESXi hosts in a GRE tunnel, and traffic should be restricted through the on-premises firewall to only the necessary ERSPAN destinations.
- vSphere Replication traffic will originate from the SDDC ESXi hosts and traffic should be restricted through the on-premises (or destination SDDC Management gateway) firewall to only the necessary vSphere Replication appliances where VMs are being protected.