Optimizing a performance for virtualized systems is always a trade-off between consolidation ratios and single VM performance. These instructions are geared towards maximum performance and network connectivity reliability (reduce packet drops to minimum) of a single Virtual Machine at cost of decreased consolidation ratios on a ESXi host level.
These instructions are valid for deployments using VMXNET3 virtual network adapters. SR-IOV (PCI Passthrough) would be optimal for performance and latency but unfortunately Palo Alto Networks VM-Series does not support Ethernet Link Aggregation at PAN-OS level, thus rendering SR-IOV mode unusable with typical dual homed Layer3 deployments.
ESXi host recommendations
Use at least ESXi version 6.5, it contains many improvements to Virtual Machine networking performance over older ESXi versions
- vSwitch packet switching overhead is lower than on previous ESXi releases
- Each Virtual Network adapter of a VM can be configured to have multiple CPU threads
- Virtual Network adapter can be configured in Split TX mode allowing asynchronous packet switching between vNIC to vSwitch and vSwitch to Physical NIC
Use Distributed Virtual Switches instead of Standard Switches
- Distributed Virtual Switch adds
- Proper VLAN trunking support, you can assign range of allowed VLANs for a trunk
- MAC Address learning with help of unofficial patch
- Distributed Virtual Switch requires Enterprise Plus license for a ESXi host
Use native network device drivers for physical NICs
- Legacy ESXi drivers are modified Linux drivers using a compatibility layer on ESXi host. ESXi 5.5 introduced support for native drivers which have lower overhead than legacy drivers
- See https://kb.vmware.com/s/article/2147565 for more information
- Check for updated device drivers at
Do not overbook ESXi host resources for mission critical NFV workloads
- Firewall and router VMs should run at minimal latency to avoid packet buffer exhaustion
- Allocate dedicated CPU cores for firewall Data Plane vCPUs
Deploy VM-Series with PAN-OS 8.0 or newer
PAN-OS 8.0 adds support for DPDK (Data Plane Development Kit) which greatly increases packet switching performance on a Virtual Machine. Once deployed upgrade to at least PAN-OS 8.0.8, this version includes critical bug fixes for VM-Series and as of March 2018 it is a PANW TAC recommended release.
Upgrade Virtual Machine compatibility level to highest available (ESXi 6.5 etc)
ESXi 6.5 (version 13) compatibility level improves VMs virtual networking performance and it allows new CPU instructions to be exposed to a VM which reduce performance penalty caused by patches for Meltdown and Spectre vulnerabilities.
Please note that this change is not actually supported or recommended by PANW TAC, however I have had no issues with TAC when working with firewalls upgraded to this level.
Allocate CPU and RAM resources based on VM-Series license that will be deployed
- VM-50 2 vCPU and 4.5 GB RAM minimum, 5 GB recommended
- VM-100 2 vCPU and 6.5 GB RAM minimum, 8 GB recommended
- VM-300 4 vCPU and 9 GB RAM minimum, 10 GB recommended
- VM-500 8 vCPU and 16 GB RAM minimum
Increased memory sizes are my personal recommendations based on experience by working with VM-Series, these are not official PANW recommendations.
VM-Series license enforces number of vCPUs assigned to firewall Data Plane, so any additional vCPUs over your VM-Series license will not improve firewall packet switching performance, any additional vCPUs are assigned to Management Plane use only.
Set VM resource reservations
Set CPU reservation to vCPU * Base CPU clock speed
- If CPU base clock speed is 2400 MHz and VM has 2 vCPU set CPU reservation to 4800 MHz
- If you cannot power on a VM with set CPU reservation try lowering it slightly, 10 – 100 MHz, and try again
You can allocate CPU reservation only within available capacity
- ESXi host may not always measure CPU clock speed exactly to specified value, in this case you may need to adjust CPU reservation down slightly to value which allows VM to Power On. For example with 2400 MHz CPU you may some times need to use values such as 2380 MHz as basis of your calculations.
- You can allocate resource reservations only within available ESXi host resources. Resource reservations prevent ESXi host resource overbooking, so existing reservations may prevent you from assigning required reservation for a new VM.
Set RAM reservation to Reserve all guest memory (All locked), this improves VM packet switching performance
Add second hard disk for log storage
This is not related to performance optimization but this is highly recommended for all deployments
- Size Hard Disk 2 according to log retention requirements, maximum size is 2 TB
- Pay attention to provisioning mode, Thin or Thick. Log disk size increases over time so Thin provisioning may be dangerous and cause datastore to run out of space.
Add Serial Port device to VM configuration
Serial Port device is required when VM compatibility level is changed, this is a workaround for BugID PAN-91472. If VM-Series Virtual Machine compatibility level is upgraded, PAN-OS bootloader will have approximately 5 minutes delay on PAN-OS boot. Adding a Serial Port Device removes this delay on PAN-OS boot-up.
- Configure Serial Port device to use file output, point output file to VM home directory into serial.out file.
- Keep Connect at Power On as unchecked
Allocate necessary amount of virtual network adapters
PAN-OS in VM-Series does not support NIC hot-plug, any run time added network adapters will be detected by PAN-OS on reboot only
- Network Adapter 1 is used for PAN-OS Management interface
- Networks Adapter 2 – 10 are used for PAN-OS Data Plane
- PAN-OS HA1, HA2 and HA3 each require dedicated virtual network adapter
- HA interfaces cannot be shared so backup HA links requires 2x virtual network adapters for HA
Edit Virtual Machine advanced settings
Edit Virtual Machine settings and go to VM Options, set Virtual Machine Latency Sensitivity to High
This setting optimizes VM execution on ESXi hypervisor
- Each vCPU will be assigned to a dedicated CPU core on ESXi. This removes ESXi scheduler from CPU instruction pipeline which reduces CPU overhead, this also reduces consolidation ratios of ESXi host.
- Interrupt coalescing is disabled for VMXNET3 virtual network adapters. This increases CPU utilization slightly but it reduces packet switching latency which in result reduces probability for packet drops between ESXi vSwitch and PAN-OS
Open Edit Configuration …
Add following parameters
sched.mem.prealloc = "TRUE" sched.mem.prealloc.pinnedMainMem = "TRUE" sched.swap.vmxSwapEnabled = "FALSE"
When editing advanced settings be careful not to have any white spaces on setting or parameters. Do not include quotation (“) marks for parameters when editing settings through a vSphere Web UI.
sched.mem.prealloc settings enable full memory preallocation which improves Virtual Machines packet switching performance.
swap.vmxSwapEnabled setting disables Virtual Machine overhead memory swap file for this VM, this setting is required with memory preallocation. Overhead memory consumption of this Virtual Machine will be increased as a result.
Using same method as for previous settings, add following parameters for each Data Plane vNIC (ethernet1 to ethernet9 if you have all vNICs assigned). Ethernet0 is Management vNIC which has no use for these settings.
ethernet1.pnicFeatures = "4" ethernet2.pnicFeatures = "4" ethernet3.pnicFeatures = "4" <add for each data plane virtual network adapter> ethernet1.ctxPerDev = "1" ethernet2.ctxPerDev = "1" ethernet3.ctxPerDev = "1" <add for each data plane virtual network adapter>
pnicFeatures = “4” enables Receive Side Scaling (RSS) support in VMXNET3 adapter within VM. This improves packet switching performance for VM-300 and higher models.
ethernetX.ctxPerDev = “1” enables dedicated CPU thread for specified virtual network adapter on a ESXi host level. This improves vNIC packet switching performance but ESXi host CPU usage may increase as a result.
These settings maximize packet switching performance of a Virtual Machine based firewall.
Following documents are highly recommended for anyone working with NFV or any other latency sensitive solutions on VMware vSphere
- Tuning vCloud NFV for Data Plane Intensive Workloads
- Best Practices for Performance Tuning of Telco and NFV Workloads in vSphere
- NSX-T Networking Best Practices
- Performance Best Practices for VMware vSphere 6.5
- VMXNET3 RX Ring Buffer Exhaustion and Packet Loss