Sunday, April 06, 2025

Network throughput and CPU efficiency of FreeBSD 14.2 and Debian 10.2 in VMware

I'm long time FreeBSD user (since FreeBSD 2.2.8, 1998) and all these (27) years I lived with the impression that FreeBSD has the best TCP/IP network stack in the industry. 

Recently, I was blogging about testing network throughput of 10 Gb line where I have used default installation of FreeBSD 14.2 with iperf and realized that I need at least 4 but better 8 vCPUs in VMware virtual machine to achieve more than 10Gb network throughput. Colleague of mine told me that he does not see such huge CPU requirements in Debian and such information definitely caught my attention. That's the reason I have decided to test it.

TCP throughput tests were performed between two VMs in one VMware ESXi host, therefore, the network traffic does not need to go to the physical networking.

Physical server I use for these tests has CPU Intel Xeon CPU E5-2680 v4 @ 2.40GHz. This type of CPU has been introduced by Intel in 2016 so it is not the latest CPU technology but both operating system will have the same conditions.

VMs were provisioned on VMware ESX 8.0.3 hypervisor, which is the latest version at time of writing this article.

VM hardware used for iperf tests is

  • 1 vCPU (artificially limited by hypervisor to 2000 MHz)
  • 2 GB RAM
  • vNIC type is vmxnet3 
I run iperf -s on one VM01 and iperf -c [IP-OF-VM01] -t600 -i5 on VM02 and watching results.

Test results of FreeBSD 14.2

I can achieve 1.34 Gb/s without Jumbo Frames enabled. 
This is 1.5 Hz for 1 bit/s (2 GHz / 1.34 Gb)
During network test without Jumbo Frames enabled, iperf client consumes ~40% CPU usage and server also ~40% CPU usage. 

Test results of Debian 12.10

I can achieve 9.5 Gb/s
This is 0.21 Hz for 1 bit/s (2 GHz / 9.5 Gb). 
During network test, iperf client consumes ~50% CPU usage and server ~60% CPU usage. There is no difference when Jumbo Frames are enabled.

Comparison of default installations

Network throughput of default installation of Debian 12.10 is 7x better than default installation of FreeBSD 14.2. We can also say that Debian requires 7x less CPU cycles per bit/s.

FreeBSD Network tuning

In Debian, open-vm-tools 12.2.0 are automatically installed in default installation.

FreeBSD does not install open-vm-tools automatically but vmxnet driver is included in the kernel, therefore, open-vm-tools should not be necessary. Anyway, I installed open-vm-tools and explicitly enabled vmxnet in rc.conf, but there is no improvement in network throughput, which confirms that open-vm-tools are not necessary for optimal vmxnet networking.

So this is not the thing, so what else we can do to improve network throughput?

Network Buffers

We can try increase Network Buffers.

What is default setting of kern.ipc.maxsockbuf?

root@VM-CUST-0001-192-168-1-11:~ # sysctl -a | grep kern.ipc.maxsockbuf
kern.ipc.maxsockbuf: 2097152

What is default setting of net.inet.tcp.sendspace?

root@VM-CUST-0001-192-168-1-11:~ # sysctl -a | grep net.inet.tcp.sendspace
net.inet.tcp.sendspace: 32768

What is default setting of net.inet.tcp.recvspace?

root@VM-CUST-0001-192-168-1-11:~ # sysctl -a | grep net.inet.tcp.recvspace
net.inet.tcp.recvspace: 65536

Let's increase these values in /etc/sysctl.conf

# Increase maximum buffer size
kern.ipc.maxsockbuf=8388608

# Increase send/receive buffer sizes
net.inet.tcp.sendspace=4194304
net.inet.tcp.recvspace=4194304

and reboot the system.

When I test iperf with these deeper network buffers I can achieve 1.2 Gb/s which is even slightly worse than throughput with default settings (1.34 Gb/s) and far beyond the Debian throughput (9.5 Gb/s) , thus tuning of network buffers does not help and I revert settings to default.

Jumbo Frames

We can try enable Jumbo Frames.

I have Jumbo Frames enabled on the physical network, so I can try enable Jumbo Frames in FreeBSD and test the impact on network throughput.

Jumbo Frames are enabled in FreeBSD by following command

ifconfig vmx0 mtu 9000

We can test if Jumbo Frames are available between VM01 and VM02.

ping -s 8972 -D [IP-OF-VM02]

iperf test result: 
I can achieve 5 Gb/s with Jumbo Frames enabled. 
This is 0.4 Hz for 1 bit/s (2 GHz / 5 Gb)
iperf client consumes ~20% CPU usage and server also ~20% CPU usage

When I test iperf with Jumbo Frames enabled, I can achieve 5 Gb/s which is significantly (3.7x) higher throughput than throughput with default settings (1.34 Gb/s) but it is still less than Debian throughput (9.5 Gb/s) with default settings (MTU 1,500). It is worth to mention that Jumbo Frames helped not only with higher throughput but also with less CPU usage.

I have also tested iperf throughput on Debian with Jumbo Frames enabled and interestingly enough, I have get the same throughput (9.5 Gb/s) as I was able to achieve withou Jumbo Frames, so increasing MTU on Debian did not have any positive impact on network throughput nor CPU usage.

I have reverted MTU settings to default (MTU 1,500) and tried another performance tuning.

Enable TCP Offloading

We can enable TCP Offloading capabilities. TXCSUM, RXCSUM, TSO4, and TSO6 are enabled by default, but LRO (Large Receive Offload) is not enabled.

Let's enable LRO and test the impact on iperf throughput.

ifconfig vmx0 txcsum rxcsum tso4 tso6 lro

iperf test result:
I can achieve 7.29 Gb/s with LRO enabled and standard MTU 1,500  
This is 0.27 Hz for 1 bit/s (2 GHz / 7.29 Gb)
iperf client consumes ~20% CPU usage and server also ~25% CPU usage

When I test iperf with LRO enabled, I can achieve 7.29 Gb/s which is significantly better than throughput with default settings (1.34 Gb/s) and even better than Jumbo Frame impact (5 Gb/s). But it is still less the Debian throughput (9.5 Gb/s) with default settings.

Combination of TCP Offloading (LRO) and Jumbo Frames

What if the impact of LRO and Jumbo Frames are combined?

ifconfig vmx0 mtu 9000 txcsum rxcsum tso4 tso6 lro

iperf test result:
I can achieve 8.9 Gb/s with Jumbo Frames and LRO enabled. 
This is 0.22 Hz for 1 bit/s (2 GHz / 8.9 Gb)
During network test with Jumbo Frames and LRO enabled, iperf client consumes ~25% CPU usage and server also ~30% CPU usage. 

Conclusion

Network throughput

Network throughput within single VLAN between two VMs with default installations of Debian 12.10 is almost 10 Gb/s (9.5 Gb/s) with ~50% usage of single CPU @ 2 GHz.

Network throughput within single VLAN between two VMs with default installations of FreeBSD 14.2 is 1.34 Gb/s with ~40% usage of single CPU @ 2 GHz.

Debian 12.10 default installation has 7x higher throughput than default installation of FreeBSD 14.2.

Enabling LRO without Jumbo Frames increase FreeBSD network throughput to 7.29 Gb/s.

Enabling Jumbo Frames on FreeBSD increase throughput to 5 Gb/s. Enabling Jumbo Frames in Debian configuration does not help with higher Throughput.

Combination of Jumbo Frames and LRO increases FreeBSD network throughput to 8.9 Gb/s which is close to 9.5 Gb/s of default Debian system, but still lower result than network throughput on Debian.

CPU usage

In terms of CPU, Debian uses ~50% CPU on iperf client and  ~60% on iperf server.

FreeBSD with LRO and without Jumbo Frames uses ~20% CPU on iperf client and  ~25% on iperf server. When LRO is used in combination with Jumbo Frames, it uses ~25% CPU on iperf client and  ~30% on iperf server, but it can achieve 20% higher throughput.

What system has better networking stack?

Debian can achieve higher throughput even without Jumbo Frames (9.5 Gb/s vs 7.29 Gb/s)  but at the cost of higher CPU usage (50/60% vs 20/25%). When Jumbo Frames can be enabled the throughput is similar (9.5 Gb/s vs 8.9 Gb/s) but with significantly higher CPU usage in Debian (50/60% vs 25/30%). 

Key findings

Debian has all TCP Offloading Capabilities (LRO, TXCSUM,  RXCSUM, TSO) enabled on default installation. Disabled LRO in default FreeBSD installation is the main reason why FreeBSD has poor VMXNET3 network throughput on its default installation. When LRO is enabled, the FreeBSD network throughput is pretty decent but still lower than Debian. Jumbo Frames is another help for FreeBSD and does not help Debian at all, which is interesting. Combination of LRO and Jumbo Frames boost FreeBSD network performance to 8.9 Gb/s but Debian can achieve 9.5 Gb/s without Jumbo Frames. I will try to open discussion about this behavior in FreeBSD and Linux forums to understand some further details. I do not understand why enabling Jumbo Frames on Debian does not have positive impact on network throughput and lower CPU usage.

 

No comments: