Hyper-V NUMA Best Practices

***savas@BackupChain*** · (This post was last modified: 12-06-2024, 03:01 PM by savas@BackupChain.)

When it comes to managing virtual machines in Hyper-V, optimizing NUMA (Non-Uniform Memory Access) is something a lot of people overlook, but it’s actually a pretty big deal, especially as workloads become more complex and your hardware scales up. NUMA is a hardware design feature that helps manage memory access in multi-processor systems, and in virtual environments like Hyper-V, understanding how NUMA works can be a game changer for VM performance. Let's talk about some best practices for configuring and optimizing NUMA settings in Hyper-V so you can get the best performance possible from your hardware.

Understand NUMA and How It Affects Performance

NUMA, at its core, is about how memory is handled across multiple processors. In a NUMA-enabled system, each CPU has its own local memory, but it can also access memory attached to other processors, albeit with higher latency. The key to optimizing NUMA is making sure that VMs are aware of this architecture and can take advantage of it, rather than dealing with the performance penalty of accessing memory across distant NUMA nodes.

By default, Hyper-V is pretty smart about how it handles NUMA, but it’s not always perfect, especially when you start running workloads that are highly resource-intensive or need to be fine-tuned. If a VM is not NUMA-aware, it could end up causing excessive memory latency because it’s trying to access memory across NUMA nodes that are distant from its CPU. This is particularly problematic for high-performance applications or virtual machines that need to run multiple CPUs in tandem.

One of the first things to know about NUMA in Hyper-V is how it ties into the CPU and memory configuration of your physical hosts. For a VM to take advantage of NUMA, it needs to be assigned to the right CPU cores, with memory that's local to those cores. If you’re not paying attention to how you assign resources, the VM might end up running in a suboptimal configuration, leading to degraded performance.

Configure VMs to Match NUMA Topology

When you create a VM in Hyper-V, it’s important to configure it in a way that respects the underlying NUMA topology of your host machine. Hyper-V will automatically configure VMs in a NUMA-aware way, but if you’re manually tuning your VMs or using non-standard configurations, you should take care to match the number of virtual CPUs (vCPUs) to the number of NUMA nodes on the physical host.

For instance, if your host has two NUMA nodes with eight cores each, and you assign a VM with 16 vCPUs, Hyper-V will try to distribute the vCPUs evenly across the NUMA nodes. But if the VM requires more vCPUs than there are available on a single node, Hyper-V may assign the VM in a way that leads to memory latency issues. To avoid this, you can explicitly set the number of vCPUs to match the NUMA node configuration, which helps ensure that the memory allocated to each vCPU is local, minimizing cross-node memory access.

In most cases, though, Hyper-V will do this for you automatically. But it's still worth keeping in mind if you’re running specialized workloads like SQL Server, high-performance computing (HPC), or other applications that require tight memory and CPU coordination. These types of applications can be really sensitive to NUMA misconfiguration, so manually aligning vCPUs with NUMA nodes can give you a tangible boost in performance.

Use VM NUMA Configuration for High-Performance Workloads

Certain applications, like databases, require lots of CPU and memory to work efficiently. NUMA is particularly important for these types of workloads because it helps ensure that memory is local to the CPU cores that are executing the application. Hyper-V has the ability to configure a VM’s NUMA settings, and for high-performance workloads, it’s crucial that these settings are optimized.

For example, let’s say you’re running a VM with a resource-heavy application, and that VM is configured to use eight vCPUs. By default, Hyper-V might assign those eight vCPUs across multiple NUMA nodes on the host, which could cause some of the memory accesses to cross NUMA boundaries, introducing latency. This can be particularly problematic for workloads like SQL Server, which depend on quick access to memory.

You can optimize this by configuring the VM’s NUMA settings to ensure that the vCPUs are aligned with the NUMA nodes of the host. For example, for a VM with 16 vCPUs, you might configure the VM to use two NUMA nodes, each with 8 vCPUs. This ensures that each vCPU has access to its own local memory, reducing the latency from cross-node memory access. This alignment can significantly improve the performance of resource-heavy applications that are NUMA-aware.
Additionally, enabling NUMA spanning in Hyper-V can sometimes help if you have workloads that require more vCPUs than a single NUMA node can support. However, NUMA spanning is generally not recommended for performance-sensitive workloads because it can increase latency when memory is accessed from distant NUMA nodes.

Balance vCPU and Memory Allocation

The beauty of NUMA-aware virtual machines is that they can benefit from tight coordination between the CPU and memory. But that means you also need to balance how much CPU and memory you assign to a VM. A mistake many people make is over-allocating memory or vCPUs to a VM without considering the NUMA topology. This can lead to inefficient resource allocation, where the virtual machine ends up requesting memory from remote NUMA nodes, which causes unnecessary latency.

To make sure your VM is NUMA-optimized, you should avoid over-committing vCPUs and memory. For example, if you have a 16-core host and decide to assign a VM with 16 vCPUs and 64GB of RAM, that VM could be straddling multiple NUMA nodes. Depending on the workload, this might work fine, but in some cases, you might be better off assigning it fewer vCPUs, especially if the VM doesn’t need to use all 16 cores.

On the memory side, assigning too much memory to a VM can cause inefficient NUMA memory access patterns, even if you’re following the correct CPU-to-NUMA-node allocation. If you give a VM more memory than it needs, it could end up accessing distant NUMA nodes for memory, which increases latency. It’s about finding the sweet spot for memory allocation that matches the application’s requirements without overwhelming the system. Monitoring memory utilization and adjusting based on usage patterns is key.

Monitor NUMA Performance and Adjust

Configuring NUMA is one thing, but actually seeing it in action is another. Once your VM is running, you’ll need to monitor NUMA performance to make sure everything is working as expected. Hyper-V provides a number of tools for monitoring performance, and it’s essential to keep an eye on the performance counters that relate to NUMA, such as memory latency, CPU usage, and NUMA node distribution.

One tool you can use is the Performance Monitor in Windows Server, which can give you a deeper view of how memory is being used across NUMA nodes. Look at things like the "Processor% Processor Time" and "Memory\Available MBytes" counters to see how much memory is being accessed across different NUMA nodes. You can also use the Resource Monitor and Task Manager to get real-time performance data, which will help you spot issues with memory latency or CPU contention.

If you notice that certain VMs are consistently experiencing high NUMA node contention or memory access delays, it’s time to adjust your configuration. You may need to change the number of vCPUs, reduce the memory allocation, or ensure that the VM’s memory is allocated to the same NUMA node as its vCPUs. Regular monitoring and tuning can go a long way in ensuring that your VMs are running as efficiently as possible.

Be Mindful of Host NUMA Configuration

Another aspect of NUMA optimization is making sure that the host’s NUMA configuration is set up correctly. This involves configuring the BIOS and ensuring that your physical server is using NUMA mode. Most modern servers with multiple CPUs support NUMA, but it’s still worth checking that the NUMA setting is enabled in the BIOS and that all CPUs and memory banks are properly recognized.

Some servers allow you to fine-tune the NUMA configuration, such as enabling or disabling specific NUMA nodes or adjusting the memory distribution between CPUs. This is especially useful in highly virtualized environments, where you want to allocate resources based on workload requirements. Just be careful not to disable too many NUMA nodes or CPU cores, as this could limit the flexibility of your host to handle resource-intensive virtual machines.

Additionally, when you scale up your environment by adding more physical hosts or upgrading hardware, it’s important to verify that NUMA is still configured correctly across all nodes. Any inconsistency in NUMA configuration across multiple hosts could lead to performance issues and make it harder for Hyper-V to optimize VM placement across the cluster.

In a nutshell

Optimizing NUMA settings in Hyper-V is essential for ensuring that your virtualized workloads run efficiently, particularly in environments that require high performance. Understanding how NUMA affects memory and CPU access patterns, configuring your VMs to match the NUMA topology, and regularly monitoring performance are all key factors in getting the best results. NUMA-aware VMs can drastically reduce latency, improve resource allocation, and enhance the overall performance of your virtual machines. With the right configuration, you’ll make sure your host hardware is being used to its full potential and your VMs are running as fast and smoothly as possible.

I hope my post was useful. Are you new to Hyper-V and do you have a good Hyper-V backup software? See my other post