Introduction to NUMA in BIOS: A Comprehensive Guide

Introduction

Non-Uniform Memory Access (NUMA) is a computer memory design architecture that provides distributed memory access across multiple processors. It is designed to improve performance by reducing memory access latency in multi-processor systems. NUMA is implemented and managed at the hardware level, including the BIOS.

In this article, we will explore the concept of NUMA in BIOS, its significance, and how it affects your system's performance. We will also provide code examples to demonstrate how to make use of NUMA in your applications.

Understanding NUMA

Before delving into BIOS settings and code examples, let's grasp the basic idea of NUMA. In a traditional symmetric multiprocessing (SMP) system, all processors have equal access to the entire memory. However, as the number of processors increases, the memory access latency also increases, leading to performance degradation.

NUMA aims to mitigate this issue by dividing the memory into multiple nodes, each associated with a subset of processors. Each node contains a local memory that is directly accessed by the processors assigned to that node. Remote memory access, from processors assigned to different nodes, incurs higher latency due to interconnects.

NUMA in BIOS

BIOS (Basic Input/Output System) is responsible for booting up the system and providing low-level hardware services. It also manages the hardware settings, including NUMA configurations, on the system.

NUMA settings in BIOS allow you to control how memory is allocated and accessed in NUMA architectures. These settings typically include enabling or disabling NUMA, setting the number of nodes, and configuring memory interleave options.

Let's take a look at an example of enabling NUMA in BIOS using code snippets:

```mermaid
sequenceDiagram
    participant User
    participant BIOS
    participant OS

    User->>BIOS: Booting up the system
    BIOS->>OS: Handover control to the OS
    OS->>BIOS: Query NUMA settings
    BIOS->>OS: Provide NUMA configuration

```markdown
```mermaid
flowchart TD
    Start --> Enable NUMA
    Enable NUMA --> Set Node Count
    Set Node Count --> Configure Memory Interleave

In the above code snippets, the sequence diagram represents the interaction between the user, BIOS, and the operating system (OS) during the boot process. The flowchart illustrates the steps involved in enabling NUMA in BIOS.

## Code Examples

Now, let's explore how to leverage NUMA in your applications using code examples. NUMA-aware programming involves:

1. Querying the system for NUMA information.
2. Allocating memory on a specific NUMA node.
3. Binding threads to specific NUMA nodes.

Here's an example using C++ and the Windows API to allocate memory on a specific NUMA node:

```c++
```markdown
```cpp
#include <iostream>
#include <windows.h>
#include <winnt.h>
#include <numaapi.h>

int main() {
    DWORD nodeCount = GetNumaHighestNodeNumber() + 1;
    std::cout << "Number of NUMA nodes: " << nodeCount << std::endl;

    DWORD currentNode = GetCurrentProcessorNumber();
    std::cout << "Current processor node: " << currentNode << std::endl;

    DWORD targetNode = 1; // Allocate memory on NUMA node 1
    void* memory = VirtualAllocExNuma(GetCurrentProcess(), NULL, 1024 * 1024, MEM_COMMIT, PAGE_READWRITE, targetNode);
    if (memory != NULL) {
        std::cout << "Memory allocated on NUMA node " << targetNode << std::endl;
    }
    else {
        std::cerr << "Failed to allocate memory." << std::endl;
    }

    return 0;
}

In the above code, we use the `GetNumaHighestNodeNumber` function to retrieve the total number of NUMA nodes in the system. We then use `GetCurrentProcessorNumber` to determine the current processor's node. Finally, we allocate memory on a specific NUMA node using `VirtualAllocExNuma`.

## Conclusion

NUMA is an essential architectural concept in modern multi-processor systems that aims to enhance performance by reducing memory access latency. Understanding and configuring NUMA settings in BIOS can help optimize your system's performance.

In this article, we provided an overview of NUMA in BIOS, demonstrated the interaction using sequence diagrams, and presented code examples to illustrate NUMA-aware programming. We hope this guide helps you make the most of NUMA in your applications.

Remember, efficient utilization of NUMA requires careful consideration of workload distribution, memory allocation, and thread binding. Always benchmark and analyze the performance impact before making any significant changes.

Stay tuned for more informative articles on system architecture and optimization!

## References

- Microsoft Docs: [NUMA Support](
- Intel Developer Zone: [Non-Uniform Memory Access (NUMA)](