Linux内存管理

原创

mb611a2e88042f6 2021-08-16 22:48:15 博主文章分类：Linux 网络 ©著作权

文章标签 linux 物理内存体系结构 #endif 物理地址 文章分类 运维

©著作权归作者所有：来自51CTO博客作者mb611a2e88042f6的原创作品，请联系作者获取转载授权，否则将追究法律责任

内核和用户空间不同，不支持简单便捷的内存分配方式，而且处理内存分配错误也绝非易事。因此在深入之前，非常有必要理解内核是如何管理内存的。

1.1.1 页面

内核把物理页作为内存管理的基本单元。体系结构不同，支持的页大小也不同，大多数３２位体系结构支持４ＫＢ，而６４位一般会支持８ＫＢ的页。

系统中每一个物理页有一个 struct page,结构体定义在文件：大多数内核（kernel）的操作只使用ZONE_NORMAL区域，系统内存由很多固定大小的内存块组成的，这样的内存块称作为“页”（PAGE），x86体系结构中，page的大小为4096个字节。

　　Page结构与物理页相关，而并非与虚拟页相关。是对页的描述是短暂的，因为会存在交换等原因。该结构描述当前时刻相关物理页中存放的东西。目的在于物理内存本身，而不是包含在其中的数据。

页的数据结构对象都保存在mem_map全局数组中，该数组通常被存放在ZONE_NORMAL的首部，或者就在小内存系统中为装入内核映像而预留的区域之后。从载入内核的低地址内存区域的后面内存区域，也就是ZONE_NORMAL开始的地方的内存的页的数据结构对象，都保存在这个全局数组中。

1.1.1.1 分配页

内核提供了请求内存的底层机制，提供了进行访问的几个接口。以页为单位分配内存，定义于include/linux/gfp.h

static inline struct page *

alloc_pages(gfp_t gfp_mask, unsigned int order)

{

return alloc_pages_current(gfp_mask, order);

}

此外使用page_address函数将页转换成为逻辑地址。

__get_free_pages函数同alloc_pages,不过返回的是逻辑地址。

如果要获取返回的页的内容全为０，可以使用函数get_zeroed_page函数，该函数同__get_free_pages函数，只是将页填充成了０。

底层分配页如下：

Linux内存管理_物理地址

对应的释放函数有：__free_pages,free_pages,free_page。

1.1.1.2 分配字节单位空间

为了获得以字节为单位的一块物理地址连续的内核内存，内核提供函数kmalloc函数,定义在文件include/linux/slab.h。

/**

* kmalloc - allocate memory

* @size: how many bytes of memory are required.

* @flags: the type of memory to allocate.

* kmalloc is the normal method of allocating memory

* for objects smaller than page size in the kernel.

* The @flags argument may be one of:

* %GFP_USER - Allocate memory on behalf of user. May sleep.

* %GFP_KERNEL - Allocate normal kernel ram. May sleep.

* %GFP_ATOMIC - Allocation will not sleep. May use emergency pools.

* For example, use this inside interrupt handlers.

* %GFP_HIGHUSER - Allocate pages from high memory.

* %GFP_NOIO - Do not do any I/O at all while trying to get memory.

* %GFP_NOFS - Do not make any fs calls while trying to get memory.

* %GFP_NOWAIT - Allocation will not sleep.

* %__GFP_THISNODE - Allocate node-local memory only.

* %GFP_DMA - Allocation suitable for DMA.

* Should only be used for kmalloc() caches. Otherwise, use a

* slab created with SLAB_DMA.

* Also it is possible to set different flags by OR'ing

* in one or more of the following additional @flags:

* %__GFP_HIGH - This allocation has high priority and may use emergency pools.

* %__GFP_NOFAIL - Indicate that this allocation is in no way allowed to fail

* (think twice before using).

* %__GFP_NORETRY - If memory is not immediately available,

* then give up at once.

* %__GFP_NOWARN - If allocation fails, don't issue any warnings.

* %__GFP_RETRY_MAYFAIL - Try really hard to succeed the allocation but fail

* eventually.

* There are other flags available as well, but these are not intended

* for general use, and so are not documented here. For a full list of

* potential flags, always refer to linux/gfp.h.

static __always_inline void *kmalloc(size_t size, gfp_t flags)

{

if (__builtin_constant_p(size)) {

if (size > KMALLOC_MAX_CACHE_SIZE)

return kmalloc_large(size, flags);

#ifndef CONFIG_SLOB

if (!(flags & GFP_DMA)) {

int index = kmalloc_index(size);

if (!index)

return ZERO_SIZE_PTR;

return kmem_cache_alloc_trace(kmalloc_caches[index],

flags, size);

}

#endif

}

return __kmalloc(size, flags);

}

如果调用成功，返回指向内存的的指针。

与kmalloc对应的释放函数是kfree。

和kmalloc类似还有vmalloc，存在的差异是vmalloc分配的地址是虚拟地址连续的。并不能保证物理地址的连续性。一般是硬件设备需要连续的物理内存。

当然是用物理连续的内存块可以带来性能增益，因为把物理上不连续的页转换为虚拟地址空间上连续的页，必须专门建立页表项，而且不连续的物理地址容易导致TLB抖动。所以，为了获得大块内存时候，会调用vmalloc函数。

与vmalloc对应的释放函数是vfree。

1.1.2 区

实际的计算机体系结构有硬件的诸多限制, 这限制了页框可以使用的方式。内核并不能对所有页一视同仁。

例如80x86体系结构的两种硬件约束.

lÂ Â ISA总线的直接内存存储DMA处理器有一个限制，只能对RAM的前16MB进行寻址

lÂ Â 在具有大容量RAM的现代32位计算机中, CPU不能直接访问所有的物理地址, 因为线性地址空间太小, 内核不可能直接映射所有物理内存到线性地址空间

因此，内核把页划分为不同的区。

Linux内核对不同区域的内存需要采用不同的管理方式和映射方式,

管理区分类：

enum zone_type {

#ifdef CONFIG_ZONE_DMA

* ZONE_DMA is used when there are devices that are not able

* to do DMA to all of addressable memory (ZONE_NORMAL). Then we

* carve out the portion of memory that is needed for these devices.

* The range is arch specific.

* Some examples

* Architecture Limit

* ---------------------------

* parisc, ia64, sparc <4G

* s390 <2G

* arm Various

* alpha Unlimited or 0-16MB.

* i386, x86_64 and multiple other arches

* <16M.

ZONE_DMA,

#endif

#ifdef CONFIG_ZONE_DMA32

* x86_64 needs two ZONE_DMAs because it supports devices that are

* only able to do DMA to the lower 16M but also 32 bit devices that

* can only do DMA areas below 4G.

ZONE_DMA32,

#endif

* Normal addressable memory is in ZONE_NORMAL. DMA operations can be

* performed on pages in ZONE_NORMAL if the DMA devices support

* transfers to all addressable memory.

ZONE_NORMAL,

#ifdef CONFIG_HIGHMEM

* A memory area that is only addressable by the kernel through

* mapping portions into its own address space. This is for example

* used by i386 to allow the kernel to address the memory beyond

* 900MB. The kernel will set up special mappings (page

* table entries on i386) for each page that the kernel needs to

* access.

ZONE_HIGHMEM,

#endif

ZONE_MOVABLE,

#ifdef CONFIG_ZONE_DEVICE

ZONE_DEVICE,

#endif

__MAX_NR_ZONES

};

例如在x86-32上的区如下图：

Linux内存管理_#endif_02

一个管理区(zone)由struct zone结构体来描述，在linux-2.4.37之前的内核中是用typedef struct zone_struct zone_t数据结构来描述）。

1.1.3 节点

CPU被划分为多个节点(node), 内存则被分簇, 每个CPU对应一个本地物理内存, 即一个CPU-node对应一个内存簇bank，即每个内存簇被认为是一个节点。系统的物理内存被划分为几个节点(node), 一个node对应一个内存簇bank，即每个内存簇被认为是一个节点。

内存中每个节点由pg_data_t来描述。在linux中使用page_data_t来体现pglist_data。在分配页面时，Linux采用节点局部分配策略，从最靠近运行中的CPU的节点分配内存。

定义在文件include/linux/mmzone.h中：

nÂ Â 对于NUMA系统来讲，整个系统的内存由一个node_data的pg_data_t指针数组来管理

nÂ Â 对于PC这样的UMA系统，使用struct pglist_data contig_page_data ，作为系统唯一的node管理所有的内存区域。（UMA系统中中只有一个node）

节点、管理区和页之前的关系。

Linux内存管理_#endif_03

上一篇：Linux未来监控tracing框架——eBPF

下一篇：4.Oracle PDB官方解读- Lone PDB对比 non-CDB

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯