问题描述:

SUSE Linux中(SLES 11),激活overcommit memory后,系统会启用oom-killer随机杀死系统进程,在/proc下有一非常大的kcore文件。


问题分析:

请参考文档:http://www.novell.com/support/kb/doc.php?id=7002775

我摘录其中重要的几段如下,加上我的说明(蓝色):

Situation

Overcommit memory under SLES is a subject which is often misunderstood. The purpose of this TID is to address some common misunderstandings and provide resources for obtaining more information on this subject.

Note - In low memory conditions, overcommiting memory can lead to the oom-killer killing apparently random tasks. Managing the oom-killer will be discussed in detail in an upcoming TID.
这里说在内存比较少的情况下,如果激活overcommiting memory 功能,可能会导致oom-killer杀掉一些进程(这些进程需要额外的虚拟内存,比如Java虚拟机),而杀死任务是随机的。

Resolution

The definitive source of documentation for the behavior of overcommit memory is the Linux kernel source code. In particular, /usr/src/linux/mm/mmap.c (available when the kernel-source package is installed) is a good place to start.

As the source code can be difficult to follow, there is also documentation provided with the kernel-source package that explains overcommit memory in detail. This documentation can be found in the following file:
  • /usr/src/linux/Documentation/vm/overcommit-account

This file details the following 3 modes available for overcommit memory in the Linux kernel:
  • 0 - Heuristic overcommit handling.

  • 1 - Always overcommit.

  • 2 - Don't overcommit.

Mode 0 is the default mode for SLES servers. This allows for processes to overcommit "reasonable" amounts of memory. If a process attempts to allocate an "unreasonable" amount of memory (as determined by internal heuristics), the memory allocation attempt is denied. In this mode, if many applications perform small overcommit allocations, it is possible for the server to run out of memory. In this situation, the Out of Memory killer (oom-kill) will be used to kill processes until enough memory is available for the server to continue operating.

Mode 1 allows processes to commit as much memory as requested. These allocations will never result in an "out of memory" error. This mode is usually appropriate only in specific scientific applications.

Mode 2 prevents memory overcommit and limits the amount of memory that is available for a process to allocate. This model ensures that processes will not be randomly killed by the oom-killer, and that there will always be enough memory for the kernel to operate properly. The total amount of memory available for use by the system is determined through the following calculation:
  • Total Commit Memory = (swap size + (RAM size * overcommit_ratio))

By default, overcommit_ratio is set to 50. With this setting, the total commit memory size will be equal to the total amount of swap space in the server, plus 50% of the RAM. In other words, if a server has 1 GB of RAM, and 1GB of swap space, the system would have a total commit limit of 1.5GB.
  • Note - The RedHat documentation, Understanding Virtual Memory, is a good source of information on overcommit memory. (Other topics in that documentation have evolved since 2004.) However, there is an error in the "overcommit_ratio" section of this document. In this section, the calculation used to determine the allocatable memory is correct. However, in the text accompanying the calculation, the total amount of allocatable memory is incorrectly calculated as 2.5GB (on a server with 1GB of RAM and 1GB of swap space). 1.5GB is the correct value.

To determine or change which overcommit mode a server is operating in, the following proc files are used:
  • /proc/sys/vm/overcommit_memory

  • /proc/sys/vm/overcommit_ratio

Echoing the number of the desired mode into overcommit_memory will immediately change the overcommit mode being used. If mode 2 is in use, the ratio is determined using the value in the overcommit_ratio file.

To view the current memory statistics, check the following fields in /proc/meminfo:
  • CommitLimit - Overcommit limit

  • Committed_AS - Current memory amount committed

这是讲overcommiting memory 的几种类型,可以激活也可以禁用,overcommiting memory 的原理就是让系统能够使用超出其实际内存容量的内存,以让更多的程序能够运行,因为不是所有程序都会同时消耗内存的,这个跟Thin Provision有点类似,但是在内存少的情况下,这个多出来的内存如果太多,会激活oom-killer。



以下是overcommit memory的说明:http://www.redhat.com/magazine/001nov04/features/vm/

overcommit_memory is a value which sets the general kernel policy toward granting memory allocations. If the value is 0, then the kernel checks to determine if there is enough memory free to grant a memory request to a malloc call from an application. If there is enough memory, then the request is granted. Otherwise, it is denied and an error code is returned to the application. If the value is set to 1, then the kernel grants allocations above the amount of physical RAM and swap in the system as defined by the overcommit_ratio value. Enabling this feature can be somewhat helpful in environments which allocate large amounts of memory expecting worst case scenarios but do not use it all. If the setting in this file is 2, the kernel allows all memory allocations, regardless of the current memory allocation state.



解决办法:

用ps查看各进程的内存,大约就占用了4G, 绝大部分内存都是被Page Cache所占用。Linux内核的策略是最大程度的利用内存cache 文件系统的数据,提高IO速度,虽然在机制上是有进程需要更大的内存时,会自动释放Page Cache,但不排除释放不及时或者释放的内存由于存在碎片不满足进程的内存需求。

所以我们需要一个方法,能够限定PageCache的上限。

Linux 提供了这样一个参数min_free_kbytes,用来确定系统开始回收内存的阀值,控制系统的空闲内存。值越高,内核越早开始回收内存,空闲内存越高。

[root@zyite-app01 root]# cat /proc/sys/vm/min_free_kbytes
163840
echo 963840 > /proc/sys/vm/min_free_kbytes

其他可选的临时解决方法:

关闭oom-killer

cat /proc/sys/vm/oom_kill_allocating_task

echo "0" > /proc/sys/vm/oom-kill_allocating_task

vi /etc/sysctl.conf

vm.oom-kill_allocating_task = 0

2. 清空cache (可选)
echo 1 > /proc/sys/vm/drop_caches


另:

可以关闭特定程序的OOM killer,以下是示范脚本:

for pid in $(pidof sshd) ; do

          echo "disabling oom on pid $pid"        

          echo -17 | sudo tee /proc/$pid/oom_adj > /dev/null    

done


参考文档:http://linux-mm.org/OOM_Killer