How to reproduce a condition which invokes the OOM-Killer ?

环境

  • Red Hat Enterprise Linux 5
  • Red Hat Enterprise Linux 6

问题

  • How to reproduce a condition which invokes the OOM-Killer ?

决议

Free output :

​Raw​

# free -m 
total used free shared buffers cached
Mem: 1999 1819 180 0 94 910
-/+ buffers/cache: 813 1186
Swap: 4095 0 4095

  • There is almost 2GB of memory and out of that 910MB memory is cached( that means alomost 50% of memory is cached), system is using 99% of RAM.
  • Following are the overcommit parameters.

​Raw​

 $ cat /proc/sys/vm/overcommit_memory  

$ cat /proc/sys/vm/overcommit_ratio
50

The following program will allocate all the memory but will not use it. Just it will allocate the memory.

memtest.c

​Raw​

 #include <stdio.h>
#include <stdlib.h>

int main (void) {
int n = 0;

while (1) {
if (malloc(1<<20) == NULL) {
printf("malloc failure after %d MiB\n", n);
return 0;
}
printf ("got %d MiB\n", ++n);
}
}



$ gcc memtest1.c
$ ./a.out

got 570528 MiB
got 570529 MiB
got 570530 MiB
got 570531 MiBKilled

  • Kernel allowed upto 557MB of RAM (Kernel has overcommited the memory) we have used vm.overcommit_memory = 0 parameter.
    Following are the snipped log messages:

​Raw​

 #less /var/log/messages  

6792kB unstable:0kB bounce:0kB writeback\_tmp:0kB pages\_scanned:160 all_unreclaimable? no
Dec 6 00:19:23 dhcp1-109 kernel: [15358.827694] lowmem_reserve[]: 0 0 0 0
Dec 6 00:19:23 dhcp1-109 kernel: [15358.827699] Node 0 DMA: 3*4kB 4*8kB 8*16kB 9*32kB 10*64kB 10*128kB 2*256kB 2*512kB 2*1024kB 1*2048kB 0*4096kB = 8012kB
Dec 6 00:19:23 dhcp1-109 kernel: [15358.827711] Node 0 DMA32: 377*4kB 21*8kB 2*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 5740kB
Dec 6 00:19:23 dhcp1-109 kernel: [15358.827723] 19644 total pagecache pages
Dec 6 00:19:23 dhcp1-109 kernel: [15358.827725] 1378 pages in swap cache
Dec 6 00:19:23 dhcp1-109 kernel: [15358.827728] Swap cache stats: add 1114112, delete 1112734, find 9660/15265
Dec 6 00:19:23 dhcp1-109 kernel: [15358.827730] Free swap = 0kB
Dec 6 00:19:23 dhcp1-109 kernel: [15358.827732] Total swap = 4194300kB
Dec 6 00:19:23 dhcp1-109 kernel: [15358.836840] 521855 pages RAM
Dec 6 00:19:23 dhcp1-109 kernel: [15358.836843] 9983 pages reserved
Dec 6 00:19:23 dhcp1-109 kernel: [15358.836845] 17279 pages shared
Dec 6 00:19:23 dhcp1-109 kernel: [15358.836847] 494732 pages non-shared
Dec 6 00:19:23 dhcp1-109 kernel: [15358.836852] Out of memory: kill process 6299 (a.out) score 154833937 or a child
Dec 6 00:19:23 dhcp1-109 kernel: [15358.836857] Killed process 6299 (a.out) vsz:619335748kB, anon-rss:535344kB, file-rss:92kB

  • System was running in Low memory and it has killed a.out proces.

Free output

​Raw​

 # free -m  
total used free shared buffers cached
Mem: 1999 455 1543 0 9 126
-/+ buffers/cache: 319 1680
Swap: 4095 354 3741

​Raw​

 #echo "2"  /proc/sys/vm/overcommit_memory  
#echo "100" /proc/sys/vm/overcommit_ratio <<< Here your system has failed.

  • Following program will start using the memory:

memtest2.c

​Raw​

 #include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main (void) {
int n = 0;
char *p;

while (1) {
if ((p = malloc(1<<20)) == NULL) {
printf("malloc failure after %d MiB\n", n);
return 0;
}
memset (p, 0, (1<<20));
printf ("got %d MiB\n", ++n);
}
}


#gcc memtest2.c
#./a.out
got 4511 MiB
got 4512 MiB
malloc failure after 4512 MiB

  • That means system allowed me to use upto 4.5GB of memory. This is because of overcommit_memory=2 and overcommit_ratio=100. (swap+100% of memory).
  • After running this program system became very slow and slugish but it has not crashed. Then OOM killer came and killed correct process.

​Raw​

 #less /var/log/messages  
Dec 6 00:19:23 dhcp1-109 kernel: [15358.836847] 494732 pages non-shared
Dec 6 00:19:23 dhcp1-109 kernel: [15358.836852] Out of memory: kill process 6299 (a.out) score 154833937 or a child
Dec 6 00:19:23 dhcp1-109 kernel: [15358.836857] Killed process 6299 (a.out) vsz:619335748kB, anon-rss:535344kB, file-rss:92kB
Dec 6 00:29:19 dhcp1-109 rtkit-daemon[2166]: The canary thread is apparently starving. Taking action.
Dec 6 00:29:19 dhcp1-109 rtkit-daemon[2166]: Demoting known real-time threads.
Dec 6 00:29:19 dhcp1-109 rtkit-daemon[2166]: Successfully demoted thread 2336 of process 2333 (/usr/bin/pulseaudio).
Dec 6 00:29:19 dhcp1-109 rtkit-daemon[2166]: Successfully demoted thread 2335 of process 2333 (/usr/bin/pulseaudio).
Dec 6 00:29:19 dhcp1-109 rtkit-daemon[2166]: Successfully demoted thread 2333 of process 2333 (/usr/bin/pulseaudio).
Dec 6 00:29:19 dhcp1-109 rtkit-daemon[2166]: Demoted 3 threads

  • Still system is on and running.

Conclusion:

Reference

  • http://www.win.tue.nl/~aeb/linux/lk/lk-9.html