Browse Source

memcg: handle panic_on_oom=always case

Presently, if panic_on_oom=2, the whole system panics even if the oom
happend in some special situation (as cpuset, mempolicy....).  Then,
panic_on_oom=2 means painc_on_oom_always.

Now, memcg doesn't check panic_on_oom flag. This patch adds a check.

BTW, how it's useful ?

kdump+panic_on_oom=2 is the last tool to investigate what happens in
oom-ed system.  When a task is killed, the sysytem recovers and there will
be few hint to know what happnes.  In mission critical system, oom should
never happen.  Then, panic_on_oom=2+kdump is useful to avoid next OOM by
knowing precise information via snapshot.

TODO:
 - For memcg, it's for isolate system's memory usage, oom-notiifer and
   freeze_at_oom (or rest_at_oom) should be implemented. Then, management
   daemon can do similar jobs (as kdump) or taking snapshot per cgroup.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki 15 years ago
parent
commit
daaf1e6887
3 changed files with 10 additions and 2 deletions
  1. 4 1
      Documentation/cgroups/memory.txt
  2. 4 1
      Documentation/sysctl/vm.txt
  3. 2 0
      mm/oom_kill.c

+ 4 - 1
Documentation/cgroups/memory.txt

@@ -182,6 +182,8 @@ list.
 NOTE: Reclaim does not work for the root cgroup, since we cannot set any
 NOTE: Reclaim does not work for the root cgroup, since we cannot set any
 limits on the root cgroup.
 limits on the root cgroup.
 
 
+Note2: When panic_on_oom is set to "2", the whole system will panic.
+
 2. Locking
 2. Locking
 
 
 The memory controller uses the following hierarchy
 The memory controller uses the following hierarchy
@@ -379,7 +381,8 @@ The feature can be disabled by
 NOTE1: Enabling/disabling will fail if the cgroup already has other
 NOTE1: Enabling/disabling will fail if the cgroup already has other
 cgroups created below it.
 cgroups created below it.
 
 
-NOTE2: This feature can be enabled/disabled per subtree.
+NOTE2: When panic_on_oom is set to "2", the whole system will panic in
+case of an oom event in any cgroup.
 
 
 7. Soft limits
 7. Soft limits
 
 

+ 4 - 1
Documentation/sysctl/vm.txt

@@ -573,11 +573,14 @@ Because other nodes' memory may be free. This means system total status
 may be not fatal yet.
 may be not fatal yet.
 
 
 If this is set to 2, the kernel panics compulsorily even on the
 If this is set to 2, the kernel panics compulsorily even on the
-above-mentioned.
+above-mentioned. Even oom happens under memory cgroup, the whole
+system panics.
 
 
 The default value is 0.
 The default value is 0.
 1 and 2 are for failover of clustering. Please select either
 1 and 2 are for failover of clustering. Please select either
 according to your policy of failover.
 according to your policy of failover.
+panic_on_oom=2+kdump gives you very strong tool to investigate
+why oom happens. You can get snapshot.
 
 
 =============================================================
 =============================================================
 
 

+ 2 - 0
mm/oom_kill.c

@@ -473,6 +473,8 @@ void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask)
 	unsigned long points = 0;
 	unsigned long points = 0;
 	struct task_struct *p;
 	struct task_struct *p;
 
 
+	if (sysctl_panic_on_oom == 2)
+		panic("out of memory(memcg). panic_on_oom is selected.\n");
 	read_lock(&tasklist_lock);
 	read_lock(&tasklist_lock);
 retry:
 retry:
 	p = select_bad_process(&points, mem);
 	p = select_bad_process(&points, mem);