mm, oom_reaper: skip mm structs with mmu notifiers

commit 4d4bbd8526a8fbeb2c090ea360211fceff952383 upstream. Andrea has noticed that the oom_reaper doesn't invalidate the range via mmu notifiers (mmu_notifier_invalidate_range_start/end) and that can corrupt the memory of the kvm guest for example. tlb_flush_mmu_tlbonly already invokes mmu notifiers but that is not sufficient as per Andrea: "mmu_notifier_invalidate_range cannot be used in replacement of mmu_notifier_invalidate_range_start/end. For KVM mmu_notifier_invalidate_range is a noop and rightfully so. A MMU notifier implementation has to implement either ->invalidate_range method or the invalidate_range_start/end methods, not both. And if you implement invalidate_range_start/end like KVM is forced to do, calling mmu_notifier_invalidate_range in common code is a noop for KVM. For those MMU notifiers that can get away only implementing ->invalidate_range, the ->invalidate_range is implicitly called by mmu_notifier_invalidate_range_end(). And only those secondary MMUs that share the same pagetable with the primary MMU (like AMD iommuv2) can get away only implementing ->invalidate_range" As the callback is allowed to sleep and the implementation is out of hand of the MM it is safer to simply bail out if there is an mmu notifier registered. In order to not fail too early make the mm_has_notifiers check under the oom_lock and have a little nap before failing to give the current oom victim some more time to exit. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20170913113427.2291-1-mhocko@kernel.org Fixes: aac453635549 ("mm, oom: introduce oom reaper") Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
author: Michal Hocko <mhocko@suse.com> 2017-10-03 16:14:50 -0700
committer: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 2017-10-12 11:51:19 +0200
commit: 2b8197073a0f2c5d14c4e3ed5934b8f6e51eeeb7 (patch)
tree: f8f9df6a286192765de13013afd2be2eeefe5959 /mm
parent: 8a056a1152707567ecfe6a6f26b8414606150936 (diff)
1 files changed, 16 insertions, 0 deletions
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ec9f11d4f094..d631d251c150 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -37,6 +37,7 @@
 #include <linux/ratelimit.h>
 #include <linux/kthread.h>
 #include <linux/init.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/tlb.h>
 #include "internal.h"
@@ -491,6 +492,21 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
 	}
 
 	/*
+	 * If the mm has notifiers then we would need to invalidate them around
+	 * unmap_page_range and that is risky because notifiers can sleep and
+	 * what they do is basically undeterministic.  So let's have a short
+	 * sleep to give the oom victim some more time.
+	 * TODO: we really want to get rid of this ugly hack and make sure that
+	 * notifiers cannot block for unbounded amount of time and add
+	 * mmu_notifier_invalidate_range_{start,end} around unmap_page_range
+	 */
+	if (mm_has_notifiers(mm)) {
+		up_read(&mm->mmap_sem);
+		schedule_timeout_idle(HZ);
+		goto unlock_oom;
+	}
+
+	/*
 	 * increase mm_users only after we know we will reap something so
 	 * that the mmput_async is called only when we have reaped something
 	 * and delayed __mmput doesn't matter that much
author	Michal Hocko <mhocko@suse.com>	2017-10-03 16:14:50 -0700
committer	Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-10-12 11:51:19 +0200
commit	2b8197073a0f2c5d14c4e3ed5934b8f6e51eeeb7 (patch)
tree	f8f9df6a286192765de13013afd2be2eeefe5959 /mm
parent	8a056a1152707567ecfe6a6f26b8414606150936 (diff)