mm, vmscan: add cond_resched() into shrink_node_memcg()

Boris Zhmurov has reported RCU stalls during the kswapd reclaim: INFO: rcu_sched detected stalls on CPUs/tasks: 23-...: (22 ticks this GP) idle=92f/140000000000000/0 softirq=2638404/2638404 fqs=23 (detected by 4, t=6389 jiffies, g=786259, c=786258, q=42115) Task dump for CPU 23: kswapd1 R running task 0 148 2 0x00000008 Call Trace: shrink_node+0xd2/0x2f0 kswapd+0x2cb/0x6a0 mem_cgroup_shrink_node+0x160/0x160 kthread+0xbd/0xe0 __switch_to+0x1fa/0x5c0 ret_from_fork+0x1f/0x40 kthread_create_on_node+0x180/0x180 a closer code inspection has shown that we might indeed miss all the scheduling points in the reclaim path if no pages can be isolated from the LRU list. This is a pathological case but other reports from Donald Buczek have shown that we might indeed hit such a path: clusterd-989 [009] .... 118023.654491: mm_vmscan_direct_reclaim_end: nr_reclaimed=193 kswapd1-86 [001] dN.. 118023.987475: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239830 nr_taken=0 file=1 kswapd1-86 [001] dN.. 118024.320968: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239844 nr_taken=0 file=1 kswapd1-86 [001] dN.. 118024.654375: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239858 nr_taken=0 file=1 kswapd1-86 [001] dN.. 118024.987036: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239872 nr_taken=0 file=1 kswapd1-86 [001] dN.. 118025.319651: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239886 nr_taken=0 file=1 kswapd1-86 [001] dN.. 118025.652248: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239900 nr_taken=0 file=1 kswapd1-86 [001] dN.. 118025.984870: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239914 nr_taken=0 file=1 [...] kswapd1-86 [001] dN.. 118084.274403: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4241133 nr_taken=0 file=1 this is minute long snapshot which didn't take a single page from the LRU. It is not entirely clear why only 1303 pages have been scanned during that time (maybe there was a heavy IRQ activity interfering). In any case it looks like we can really hit long periods without scheduling on non preemptive kernels so an explicit cond_resched() in shrink_node_memcg which is independent on the reclaim operation is due. Link: http://lkml.kernel.org/r/20161202095841.16648-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Boris Zhmurov <bb@kernelpanic.ru> Tested-by: Boris Zhmurov <bb@kernelpanic.ru> Reported-by: Donald Buczek <buczek@molgen.mpg.de> Reported-by: "Christopher S. Aker" <caker@theshore.net> Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
author: Michal Hocko <mhocko@suse.com> 2016-12-02 17:26:48 -0800
committer: Linus Torvalds <torvalds@linux-foundation.org> 2016-12-02 18:48:03 -0800
commit: bd041733c9f612b66c519e5a8b1a98b05b94ed24 (patch)
tree: 38be7a4d2b223f3b7091fc989191bee25ed29bf9
parent: 20ab67a563f5299c09a234164c372aba5a59add8 (diff)
1 files changed, 2 insertions, 0 deletions
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 76fda2268148..d75cdf360730 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2354,6 +2354,8 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc
 			}
 		}
 
+		cond_resched();
+
 		if (nr_reclaimed < nr_to_reclaim || scan_adjusted)
 			continue;
author	Michal Hocko <mhocko@suse.com>	2016-12-02 17:26:48 -0800
committer	Linus Torvalds <torvalds@linux-foundation.org>	2016-12-02 18:48:03 -0800
commit	bd041733c9f612b66c519e5a8b1a98b05b94ed24 (patch)
tree	38be7a4d2b223f3b7091fc989191bee25ed29bf9
parent	20ab67a563f5299c09a234164c372aba5a59add8 (diff)