sched/core: Introduce set_special_state()

[ Upstream commit b5bf9a90bbebffba888c9144c5a8a10317b04064 ] Gaurav reported a perceived problem with TASK_PARKED, which turned out to be a broken wait-loop pattern in __kthread_parkme(), but the reported issue can (and does) in fact happen for states that do not do condition based sleeps. When the 'current->state = TASK_RUNNING' store of a previous (concurrent) try_to_wake_up() collides with the setting of a 'special' sleep state, we can loose the sleep state. Normal condition based wait-loops are immune to this problem, but for sleep states that are not condition based are subject to this problem. There already is a fix for TASK_DEAD. Abstract that and also apply it to TASK_STOPPED and TASK_TRACED, both of which are also without condition based wait-loop. Reported-by: Gaurav Kohli <gkohli@codeaurora.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
author: Peter Zijlstra <peterz@infradead.org> 2018-04-30 14:51:01 +0200
committer: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 2018-06-21 04:02:54 +0900
commit: 0742396317a0a0e41ab47e9de10cd6f16317f049 (patch)
tree: 71ee100f1003e721b08dbe8cc9bc15f113648230 /kernel/sched
parent: a614eaa465f78767cfe6c5580c9741d73a228964 (diff)
1 files changed, 1 insertions, 16 deletions
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8cf36b30a006..f287dcbe8cb2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3374,23 +3374,8 @@ static void __sched notrace __schedule(bool preempt)
 
 void __noreturn do_task_dead(void)
 {
-	/*
-	 * The setting of TASK_RUNNING by try_to_wake_up() may be delayed
-	 * when the following two conditions become true.
-	 *   - There is race condition of mmap_sem (It is acquired by
-	 *     exit_mm()), and
-	 *   - SMI occurs before setting TASK_RUNINNG.
-	 *     (or hypervisor of virtual machine switches to other guest)
-	 *  As a result, we may become TASK_RUNNING after becoming TASK_DEAD
-	 *
-	 * To avoid it, we have to wait for releasing tsk->pi_lock which
-	 * is held by try_to_wake_up()
-	 */
-	raw_spin_lock_irq(&current->pi_lock);
-	raw_spin_unlock_irq(&current->pi_lock);
-
 	/* Causes final put_task_struct in finish_task_switch(): */
-	__set_current_state(TASK_DEAD);
+	set_special_state(TASK_DEAD);
 
 	/* Tell freezer to ignore us: */
 	current->flags |= PF_NOFREEZE;
author	Peter Zijlstra <peterz@infradead.org>	2018-04-30 14:51:01 +0200
committer	Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-06-21 04:02:54 +0900
commit	0742396317a0a0e41ab47e9de10cd6f16317f049 (patch)
tree	71ee100f1003e721b08dbe8cc9bc15f113648230 /kernel/sched
parent	a614eaa465f78767cfe6c5580c9741d73a228964 (diff)