summaryrefslogtreecommitdiff
path: root/kernel/rcu
AgeCommit message (Collapse)Author
2019-05-31rcuperf: Fix cleanup path for invalid perf_type stringsPaul E. McKenney
[ Upstream commit ad092c027713a68a34168942a5ef422e42e039f4 ] If the specified rcuperf.perf_type is not in the rcu_perf_init() function's perf_ops[] array, rcuperf prints some console messages and then invokes rcu_perf_cleanup() to set state so that a future torture test can run. However, rcu_perf_cleanup() also attempts to end the test that didn't actually start, and in doing so relies on the value of cur_ops, a value that is not particularly relevant in this case. This can result in confusing output or even follow-on failures due to attempts to use facilities that have not been properly initialized. This commit therefore sets the value of cur_ops to NULL in this case and inserts a check near the beginning of rcu_perf_cleanup(), thus avoiding relying on an irrelevant cur_ops value. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-31rcutorture: Fix cleanup path for invalid torture_type stringsPaul E. McKenney
[ Upstream commit b813afae7ab6a5e91b4e16cc567331d9c2ae1f04 ] If the specified rcutorture.torture_type is not in the rcu_torture_init() function's torture_ops[] array, rcutorture prints some console messages and then invokes rcu_torture_cleanup() to set state so that a future torture test can run. However, rcu_torture_cleanup() also attempts to end the test that didn't actually start, and in doing so relies on the value of cur_ops, a value that is not particularly relevant in this case. This can result in confusing output or even follow-on failures due to attempts to use facilities that have not been properly initialized. This commit therefore sets the value of cur_ops to NULL in this case and inserts a check near the beginning of rcu_torture_cleanup(), thus avoiding relying on an irrelevant cur_ops value. Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-23rcu: Do RCU GP kthread self-wakeup from softirq and interruptZhang, Jun
commit 1d1f898df6586c5ea9aeaf349f13089c6fa37903 upstream. The rcu_gp_kthread_wake() function is invoked when it might be necessary to wake the RCU grace-period kthread. Because self-wakeups are normally a useless waste of CPU cycles, if rcu_gp_kthread_wake() is invoked from this kthread, it naturally refuses to do the wakeup. Unfortunately, natural though it might be, this heuristic fails when rcu_gp_kthread_wake() is invoked from an interrupt or softirq handler that interrupted the grace-period kthread just after the final check of the wait-event condition but just before the schedule() call. In this case, a wakeup is required, even though the call to rcu_gp_kthread_wake() is within the RCU grace-period kthread's context. Failing to provide this wakeup can result in grace periods failing to start, which in turn results in out-of-memory conditions. This race window is quite narrow, but it actually did happen during real testing. It would of course need to be fixed even if it was strictly theoretical in nature. This patch does not Cc stable because it does not apply cleanly to earlier kernel versions. Fixes: 48a7639ce80c ("rcu: Make callers awaken grace-period kthread") Reported-by: "He, Bo" <bo.he@intel.com> Co-developed-by: "Zhang, Jun" <jun.zhang@intel.com> Co-developed-by: "He, Bo" <bo.he@intel.com> Co-developed-by: "xiao, jin" <jin.xiao@intel.com> Co-developed-by: Bai, Jie A <jie.a.bai@intel.com> Signed-off: "Zhang, Jun" <jun.zhang@intel.com> Signed-off: "He, Bo" <bo.he@intel.com> Signed-off: "xiao, jin" <jin.xiao@intel.com> Signed-off: Bai, Jie A <jie.a.bai@intel.com> Signed-off-by: "Zhang, Jun" <jun.zhang@intel.com> [ paulmck: Switch from !in_softirq() to "!in_interrupt() && !in_serving_softirq() to avoid redundant wakeups and to also handle the interrupt-handler scenario as well as the softirq-handler scenario that actually occurred in testing. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Link: https://lkml.kernel.org/r/CD6925E8781EFD4D8E11882D20FC406D52A11F61@SHSMSX104.ccr.corp.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-01rcu: Make need_resched() respond to urgent RCU-QS needsPaul E. McKenney
commit 92aa39e9dc77481b90cbef25e547d66cab901496 upstream. The per-CPU rcu_dynticks.rcu_urgent_qs variable communicates an urgent need for an RCU quiescent state from the force-quiescent-state processing within the grace-period kthread to context switches and to cond_resched(). Unfortunately, such urgent needs are not communicated to need_resched(), which is sometimes used to decide when to invoke cond_resched(), for but one example, within the KVM vcpu_run() function. As of v4.15, this can result in synchronize_sched() being delayed by up to ten seconds, which can be problematic, to say nothing of annoying. This commit therefore checks rcu_dynticks.rcu_urgent_qs from within rcu_check_callbacks(), which is invoked from the scheduling-clock interrupt handler. If the current task is not an idle task and is not executing in usermode, a context switch is forced, and either way, the rcu_dynticks.rcu_urgent_qs variable is set to false. If the current task is an idle task, then RCU's dyntick-idle code will detect the quiescent state, so no further action is required. Similarly, if the task is executing in usermode, other code in rcu_check_callbacks() and its called functions will report the corresponding quiescent state. Reported-by: Marius Hillenbrand <mhillenb@amazon.de> Reported-by: David Woodhouse <dwmw2@infradead.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Backported to make patch apply cleanly on older versions. ] Tested-by: Marius Hillenbrand <mhillenb@amazon.de> Cc: <stable@vger.kernel.org> # 4.12.x - 4.19.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30rcu: Call touch_nmi_watchdog() while printing stall warningsTejun Heo
[ Upstream commit 3caa973b7a260e7a2a69edc94c300ab9c65148c3 ] When RCU stall warning triggers, it can print out a lot of messages while holding spinlocks. If the console device is slow (e.g. an actual or IPMI serial console), it may end up triggering NMI hard lockup watchdog like the following.
2018-02-16rcu: Export init_rcu_head() and destroy_rcu_head() to GPL modulesPaul E. McKenney
commit 156baec39732f025dc778e00da95fc10d6e45885 upstream. Use of init_rcu_head() and destroy_rcu_head() from modules results in the following build-time error with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y: ERROR: "init_rcu_head" [drivers/scsi/scsi_mod.ko] undefined! ERROR: "destroy_rcu_head" [drivers/scsi/scsi_mod.ko] undefined! This commit therefore adds EXPORT_SYMBOL_GPL() for each to allow them to be used by GPL-licensed kernel modules. Reported-by: Bart Van Assche <Bart.VanAssche@wdc.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-24rcu: Fix up pending cbs check in rcu_prepare_for_idleNeeraj Upadhyay
commit 135bd1a230bb69a68c9808a7d25467318900b80a upstream. The pending-callbacks check in rcu_prepare_for_idle() is backwards. It should accelerate if there are pending callbacks, but the check rather uselessly accelerates only if there are no callbacks. This commit therefore inverts this check. Fixes: 15fecf89e46a ("srcu: Abstract multi-tail callback list handling") Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-19doc: Fix various RCU docbook comment-header problemsPaul E. McKenney
Because many of RCU's files have not been included into docbook, a number of errors have accumulated. This commit fixes them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-04Merge tag 'trace-v4.14-rc1-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixlets from Steven Rostedt: "Two updates: - A memory fix with left over code from spliting out ftrace_ops and function graph tracer, where the function graph tracer could reset the trampoline pointer, leaving the old trampoline not to be freed (memory leak). - The update to Paul's patch that added the unnecessary READ_ONCE(). This removes the unnecessary READ_ONCE() instead of having to rebase the branch to update the patch that added it" * tag 'trace-v4.14-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: rcu: Remove extraneous READ_ONCE()s from rcu_irq_{enter,exit}() ftrace: Fix kmemleak in unregister_ftrace_graph
2017-10-03rcu: Remove extraneous READ_ONCE()s from rcu_irq_{enter,exit}()Paul E. McKenney
The read of ->dynticks_nmi_nesting in rcu_irq_enter() and rcu_irq_exit() is currently protected with READ_ONCE(). However, this protection is unnecessary because (1) ->dynticks_nmi_nesting is updated only by the current CPU, (2) Although NMI handlers can update this field, they reset it back to its old value before return, and (3) Interrupts are disabled, so nothing else can modify it. The value of ->dynticks_nmi_nesting is thus effectively constant, and so no protection is required. This commit therefore removes the READ_ONCE() protection from these two accesses. Link: http://lkml.kernel.org/r/20170926031902.GA2074@linux.vnet.ibm.com Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-09-25Merge tag 'trace-v4.14-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Stack tracing and RCU has been having issues with each other and lockdep has been pointing out constant problems. The changes have been going into the stack tracer, but it has been discovered that the problem isn't with the stack tracer itself, but it is with calling save_stack_trace() from within the internals of RCU. The stack tracer is the one that can trigger the issue the easiest, but examining the problem further, it could also happen from a WARN() in the wrong place, or even if an NMI happened in this area and it did an rcu_read_lock(). The critical area is where RCU is not watching. Which can happen while going to and from idle, or bringing up or taking down a CPU. The final fix was to put the protection in kernel_text_address() as it is the one that requires RCU to be watching while doing the stack trace. To make this work properly, Paul had to allow rcu_irq_enter() happen after rcu_nmi_enter(). This should have been done anyway, since an NMI can page fault (reading vmalloc area), and a page fault triggers rcu_irq_enter(). One patch is just a consolidation of code so that the fix only needed to be done in one location" * tag 'trace-v4.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Remove RCU work arounds from stack tracer extable: Enable RCU if it is not watching in kernel_text_address() extable: Consolidate *kernel_text_address() functions rcu: Allow for page faults in NMI handlers
2017-09-23rcu: Allow for page faults in NMI handlersPaul E. McKenney
A number of architecture invoke rcu_irq_enter() on exception entry in order to allow RCU read-side critical sections in the exception handler when the exception is from an idle or nohz_full CPU. This works, at least unless the exception happens in an NMI handler. In that case, rcu_nmi_enter() would already have exited the extended quiescent state, which would mean that rcu_irq_enter() would (incorrectly) cause RCU to think that it is again in an extended quiescent state. This will in turn result in lockdep splats in response to later RCU read-side critical sections. This commit therefore causes rcu_irq_enter() and rcu_irq_exit() to take no action if there is an rcu_nmi_enter() in effect, thus avoiding the unscheduled return to RCU quiescent state. This in turn should make the kernel safe for on-demand RCU voyeurism. Link: http://lkml.kernel.org/r/20170922211022.GA18084@linux.vnet.ibm.com Cc: stable@vger.kernel.org Fixes: 0be964be0 ("module: Sanitize RCU usage and locking") Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-09-08treewide: make "nr_cpu_ids" unsignedAlexey Dobriyan
First, number of CPUs can't be negative number. Second, different signnnedness leads to suboptimal code in the following cases: 1) kmalloc(nr_cpu_ids * sizeof(X)); "int" has to be sign extended to size_t. 2) while (loff_t *pos < nr_cpu_ids) MOVSXD is 1 byte longed than the same MOV. Other cases exist as well. Basically compiler is told that nr_cpu_ids can't be negative which can't be deduced if it is "int". Code savings on allyesconfig kernel: -3KB add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370) function old new delta coretemp_cpu_online 450 512 +62 rcu_init_one 1234 1272 +38 pci_device_probe 374 399 +25 ... pgdat_reclaimable_pages 628 556 -72 select_fallback_rq 446 369 -77 task_numa_find_cpu 1923 1807 -116 Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-17Merge branches 'doc.2017.08.17a', 'fixes.2017.08.17a', ↵Paul E. McKenney
'hotplug.2017.07.25b', 'misc.2017.08.17a', 'spin_unlock_wait_no.2017.08.17a', 'srcu.2017.07.27c' and 'torture.2017.07.24c' into HEAD doc.2017.08.17a: Documentation updates. fixes.2017.08.17a: RCU fixes. hotplug.2017.07.25b: CPU-hotplug updates. misc.2017.08.17a: Miscellaneous fixes outside of RCU (give or take conflicts). spin_unlock_wait_no.2017.08.17a: Remove spin_unlock_wait(). srcu.2017.07.27c: SRCU updates. torture.2017.07.24c: Torture-test updates.
2017-08-17rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter()Paul E. McKenney
The rcu_idle_exit() and rcu_idle_enter() functions are exported because they were originally used by RCU_NONIDLE(), which was intended to be usable from modules. However, RCU_NONIDLE() now instead uses rcu_irq_enter_irqson() and rcu_irq_exit_irqson(), which are not exported, and there have been no complaints. This commit therefore removes the exports from rcu_idle_exit() and rcu_idle_enter(). Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17rcu: Add warning to rcu_idle_enter() for irqs enabledPaul E. McKenney
All current callers of rcu_idle_enter() have irqs disabled, and rcu_idle_enter() relies on this, but doesn't check. This commit therefore adds a RCU_LOCKDEP_WARN() to add some verification to the trust. While we are there, pass "true" rather than "1" to rcu_eqs_enter(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17rcu: Make rcu_idle_enter() rely on callers disabling irqsPeter Zijlstra (Intel)
All callers to rcu_idle_enter() have irqs disabled, so there is no point in rcu_idle_enter disabling them again. This commit therefore replaces the irq disabling with a RCU_LOCKDEP_WARN(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17rcu: Add assertions verifying blocked-tasks listPaul E. McKenney
This commit adds assertions verifying the consistency of the rcu_node structure's ->blkd_tasks list and its ->gp_tasks, ->exp_tasks, and ->boost_tasks pointers. In particular, the ->blkd_tasks lists must be empty except for leaf rcu_node structures. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit()Masami Hiramatsu
Set disable_rcu_irq_enter on not only rcu_eqs_enter_common() but also rcu_eqs_exit(), since rcu_eqs_exit() suffers from the same issue as was fixed for rcu_eqs_enter_common() by commit 03ecd3f48e57 ("rcu/tracing: Add rcu_disabled to denote when rcu_irq_enter() will not work"). Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17rcu: Add TPS() protection for _rcu_barrier_trace stringsPaul E. McKenney
The _rcu_barrier_trace() function is a wrapper for trace_rcu_barrier(), which needs TPS() protection for strings passed through the second argument. However, it has escaped prior TPS()-ification efforts because it _rcu_barrier_trace() does not start with "trace_". This commit therefore adds the needed TPS() protection Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-08-17rcu: Use idle versions of swait to make idle-hack clearLuis R. Rodriguez
These RCU waits were set to use interruptible waits to avoid the kthreads contributing to system load average, even though they are not interruptible as they are spawned from a kthread. Use the new TASK_IDLE swaits which makes our goal clear, and removes confusion about these paths possibly being interruptible -- they are not. When the system is idle the RCU grace-period kthread will spend all its time blocked inside the swait_event_interruptible(). If the interruptible() was not used, then this kthread would contribute to the load average. This means that an idle system would have a load average of 2 (or 3 if PREEMPT=y), rather than the load average of 0 that almost fifty years of UNIX has conditioned sysadmins to expect. The same argument applies to swait_event_interruptible_timeout() use. The RCU grace-period kthread spends its time blocked inside this call while waiting for grace periods to complete. In particular, if there was only one busy CPU, but that CPU was frequently invoking call_rcu(), then the RCU grace-period kthread would spend almost all its time blocked inside the swait_event_interruptible_timeout(). This would mean that the load average would be 2 rather than the expected 1 for the single busy CPU. Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17rcu: Add event tracing to ->gp_tasks update at GP startPaul E. McKenney
There is currently event tracing to track when a task is preempted within a preemptible RCU read-side critical section, and also when that task subsequently reaches its outermost rcu_read_unlock(), but none indicating when a new grace period starts when that grace period must wait on pre-existing readers that have been been preempted at least once since the beginning of their current RCU read-side critical sections. This commit therefore adds an event trace at grace-period start in the case where there are such readers. Note that only the first reader in the list is traced. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-08-17rcu: Move rcu.h to new trivial-function stylePaul E. McKenney
This commit saves a few lines in kernel/rcu/rcu.h by moving to single-line definitions for trivial functions, instead of the old style where the two curly braces each get their own line. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17rcu: Add TPS() to event-traced stringsPaul E. McKenney
Strings used in event tracing need to be specially handled, for example, using the TPS() macro. Without the TPS() macro, although output looks fine from within a running kernel, extracting traces from a crash dump produces garbage instead of strings. This commit therefore adds the TPS() macro to some unadorned strings that were passed to event-tracing macros. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-08-17rcu: Create reasonable API for do_exit() TASKS_RCU processingPaul E. McKenney
Currently, the exit-time support for TASKS_RCU is open-coded in do_exit(). This commit creates exit_tasks_rcu_start() and exit_tasks_rcu_finish() APIs for do_exit() use. This has the benefit of confining the use of the tasks_rcu_exit_srcu variable to one file, allowing it to become static. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17rcu: Drive TASKS_RCU directly off of PREEMPTPaul E. McKenney
The actual use of TASKS_RCU is only when PREEMPT, otherwise RCU-sched is used instead. This commit therefore makes synchronize_rcu_tasks() and call_rcu_tasks() available always, but mapped to synchronize_sched() and call_rcu_sched(), respectively, when !PREEMPT. This approach also allows some #ifdefs to be removed from rcutorture. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org>
2017-07-27srcu: Provide ordering for CPU not involved in grace periodPaul E. McKenney
Tree RCU guarantees that every online CPU has a memory barrier between any given grace period and any of that CPU's RCU read-side sections that must be ordered against that grace period. Since RCU doesn't always know where read-side critical sections are, the actual implementation guarantees order against prior and subsequent non-idle non-offline code, whether in an RCU read-side critical section or not. As a result, there does not need to be a memory barrier at the end of synchronize_rcu() and friends because the ordering internal to the grace period has ordered every CPU's post-grace-period execution against each CPU's pre-grace-period execution, again for all non-idle online CPUs. In contrast, SRCU can have non-idle online CPUs that are completely uninvolved in a given SRCU grace period, for example, a CPU that never runs any SRCU read-side critical sections and took no part in the grace-period processing. It is in theory possible for a given synchronize_srcu()'s wakeup to be delivered to a CPU that was completely uninvolved in the prior SRCU grace period, which could mean that the code following that synchronize_srcu() would end up being unordered with respect to both the grace period and any pre-existing SRCU read-side critical sections. This commit therefore adds an smp_mb() to the end of __synchronize_srcu(), which prevents this scenario from occurring. Reported-by: Lance Roy <ldr709@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Lance Roy <ldr709@gmail.com> Cc: <stable@vger.kernel.org> # 4.12.x
2017-07-25rcu: Move callback-list warning to irq-disable regionPaul E. McKenney
After adopting callbacks from a newly offlined CPU, the adopting CPU checks to make sure that its callback list's count is zero only if the list has no callbacks and vice versa. Unfortunately, it does so after enabling interrupts, which means that false positives are possible due to interrupt handlers invoking call_rcu(). Although these false positives are improbable, rcutorture did make it happen once. This commit therefore moves this check to an irq-disabled region of code, thus suppressing the false positive. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25rcu: Remove unused RCU list functionsPaul E. McKenney
Given changes to callback migration, rcu_cblist_head(), rcu_cblist_tail(), rcu_cblist_count_cbs(), rcu_segcblist_segempty(), rcu_segcblist_dequeued_lazy(), and rcu_segcblist_new_cbs() are no longer used. This commit therefore removes them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25rcu: Localize rcu_state ->orphan_pend and ->orphan_donePaul E. McKenney
Given that the rcu_state structure's >orphan_pend and ->orphan_done fields are used only during migration of callbacks from the recently offlined CPU to a surviving CPU, if rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() are combined, these fields can become local variables in the combined function. This commit therefore combines rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() into a new rcu_segcblist_merge() function and removes the ->orphan_pend and ->orphan_done fields. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25rcu: Advance callbacks after migrationPaul E. McKenney
When migrating callbacks from a newly offlined CPU, we are already holding the root rcu_node structure's lock, so it costs almost nothing to advance and accelerate the newly migrated callbacks. This patch therefore makes this advancing and acceleration happen. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25rcu: Eliminate rcu_state ->orphan_lockPaul E. McKenney
The ->orphan_lock is acquired and released only within the rcu_migrate_callbacks() function, which now acquires the root rcu_node structure's ->lock. This commit therefore eliminates the ->orphan_lock in favor of the root rcu_node structure's ->lock. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25rcu: Advance outgoing CPU's callbacks before migrating themPaul E. McKenney
It is possible that the outgoing CPU is unaware of recent grace periods, and so it is also possible that some of its pending callbacks are actually ready to be invoked. The current callback-migration code would needlessly force these callbacks to pass through another grace period. This commit therefore invokes rcu_advance_cbs() on the outgoing CPU's callbacks in order to give them full credit for having passed through any recent grace periods. This also fixes an odd theoretical bug where there are no callbacks in the system except for those on the outgoing CPU, none of those callbacks have yet been associated with a grace-period number, there is never again another callback registered, and the surviving CPU never again takes a scheduling-clock interrupt, never goes idle, and never enters nohz_full userspace execution. Yes, this is (just barely) possible. It requires that the surviving CPU be a nohz_full CPU, that its scheduler-clock interrupt be shut off, and that it loop forever in the kernel. You get bonus points if you can make this one happen! ;-) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25rcu: Make NOCB CPUs migrate CBs directly from outgoing CPUPaul E. McKenney
RCU's CPU-hotplug callback-migration code first moves the outgoing CPU's callbacks to ->orphan_done and ->orphan_pend, and only then moves them to the NOCB callback list. This commit avoids the extra step (and simplifies the code) by moving the callbacks directly from the outgoing CPU's callback list to the NOCB callback list. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25rcu: Check for NOCB CPUs and empty lists earlier in CB migrationPaul E. McKenney
The current CPU-hotplug RCU-callback-migration code checks for the source (newly offlined) CPU being a NOCBs CPU down in rcu_send_cbs_to_orphanage(). This commit simplifies callback migration a bit by moving this check up to rcu_migrate_callbacks(). This commit also adds a check for the source CPU having no callbacks, which eases analysis of the rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() functions. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25rcu: Remove orphan/adopt event-tracing fieldsPaul E. McKenney
The rcu_node structure's ->n_cbs_orphaned and ->n_cbs_adopted fields are updated, but never read. This commit therefore removes them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25rcu: Make expedited GPs correctly handle hardware CPU insertionPaul E. McKenney
The update of the ->expmaskinitnext and of ->ncpus are unsynchronized, with the value of ->ncpus being incremented long before the corresponding ->expmaskinitnext mask is updated. If an RCU expedited grace period sees ->ncpus change, it will update the ->expmaskinit masks from the new ->expmaskinitnext masks. But it is possible that ->ncpus has already been updated, but the ->expmaskinitnext masks still have their old values. For the current expedited grace period, no harm done. The CPU could not have been online before the grace period started, so there is no need to wait for its non-existent pre-existing readers. But the next RCU expedited grace period is in a world of hurt. The value of ->ncpus has already been updated, so this grace period will assume that the ->expmaskinitnext masks have not changed. But they have, and they won't be taken into account until the next never-been-online CPU comes online. This means that RCU will be ignoring some CPUs that it should be paying attention to. The solution is to update ->ncpus and ->expmaskinitnext while holding the ->lock for the rcu_node structure containing the ->expmaskinitnext mask. Because smp_store_release() is now used to update ->ncpus and smp_load_acquire() is now used to locklessly read it, if the expedited grace period sees ->ncpus change, then the updating CPU has to already be holding the corresponding ->lock. Therefore, when the expedited grace period later acquires that ->lock, it is guaranteed to see the new value of ->expmaskinitnext. On the other hand, if the expedited grace period loads ->ncpus just before an update, earlier full memory barriers guarantee that the incoming CPU isn't far enough along to be running any RCU readers. This commit therefore makes the required change. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25rcu: Migrate callbacks earlier in the CPU-offline timelinePaul E. McKenney
RCU callbacks must be migrated away from an outgoing CPU, and this is done near the end of the CPU-hotplug operation, after the outgoing CPU is long gone. Unfortunately, this means that other CPU-hotplug callbacks can execute while the outgoing CPU's callbacks are still immobilized on the long-gone CPU's callback lists. If any of these CPU-hotplug callbacks must wait, either directly or indirectly, for the invocation of any of the immobilized RCU callbacks, the system will hang. This commit avoids such hangs by migrating the callbacks away from the outgoing CPU immediately upon its departure, shortly after the return from __cpu_die() in takedown_cpu(). Thus, RCU is able to advance these callbacks and invoke them, which allows all the after-the-fact CPU-hotplug callbacks to wait on these RCU callbacks without risk of a hang. While in the neighborhood, this commit also moves rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() under a pre-existing #ifdef to avoid including dead code on the one hand and to avoid define-without-use warnings on the other hand. Reported-by: Jeffrey Hugo <jhugo@codeaurora.org> Link: http://lkml.kernel.org/r/db9c91f6-1b17-6136-84f0-03c3c2581ab4@codeaurora.org Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Richard Weinberger <richard@nod.at>
2017-07-25rcu: Use timer as backstop for NOCB deferred wakeupsPaul E. McKenney
The handling of RCU's no-CBs CPUs has a maintenance headache, namely that if call_rcu() is invoked with interrupts disabled, the rcuo kthread wakeup must be defered to a point where we can be sure that scheduler locks are not held. Of course, there are a lot of code paths leading from an interrupts-disabled invocation of call_rcu(), and missing any one of these can result in excessive callback-invocation latency, and potentially even system hangs. This commit therefore uses a timer to guarantee that the wakeup will eventually occur. If one of the deferred-wakeup points kicks in, then the timer is simply cancelled. This commit also fixes up an incomplete removal of commits that were intended to plug remaining exit paths, which should have the added benefit of reducing the overhead of RCU's context-switch hooks. In addition, it simplifies leader-to-follower callback-list handoff by introducing locking. The call_rcu()-to-leader handoff continues to use atomic operations in order to maintain good real-time latency for common-case use of call_rcu(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Dan Carpenter fix for mod_timer() usage bug found by smatch. ]
2017-07-24rcutorture: Invoke call_rcu() from timer handlerPaul E. McKenney
The Linux kernel invokes call_rcu() from various interrupt/softirq handlers, but rcutorture does not. This commit therefore adds this behavior to rcutorture's repertoire. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24rcu: Add last-CPU to GP-kthread starvation messagesPaul E. McKenney
This commit augments the grace-period-kthread starvation debugging messages by adding the last CPU that ran the kthread. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24rcutorture: Eliminate unused ts_rem local from rcu_trace_clock_local()Paul E. McKenney
This commit removes an unused local variable named ts_rem that is marked __maybe_unused. Yes, the variable was assigned to, but it was never used beyond that point, hence not needed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24rcutorture: Add task's CPU for rcutorture writer stallsPaul E. McKenney
It appears that at least some of the rcutorture writer stall messages coincide with unusually long CPU-online operations, for example, no fewer than 205 seconds in a recent test. It is of course possible that the writer stall is not unrelated to this unusually long CPU-hotplug operation, and so this commit adds the rcutorture writer task's CPU to the stall message to gain more information about this possible connection. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24rcutorture: Place event-traced strings into trace bufferPaul E. McKenney
Strings used in event tracing need to be specially handled, for example, being copied to the trace buffer instead of being pointed to by the trace buffer. Although the TPS() macro can be used to "launder" pointed-to strings, this might not be all that effective within a loadable module. This commit therefore copies rcutorture's strings to the trace buffer. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2017-07-24rcutorture: Enable SRCU readers from timer handlerPaul E. McKenney
Now that it is legal to invoke srcu_read_lock() and srcu_read_unlock() for a given srcu_struct from both process context and {soft,}irq handlers, it is time to test it. This commit therefore enables testing of SRCU readers from rcutorture's timer handler, using in_task() to determine whether or not it is safe to sleep in the SRCU read-side critical sections. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24rcu: Remove CONFIG_TASKS_RCU ifdef from rcuperf.cPaul E. McKenney
The synchronize_rcu_tasks() and call_rcu_tasks() APIs are now available regardless of kernel configuration, so this commit removes the CONFIG_TASKS_RCU ifdef from rcuperf.c. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24rcutorture: Print SRCU lock/unlock totalsPaul E. McKenney
This commit adds printing of SRCU lock/unlock totals, which are just the sums of the per-CPU counts. Saves a bit of mental arithmetic. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24rcutorture: Move SRCU status printing to SRCU implementationsPaul E. McKenney
This commit gets rid of some ugly #ifdefs in rcutorture.c by moving the SRCU status printing to the SRCU implementations. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24srcu: Make process_srcu() be staticPaul E. McKenney
The function process_srcu() is not invoked outside of srcutree.c, so this commit makes it static and drops the EXPORT_SYMBOL_GPL(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>