summaryrefslogtreecommitdiff
path: root/kernel/printk
AgeCommit message (Collapse)Author
2019-11-28printk: fix integer overflow in setup_log_buf()Sergey Senozhatsky
[ Upstream commit d2130e82e9454304e9b91ba9da551b5989af8c27 ] The way we calculate logbuf free space percentage overflows signed integer: int free; free = __LOG_BUF_LEN - log_next_idx; pr_info("early log buf free: %u(%u%%)\n", free, (free * 100) / __LOG_BUF_LEN); We support LOG_BUF_LEN of up to 1<<25 bytes. Since setup_log_buf() is called during early init, logbuf is mostly empty, so __LOG_BUF_LEN - log_next_idx is close to 1<<25. Thus when we multiply it by 100, we overflow signed integer value range: 100 is 2^6 + 2^5 + 2^2. Example, booting with LOG_BUF_LEN 1<<25 and log_buf_len=2G boot param: [ 0.075317] log_buf_len: -2147483648 bytes [ 0.075319] early log buf free: 33549896(-28%) Make "free" unsigned integer and use appropriate printk() specifier. Link: http://lkml.kernel.org/r/20181010113308.9337-1-sergey.senozhatsky@gmail.com To: Steven Rostedt <rostedt@goodmis.org> Cc: linux-kernel@vger.kernel.org Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-25printk: Give error on attempt to set log buffer length to over 2GHe Zhe
[ Upstream commit e6fe3e5b7d16e8f146a4ae7fe481bc6e97acde1e ] The current printk() is ready to handle log buffer size up to 2G. Give an explicit error for users who want to use larger log buffer. Also fix printk formatting to show the 2G as a positive number. Link: http://lkml.kernel.org/r/20181008135916.gg4kkmoki5bgtco5@pathway.suse.cz Cc: rostedt@goodmis.org Cc: linux-kernel@vger.kernel.org Suggested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: He Zhe <zhe.he@windriver.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> [pmladek: Fixed to the really safe limit 2GB.] Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05printk: Do not lose last line in kmsg buffer dumpVincent Whitchurch
[ Upstream commit b46eff55ad5bd98e746c0a7022fe7ee071de5fee ] kmsg_dump_get_buffer() is supposed to select all the youngest log messages which fit into the provided buffer. It determines the correct start index by using msg_print_text() with a NULL buffer to calculate the size of each entry. However, when performing the actual writes, msg_print_text() only writes the entry to the buffer if the written len is lesser than the size of the buffer. So if the lengths of the selected youngest log messages happen to precisely fill up the provided buffer, the last log message is not included. We don't want to modify msg_print_text() to fill up the buffer and start returning a length which is equal to the size of the buffer, since callers of its other users, such as kmsg_dump_get_line(), depend upon the current behaviour. Instead, fix kmsg_dump_get_buffer() to compensate for this. For example, with the following two final prints: [ 6.427502] AAAAAAAAAAAAA [ 6.427769] BBBBBBBB12345 A dump of a 64-byte buffer filled by kmsg_dump_get_buffer(), before this patch: 00000000: 3c 30 3e 5b 20 20 20 20 36 2e 35 32 32 31 39 37 <0>[ 6.522197 00000010: 5d 20 41 41 41 41 41 41 41 41 41 41 41 41 41 0a ] AAAAAAAAAAAAA. 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ After this patch: 00000000: 3c 30 3e 5b 20 20 20 20 36 2e 34 35 36 36 37 38 <0>[ 6.456678 00000010: 5d 20 42 42 42 42 42 42 42 42 31 32 33 34 35 0a ] BBBBBBBB12345. 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Link: http://lkml.kernel.org/r/20190711142937.4083-1-vincent.whitchurch@axis.com Fixes: e2ae715d66bf4bec ("kmsg - kmsg_dump() use iterator to receive log buffer content") To: rostedt@goodmis.org Cc: linux-kernel@vger.kernel.org Cc: <stable@vger.kernel.org> # v3.5+ Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05printk: remove games with previous record flagsLinus Torvalds
[ Upstream commit 5aa068ea4082b39eafc356c27c9ecd155b0895f6 ] The record logging code looks at the previous record flags in various ways, and they are all wrong. You can't use the previous record flags to determine anything about the next record, because they may simply not be related. In particular, the reason the previous record was a continuation record may well be exactly _because_ the new record was printed by a different process, which is why the previous record was flushed. So all those games are simply wrong, and make the code hard to understand (because the code fundamentally cdoes not make sense). So remove it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-13printk: Fix panic caused by passing log_buf_len to command lineHe Zhe
commit 277fcdb2cfee38ccdbe07e705dbd4896ba0c9930 upstream. log_buf_len_setup does not check input argument before passing it to simple_strtoull. The argument would be a NULL pointer if "log_buf_len", without its value, is set in command line and thus causes the following panic. PANIC: early exception 0xe3 IP 10:ffffffffaaeacd0d error 0 cr2 0x0 [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc4-yocto-standard+ #1 [ 0.000000] RIP: 0010:_parse_integer_fixup_radix+0xd/0x70 ... [ 0.000000] Call Trace: [ 0.000000] simple_strtoull+0x29/0x70 [ 0.000000] memparse+0x26/0x90 [ 0.000000] log_buf_len_setup+0x17/0x22 [ 0.000000] do_early_param+0x57/0x8e [ 0.000000] parse_args+0x208/0x320 [ 0.000000] ? rdinit_setup+0x30/0x30 [ 0.000000] parse_early_options+0x29/0x2d [ 0.000000] ? rdinit_setup+0x30/0x30 [ 0.000000] parse_early_param+0x36/0x4d [ 0.000000] setup_arch+0x336/0x99e [ 0.000000] start_kernel+0x6f/0x4ee [ 0.000000] x86_64_start_reservations+0x24/0x26 [ 0.000000] x86_64_start_kernel+0x6f/0x72 [ 0.000000] secondary_startup_64+0xa4/0xb0 This patch adds a check to prevent the panic. Link: http://lkml.kernel.org/r/1538239553-81805-1-git-send-email-zhe.he@windriver.com Cc: stable@vger.kernel.org Cc: rostedt@goodmis.org Cc: linux-kernel@vger.kernel.org Signed-off-by: He Zhe <zhe.he@windriver.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-09printk/tracing: Do not trace printk_nmi_enter()Steven Rostedt (VMware)
commit d1c392c9e2a301f38998a353f467f76414e38725 upstream. I hit the following splat in my tests: ------------[ cut here ]------------ IRQs not enabled as expected WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 tick_nohz_idle_enter+0x44/0x8c Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 EIP: tick_nohz_idle_enter+0x44/0x8c Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00 75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff <0f> 0b 58 fa bb a0 e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b EAX: 0000001c EBX: f26e7f8c ECX: 00000006 EDX: 00000007 ESI: f26dd1c0 EDI: 00000000 EBP: f26e7f40 ESP: f26e7f38 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296 CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0 Call Trace: do_idle+0x33/0x202 cpu_startup_entry+0x61/0x63 start_secondary+0x18e/0x1ed startup_32_smp+0x164/0x168 irq event stamp: 18773830 hardirqs last enabled at (18773829): [<c040150c>] trace_hardirqs_on_thunk+0xc/0x10 hardirqs last disabled at (18773830): [<c040151c>] trace_hardirqs_off_thunk+0xc/0x10 softirqs last enabled at (18773824): [<c0ddaa6f>] __do_softirq+0x25f/0x2bf softirqs last disabled at (18773767): [<c0416bbe>] call_on_stack+0x45/0x4b ---[ end trace b7c64aa79e17954a ]--- After a bit of debugging, I found what was happening. This would trigger when performing "perf" with a high NMI interrupt rate, while enabling and disabling function tracer. Ftrace uses breakpoints to convert the nops at the start of functions to calls to the function trampolines. The breakpoint traps disable interrupts and this makes calls into lockdep via the trace_hardirqs_off_thunk in the entry.S code. What happens is the following: do_idle { [interrupts enabled] <interrupt> [interrupts disabled] TRACE_IRQS_OFF [lockdep says irqs off] [...] TRACE_IRQS_IRET test if pt_regs say return to interrupts enabled [yes] TRACE_IRQS_ON [lockdep says irqs are on] <nmi> nmi_enter() { printk_nmi_enter() [traced by ftrace] [ hit ftrace breakpoint ] <breakpoint exception> TRACE_IRQS_OFF [lockdep says irqs off] [...] TRACE_IRQS_IRET [return from breakpoint] test if pt_regs say interrupts enabled [no] [iret back to interrupt] [iret back to code] tick_nohz_idle_enter() { lockdep_assert_irqs_enabled() [lockdep say no!] Although interrupts are indeed enabled, lockdep thinks it is not, and since we now do asserts via lockdep, it gives a false warning. The issue here is that printk_nmi_enter() is called before lockdep_off(), which disables lockdep (for this reason) in NMIs. By simply not allowing ftrace to see printk_nmi_enter() (via notrace annotation) we keep lockdep from getting confused. Cc: stable@vger.kernel.org Fixes: 42a0bb3f71383 ("printk/nmi: generic solution for safe printk in NMI") Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03printk: fix possible reuse of va_list variableTetsuo Handa
commit 988a35f8da1dec5a8cd2788054d1e717be61bf25 upstream. I noticed that there is a possibility that printk_safe_log_store() causes kernel oops because "args" parameter is passed to vsnprintf() again when atomic_cmpxchg() detected that we raced. Fix this by using va_copy(). Link: http://lkml.kernel.org/r/201805112002.GIF21216.OFVHFOMLJtQFSO@I-love.SAKURA.ne.jp Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: dvyukov@google.com Cc: syzkaller@googlegroups.com Cc: fengguang.wu@intel.com Cc: linux-kernel@vger.kernel.org Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 42a0bb3f71383b45 ("printk/nmi: generic solution for safe printk in NMI") Cc: 4.7+ <stable@vger.kernel.org> # v4.7+ Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22printk: Correctly handle preemption in console_unlock()Petr Mladek
[ Upstream commit 257ab443118bffc7fdcef38f49cf59be68a3e362 ] Some console drivers code calls console_conditional_schedule() that looks at @console_may_schedule. The value must be cleared when the drivers are called from console_unlock() with interrupts disabled. But rescheduling is fine when the same code is called, for example, from tty operations where the console semaphore is taken via console_lock(). This is why @console_may_schedule is cleared before calling console drivers. The original value is stored to decide if we could sleep between lines. Now, @console_may_schedule is not cleared when we call console_trylock() and jump back to the "again" goto label. This has become a problem, since the commit 6b97a20d3a7909daa066 ("printk: set may_schedule for some of console_trylock() callers"). @console_may_schedule might get enabled now. There is also the opposite problem. console_lock() can be called only from preemptive context. It can always enable scheduling in the console code. But console_trylock() is not able to detect it when CONFIG_PREEMPT_COUNT is disabled. Therefore we should use the original @console_may_schedule value after re-acquiring the console semaphore in console_unlock(). This patch solves both problems by moving the "again" goto label. Alternative solution was to clear and restore the value around call_console_drivers(). Then console_conditional_schedule() could be used also inside console_unlock(). But there was a potential race with console_flush_on_panic() as reported by Sergey Senozhatsky. That function should be called only where there is only one CPU and with interrupts disabled. But better be on the safe side because stopping CPUs might fail. Fixes: 6b97a20d3a7909 ("printk: set may_schedule for some of console_trylock() callers") Link: http://lkml.kernel.org/r/1490372045-22288-1-git-send-email-pmladek@suse.com Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jslaby@suse.cz> Cc: linux-fbdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22braille-console: Fix value returned by _braille_console_setupSamuel Thibault
[ Upstream commit 2ed2b8621be2708c0f6d61fe9841e9ad8b9753f0 ] commit bbeddf52adc1 ("printk: move braille console support into separate braille.[ch] files") introduced _braille_console_setup() to outline the braille initialization code. There was however some confusion over the value it was supposed to return. commit 2cfe6c4ac7ee ("printk: Fix return of braille_register_console()") tried to fix it but failed to. This fixes and documents the returned value according to the use in printk.c: non-zero return means a parsing error, and thus this console configuration should be ignored. Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Aleksey Makarov <aleksey.makarov@linaro.org> Cc: Joe Perches <joe@perches.com> Cc: Ming Lei <ming.lei@canonical.com> Cc: Steven Rostedt <rostedt@goodmis.org> Acked-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-23printk: use rcuidle console tracepointSergey Senozhatsky
commit fc98c3c8c9dcafd67adcce69e6ce3191d5306c9c upstream. Use rcuidle console tracepoint because, apparently, it may be issued from an idle CPU: hw-breakpoint: Failed to enable monitor mode on CPU 0. hw-breakpoint: CPU 0 failed to disable vector catch =============================== [ ERR: suspicious RCU usage. ] 4.10.0-rc8-next-20170215+ #119 Not tainted ------------------------------- ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 2, debug_locks = 0 RCU used illegally from extended quiescent state! 2 locks held by swapper/0/0: #0: (cpu_pm_notifier_lock){......}, at: [<c0237e2c>] cpu_pm_exit+0x10/0x54 #1: (console_lock){+.+.+.}, at: [<c01ab350>] vprintk_emit+0x264/0x474 stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ #119 Hardware name: Generic OMAP4 (Flattened Device Tree) console_unlock vprintk_emit vprintk_default printk reset_ctrl_regs dbg_cpu_pm_notify notifier_call_chain cpu_pm_exit omap_enter_idle_coupled cpuidle_enter_state cpuidle_enter_state_coupled do_idle cpu_startup_entry start_kernel This RCU warning, however, is suppressed by lockdep_off() in printk(). lockdep_off() increments the ->lockdep_recursion counter and thus disables RCU_LOCKDEP_WARN() and debug_lockdep_rcu_enabled(), which want lockdep to be enabled "current->lockdep_recursion == 0". Link: http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Tony Lindgren <tony@atomide.com> Tested-by: Tony Lindgren <tony@atomide.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Lindgren <tony@atomide.com> Cc: Russell King <rmk@armlinux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-14Revert "printk: make reading the kernel log flush pending lines"Linus Torvalds
This reverts commit bfd8d3f23b51018388be0411ccbc2d56277fe294. It turns out that this flushes things much too aggressiverly, and causes lines to break up when the system logger races with new continuation lines being printed. There's a pending patch to make printk() flushing much more straightforward, but it's too invasive for 4.9, so in the meantime let's just not make the system message logging flush continuation lines. They'll be flushed by the final newline anyway. Suggested-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-11Revert "console: don't prefer first registered if DT specifies stdout-path"Hans de Goede
This reverts commit 05fd007e4629 ("console: don't prefer first registered if DT specifies stdout-path"). The reverted commit changes existing behavior on which many ARM boards rely. Many ARM small-board-computers, like e.g. the Raspberry Pi have both a video output and a serial console. Depending on whether the user is using the device as a more regular computer; or as a headless device we need to have the console on either one or the other. Many users rely on the kernel behavior of the console being present on both outputs, before the reverted commit the console setup with no console= kernel arguments on an ARM board which sets stdout-path in dt would look like this: [root@localhost ~]# cat /proc/consoles ttyS0 -W- (EC p a) 4:64 tty0 -WU (E p ) 4:1 Where as after the reverted commit, it looks like this: [root@localhost ~]# cat /proc/consoles ttyS0 -W- (EC p a) 4:64 This commit reverts commit 05fd007e4629 ("console: don't prefer first registered if DT specifies stdout-path") restoring the original behavior. Fixes: 05fd007e4629 ("console: don't prefer first registered if DT specifies stdout-path") Link: http://lkml.kernel.org/r/20161104121135.4780-2-hdegoede@redhat.com Signed-off-by: Hans de Goede <hdegoede@redhat.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Rob Herring <robh+dt@kernel.org> Cc: Frank Rowand <frowand.list@gmail.com> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-19printk: suppress empty continuation linesLinus Torvalds
We have a fairly common pattern where you print several things as continuations on one single line in a loop, and then at the end you do printk(KERN_CONT "\n"); to flush the buffered output. But if the output was flushed by something else (concurrent printk activity, or just system logging), we don't want that final flushing to just print an empty line. So just suppress empty continuation lines when they couldn't be merged into the line they are a continuation of. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-10Merge branch 'printk-cleanups'Linus Torvalds
Merge my system logging cleanups, triggered by the broken '\n' patches. The line continuation handling has been broken basically forever, and the code to handle the system log records was both confusing and dubious. And it would do entirely the wrong thing unless you always had a terminating newline, partly because it couldn't actually see whether a message was marked KERN_CONT or not (but partly because the LOG_CONT handling in the recording code was rather confusing too). This re-introduces a real semantically meaningful KERN_CONT, and fixes the few places I noticed where it was missing. There are probably more missing cases, since KERN_CONT hasn't actually had any semantic meaning for at least four years (other than the checkpatch meaning of "no log level necessary, this is a continuation line"). This also allows the combination of KERN_CONT and a log level. In that case the log level will be ignored if the merging with a previous line is successful, but if a new record is needed, that new record will now get the right log level. That also means that you can at least in theory combine KERN_CONT with the "pr_info()" style helpers, although any use of pr_fmt() prefixing would make that just result in a mess, of course (the prefix would end up in the middle of a continuing line). * printk-cleanups: printk: make reading the kernel log flush pending lines printk: re-organize log_output() to be more legible printk: split out core logging code into helper function printk: reinstate KERN_CONT for printing continuation lines
2016-10-09printk: make reading the kernel log flush pending linesLinus Torvalds
That will mean that any possible subsequent continuation will now be broken up onto a line of its own (since reading the log has finalized the beginning og the line), but if user space has activated system logging (or if there's a kernel message dump going on) that is the right thing to do. And now that we actually get the continuation flags _right_ for this all, the user space logger that is reading the kernel messages can actually see the continuation marker. Not that anybody seems to really bother with it (or care), but in theory user space can do its own message stitching. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-09printk: re-organize log_output() to be more legibleLinus Torvalds
Avoid some duplicate logic now that we can return early, and update the comments for the new LOG_CONT world order. This also stops the continuation flushing from just using random record flags for the flushing action, instead taking the flags from the proper original line and updating them as we add continuations to it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-09printk: split out core logging code into helper functionLinus Torvalds
The code that actually decides how to log the message (whether to put it directly into the record log, whether to append it to an existing buffered log, or whether to start a new buffered log) is fairly non-obvious code in the middle of the vprintk_emit() function. Splitting that code up into a helper function makes it easier to understand, but perhaps more importantly also allows for the code to just return early out of the helper function once it has made the decision about where the new log content goes. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-09printk: reinstate KERN_CONT for printing continuation linesLinus Torvalds
Long long ago the kernel log buffer was a buffered stream of bytes, very much like stdio in user space. It supported log levels by scanning the stream and noticing the log level markers at the beginning of each line, but if you wanted to print a partial line in multiple chunks, you just did multiple printk() calls, and it just automatically worked. Except when it didn't, and you had very confusing output when different lines got all mixed up with each other. Then you got fragment lines mixing with each other, or with non-fragment lines, because it was traditionally impossible to tell whether a printk() call was a continuation or not. To at least help clarify the issue of continuation lines, we added a KERN_CONT marker back in 2007 to mark continuation lines: 474925277671 ("printk: add KERN_CONT annotation"). That continuation marker was initially an empty string, and didn't actuall make any semantic difference. But it at least made it possible to annotate the source code, and have check-patch notice that a printk() didn't need or want a log level marker, because it was a continuation of a previous line. To avoid the ambiguity between a continuation line that had that KERN_CONT marker, and a printk with no level information at all, we then in 2009 made KERN_CONT be a real log level marker which meant that we could now reliably tell the difference between the two cases. 5fd29d6ccbc9 ("printk: clean up handling of log-levels and newlines") and we could take advantage of that to make sure we didn't mix up continuation lines with lines that just didn't have any loglevel at all. Then, in 2012, the kernel log buffer was changed to be a "record" based log, where each line was a record that has a loglevel and a timestamp. You can see the beginning of that conversion in commits e11fea92e13f ("kmsg: export printk records to the /dev/kmsg interface") 7ff9554bb578 ("printk: convert byte-buffer to variable-length record buffer") with a number of follow-up commits to fix some painful fallout from that conversion. Over all, it took a couple of months to sort out most of it. But the upside was that you could have concurrent readers (and writers) of the kernel log and not have lines with mixed output in them. And one particular pain-point for the record-based kernel logging was exactly the fragmentary lines that are generated in smaller chunks. In order to still log them as one recrod, the continuation lines need to be attached to the previous record properly. However the explicit continuation record marker that is actually useful for this exact case was actually removed in aroundm the same time by commit 61e99ab8e35a ("printk: remove the now unnecessary "C" annotation for KERN_CONT") due to the incorrect belief that KERN_CONT wasn't meaningful. The ambiguity between "is this a continuation line" or "is this a plain printk with no log level information" was reintroduced, and in fact became an even bigger pain point because there was now the whole record-level merging of kernel messages going on. This patch reinstates the KERN_CONT as a real non-empty string marker, so that the ambiguity is fixed once again. But it's not a plain revert of that original removal: in the four years since we made KERN_CONT an empty string again, not only has the format of the log level markers changed, we've also had some usage changes in this area. For example, some ACPI code seems to use KERN_CONT _together_ with a log level, and now uses both the KERN_CONT marker and (for example) a KERN_INFO marker to show that it's an informational continuation of a line. Which is actually not a bad idea - if the continuation line cannot be attached to its predecessor, without the log level information we don't know what log level to assign to it (and we traditionally just assigned it the default loglevel). So having both a log level and the KERN_CONT marker is not necessarily a bad idea, but it does mean that we need to actually iterate over potentially multiple markers, rather than just a single one. Also, since KERN_CONT was still conceptually needed, and encouraged, but didn't actually _do_ anything, we've also had the reverse problem: rather than having too many annotations it has too few, and there is bit rot with code that no longer marks the continuation lines with the KERN_CONT marker. So this patch not only re-instates the non-empty KERN_CONT marker, it also fixes up the cases of bit-rot I noticed in my own logs. There are probably other cases where KERN_CONT will be needed to be added, either because it is new code that never dealt with the need for KERN_CONT, or old code that has bitrotted without anybody noticing. That said, we should strive to avoid the need for KERN_CONT. It does result in real problems for logging, and should generally not be seen as a good feature. If we some day can get rid of the feature entirely, because nobody does any fragmented printk calls, that would be lovely. But until that point, let's at mark the code that relies on the hacky multi-fragment kernel printk's. Not only does it avoid the ambiguity, it also annotates code as "maybe this would be good to fix some day". (That said, particularly during single-threaded bootup, the downsides of KERN_CONT are very limited. Things get much hairier when you have multiple threads going on and user level reading and writing logs too). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07console: don't prefer first registered if DT specifies stdout-pathPaul Burton
If a device tree specifies a preferred device for kernel console output via the stdout-path or linux,stdout-path chosen node properties or the stdout alias then the kernel ought to honor it & output the kernel console to that device. As it stands, this isn't the case. Whilst we parse the stdout-path properties & set an of_stdout variable from of_alias_scan(), and use that from of_console_check() to determine whether to add a console device as a preferred console whilst registering it, we also prefer the first registered console if no other has been selected at the time of its registration. This means that if a console other than the one the device tree selects via stdout-path is registered first, we will switch to using it & when the stdout-path console is later registered the call to add_preferred_console() via of_console_check() is too late to do anything useful. In practice this seems to mean that we switch to the dummy console device fairly early & see no further console output: Console: colour dummy device 80x25 console [tty0] enabled bootconsole [ns16550a0] disabled Fix this by not automatically preferring the first registered console if one is specified by the device tree. This allows consoles to be registered but not enabled, and once the driver for the console selected by stdout-path calls of_console_check() the driver will be added to the list of preferred consoles before any other console has been enabled. When that console is then registered via register_console() it will be enabled as expected. Link: http://lkml.kernel.org/r/20160809151937.26118-1-paul.burton@imgtec.com Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Tejun Heo <tj@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Ivan Delalande <colona@arista.com> Cc: Thierry Reding <treding@nvidia.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jan Kara <jack@suse.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Joe Perches <joe@perches.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Frank Rowand <frowand.list@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-01printk/nmi: avoid direct printk()-s from __printk_nmi_flush()Sergey Senozhatsky
__printk_nmi_flush() can be called from nmi_panic(), therefore it has to test whether it's executed in NMI context and thus must route the messages through deferred printk() or via direct printk(). This is to avoid potential deadlocks, as described in commit cf9b1106c81c ("printk/nmi: flush NMI messages on the system panic"). However there remain two places where __printk_nmi_flush() does unconditional direct printk() calls: - pr_err("printk_nmi_flush: internal error ...") - pr_cont("\n") Factor out print_nmi_seq_line() parts into a new printk_nmi_flush_line() function, which takes care of in_nmi(), and use it in __printk_nmi_flush() for printing and error-reporting. Link: http://lkml.kernel.org/r/20160830161354.581-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-26printk: fix parsing of "brl=" optionNicolas Iooss
Commit bbeddf52adc1 ("printk: move braille console support into separate braille.[ch] files") moved the parsing of braille-related options into _braille_console_setup(), changing the type of variable str from char* to char**. In this commit, memcmp(str, "brl,", 4) was correctly updated to memcmp(*str, "brl,", 4) but not memcmp(str, "brl=", 4). Update the code to make "brl=" option work again and replace memcmp() with strncmp() to make the compiler able to detect such an issue. Fixes: bbeddf52adc1 ("printk: move braille console support into separate braille.[ch] files") Link: http://lkml.kernel.org/r/20160823165700.28952-1-nicolas.iooss_linux@m4x.org Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-09Revert "printk: create pr_<level> functions"Linus Torvalds
This reverts commit 874f9c7da9a4acbc1b9e12ca722579fb50e4d142. Geert Uytterhoeven reports: "This change seems to have an (unintendent?) side-effect. Before, pr_*() calls without a trailing newline characters would be printed with a newline character appended, both on the console and in the output of the dmesg command. After this commit, no new line character is appended, and the output of the next pr_*() call of the same type may be appended, like in: - Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000 - Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM) + Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)" Joe Perches says: "No, that is not intentional. The newline handling code inside vprintk_emit is a bit involved and for now I suggest a revert until this has all the same behavior as earlier" Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Requested-by: Joe Perches <joe@perches.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-08printk: Remove unnecessary #ifdef CONFIG_PRINTKAndreas Ziegler
In commit 874f9c7da9a4 ("printk: create pr_<level> functions"), new pr_level defines were added to printk.c. These new defines are guarded by an #ifdef CONFIG_PRINTK - however, there is already a surrounding #ifdef CONFIG_PRINTK starting a lot earlier in line 249 which means the newly introduced #ifdef is unnecessary. Let's remove it to avoid confusion. Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de> Cc: Joe Perches <joe@perches.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02printk: add kernel parameter to control writes to /dev/kmsgBorislav Petkov
Add a "printk.devkmsg" kernel command line parameter which controls how userspace writes into /dev/kmsg. It has three options: * ratelimit - ratelimit logging from userspace. * on - unlimited logging from userspace * off - logging from userspace gets ignored The default setting is to ratelimit the messages written to it. This changes the kernel default setting of "on" to "ratelimit" and we do that because we want to keep userspace spamming /dev/kmsg to sane levels. This is especially moot when a small kernel log buffer wraps around and messages get lost. So the ratelimiting setting should be a sane setting where kernel messages should have a bit higher chance of survival from all the spamming. It additionally does not limit logging to /dev/kmsg while the system is booting if we haven't disabled it on the command line. Furthermore, we can control the logging from a lower priority sysctl interface - kernel.printk_devkmsg. That interface will succeed only if printk.devkmsg *hasn't* been supplied on the command line. If it has, then printk.devkmsg is a one-time setting which remains for the duration of the system lifetime. This "locking" of the setting is to prevent userspace from changing the logging on us through sysctl(2). This patch is based on previous patches from Linus and Steven. [bp@suse.de: fixes] Link: http://lkml.kernel.org/r/20160719072344.GC25563@nazgul.tnic Link: http://lkml.kernel.org/r/20160716061745.15795-3-bp@alien8.de Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Dave Young <dyoung@redhat.com> Cc: Franck Bui <fbui@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02printk: include <asm/sections.h> instead of <asm-generic/sections.h>Christoph Hellwig
asm-generic headers are generic implementations for architecture specific code and should not be included by common code. Thus use the asm/ version of sections.h to get at the linker sections. Link: http://lkml.kernel.org/r/1468285008-7331-1-git-send-email-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02printk: introduce suppress_message_printing()Sergey Senozhatsky
Messages' levels and console log level are inspected when the actual printing occurs, which may provoke console_unlock() and console_cont_flush() to waste CPU cycles on every message that has loglevel above the current console_loglevel. Schematically, console_unlock() does the following: console_unlock() { ... for (;;) { ... raw_spin_lock_irqsave(&logbuf_lock, flags); skip: msg = log_from_idx(console_idx); if (msg->flags & LOG_NOCONS) { ... goto skip; } level = msg->level; len += msg_print_text(); >> sprintfs memcpy, etc. if (nr_ext_console_drivers) { ext_len = msg_print_ext_header(); >> scnprintf ext_len += msg_print_ext_body(); >> scnprintfs etc. } ... raw_spin_unlock(&logbuf_lock); call_console_drivers(level, ext_text, ext_len, text, len) { if (level >= console_loglevel && >> drop the message !ignore_loglevel) return; console->write(...); } local_irq_restore(flags); } ... } The thing here is this deferred `level >= console_loglevel' check. We are wasting CPU cycles on sprintfs/memcpy/etc. preparing the messages that we will eventually drop. This can be huge when we register a new CON_PRINTBUFFER console, for instance. For every such a console register_console() resets the console_seq, console_idx, console_prev and sets a `exclusive console' pointer to replay the log buffer to that just-registered console. And there can be a lot of messages to replay, in the worst case most of which can be dropped after console_loglevel test. We know messages' levels long before we call msg_print_text() and friends, so we can just move console_loglevel check out of call_console_drivers() and format a new message only if we are sure that it won't be dropped. The patch factors out loglevel check into suppress_message_printing() function and tests message->level and console_loglevel before formatting functions in console_unlock() and console_cont_flush() are getting executed. This improves things not only for exclusive CON_PRINTBUFFER consoles, but for every console_unlock() that attempts to print a message of level above the console_loglevel. Link: http://lkml.kernel.org/r/20160627135012.8229-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Calvin Owens <calvinowens@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02printk: create pr_<level> functionsJoe Perches
Using functions instead of macros can reduce overall code size by eliminating unnecessary "KERN_SOH<digit>" prefixes from format strings. defconfig x86-64: $ size vmlinux* text data bss dec hex filename 10193570 4331464 1105920 15630954 ee826a vmlinux.new 10192623 4335560 1105920 15634103 ee8eb7 vmlinux.old As the return value are unimportant and unused in the kernel tree, these new functions return void. Miscellanea: - change pr_<level> macros to call new __pr_<level> functions - change vprintk_nmi and vprintk_default to add LOGLEVEL_<level> argument [akpm@linux-foundation.org: fix LOGLEVEL_INFO, per Joe] Link: http://lkml.kernel.org/r/e16cc34479dfefcae37c98b481e6646f0f69efc3.1466718827.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02printk: do not include interrupt.hSergey Senozhatsky
A trivial cosmetic change: interrupt.h header is redundant since commit 6b898c07cb1d ("console: use might_sleep in console_lock"). Link: http://lkml.kernel.org/r/20160620132847.21930-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28printk: when dumping regs, show the stack, not thread_infoAndy Lutomirski
We currently show: task: <current> ti: <current_thread_info()> task.ti: <task_thread_info(current)>" "ti" and "task.ti" are redundant, and neither is actually what we want to show, which the the base of the thread stack. Change the display to show the stack pointer explicitly. Link: http://lkml.kernel.org/r/543ac5bd66ff94000a57a02e11af7239571a3055.1468523549.git.luto@kernel.org Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20printk/nmi: flush NMI messages on the system panicPetr Mladek
In NMI context, printk() messages are stored into per-CPU buffers to avoid a possible deadlock. They are normally flushed to the main ring buffer via an IRQ work. But the work is never called when the system calls panic() in the very same NMI handler. This patch tries to flush NMI buffers before the crash dump is generated. In this case it does not risk a double release and bails out when the logbuf_lock is already taken. The aim is to get the messages into the main ring buffer when possible. It makes them better accessible in the vmcore. Then the patch tries to flush the buffers second time when other CPUs are down. It might be more aggressive and reset logbuf_lock. The aim is to get the messages available for the consequent kmsg_dump() and console_flush_on_panic() calls. The patch causes vprintk_emit() to be called even in NMI context again. But it is done via printk_deferred() so that the console handling is skipped. Consoles use internal locks and we could not prevent a deadlock easily. They are explicitly called later when the crash dump is not generated, see console_flush_on_panic(). Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jiri Kosina <jkosina@suse.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20printk/nmi: increase the size of NMI buffer and make it configurablePetr Mladek
Testing has shown that the backtrace sometimes does not fit into the 4kB temporary buffer that is used in NMI context. The warnings are gone when I double the temporary buffer size. This patch doubles the buffer size and makes it configurable. Note that this problem existed even in the x86-specific implementation that was added by the commit a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all CPUs"). Nobody noticed it because it did not print any warnings. Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Jan Kara <jack@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jiri Kosina <jkosina@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: David Miller <davem@davemloft.net> Cc: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20printk/nmi: warn when some message has been lost in NMI contextPetr Mladek
We could not resize the temporary buffer in NMI context. Let's warn if a message is lost. This is rather theoretical. printk() should not be used in NMI. The only sensible use is when we want to print backtrace from all CPUs. The current buffer should be enough for this purpose. [akpm@linux-foundation.org: whitespace fixlet] Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Jan Kara <jack@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jiri Kosina <jkosina@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: David Miller <davem@davemloft.net> Cc: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20printk/nmi: generic solution for safe printk in NMIPetr Mladek
printk() takes some locks and could not be used a safe way in NMI context. The chance of a deadlock is real especially when printing stacks from all CPUs. This particular problem has been addressed on x86 by the commit a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all CPUs"). The patchset brings two big advantages. First, it makes the NMI backtraces safe on all architectures for free. Second, it makes all NMI messages almost safe on all architectures (the temporary buffer is limited. We still should keep the number of messages in NMI context at minimum). Note that there already are several messages printed in NMI context: WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE handlers. These are not easy to avoid. This patch reuses most of the code and makes it generic. It is useful for all messages and architectures that support NMI. The alternative printk_func is set when entering and is reseted when leaving NMI context. It queues IRQ work to copy the messages into the main ring buffer in a safe context. __printk_nmi_flush() copies all available messages and reset the buffer. Then we could use a simple cmpxchg operations to get synchronized with writers. There is also used a spinlock to get synchronized with other flushers. We do not longer use seq_buf because it depends on external lock. It would be hard to make all supported operations safe for a lockless use. It would be confusing and error prone to make only some operations safe. The code is put into separate printk/nmi.c as suggested by Steven Rostedt. It needs a per-CPU buffer and is compiled only on architectures that call nmi_enter(). This is achieved by the new HAVE_NMI Kconfig flag. The are MN10300 and Xtensa architectures. We need to clean up NMI handling there first. Let's do it separately. The patch is heavily based on the draft from Peter Zijlstra, see https://lkml.org/lkml/2015/6/10/327 [arnd@arndb.de: printk-nmi: use %zu format string for size_t] [akpm@linux-foundation.org: min_t->min - all types are size_t here] Signed-off-by: Petr Mladek <pmladek@suse.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Cc: Jan Kara <jack@suse.cz> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> [arm part] Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jiri Kosina <jkosina@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: David Miller <davem@davemloft.net> Cc: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17printk: add clear_idx symbol to vmcoreinfoIvan Delalande
This allows us to extract from the vmcore only the messages emitted since the last time the ring buffer was cleared. We just have to make sure its value is always up-to-date, when old messages are discarded to free space in log_make_free_space() for example. Signed-off-by: Zeyu Zhao <zzy8200@gmail.com> Signed-off-by: Ivan Delalande <colona@arista.com> Cc: Kay Sievers <kay@vrfy.org> Cc: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17printk: check CON_ENABLED in have_callable_console()Sergey Senozhatsky
have_callable_console() must also test CON_ENABLED bit, not just CON_ANYTIME. We may have disabled CON_ANYTIME console so printk can wrongly assume that it's safe to call_console_drivers(). Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Jan Kara <jack@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Kyle McMartin <kyle@kernel.org> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Calvin Owens <calvinowens@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17printk: set may_schedule for some of console_trylock() callersSergey Senozhatsky
console_unlock() allows to cond_resched() if its caller has set `console_may_schedule' to 1, since 8d91f8b15361 ("printk: do cond_resched() between lines while outputting to consoles"). The rules are: -- console_lock() always sets `console_may_schedule' to 1 -- console_trylock() always sets `console_may_schedule' to 0 However, console_trylock() callers (among them is printk()) do not always call printk() from atomic contexts, and some of them can cond_resched() in console_unlock(), so console_trylock() can set `console_may_schedule' to 1 for such processes. For !CONFIG_PREEMPT_COUNT kernels, however, console_trylock() always sets `console_may_schedule' to 0. It's possible to drop explicit preempt_disable()/preempt_enable() in vprintk_emit(), because console_unlock() and console_trylock() are now smart enough: a) console_unlock() does not cond_resched() when it's unsafe (console_trylock() takes care of that) b) console_unlock() does can_use_console() check. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Jan Kara <jack@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Kyle McMartin <kyle@kernel.org> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Calvin Owens <calvinowens@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17printk: move can_use_console() out of console_trylock_for_printk()Sergey Senozhatsky
console_unlock() allows to cond_resched() if its caller has set `console_may_schedule' to 1 (this functionality is present since 8d91f8b15361 ("printk: do cond_resched() between lines while outputting to consoles"). The rules are: -- console_lock() always sets `console_may_schedule' to 1 -- console_trylock() always sets `console_may_schedule' to 0 printk() calls console_unlock() with preemption desabled, which basically can lead to RCU stalls, watchdog soft lockups, etc. if something is simultaneously calling printk() frequent enough (IOW, console_sem owner always has new data to send to console divers and can't leave console_unlock() for a long time). printk()->console_trylock() callers do not necessarily execute in atomic contexts, and some of them can cond_resched() in console_unlock(). console_trylock() can set `console_may_schedule' to 1 (allow cond_resched() later in consoe_unlock()) when it's safe. This patch (of 3): vprintk_emit() disables preemption around console_trylock_for_printk() and console_unlock() calls for a strong reason -- can_use_console() check. The thing is that vprintl_emit() can be called on a CPU that is not fully brought up yet (!cpu_online()), which potentially can cause problems if console driver wants to access per-cpu data. A console driver can explicitly state that it's safe to call it from !online cpu by setting CON_ANYTIME bit in console ->flags. That's why for !cpu_online() can_use_console() iterates all the console to find out if there is a CON_ANYTIME console, otherwise console_unlock() must be avoided. can_use_console() ensures that console_unlock() call is safe in vprintk_emit() only; console_lock() and console_trylock() are not covered by this check. Even though call_console_drivers(), invoked from console_cont_flush() and console_unlock(), tests `!cpu_online() && CON_ANYTIME' for_each_console(), it may be too late, which can result in messages loss. Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no CON_ANYTIME consoles available. CPU0 online CPU1 !online console_trylock() ... console_unlock() console_cont_flush spin_lock logbuf_lock if (!cont.len) { spin_unlock logbuf_lock return } for (;;) { vprintk_emit spin_lock logbuf_lock log_store spin_unlock logbuf_lock spin_lock logbuf_lock !console_trylock_for_printk msg_print_text return console_idx = log_next() console_seq++ console_prev = msg->flags spin_unlock logbuf_lock call_console_drivers() for_each_console(con) { if (!cpu_online() && !(con->flags & CON_ANYTIME)) continue; } /* * no message printed, we lost it */ vprintk_emit spin_lock logbuf_lock log_store spin_unlock logbuf_lock !console_trylock_for_printk return /* * go to the beginning of the loop, * find out there are new messages, * lose it */ } console_trylock()/console_lock() call on CPU1 may come from cpu notifiers registered on that CPU. Since notifiers are not getting unregistered when CPU is going DOWN, all of the notifiers receive notifications during CPU UP. For example, on my x86_64, I see around 50 notification sent from offline CPU to itself [swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick [swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify [swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify [swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify while doing echo 0 > /sys/devices/system/cpu/cpu2/online echo 1 > /sys/devices/system/cpu/cpu2/online So grabbing the console_sem lock while CPU is !online is possible, in theory. This patch moves can_use_console() check out of console_trylock_for_printk(). Instead it calls it in console_unlock(), so now console_lock()/console_unlock() are also 'protected' by can_use_console(). This also means that console_trylock_for_printk() is not really needed anymore and can be removed. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Jan Kara <jack@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Kyle McMartin <kyle@kernel.org> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Calvin Owens <calvinowens@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-20kernel: printk: specify alignment for struct printk_logAndrey Ryabinin
On architectures that have support for efficient unaligned access struct printk_log has 4-byte alignment. Specify alignment attribute in type declaration. The whole point of this patch is to fix deadlock which happening when UBSAN detects unaligned access in printk() thus UBSAN recursively calls printk() with logbuf_lock held by top printk() call. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Marek <mmarek@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Yury Gribov <y.gribov@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Kostya Serebryany <kcc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16printk: change recursion_bug type to boolSergey Senozhatsky
`recursion_bug' is used as recursion_bug toggle, so make it `bool'. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16printk: do cond_resched() between lines while outputting to consolesTejun Heo
@console_may_schedule tracks whether console_sem was acquired through lock or trylock. If the former, we're inside a sleepable context and console_conditional_schedule() performs cond_resched(). This allows console drivers which use console_lock for synchronization to yield while performing time-consuming operations such as scrolling. However, the actual console outputting is performed while holding irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule before starting outputting lines. Also, only a few drivers call console_conditional_schedule() to begin with. This means that when a lot of lines need to be output by console_unlock(), for example on a console registration, the task doing console_unlock() may not yield for a long time on a non-preemptible kernel. If this happens with a slow console devices, for example a serial console, the outputting task may occupy the cpu for a very long time. Long enough to trigger softlockup and/or RCU stall warnings, which in turn pile more messages, sometimes enough to trigger the next cycle of warnings incapacitating the system. Fix it by making console_unlock() insert cond_resched() between lines if @console_may_schedule. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Calvin Owens <calvinowens@fb.com> Acked-by: Jan Kara <jack@suse.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Kyle McMartin <kyle@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16printk: only unregister boot consoles when necessaryThierry Reding
Boot consoles are typically replaced by proper consoles during the boot process. This can be problematic if the boot console data is part of the init section that is reclaimed late during boot. If the proper console does not register before this point in time, the boot console will need to be removed (so that the freed memory is not accessed), leaving the system without output for some time. There are various reasons why the proper console may not register early enough, such as deferred probe or the driver being a loadable module. If that happens, there is some amount of time where no console messages are visible to the user, which in turn can mean that they won't see crashes or other potentially useful information. To avoid this situation, only remove the boot console when it resides in the init section. Code exists to replace the boot console by the proper console when it is registered, keeping a seamless transition between the boot and proper consoles. Signed-off-by: Thierry Reding <treding@nvidia.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06printk: prevent userland from spoofing kernel messagesMathias Krause
The following statement of ABI/testing/dev-kmsg is not quite right: It is not possible to inject messages from userspace with the facility number LOG_KERN (0), to make sure that the origin of the messages can always be reliably determined. Userland actually can inject messages with a facility of 0 by abusing the fact that the facility is stored in a u8 data type. By using a facility which is a multiple of 256 the assignment of msg->facility in log_store() implicitly truncates it to 0, i.e. LOG_KERN, allowing users of /dev/kmsg to spoof kernel messages as shown below: The following call... # printf '<%d>Kernel panic - not syncing: beer empty\n' 0 >/dev/kmsg ...leads to the following log entry (dmesg -x | tail -n 1): user :emerg : [ 66.137758] Kernel panic - not syncing: beer empty However, this call... # printf '<%d>Kernel panic - not syncing: beer empty\n' 0x800 >/dev/kmsg ...leads to the slightly different log entry (note the kernel facility): kern :emerg : [ 74.177343] Kernel panic - not syncing: beer empty Fix that by limiting the user provided facility to 8 bit right from the beginning and catch the truncation early. Fixes: 7ff9554bb578 ("printk: convert byte-buffer to variable-length...") Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Petr Mladek <pmladek@suse.cz> Cc: Alex Elder <elder@linaro.org> Cc: Joe Perches <joe@perches.com> Cc: Kay Sievers <kay@vrfy.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-22pstore: add pstore unregisterGeliang Tang
pstore doesn't support unregistering yet. It was marked as TODO. This patch adds some code to fix it: 1) Add functions to unregister kmsg/console/ftrace/pmsg. 2) Add a function to free compression buffer. 3) Unmap the memory and free it. 4) Add a function to unregister pstore filesystem. Signed-off-by: Geliang Tang <geliangtang@163.com> Acked-by: Kees Cook <keescook@chromium.org> [Removed __exit annotation from ramoops_remove(). Reported by Arnd Bergmann] Signed-off-by: Tony Luck <tony.luck@intel.com>
2015-09-10kexec: split kexec_load syscall from kexec core codeDave Young
There are two kexec load syscalls, kexec_load another and kexec_file_load. kexec_file_load has been splited as kernel/kexec_file.c. In this patch I split kexec_load syscall code to kernel/kexec.c. And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and use kexec_file_load only, or vice verse. The original requirement is from Ted Ts'o, he want kexec kernel signature being checked with CONFIG_KEXEC_VERIFY_SIG enabled. But kexec-tools use kexec_load syscall can bypass the checking. Vivek Goyal proposed to create a common kconfig option so user can compile in only one syscall for loading kexec kernel. KEXEC/KEXEC_FILE selects KEXEC_CORE so that old config files still work. Because there's general code need CONFIG_KEXEC_CORE, so I updated all the architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects KEXEC_CORE in arch Kconfig. Also updated general kernel code with to kexec_load syscall. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Dave Young <dyoung@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Josh Boyer <jwboyer@fedoraproject.org> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-30printk: improve the description of /dev/kmsg line formatAntonio Ospite
The comment about /dev/kmsg does not mention the additional values which may actually be exported, fix that. Also move up the part of the comment instructing the users to ignore these additional values, this way the reading is more fluent and logically compact. Signed-off-by: Antonio Ospite <ao2@ao2.it> Cc: Joe Perches <joe@perches.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25check_syslog_permissions() cleanupVasily Averin
Patch fixes drawbacks in heck_syslog_permissions() noticed by AKPM: "from_file handling makes me cry. That's not a boolean - it's an enumerated value with two values currently defined. But the code in check_syslog_permissions() treats it as a boolean and also hardwires the knowledge that SYSLOG_FROM_PROC == 1 (or == `true`). And the name is wrong: it should be called from_proc to match SYSLOG_FROM_PROC." Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Cc: Kees Cook <keescook@chromium.org> Cc: Josh Boyer <jwboyer@redhat.com> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25security_syslog() should be called once onlyVasily Averin
The final version of commit 637241a900cb ("kmsg: honor dmesg_restrict sysctl on /dev/kmsg") lost few hooks, as result security_syslog() are processed incorrectly: - open of /dev/kmsg checks syslog access permissions by using check_syslog_permissions() where security_syslog() is not called if dmesg_restrict is set. - syslog syscall and /proc/kmsg calls do_syslog() where security_syslog can be executed twice (inside check_syslog_permissions() and then directly in do_syslog()) With this patch security_syslog() is called once only in all syslog-related operations regardless of dmesg_restrict value. Fixes: 637241a900cb ("kmsg: honor dmesg_restrict sysctl on /dev/kmsg") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Cc: Kees Cook <keescook@chromium.org> Cc: Josh Boyer <jwboyer@redhat.com> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25printk: implement support for extended console driversTejun Heo
printk log_buf keeps various metadata for each message including its sequence number and timestamp. The metadata is currently available only through /dev/kmsg and stripped out before passed onto console drivers. We want this metadata to be available to console drivers too so that console consumers can get full information including the metadata and dictionary, which among other things can be used to detect whether messages got lost in transit. This patch implements support for extended console drivers. Consoles can indicate that they want extended messages by setting the new CON_EXTENDED flag and they'll be fed messages formatted the same way as /dev/kmsg. "<level>,<sequnum>,<timestamp>,<contflag>;<message text>\n" If extended consoles exist, in-kernel fragment assembly is disabled. This ensures that all messages emitted to consoles have full metadata including sequence number. The contflag carries enough information to reassemble the fragments from the reader side trivially. Note that this only affects /dev/kmsg. Regular console and /proc/kmsg outputs are not affected by this change. * Extended message formatting for console drivers is enabled iff there are registered extended consoles. * Comment describing /dev/kmsg message format updated to add missing contflag field and help distinguishing variable from verbatim terms. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: David Miller <davem@davemloft.net> Cc: Kay Sievers <kay@vrfy.org> Reviewed-by: Petr Mladek <pmladek@suse.cz> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25printk: factor out message formatting from devkmsg_read()Tejun Heo
The extended message formatting used for /dev/kmsg will be used implement extended consoles. Factor out msg_print_ext_header() and msg_print_ext_body() from devkmsg_read(). This is pure restructuring. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: David Miller <davem@davemloft.net> Cc: Kay Sievers <kay@vrfy.org> Reviewed-by: Petr Mladek <pmladek@suse.cz> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25printk: guard the amount written per line by devkmsg_read()Tejun Heo
This patchset updates netconsole so that it can emit messages with the same header as used in /dev/kmsg which gives neconsole receiver full log information which enables things like structured logging and detection of lost messages. This patch (of 7): devkmsg_read() uses 8k buffer and assumes that the formatted output message won't overrun which seems safe given LOG_LINE_MAX, the current use of dict and the escaping method being used; however, we're planning to use devkmsg formatting wider and accounting for the buffer size properly isn't that complicated. This patch defines CONSOLE_EXT_LOG_MAX as 8192 and updates devkmsg_read() so that it limits output accordingly. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: David Miller <davem@davemloft.net> Cc: Kay Sievers <kay@vrfy.org> Reviewed-by: Petr Mladek <pmladek@suse.cz> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>