summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJack Morgenstein <jackm@dev.mellanox.co.il>2017-03-21 12:57:06 +0200
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>2017-05-20 14:28:38 +0200
commit43c54927f6f4e0dba5714dbcd53fea44114cd7cf (patch)
tree6b42475a8fd05f1fd7594eb8d938bc896ab74c24
parent9ae6b33dcbb48f82857d5765fcad429961f8f46c (diff)
IB/mlx4: Reduce SRIOV multicast cleanup warning message to debug level
commit fb7a91746af18b2ebf596778b38a709cdbc488d3 upstream. A warning message during SRIOV multicast cleanup should have actually been a debug level message. The condition generating the warning does no harm and can fill the message log. In some cases, during testing, some tests were so intense as to swamp the message log with these warning messages, causing a stall in the console message log output task. This stall caused an NMI to be sent to all CPUs (so that they all dumped their stacks into the message log). Aside from the message flood causing an NMI, the tests all passed. Once the message flood which caused the NMI is removed (by reducing the warning message to debug level), the NMI no longer occurs. Sample message log (console log) output illustrating the flood and resultant NMI (snippets with comments and modified with ... instead of hex digits, to satisfy checkpatch.pl): <mlx4_ib> _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!... *** About 4000 almost identical lines in less than one second *** <mlx4_ib> _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!... INFO: rcu_sched detected stalls on CPUs/tasks: { 17} (...) *** { 17} above indicates that CPU 17 was the one that stalled *** sending NMI to all CPUs: ... NMI backtrace for cpu 17 CPU: 17 PID: 45909 Comm: kworker/17:2 Hardware name: HP ProLiant DL360p Gen8, BIOS P71 09/08/2013 Workqueue: events fb_flashcursor task: ffff880478...... ti: ffff88064e...... task.ti: ffff88064e...... RIP: 0010:[ffffffff81......] [ffffffff81......] io_serial_in+0x15/0x20 RSP: 0018:ffff88064e257cb0 EFLAGS: 00000002 RAX: 0000000000...... RBX: ffffffff81...... RCX: 0000000000...... RDX: 0000000000...... RSI: 0000000000...... RDI: ffffffff81...... RBP: ffff88064e...... R08: ffffffff81...... R09: 0000000000...... R10: 0000000000...... R11: ffff88064e...... R12: 0000000000...... R13: 0000000000...... R14: ffffffff81...... R15: 0000000000...... FS: 0000000000......(0000) GS:ffff8804af......(0000) knlGS:000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080...... CR2: 00007f2a2f...... CR3: 0000000001...... CR4: 0000000000...... DR0: 0000000000...... DR1: 0000000000...... DR2: 0000000000...... DR3: 0000000000...... DR6: 00000000ff...... DR7: 0000000000...... Stack: ffff88064e...... ffffffff81...... ffffffff81...... 0000000000...... ffffffff81...... ffff88064e...... ffffffff81...... ffffffff81...... ffffffff81...... ffff88064e...... ffffffff81...... 0000000000...... Call Trace: [<ffffffff813d099b>] wait_for_xmitr+0x3b/0xa0 [<ffffffff813d0b5c>] serial8250_console_putchar+0x1c/0x30 [<ffffffff813d0b40>] ? serial8250_console_write+0x140/0x140 [<ffffffff813cb5fa>] uart_console_write+0x3a/0x80 [<ffffffff813d0aae>] serial8250_console_write+0xae/0x140 [<ffffffff8107c4d1>] call_console_drivers.constprop.15+0x91/0xf0 [<ffffffff8107d6cf>] console_unlock+0x3bf/0x400 [<ffffffff813503cd>] fb_flashcursor+0x5d/0x140 [<ffffffff81355c30>] ? bit_clear+0x120/0x120 [<ffffffff8109d5fb>] process_one_work+0x17b/0x470 [<ffffffff8109e3cb>] worker_thread+0x11b/0x400 [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400 [<ffffffff810a5aef>] kthread+0xcf/0xe0 [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [<ffffffff81645858>] ret_from_fork+0x58/0x90 [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 Code: 48 89 e5 d3 e6 48 63 f6 48 03 77 10 8b 06 5d c3 66 0f 1f 44 00 00 66 66 66 6 As indicated in the stack trace above, the console output task got swamped. Fixes: b9c5d6a64358 ("IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-rw-r--r--drivers/infiniband/hw/mlx4/mcg.c3
1 files changed, 2 insertions, 1 deletions
diff --git a/drivers/infiniband/hw/mlx4/mcg.c b/drivers/infiniband/hw/mlx4/mcg.c
index a21d37f02f35..e6ea81c9644a 100644
--- a/drivers/infiniband/hw/mlx4/mcg.c
+++ b/drivers/infiniband/hw/mlx4/mcg.c
@@ -1102,7 +1102,8 @@ static void _mlx4_ib_mcg_port_cleanup(struct mlx4_ib_demux_ctx *ctx, int destroy
while ((p = rb_first(&ctx->mcg_table)) != NULL) {
group = rb_entry(p, struct mcast_group, node);
if (atomic_read(&group->refcount))
- mcg_warn_group(group, "group refcount %d!!! (pointer %p)\n", atomic_read(&group->refcount), group);
+ mcg_debug_group(group, "group refcount %d!!! (pointer %p)\n",
+ atomic_read(&group->refcount), group);
force_clean_group(group);
}