summaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)Author
2017-10-21IB/hfi1: Allocate context data on memory nodeSebastian Sanchez
[ Upstream commit b448bf9a0df6093dbadac36979a55ce4e012a677 ] There are some memory allocation calls in hfi1_create_ctxtdata() that do not use the numa function parameter. This can cause cache lines to be filled over QPI. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21IB/hfi1: Use static CTLE with Preset 6 for integrated HFIsEaswar Hariharan
[ Upstream commit 39e2afa8d042a53d855137d4c5a689a6f5492b39 ] After extended testing, it was found that the previous PCIe Gen 3 recipe, which used adaptive CTLE with Preset 4, could cause an NMI/Surprise Link Down in about 1 in 100 to 1 in 1000 power cycles on some platforms. New EV data combined with extensive empirical data indicates that the new recipe should use static CTLE with Preset 6 for all integrated silicon SKUs. Fixes: c3f8de0b334c ("IB/hfi1: Add static PCIe Gen3 CTLE tuning") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-08IB/qib: fix false-postive maybe-uninitialized warningArnd Bergmann
commit f6aafac184a3e46e919769dd4faa8bf0dc436534 upstream. aarch64-linux-gcc-7 complains about code it doesn't fully understand: drivers/infiniband/hw/qib/qib_iba7322.c: In function 'qib_7322_txchk_change': include/asm-generic/bitops/non-atomic.h:105:35: error: 'shadow' may be used uninitialized in this function [-Werror=maybe-uninitialized] The code is right, and despite trying hard, I could not come up with a version that I liked better than just adding a fake initialization here to shut up the warning. Fixes: f931551bafe1 ("IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-08IB/ipoib: Replace list_del of the neigh->list with list_del_initFeras Daoud
[ Upstream commit c586071d1dc8227a7182179b8e50ee92cc43f6d2 ] In order to resolve a situation where a few process delete the same list element in sequence and cause panic, list_del is replaced with list_del_init. In this case if the first process that calls list_del releases the lock before acquiring it again, other processes who can acquire the lock will call list_del_init. Fixes: b63b70d87741 ("IPoIB: Use a private hash table for path lookup") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-08IB/ipoib: rtnl_unlock can not come after free_netdevFeras Daoud
[ Upstream commit 89a3987ab7a923c047c6dec008e60ad6f41fac22 ] The ipoib_vlan_add function calls rtnl_unlock after free_netdev, rtnl_unlock not only releases the lock, but also calls netdev_run_todo. The latter function browses the net_todo_list array and completes the unregistration of all its net_device instances. If we call free_netdev before rtnl_unlock, then netdev_run_todo call over the freed device causes panic. To fix, move rtnl_unlock call before free_netdev call. Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support") Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-08IB/ipoib: Fix deadlock over vlan_mutexFeras Daoud
[ Upstream commit 1c3098cdb05207e740715857df7b0998e372f527 ] This patch fixes Deadlock while executing ipoib_vlan_delete. The function takes the vlan_rwsem semaphore and calls unregister_netdevice. The later function calls ipoib_mcast_stop_thread that cause workqueue flush. When the queue has one of the ipoib_ib_dev_flush_xxx events, a deadlock occur because these events also tries to catch the same vlan_rwsem semaphore. To fix, unregister_netdevice should be called after releasing the semaphore. Fixes: cbbe1efa4972 ("IPoIB: Fix deadlock between ipoib_open() and child interface create") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-08IB/rxe: Fix a MR reference leak in check_rkey()Bart Van Assche
[ Upstream commit b3a459961014b14c267544c327db033669493295 ] Avoid that calling check_rkey() for mem->state == RXE_MEM_STATE_FREE triggers an MR reference leak. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-08IB/rxe: Add a runtime check in alloc_index()Bart Van Assche
[ Upstream commit 642c7cbcaf2ffc1e27f67eda3dc47347ac5aff37 ] Since index values equal to or above 'range' can trigger memory corruption, complain if index >= range. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05iw_cxgb4: put ep reference in pass_accept_req()Steve Wise
commit 3d318605f5e32ff44fb290d9b67573b34213c4c8 upstream. The listening endpoint should always be dereferenced at the end of pass_accept_req(). Fixes: f86fac79afec ("RDMA/iw_cxgb4: atomic find and reference for listening endpoints") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05iw_cxgb4: remove the stid on listen create failureSteve Wise
commit 8b1bbf36b7452c4acb20e91948eaa5e225ea6978 upstream. If a listen create fails, then the server tid (stid) is incorrectly left in the stid idr table, which can cause a touch-after-free if the stid is looked up and the already freed endpoint is touched. So make sure and remove it in the error path. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27IB/addr: Fix setting source address in addr6_resolve()Roland Dreier
commit 79e25959403e6a79552db28a87abed34de32a1df upstream. Commit eea40b8f624f ("infiniband: call ipv6 route lookup via the stub interface") introduced a regression in address resolution when connecting to IPv6 destination addresses. The old code called ip6_route_output(), while the new code calls ipv6_stub->ipv6_dst_lookup(). The two are almost the same, except that ipv6_dst_lookup() also calls ip6_route_get_saddr() if the source address is in6addr_any. This means that the test of ipv6_addr_any(&fl6.saddr) now never succeeds, and so we never copy the source address out. This ends up causing rdma_resolve_addr() to fail, because without a resolved source address, cma_acquire_dev() will fail to find an RDMA device to use. For me, this causes connecting to an NVMe over Fabrics target via RoCE / IPv6 to fail. Fix this by copying out fl6.saddr if ipv6_addr_any() is true for the original source address passed into addr6_resolve(). We can drop our call to ipv6_dev_get_saddr() because ipv6_dst_lookup() already does that work. Fixes: eea40b8f624 ("infiniband: call ipv6 route lookup via the stub interface") Signed-off-by: Roland Dreier <roland@purestorage.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27IB/{qib, hfi1}: Avoid flow control testing for RDMA write operationMike Marciniszyn
commit 5b0ef650bd0f820e922fcc42f1985d4621ae19cf upstream. Section 9.7.7.2.5 of the 1.3 IBTA spec clearly says that receive credits should never apply to RDMA write. qib and hfi1 were doing that. The following situation will result in a QP hang: - A prior SEND or RDMA_WRITE with immmediate consumed the last credit for a QP using RC receive buffer credits - The prior op is acked so there are no more acks - The peer ULP fails to post receive for some reason - An RDMA write sees that the credits are exhausted and waits - The peer ULP posts receive buffers - The ULP posts a send or RDMA write that will be hung The fix is to avoid the credit test for the RDMA write operation. Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11iw_cxgb4: do not send RX_DATA_ACK CPLs after close/abortSteve Wise
[ Upstream commit 3bcf96e0183f5c863657cb6ae9adad307a0f6071 ] Function rx_data(), which handles ingress CPL_RX_DATA messages, was always sending an RX_DATA_ACK with the goal of updating the credits. However, if the RDMA connection is moved out of FPDU mode abruptly, then it is possible for iw_cxgb4 to process queued RX_DATA CPLs after HW has aborted the connection. These CPLs should not trigger RX_DATA_ACKS. If they do, HW can see a READ after DELETE of the DB_LE hash entry for the tid and post a LE_DB HashTblMemCrcError. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-06net/mlx4_core: Fix raw qp flow steering rules under SRIOVJack Morgenstein
[ Upstream commit 10b1c04e92229ebeb38ccd0dcf2b6d3ec73c0575 ] Demoting simple flow steering rule priority (for DPDK) was achieved by wrapping FW commands MLX4_QP_FLOW_STEERING_ATTACH/DETACH for the PF as well, and forcing the priority to MLX4_DOMAIN_NIC in the wrapper function for the PF and all VFs. In function mlx4_ib_create_flow(), this change caused the main rule creation for the PF to be wrapped, while it left the associated tunnel steering rule creation unwrapped for the PF. This mismatch caused rule deletion failures in mlx4_ib_destroy_flow() for the PF when the detach wrapper function did not find the associated tunnel-steering rule (since creation of that rule for the PF did not go through the wrapper function). Fix this by setting MLX4_QP_FLOW_STEERING_ATTACH/DETACH to be "native" (so that the PF invocation does not go through the wrapper), and perform the required priority demotion for the PF in the mlx4_ib_create_flow() code path. Fixes: 48564135cba8 ("net/mlx4_core: Demote simple multicast and broadcast flow steering rules") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-06RDMA/uverbs: Fix the check for port numberIsmail, Mustafa
commit 5a7a88f1b488e4ee49eb3d5b82612d4d9ffdf2c3 upstream. The port number is only valid if IB_QP_PORT is set in the mask. So only check port number if it is valid to prevent modify_qp from failing due to an invalid port number. Fixes: 5ecce4c9b17b("Check port number supplied by user verbs cmds") Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] arrayBart Van Assche
commit 99975cd4fda52974a767aa44fe0b1a8f74950d9d upstream. ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger than what fits into a single MR. .map_mr_sg() must not attempt to map more SG-list elements than what fits into a single MR. Hence make sure that mlx5_ib_sg_to_klms() does not write outside the MR klms[] array. Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Israel Rukshin <israelr@mellanox.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27RDMA/core: Initialize port_num in qp_attrIsmail, Mustafa
commit a62ab66b13a0f9bcb17b7b761f6670941ed5cd62 upstream. Initialize the port_num for iWARP in rdma_init_qp_attr. Fixes: 5ecce4c9b17b("Check port number supplied by user verbs cmds") Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27iser-target: Avoid isert_conn->cm_id dereference in isert_login_recv_doneNicholas Bellinger
commit fce50a2fa4e9c6e103915c351b6d4a98661341d6 upstream. This patch fixes a NULL pointer dereference in isert_login_recv_done() of isert_conn->cm_id due to isert_cma_handler() -> isert_connect_error() resetting isert_conn->cm_id = NULL during a failed login attempt. As per Sagi, we will always see the completion of all recv wrs posted on the qp (given that we assigned a ->done handler), this is a FLUSH error completion, we just don't get to verify that because we deref NULL before. The issue here, was the assumption that dereferencing the connection cm_id is always safe, which is not true since: commit 4a579da2586bd3b79b025947ea24ede2bbfede62 Author: Sagi Grimberg <sagig@mellanox.com> Date: Sun Mar 29 15:52:04 2015 +0300 iser-target: Fix possible deadlock in RDMA_CM connection error As I see it, we have a direct reference to the isert_device from isert_conn which is the one-liner fix that we actually need like we do in isert_rdma_read_done() and isert_rdma_write_done(). Reported-by: Andrea Righi <righi.andrea@gmail.com> Tested-by: Andrea Righi <righi.andrea@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27IB/core: Namespace is mandatory input for address resolutionMoni Shoua
commit bebb2a473a43c8f84a8210687d1cbdde503046d7 upstream. In function addr_resolve() the namespace is a required input parameter and not an output. It is passed later for searching the routing table and device addresses. Also, it shouldn't be copied back to the caller. Fixes: 565edd1d5555 ('IB/addr: Pass network namespace as a parameter') Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27IB/iser: Fix connection teardown race conditionVladimir Neyelov
commit c8c16d3bae967f1c7af541e8d016e5c51e4f010a upstream. Under heavy iser target(scst) start/stop stress during login/logout on iser intitiator side happened trace call provided below. The function iscsi_iser_slave_alloc iser_conn pointer could be NULL, due to the fact that function iscsi_iser_conn_stop can be called before and free iser connection. Let's protect that flow by introducing global mutex. BUG: unable to handle kernel paging request at 0000000000001018 IP: [<ffffffffc0426f7e>] iscsi_iser_slave_alloc+0x1e/0x50 [ib_iser] Call Trace: ? scsi_alloc_sdev+0x242/0x300 scsi_probe_and_add_lun+0x9e1/0xea0 ? kfree_const+0x21/0x30 ? kobject_set_name_vargs+0x76/0x90 ? __pm_runtime_resume+0x5b/0x70 __scsi_scan_target+0xf6/0x250 scsi_scan_target+0xea/0x100 iscsi_user_scan_session.part.13+0x101/0x130 [scsi_transport_iscsi] ? iscsi_user_scan_session.part.13+0x130/0x130 [scsi_transport_iscsi] iscsi_user_scan_session+0x1e/0x30 [scsi_transport_iscsi] device_for_each_child+0x50/0x90 iscsi_user_scan+0x44/0x60 [scsi_transport_iscsi] store_scan+0xa8/0x100 ? common_file_perm+0x5d/0x1c0 dev_attr_store+0x18/0x30 sysfs_kf_write+0x37/0x40 kernfs_fop_write+0x12c/0x1c0 __vfs_write+0x18/0x40 vfs_write+0xb5/0x1a0 SyS_write+0x55/0xc0 Fixes: 318d311e8f01 ("iser: Accept arbitrary sg lists mapping if the device supports it") Signed-off-by: Vladimir Neyelov <vladimirn@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimbeg.me> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12RDMA/uverbs: Check port number supplied by user verbs cmdsBoris Pismenny
commit 5ecce4c9b17bed4dc9cb58bfb10447307569b77b upstream. The ib_uverbs_create_ah() ind ib_uverbs_modify_qp() calls receive the port number from user input as part of its attributes and assumes it is valid. Down on the stack, that parameter is used to access kernel data structures. If the value is invalid, the kernel accesses memory it should not. To prevent this, verify the port number before using it. BUG: KASAN: use-after-free in ib_uverbs_create_ah+0x6d5/0x7b0 Read of size 4 at addr ffff880018d67ab8 by task syz-executor/313 BUG: KASAN: slab-out-of-bounds in modify_qp.isra.4+0x19d0/0x1ef0 Read of size 4 at addr ffff88006c40ec58 by task syz-executor/819 Fixes: 67cdb40ca444 ("[IB] uverbs: Implement more commands") Cc: Yevgeny Kliteynik <kliteyn@mellanox.com> Cc: Tziporet Koren <tziporet@mellanox.com> Cc: Alex Polak <alexpo@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-05infiniband: hns: avoid gcc-7.0.1 warning for uninitialized dataArnd Bergmann
commit 5b0ff9a00755d4d9c209033a77f1ed8f3186fe5c upstream. hns_roce_v1_cq_set_ci() calls roce_set_bit() on an uninitialized field, which will then change only a few of its bits, causing a warning with the latest gcc: infiniband/hw/hns/hns_roce_hw_v1.c: In function 'hns_roce_v1_cq_set_ci': infiniband/hw/hns/hns_roce_hw_v1.c:1854:23: error: 'doorbell[1]' is used uninitialized in this function [-Werror=uninitialized] roce_set_bit(doorbell[1], ROCEE_DB_OTHERS_H_ROCEE_DB_OTH_HW_SYNS_S, 1); The code is actually correct since we always set all bits of the port_vlan field, but gcc correctly points out that the first access does contain uninitialized data. This initializes the field to zero first before setting the individual bits. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24IB/mlx5: Fix kernel to user leak prevention logicEli Cohen
commit de8d6e02efbdb259c67832ccf027d7ace9b91d5d upstream. The logic was broken as it failed to update the response length for architectures with PAGE_SIZE larger than 4kB. As a result further extension of the ucontext response struct would fail. Fixes: d69e3bcf7976 ('IB/mlx5: Mmap the HCA's core clock register to user-space') Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17RDMA/qedr: Return max inline data in QP query resultRam Amrani
[ Upstream commit 59e8970b3798e4cbe575ed9cf4d53098760a2a86 ] Return the maximum supported amount of inline data, not the qp's current configured inline data size, when filling out the results of a query qp call. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17RDMA/qedr: Don't spam dmesg if QP is in error stateRam Amrani
[ Upstream commit c78c31496111f497b4a03f955c100091185da8b6 ] It is normal to flush CQEs if the QP is in error state. Hence there's no use in printing a message per CQE to dmesg. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17RDMA/qedr: Don't reset QP when queues aren't flushedRam Amrani
[ Upstream commit 933e6dcaa0f65eb2f624ad760274020874a1f35e ] Fail QP state transition from error to reset if SQ/RQ are not empty and still in the process of flushing out the queued work entries. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17RDMA/qedr: Fix and simplify memory leak in PD allocRam Amrani
[ Upstream commit 9c1e0228ab35e52d30abf4b5629c28350833fbcb ] Free the PD if no internal resources were available. Move userspace code under the relevant 'if'. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17RDMA/qedr: Dispatch port active event from qedr_addRam Amrani
[ Upstream commit f449c7a2d822c2d81b5bcb2c50eec80796766726 ] Relying on qede to trigger qedr on startup is problematic. When probing both if qedr loads slowly then qede can assume qedr is missing and not trigger it. This patch adds a triggering from qedr and protects against a race via an atomic bit. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07RDMA/qib,hfi1: Fix MR reference count leak on write with immediateMike Marciniszyn
commit 1feb40067cf04ae48d65f728d62ca255c9449178 upstream. The handling of IB_RDMA_WRITE_ONLY_WITH_IMMEDIATE will leak a memory reference when a buffer cannot be allocated for returning the immediate data. The issue is that the rkey validation has already occurred and the RNR nak fails to release the reference that was fruitlessly gotten. The the peer will send the identical single packet request when its RNR timer pops. The fix is to release the held reference prior to the rnr nak exit. This is the only sequence the requires both rkey validation and the buffer allocation on the same packet. Tested-by: Tadeusz Struk <tadeusz.struk@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25IB/hfi1: Fix a subcontext memory leakMichael J. Ruhl
commit 224d71f910102c966cdcd782c97e096d5e26e4da upstream. The only context that frees user_exp_rcv data structures is the last context closed (from a sub-context set). This leaks the allocations from the other sub-contexts. Separate the common frees from the specific frees and call them at the appropriate time. Using KEDR to check for memory leaks we get: Before test: [leak_check] Possible leaks: 25 After test: [leak_check] Possible leaks: 31 (6 leaked data structures) After patch applied (before and after test have the same value) [leak_check] Possible leaks: 25 Each leak is 192 + 13440 + 6720 = 20352 bytes per sub-context. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25IB/hfi1: Return an error on memory allocation failureMichael J. Ruhl
commit 94679061dcdddbafcf24e3bfb526e54dedcc2f2f upstream. If the eager buffer allocation fails, it is necessary to return an error code. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25infiniband: call ipv6 route lookup via the stub interfacePaolo Abeni
commit eea40b8f624f25cbc02d55f2d93203f60cee9341 upstream. The infiniband address handle can be triggered to resolve an ipv6 address in response to MAD packets, regardless of the ipv6 module being disabled via the kernel command line argument. That will cause a call into the ipv6 routing code, which is not initialized, and a conseguent oops. This commit addresses the above issue replacing the direct lookup call with an indirect one via the ipv6 stub, which is properly initialized according to the ipv6 status (e.g. if ipv6 is disabled, the routing lookup fails gracefully) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25mlx5: Fix mlx5_ib_map_mr_sg mr lengthSagi Grimberg
commit 0a49f2c31c3efbeb0de3e4b5598764887f629be2 upstream. In case we got an initial sg_offset, we need to account for it in the mr length. Fixes: ff2ba9936591 ("IB/core: Add passing an offset into the SG to ib_map_mr_sg") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20IB/hfi1: Prevent kernel QP post send hard lockupsMike Marciniszyn
commit b6eac931b9bb2bce4db7032c35b41e5e34ec22a5 upstream. The driver progress routines can call cond_resched() when a timeslice is exhausted and irqs are enabled. If the ULP had been holding a spin lock without disabling irqs and the post send directly called the progress routine, the cond_resched() could yield allowing another thread from the same ULP to deadlock on that same lock. Correct by replacing the current hfi1_do_send() calldown with a unique one for post send and adding an argument to hfi1_do_send() to indicate that the send engine is running in a thread. If the routine is not running in a thread, avoid calling cond_resched(). Fixes: Commit 831464ce4b74 ("IB/hfi1: Don't call cond_resched in atomic mode when sending packets") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20IB/mlx4: Reduce SRIOV multicast cleanup warning message to debug levelJack Morgenstein
commit fb7a91746af18b2ebf596778b38a709cdbc488d3 upstream. A warning message during SRIOV multicast cleanup should have actually been a debug level message. The condition generating the warning does no harm and can fill the message log. In some cases, during testing, some tests were so intense as to swamp the message log with these warning messages, causing a stall in the console message log output task. This stall caused an NMI to be sent to all CPUs (so that they all dumped their stacks into the message log). Aside from the message flood causing an NMI, the tests all passed. Once the message flood which caused the NMI is removed (by reducing the warning message to debug level), the NMI no longer occurs. Sample message log (console log) output illustrating the flood and resultant NMI (snippets with comments and modified with ... instead of hex digits, to satisfy checkpatch.pl): <mlx4_ib> _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!... *** About 4000 almost identical lines in less than one second *** <mlx4_ib> _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!... INFO: rcu_sched detected stalls on CPUs/tasks: { 17} (...) *** { 17} above indicates that CPU 17 was the one that stalled *** sending NMI to all CPUs: ... NMI backtrace for cpu 17 CPU: 17 PID: 45909 Comm: kworker/17:2 Hardware name: HP ProLiant DL360p Gen8, BIOS P71 09/08/2013 Workqueue: events fb_flashcursor task: ffff880478...... ti: ffff88064e...... task.ti: ffff88064e...... RIP: 0010:[ffffffff81......] [ffffffff81......] io_serial_in+0x15/0x20 RSP: 0018:ffff88064e257cb0 EFLAGS: 00000002 RAX: 0000000000...... RBX: ffffffff81...... RCX: 0000000000...... RDX: 0000000000...... RSI: 0000000000...... RDI: ffffffff81...... RBP: ffff88064e...... R08: ffffffff81...... R09: 0000000000...... R10: 0000000000...... R11: ffff88064e...... R12: 0000000000...... R13: 0000000000...... R14: ffffffff81...... R15: 0000000000...... FS: 0000000000......(0000) GS:ffff8804af......(0000) knlGS:000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080...... CR2: 00007f2a2f...... CR3: 0000000001...... CR4: 0000000000...... DR0: 0000000000...... DR1: 0000000000...... DR2: 0000000000...... DR3: 0000000000...... DR6: 00000000ff...... DR7: 0000000000...... Stack: ffff88064e...... ffffffff81...... ffffffff81...... 0000000000...... ffffffff81...... ffff88064e...... ffffffff81...... ffffffff81...... ffffffff81...... ffff88064e...... ffffffff81...... 0000000000...... Call Trace: [<ffffffff813d099b>] wait_for_xmitr+0x3b/0xa0 [<ffffffff813d0b5c>] serial8250_console_putchar+0x1c/0x30 [<ffffffff813d0b40>] ? serial8250_console_write+0x140/0x140 [<ffffffff813cb5fa>] uart_console_write+0x3a/0x80 [<ffffffff813d0aae>] serial8250_console_write+0xae/0x140 [<ffffffff8107c4d1>] call_console_drivers.constprop.15+0x91/0xf0 [<ffffffff8107d6cf>] console_unlock+0x3bf/0x400 [<ffffffff813503cd>] fb_flashcursor+0x5d/0x140 [<ffffffff81355c30>] ? bit_clear+0x120/0x120 [<ffffffff8109d5fb>] process_one_work+0x17b/0x470 [<ffffffff8109e3cb>] worker_thread+0x11b/0x400 [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400 [<ffffffff810a5aef>] kthread+0xcf/0xe0 [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [<ffffffff81645858>] ret_from_fork+0x58/0x90 [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 Code: 48 89 e5 d3 e6 48 63 f6 48 03 77 10 8b 06 5d c3 66 0f 1f 44 00 00 66 66 66 6 As indicated in the stack trace above, the console output task got swamped. Fixes: b9c5d6a64358 ("IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20IB/mlx4: Fix ib device initialization error flowJack Morgenstein
commit 99e68909d5aba1861897fe7afc3306c3c81b6de0 upstream. In mlx4_ib_add, procedure mlx4_ib_alloc_eqs is called to allocate EQs. However, in the mlx4_ib_add error flow, procedure mlx4_ib_free_eqs is not called to free the allocated EQs. Fixes: e605b743f33d ("IB/mlx4: Increase the number of vectors (EQs) available for ULPs") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20IB/IPoIB: ibX: failed to create mcg debug fileShamir Rabinovitch
commit 771a52584096c45e4565e8aabb596eece9d73d61 upstream. When udev renames the netdev devices, ipoib debugfs entries does not get renamed. As a result, if subsequent probe of ipoib device reuse the name then creating a debugfs entry for the new device would fail. Also, moved ipoib_create_debug_files and ipoib_delete_debug_files as part of ipoib event handling in order to avoid any race condition between these. Fixes: 1732b0ef3b3a ([IPoIB] add path record information in debugfs) Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20IB/core: For multicast functions, verify that LIDs are multicast LIDsMichael J. Ruhl
commit 8561eae60ff9417a50fa1fb2b83ae950dc5c1e21 upstream. The Infiniband spec defines "A multicast address is defined by a MGID and a MLID" (section 10.5). Currently the MLID value is not validated. Add check to verify that the MLID value is in the correct address range. Fixes: 0c33aeedb2cf ("[IB] Add checks to multicast attach and detach") Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20IB/core: Fix sysfs registration error flowJack Morgenstein
commit b312be3d87e4c80872cbea869e569175c5eb0f9a upstream. The kernel commit cited below restructured ib device management so that the device kobject is initialized in ib_alloc_device. As part of the restructuring, the kobject is now initialized in procedure ib_alloc_device, and is later added to the device hierarchy in the ib_register_device call stack, in procedure ib_device_register_sysfs (which calls device_add). However, in the ib_device_register_sysfs error flow, if an error occurs following the call to device_add, the cleanup procedure device_unregister is called. This call results in the device object being deleted -- which results in various use-after-free crashes. The correct cleanup call is device_del -- which undoes device_add without deleting the device object. The device object will then (correctly) be deleted in the ib_register_device caller's error cleanup flow, when the caller invokes ib_dealloc_device. Fixes: 55aeed06544f6 ("IB/core: Make ib_alloc_device init the kobject") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-18IB/mlx5: Verify that Q counters are supportedKamal Heib
commit 45bded2c216da6010184ac5ebe88c27f73439009 upstream. Make sure that the Q counters are supported by the FW before trying to allocate/deallocte them, this will avoid driver load failure when they aren't supported by the FW. Fixes: 0837e86a7a34 ('IB/mlx5: Add per port counters') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15IB/srp: Fix race conditions related to task managementBart Van Assche
commit 0a6fdbdeb1c25e31763c1fb333fa2723a7d2aba6 upstream. Avoid that srp_process_rsp() overwrites the status information in ch if the SRP target response timed out and processing of another task management function has already started. Avoid that issuing multiple task management functions concurrently triggers list corruption. This patch prevents that the following stack trace appears in the system log: WARNING: CPU: 8 PID: 9269 at lib/list_debug.c:52 __list_del_entry_valid+0xbc/0xc0 list_del corruption. prev->next should be ffffc90004bb7b00, but was ffff8804052ecc68 CPU: 8 PID: 9269 Comm: sg_reset Tainted: G W 4.10.0-rc7-dbg+ #3 Call Trace: dump_stack+0x68/0x93 __warn+0xc6/0xe0 warn_slowpath_fmt+0x4a/0x50 __list_del_entry_valid+0xbc/0xc0 wait_for_completion_timeout+0x12e/0x170 srp_send_tsk_mgmt+0x1ef/0x2d0 [ib_srp] srp_reset_device+0x5b/0x110 [ib_srp] scsi_ioctl_reset+0x1c7/0x290 scsi_ioctl+0x12a/0x420 sd_ioctl+0x9d/0x100 blkdev_ioctl+0x51e/0x9f0 block_ioctl+0x38/0x40 do_vfs_ioctl+0x8f/0x700 SyS_ioctl+0x3c/0x70 entry_SYSCALL_64_fastpath+0x18/0xad Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Israel Rukshin <israelr@mellanox.com> Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: Steve Feeley <Steve.Feeley@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15IB/srp: Avoid that duplicate responses trigger a kernel bugBart Van Assche
commit 6cb72bc1b40bb2c1750ee7a5ebade93bed49a5fb upstream. After srp_process_rsp() returns there is a short time during which the scsi_host_find_tag() call will return a pointer to the SCSI command that is being completed. If during that time a duplicate response is received, avoid that the following call stack appears: BUG: unable to handle kernel NULL pointer dereference at (null) IP: srp_recv_done+0x450/0x6b0 [ib_srp] Oops: 0000 [#1] SMP CPU: 10 PID: 0 Comm: swapper/10 Not tainted 4.10.0-rc7-dbg+ #1 Call Trace: <IRQ> __ib_process_cq+0x4b/0xd0 [ib_core] ib_poll_handler+0x1d/0x70 [ib_core] irq_poll_softirq+0xba/0x120 __do_softirq+0xba/0x4c0 irq_exit+0xbe/0xd0 smp_apic_timer_interrupt+0x38/0x50 apic_timer_interrupt+0x90/0xa0 </IRQ> RIP: srp_recv_done+0x450/0x6b0 [ib_srp] RSP: ffff88046f483e20 Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Israel Rukshin <israelr@mellanox.com> Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: Steve Feeley <Steve.Feeley@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15IB/SRP: Avoid using IB_MR_TYPE_SG_GAPSBart Van Assche
commit d6c58dc40fec35ff6cdb350b53bce0fcf9143709 upstream. Tests have shown that the following error message is reported when using SG-GAPS registration with an mlx5 adapter: scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff880bd4270eb0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0f007806 2500002a ad9fafd1 scsi host1: ib_srp: reconnect succeeded mlx5_0:dump_cqe:262:(pid 7369): dump error cqe 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0f007806 25000032 00105dd0 scsi host1: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff880b92860138 Hence avoid using SG-GAPS memory registrations. Additionally, always configure the blk_queue_virt_boundary() to avoid to trigger a mapping failure when using adapters that support SG-GAPS (e.g. mlx5). Fixes: commit ad8e66b4a801 ("IB/srp: fix mr allocation when the device supports sg gaps") Fixes: commit 509c5f33f4f6 ("IB/srp: Prevent mapping failures") Reported-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Israel Rukshin <israelr@mellanox.com> Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mark Bloch <markb@mellanox.com> Cc: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15IB/mlx5: Fix out-of-bound accessLeon Romanovsky
commit 0fd27a88c2e4f548937fd7d93fc6e65c4ad7c278 upstream. When we initialize buffer to create SRQ in kernel, the number of pages was less than actually used in following mlx5_fill_page_array(). Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15IB/IPoIB: Add destination address when re-queue packetErez Shitrit
commit 2b0841766a898aba84630fb723989a77a9d3b4e6 upstream. When sending packet to destination that was not resolved yet via path query, the driver keeps the skb and tries to re-send it again when the path is resolved. But when re-sending via dev_queue_xmit the kernel doesn't call to dev_hard_header, so IPoIB needs to keep 20 bytes in the skb and to put the destination address inside them. In that way the dev_start_xmit will have the correct destination, and the driver won't take the destination from the skb->data, while nothing exists there, which causes to packet be be dropped. The test flow is: 1. Run the SM on remote node, 2. Restart the driver. 4. Ping some destination, 3. Observe that first ICMP request will be dropped. Fixes: fc791b633515 ("IB/ipoib: move back IB LL address into the hard header") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Tested-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15IB/ipoib: Fix deadlock between rmmod and set_modeFeras Daoud
commit 0a0007f28304cb9fc87809c86abb80ec71317f20 upstream. When calling set_mode from sys/fs, the call flow locks the sys/fs lock first and then tries to lock rtnl_lock (when calling ipoib_set_mod). On the other hand, the rmmod call flow takes the rtnl_lock first (when calling unregister_netdev) and then tries to take the sys/fs lock. Deadlock a->b, b->a. The problem starts when ipoib_set_mod frees it's rtnl_lck and tries to get it after that. set_mod: [<ffffffff8104f2bd>] ? check_preempt_curr+0x6d/0x90 [<ffffffff814fee8e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff81448655>] ? __rtnl_unlock+0x15/0x20 [<ffffffff814fed2b>] mutex_lock+0x2b/0x50 [<ffffffff81448675>] rtnl_lock+0x15/0x20 [<ffffffffa02ad807>] ipoib_set_mode+0x97/0x160 [ib_ipoib] [<ffffffffa02b5f5b>] set_mode+0x3b/0x80 [ib_ipoib] [<ffffffff8134b840>] dev_attr_store+0x20/0x30 [<ffffffff811f0fe5>] sysfs_write_file+0xe5/0x170 [<ffffffff8117b068>] vfs_write+0xb8/0x1a0 [<ffffffff8117ba81>] sys_write+0x51/0x90 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b rmmod: [<ffffffff81279ffc>] ? put_dec+0x10c/0x110 [<ffffffff8127a2ee>] ? number+0x2ee/0x320 [<ffffffff814fe6a5>] schedule_timeout+0x215/0x2e0 [<ffffffff8127cc04>] ? vsnprintf+0x484/0x5f0 [<ffffffff8127b550>] ? string+0x40/0x100 [<ffffffff814fe323>] wait_for_common+0x123/0x180 [<ffffffff81060250>] ? default_wake_function+0x0/0x20 [<ffffffff8119661e>] ? ifind_fast+0x5e/0xb0 [<ffffffff814fe43d>] wait_for_completion+0x1d/0x20 [<ffffffff811f2e68>] sysfs_addrm_finish+0x228/0x270 [<ffffffff811f2fb3>] sysfs_remove_dir+0xa3/0xf0 [<ffffffff81273f66>] kobject_del+0x16/0x40 [<ffffffff8134cd14>] device_del+0x184/0x1e0 [<ffffffff8144e59b>] netdev_unregister_kobject+0xab/0xc0 [<ffffffff8143c05e>] rollback_registered+0xae/0x130 [<ffffffff8143c102>] unregister_netdevice+0x22/0x70 [<ffffffff8143c16e>] unregister_netdev+0x1e/0x30 [<ffffffffa02a91b0>] ipoib_remove_one+0xe0/0x120 [ib_ipoib] [<ffffffffa01ed95f>] ib_unregister_device+0x4f/0x100 [ib_core] [<ffffffffa021f5e1>] mlx4_ib_remove+0x41/0x180 [mlx4_ib] [<ffffffffa01ab771>] mlx4_remove_device+0x71/0x90 [mlx4_core] Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks") Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12rdma_cm: fail iwarp accepts w/o connection paramsSteve Wise
commit f2625f7db4dd0bbd16a9c7d2950e7621f9aa57ad upstream. cma_accept_iw() needs to return an error if conn_params is NULL. Since this is coming from user space, we can crash. Reported-by: Shaobo He <shaobo@cs.utah.edu> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-14IB/rxe: Fix mem_check_range integer overflowEyal Itkin
commit 647bf3d8a8e5777319da92af672289b2a6c4dc66 upstream. Update the range check to avoid integer-overflow in edge case. Resolves CVE 2016-8636. Signed-off-by: Eyal Itkin <eyal.itkin@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-14IB/rxe: Fix resid updateEyal Itkin
commit 628f07d33c1f2e7bf31e0a4a988bb07914bd5e73 upstream. Update the response's resid field when larger than MTU, instead of only updating the local resid variable. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Eyal Itkin <eyal.itkin@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-09iw_cxgb4: set correct FetchBurstMax for QPsSteve Wise
commit b414fa01c31318383ae29d9d23cb9ca4184bbd86 upstream. The current QP FetchBurstMax value is 256B, which is incorrect since a WR can exceed that value. The result being a partial WR fetched by hardware, and a fatal "bad WR" error posted by the SGE. So bump the FetchBurstMax to 512B. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>