summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2014-06-08Merge branch 'for-davem' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== pull request: wireless-next 2014-06-06 Please accept this batch of fixes intended for the 3.16 stream. For the bluetooth bits, Gustavo says: "Here some more patches for 3.16. We know that Linus already opened the merge window, but this is fix only pull request, and most of the patches here are also tagged for stable." Along with that, Andrea Merello provides a fix for the broken scanning in the venerable at76c50x driver... ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-08net: force a list_del() in unregister_netdevice_many()Eric Dumazet
unregister_netdevice_many() API is error prone and we had too many bugs because of dangling LIST_HEAD on stacks. See commit f87e6f47933e3e ("net: dont leave active on stack LIST_HEAD") In fact, instead of making sure no caller leaves an active list_head, just force a list_del() in the callee. No one seems to need to access the list after unregister_netdevice_many() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-08Merge tag 'llvmlinux-for-v3.16' of ↵Linus Torvalds
git://git.linuxfoundation.org/llvmlinux/kernel Pull LLVM patches from Behan Webster: "Next set of patches to support compiling the kernel with clang. They've been soaking in linux-next since the last merge window. More still in the works for the next merge window..." * tag 'llvmlinux-for-v3.16' of git://git.linuxfoundation.org/llvmlinux/kernel: arm, unwind, LLVMLinux: Enable clang to be used for unwinding the stack ARM: LLVMLinux: Change "extern inline" to "static inline" in glue-cache.h all: LLVMLinux: Change DWARF flag to support gcc and clang net: netfilter: LLVMLinux: vlais-netfilter crypto: LLVMLinux: aligned-attribute.patch
2014-06-08Merge branch 'next' (accumulated 3.16 merge window patches) into masterLinus Torvalds
Now that 3.15 is released, this merges the 'next' branch into 'master', bringing us to the normal situation where my 'master' branch is the merge window. * accumulated work in next: (6809 commits) ufs: sb mutex merge + mutex_destroy powerpc: update comments for generic idle conversion cris: update comments for generic idle conversion idle: remove cpu_idle() forward declarations nbd: zero from and len fields in NBD_CMD_DISCONNECT. mm: convert some level-less printks to pr_* MAINTAINERS: adi-buildroot-devel is moderated MAINTAINERS: add linux-api for review of API/ABI changes mm/kmemleak-test.c: use pr_fmt for logging fs/dlm/debug_fs.c: replace seq_printf by seq_puts fs/dlm/lockspace.c: convert simple_str to kstr fs/dlm/config.c: convert simple_str to kstr mm: mark remap_file_pages() syscall as deprecated mm: memcontrol: remove unnecessary memcg argument from soft limit functions mm: memcontrol: clean up memcg zoneinfo lookup mm/memblock.c: call kmemleak directly from memblock_(alloc|free) mm/mempool.c: update the kmemleak stack trace for mempool allocations lib/radix-tree.c: update the kmemleak stack trace for radix tree allocations mm: introduce kmemleak_update_trace() mm/kmemleak.c: use %u to print ->checksum ...
2014-06-07net: netfilter: LLVMLinux: vlais-netfilterMark Charlebois
Replaced non-standard C use of Variable Length Arrays In Structs (VLAIS) in xt_repldata.h with a C99 compliant flexible array member and then calculated offsets to the other struct members. These other members aren't referenced by name in this code, however this patch maintains the same memory layout and padding as was previously accomplished using VLAIS. Had the original structure been ordered differently, with the entries VLA at the end, then it could have been a flexible member, and this patch would have been a lot simpler. However since the data stored in this structure is ultimately exported to userspace, the order of this structure can't be changed. This patch makes no attempt to change the existing behavior, merely the way in which the current layout is accomplished using standard C99 constructs. As such the code can now be compiled with either gcc or clang. This version of the patch removes the trailing alignment that the VLAIS structure would allocate in order to simplify the patch. Author: Mark Charlebois <charlebm@gmail.com> Signed-off-by: Mark Charlebois <charlebm@gmail.com> Signed-off-by: Behan Webster <behanw@converseincode.com> Signed-off-by: Vinícius Tinti <viniciustinti@gmail.com>
2014-06-06mac802154: llsec: add forgotten list_del_rcu in key removalPhoebe Buckheister
During key removal, the key object is freed, but not taken out of the llsec key list properly. Fix that. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-06svcrdma: Fence LOCAL_INV work requestsSteve Wise
Fencing forces the invalidate to only happen after all prior send work requests have been completed. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reported by : Devesh Sharma <Devesh.Sharma@Emulex.Com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-06svcrdma: refactor marshalling logicSteve Wise
This patch refactors the NFSRDMA server marshalling logic to remove the intermediary map structures. It also fixes an existing bug where the NFSRDMA server was not minding the device fast register page list length limitations. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
2014-06-06nfsd4: simplify server xdr->next_page useJ. Bruce Fields
The rpc code makes available to the NFS server an array of pages to encod into. The server represents its reply as an xdr buf, with the head pointing into the first page in that array, the pages ** array starting just after that, and the tail (if any) sharing any leftover space in the page used by the head. While encoding, we use xdr_stream->page_ptr to keep track of which page we're currently using. Currently we set xdr_stream->page_ptr to buf->pages, which makes the head a weird exception to the rule that page_ptr always points to the page we're currently encoding into. So, instead set it to buf->pages - 1 (the page actually containing the head), and remove the need for a little unintuitive logic in xdr_get_next_encode_buffer() and xdr_truncate_encode. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-06Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
2014-06-05ip_tunnel: fix possible rtable leakDmitry Popov
ip_rt_put(rt) is always called in "error" branches above, but was missed in skb_cow_head branch. As rt is not yet bound to skb here we have to release it by hand. Signed-off-by: Dmitry Popov <ixaphire@qrator.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-06libceph: add ceph_monc_wait_osdmap()Ilya Dryomov
Add ceph_monc_wait_osdmap(), which will block until the osdmap with the specified epoch is received or timeout occurs. Export both of these as they are going to be needed by rbd. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-06-06libceph: mon_get_version request infrastructureIlya Dryomov
Add support for mon_get_version requests to libceph. This reuses much of the ceph_mon_generic_request infrastructure, with one exception. Older OSDs don't set mon_get_version reply hdr->tid even if the original request had a non-zero tid, which makes it impossible to lookup ceph_mon_generic_request contexts by tid in get_generic_reply() for such replies. As a workaround, we allocate a reply message on the reply path. This can probably interfere with revoke, but I don't see a better way. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-06-06libceph: recognize poolop requests in debugfsIlya Dryomov
Recognize poolop requests in debugfs monc dump, fix prink format specifiers - tid is unsigned. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-06-05ipv6: Shrink udp_v6_mcast_next() to one socket variableSven Wegener
To avoid the confusion of having two variables, shrink the function to only use the parameter variable for looping. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/xen-netback/netback.c net/core/filter.c A filter bug fix overlapped some cleanups and a conversion over to some new insn generation macros. A xen-netback bug fix overlapped the addition of multi-queue support. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05net: filter: fix SKF_AD_PKTTYPE extension on big-endianAlexei Starovoitov
BPF classic->internal converter broke SKF_AD_PKTTYPE extension, since pkt_type_offset() was failing to find skb->pkt_type field which is defined as: __u8 pkt_type:3, fclone:2, ipvs_property:1, peeked:1, nf_trace:1; Fix it by searching for 3 most significant bits and shift them by 5 at run-time Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set") Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Tested-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/nf_tables fixes for net-next This patchset contains fixes for recent updates available in your net-next, they are: 1) Fix double memory allocation for accounting objects that results in a leak, this slipped through with the new quota extension, patch from Mathieu Poirier. 2) Fix broken ordering when adding set element transactions. 3) Make sure that objects are released in reverse order in the abort path, to avoid possible use-after-free when accessing dependencies. 4) Allow to delete several objects (as long as dependencies are fulfilled) by using one batch. This includes changes in the use counter semantics of the nf_tables objects. 5) Fix illegal sleeping allocation from rcu callback. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05bridge: Fix incorrect judgment of promiscToshiaki Makita
br_manage_promisc() incorrectly expects br_auto_port() to return only 0 or 1, while it actually returns flags, i.e., a subset of BR_AUTO_MASK. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05MPLS: Use mpls_features to activate software MPLS GSO segmentationSimon Horman
If an MPLS packet requires segmentation then use mpls_features to determine if the software implementation should be used. As no driver advertises MPLS GSO segmentation this will always be the case. I had not noticed that this was necessary before as software MPLS GSO segmentation was already being used in my test environment. I believe that the reason for that is the skbs in question always had fragments and the driver I used does not advertise NETIF_F_FRAGLIST (which seems to be the case for most drivers). Thus software segmentation was activated by skb_gso_ok(). This introduces the overhead of an extra call to skb_network_protocol() in the case where where CONFIG_NET_MPLS_GSO is set and skb->ip_summed == CHECKSUM_NONE. Thanks to Jesse Gross for prompting me to investigate this. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05Merge branch 'for-upstream' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
2014-06-05ipv4: use skb frags api in udp4_hwcsum()WANG Cong
Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05net: use the new API kvfree()WANG Cong
It is available since v3.15-rc5. Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05dns_resolver: Do not accept domain names longer than 255 charsManuel Schölling
According to RFC1035 "[...] the total length of a domain name (i.e., label octets and label length octets) is restricted to 255 octets or less." Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04vxlan: Add support for UDP checksums (v4 sending, v6 zero csums)Tom Herbert
Added VXLAN link configuration for sending UDP checksums, and allowing TX and RX of UDP6 checksums. Also, call common iptunnel_handle_offloads and added GSO support for checksums. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04gre: Call gso_make_checksumTom Herbert
Call gso_make_checksum. This should have the benefit of using a checksum that may have been previously computed for the packet. This also adds NETIF_F_GSO_GRE_CSUM to differentiate devices that offload GRE GSO with and without the GRE checksum offloaed. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04net: Add GSO support for UDP tunnels with checksumTom Herbert
Added a new netif feature for GSO_UDP_TUNNEL_CSUM. This indicates that a device is capable of computing the UDP checksum in the encapsulating header of a UDP tunnel. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04tcp: Call gso_make_checksumTom Herbert
Call common gso_make_checksum when calculating checksum for a TCP GSO segment. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04net: Support for multiple checksums with gsoTom Herbert
When creating a GSO packet segment we may need to set more than one checksum in the packet (for instance a TCP checksum and UDP checksum for VXLAN encapsulation). To be efficient, we want to do checksum calculation for any part of the packet at most once. This patch adds csum_start offset to skb_gso_cb. This tracks the starting offset for skb->csum which is initially set in skb_segment. When a protocol needs to compute a transport checksum it calls gso_make_checksum which computes the checksum value from the start of transport header to csum_start and then adds in skb->csum to get the full checksum. skb->csum and csum_start are then updated to reflect the checksum of the resultant packet starting from the transport header. This patch also adds a flag to skbuff, encap_hdr_csum, which is set in *gso_segment fucntions to indicate that a tunnel protocol needs checksum calculation Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04l2tp: call udp{6}_set_csumTom Herbert
Call common functions to set checksum for UDP tunnel. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04udp: Generic functions to set checksumTom Herbert
Added udp_set_csum and udp6_set_csum functions to set UDP checksums in packets. These are for simple UDP packets such as those that might be created in UDP tunnels. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04ipv6: Fix regression caused by efe4208 in udp_v6_mcast_next()Sven Wegener
Commit efe4208 ("ipv6: make lookups simpler and faster") introduced a regression in udp_v6_mcast_next(), resulting in multicast packets not reaching the destination sockets under certain conditions. The packet's IPv6 addresses are wrongly compared to the IPv6 addresses from the function's socket argument, which indicates the starting point for looping, instead of the loop variable. If the addresses from the first socket do not match the packet's addresses, no socket in the list will match. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04net: Revert "fib_trie: use seq_file_net rather than seq->private"Sasha Levin
This reverts commit 30f38d2fdd79f13fc929489f7e6e517b4a4bfe63. fib_triestat is surrounded by a big lie: while it claims that it's a seq_file (fib_triestat_seq_open, fib_triestat_seq_show), it isn't: static const struct file_operations fib_triestat_fops = { .owner = THIS_MODULE, .open = fib_triestat_seq_open, .read = seq_read, .llseek = seq_lseek, .release = single_release_net, }; Yes, fib_triestat is just a regular file. A small detail (assuming CONFIG_NET_NS=y) is that while for seq_files you could do seq_file_net() to get the net ptr, doing so for a regular file would be wrong and would dereference an invalid pointer. The fib_triestat lie claimed a victim, and trying to show the file would be bad for the kernel. This patch just reverts the issue and fixes fib_triestat, which still needs a rewrite to either be a seq_file or stop claiming it is. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04xprtrdma: Disconnect on registration failureChuck Lever
If rpcrdma_register_external() fails during request marshaling, the current RPC request is killed. Instead, this RPC should be retried after reconnecting the transport instance. The most likely reason for registration failure with FRMR is a failed post_send, which would be due to a remote transport disconnect or memory exhaustion. These issues can be recovered by a retry. Problems encountered in the marshaling logic itself will not be corrected by trying again, so these should still kill a request. Now that we've added a clean exit for marshaling errors, take the opportunity to defang some BUG_ON's. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Remove BUG_ON() call sitesChuck Lever
If an error occurs in the marshaling logic, fail the RPC request being processed, but leave the client running. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Avoid deadlock when credit window is resetChuck Lever
Update the cwnd while processing the server's reply. Otherwise the next task on the xprt_sending queue is still subject to the old credit window. Currently, no task is awoken if the old congestion window is still exceeded, even if the new window is larger, and a deadlock results. This is an issue during a transport reconnect. Servers don't normally shrink the credit window, but the client does reset it to 1 when reconnecting so the server can safely grow it again. As a minor optimization, remove the hack of grabbing the initial cwnd size (which happens to be RPC_CWNDSCALE) and using that value as the congestion scaling factor. The scaling value is invariant, and we are better off without the multiplication operation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04SUNRPC: Move congestion window constants to header fileChuck Lever
I would like to use one of the RPC client's congestion algorithm constants in transport-specific code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Reset connection timeout after successful reconnectChuck Lever
If the new connection is able to make forward progress, reset the re-establish timeout. Otherwise it keeps growing even if disconnect events are rare. The same behavior as TCP is adopted: reconnect immediately if the transport instance has been able to make some forward progress. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Use macros for reconnection timeout constantsChuck Lever
Clean up: Ensure the same max and min constant values are used everywhere when setting reconnect timeouts. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Allocate missing pagelistShirley Ma
GETACL relies on transport layer to alloc memory for reply buffer. However xprtrdma assumes that the reply buffer (pagelist) has been pre-allocated in upper layer. This problem was reported by IOL OFA lab test on PPC. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Edward Mossman <emossman@iol.unh.edu> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Remove Tavor MTU settingChuck Lever
Clean up. Remove HCA-specific clutter in xprtrdma, which is supposed to be device-independent. Hal Rosenstock <hal@dev.mellanox.co.il> observes: > Note that there is OpenSM option (enable_quirks) to return 1K MTU > in SA PathRecord responses for Tavor so that can be used for this. > The default setting for enable_quirks is FALSE so that would need > changing. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnectingChuck Lever
Devesh Sharma <Devesh.Sharma@Emulex.Com> reports that after a disconnect, his HCA is failing to create a fresh QP, leaving ia_ri->ri_id->qp set to NULL. But xprtrdma still allows RPCs to wake up and post LOCAL_INV as they exit, causing an oops. rpcrdma_ep_connect() is allowing the wake-up by leaking the QP creation error code (-EPERM in this case) to the RPC client's generic layer. xprt_connect_status() does not recognize -EPERM, so it kills pending RPC tasks immediately rather than retrying the connect. Re-arrange the QP creation logic so that when it fails on reconnect, it leaves ->qp with the old QP rather than NULL. If pending RPC tasks wake and exit, LOCAL_INV work requests will flush rather than oops. On initial connect, leaving ->qp == NULL is OK, since there are no pending RPCs that might use ->qp. But be sure not to try to destroy a NULL QP when rpcrdma_ep_connect() is retried. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Reduce the number of hardway buffer allocationsChuck Lever
While marshaling an RPC/RDMA request, the inline_{rsize,wsize} settings determine whether an inline request is used, or whether read or write chunks lists are built. The current default value of these settings is 1024. Any RPC request smaller than 1024 bytes is sent to the NFS server completely inline. rpcrdma_buffer_create() allocates and pre-registers a set of RPC buffers for each transport instance, also based on the inline rsize and wsize settings. RPC/RDMA requests and replies are built in these buffers. However, if an RPC/RDMA request is expected to be larger than 1024, a buffer has to be allocated and registered for that RPC, and deregistered and released when the RPC is complete. This is known has a "hardway allocation." Since the introduction of NFSv4, the size of RPC requests has become larger, and hardway allocations are thus more frequent. Hardway allocations are significant overhead, and they waste the existing RPC buffers pre-allocated by rpcrdma_buffer_create(). We'd like fewer hardway allocations. Increasing the size of the pre-registered buffers is the most direct way to do this. However, a blanket increase of the inline thresholds has interoperability consequences. On my 64-bit system, rpcrdma_buffer_create() requests roughly 7000 bytes for each RPC request buffer, using kmalloc(). Due to internal fragmentation, this wastes nearly 1200 bytes because kmalloc() already returns an 8192-byte piece of memory for a 7000-byte allocation request, though the extra space remains unused. So let's round up the size of the pre-allocated buffers, and make use of the unused space in the kmalloc'd memory. This change reduces the amount of hardway allocated memory for an NFSv4 general connectathon run from 1322092 to 9472 bytes (99%). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Limit work done by completion handlerChuck Lever
Sagi Grimberg <sagig@dev.mellanox.co.il> points out that a steady stream of CQ events could starve other work because of the boundless loop pooling in rpcrdma_{send,recv}_poll(). Instead of a (potentially infinite) while loop, return after collecting a budgeted number of completions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Sagi Grimberg <sagig@dev.mellanox.co.il> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrmda: Reduce calls to ib_poll_cq() in completion handlersChuck Lever
Change the completion handlers to grab up to 16 items per ib_poll_cq() call. No extra ib_poll_cq() is needed if fewer than 16 items are returned. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrmda: Reduce lock contention in completion handlersChuck Lever
Skip the ib_poll_cq() after re-arming, if the provider knows there are no additional items waiting. (Have a look at commit ed23a727 for more details). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Split the completion queueChuck Lever
The current CQ handler uses the ib_wc.opcode field to distinguish between event types. However, the contents of that field are not reliable if the completion status is not IB_WC_SUCCESS. When an error completion occurs on a send event, the CQ handler schedules a tasklet with something that is not a struct rpcrdma_rep. This is never correct behavior, and sometimes it results in a panic. To resolve this issue, split the completion queue into a send CQ and a receive CQ. The send CQ handler now handles only struct rpcrdma_mw wr_id's, and the receive CQ handler now handles only struct rpcrdma_rep wr_id's. Fix suggested by Shirley Ma <shirley.ma@oracle.com> Reported-by: Rafael Reiter <rafael.reiter@ims.co.at> Fixes: 5c635e09cec0feeeb310968e51dad01040244851 BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=73211 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Klemens Senn <klemens.senn@ims.co.at> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Make rpcrdma_ep_destroy() return voidChuck Lever
Clean up: rpcrdma_ep_destroy() returns a value that is used only to print a debugging message. rpcrdma_ep_destroy() already prints debugging messages in all error cases. Make rpcrdma_ep_destroy() return void instead. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: Simplify rpcrdma_deregister_external() synopsisChuck Lever
Clean up: All remaining callers of rpcrdma_deregister_external() pass NULL as the last argument, so remove that argument. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04xprtrdma: mount reports "Invalid mount option" if memreg mode not supportedChuck Lever
If the selected memory registration mode is not supported by the underlying provider/HCA, the NFS mount command reports that there was an invalid mount option, and fails. This is misleading. Reporting a problem allocating memory is a lot closer to the truth. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>