summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2015-10-22batman-adv: Fix potentially broken skb network header accessLinus Lüssing
commit 53cf037bf846417fd92dc92ddf97267f69b110f4 upstream. The two commits noted below added calls to ip_hdr() and ipv6_hdr(). They need a correctly set skb network header. Unfortunately we cannot rely on the device drivers to set it for us. Therefore setting it in the beginning of the according ndo_start_xmit handler. Fixes: 1d8ab8d3c176 ("batman-adv: Modified forwarding behaviour for multicast packets") Fixes: ab49886e3da7 ("batman-adv: Add IPv4 link-local/IPv6-ll-all-nodes multicast support") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22batman-adv: Fix potential synchronization issues in mcast tvlv handlerLinus Lüssing
commit 8a4023c5b5e30b11f1f383186f4a7222b3b823cf upstream. So far the mcast tvlv handler did not anticipate the processing of multiple incoming OGMs from the same originator at the same time. This can lead to various issues: * Broken refcounting: For instance two mcast handlers might both assume that an originator just got multicast capabilities and will together wrongly decrease mcast.num_disabled by two, potentially leading to an integer underflow. * Potential kernel panic on hlist_del_rcu(): Two mcast handlers might one after another try to do an hlist_del_rcu(&orig->mcast_want_all_*_node). The second one will cause memory corruption / crashes. (Reported by: Sven Eckelmann <sven@narfation.org>) Right in the beginning the code path makes assumptions about the current multicast related state of an originator and bases all updates on that. The easiest and least error prune way to fix the issues in this case is to serialize multiple mcast handler invocations with a spinlock. Fixes: 60432d756cf0 ("batman-adv: Announce new capability via multicast TVLV") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22batman-adv: Make MCAST capability changes atomicLinus Lüssing
commit 9c936e3f4c4fad07abb6c082a89508b8f724c88f upstream. Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One OGM handler might undo the set/clear of a specific bit from another handler run in between. Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions. Fixes: 60432d756cf0 ("batman-adv: Announce new capability via multicast TVLV") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22batman-adv: Make TT capability changes atomicLinus Lüssing
commit ac4eebd48461ec993e7cb614d5afe7df8c72e6b7 upstream. Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One OGM handler might undo the set/clear of a specific bit from another handler run in between. Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions. Fixes: e17931d1a61d ("batman-adv: introduce capability initialization bitfield") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22batman-adv: Make NC capability changes atomicLinus Lüssing
commit 4635469f5c617282f18c69643af36cd8c0acf707 upstream. Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One OGM handler might undo the set/clear of a specific bit from another handler run in between. Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions. Fixes: 3f4841ffb336 ("batman-adv: tvlv - add network coding container") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22batman-adv: Make DAT capability changes atomicLinus Lüssing
commit 65d7d46050704bcdb8121ddbf4110bfbf2b38baa upstream. Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One OGM handler might undo the set/clear of a specific bit from another handler run in between. Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions. Fixes: 17cf0ea455f1 ("batman-adv: tvlv - add distributed arp table container") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22Bluetooth: Delay check for conn->smp in smp_conn_security()Johan Hedberg
commit d8949aad3eab5d396f4fefcd581773bf07b9a79e upstream. There are several actions that smp_conn_security() might make that do not require a valid SMP context (conn->smp pointer). One of these actions is to encrypt the link with an existing LTK. If the SMP context wasn't initialized properly we should still allow the independent actions to be done, i.e. the check for the context should only be done at the last possible moment. Reported-by: Chuck Ebbert <cebbert.lkml@gmail.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22netfilter: nf_log: don't zap all loggers on unregisterFlorian Westphal
commit 205ee117d4dc4a11ac3bd9638bb9b2e839f4de9a upstream. like nf_log_unset, nf_log_unregister must not reset the list of loggers. Otherwise, a call to nf_log_unregister() will render loggers of other nf protocols unusable: iptables -A INPUT -j LOG modprobe nf_log_arp ; rmmod nf_log_arp iptables -A INPUT -j LOG iptables: No chain/target/match by that name Fixes: 30e0c6a6be ("netfilter: nf_log: prepare net namespace support for loggers") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22netfilter: nft_compat: skip family comparison in case of NFPROTO_UNSPECPablo Neira Ayuso
commit ba378ca9c04a5fc1b2cf0f0274a9d02eb3d1bad9 upstream. Fix lookup of existing match/target structures in the corresponding list by skipping the family check if NFPROTO_UNSPEC is used. This is resulting in the allocation and insertion of one match/target structure for each use of them. So this not only bloats memory consumption but also severely affects the time to reload the ruleset from the iptables-compat utility. After this patch, iptables-compat-restore and iptables-compat take almost the same time to reload large rulesets. Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22netfilter: nf_log: wait for rcu grace after logger unregistrationPablo Neira Ayuso
commit ad5001cc7cdf9aaee5eb213fdee657e4a3c94776 upstream. The nf_log_unregister() function needs to call synchronize_rcu() to make sure that the objects are not dereferenced anymore on module removal. Fixes: 5962815a6a56 ("netfilter: nf_log: use an array of loggers instead of list") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22netfilter: conntrack: use nf_ct_tmpl_free in CT/synproxy error pathsDaniel Borkmann
commit 9cf94eab8b309e8bcc78b41dd1561c75b537dd0b upstream. Commit 0838aa7fcfcd ("netfilter: fix netns dependencies with conntrack templates") migrated templates to the new allocator api, but forgot to update error paths for them in CT and synproxy to use nf_ct_tmpl_free() instead of nf_conntrack_free(). Due to that, memory is being freed into the wrong kmemcache, but also we drop the per net reference count of ct objects causing an imbalance. In Brad's case, this leads to a wrap-around of net->ct.count and thus lets __nf_conntrack_alloc() refuse to create a new ct object: [ 10.340913] xt_addrtype: ipv6 does not support BROADCAST matching [ 10.810168] nf_conntrack: table full, dropping packet [ 11.917416] r8169 0000:07:00.0 eth0: link up [ 11.917438] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 12.815902] nf_conntrack: table full, dropping packet [ 15.688561] nf_conntrack: table full, dropping packet [ 15.689365] nf_conntrack: table full, dropping packet [ 15.690169] nf_conntrack: table full, dropping packet [ 15.690967] nf_conntrack: table full, dropping packet [...] With slab debugging, it also reports the wrong kmemcache (kmalloc-512 vs. nf_conntrack_ffffffff81ce75c0) and reports poison overwrites, etc. Thus, to fix the problem, export and use nf_ct_tmpl_free() instead. Fixes: 0838aa7fcfcd ("netfilter: fix netns dependencies with conntrack templates") Reported-by: Brad Jackson <bjackson0971@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22netfilter: ipset: Fixing unnamed union initElad Raz
commit 96be5f2806cd65a2ebced3bfcdf7df0116e6c4a6 upstream. In continue to proposed Vinson Lee's post [1], this patch fixes compilation issues founded at gcc 4.4.7. The initialization of .cidr field of unnamed unions causes compilation error in gcc 4.4.x. References Visible links [1] https://lkml.org/lkml/2015/7/5/74 Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22netfilter: ipset: Out of bound access in hash:net* types fixedJozsef Kadlecsik
commit 6fe7ccfd77415a6ba250c10c580eb3f9acf79753 upstream. Dave Jones reported that KASan detected out of bounds access in hash:net* types: [ 23.139532] ================================================================== [ 23.146130] BUG: KASan: out of bounds access in hash_net4_add_cidr+0x1db/0x220 at addr ffff8800d4844b58 [ 23.152937] Write of size 4 by task ipset/457 [ 23.159742] ============================================================================= [ 23.166672] BUG kmalloc-512 (Not tainted): kasan: bad access detected [ 23.173641] ----------------------------------------------------------------------------- [ 23.194668] INFO: Allocated in hash_net_create+0x16a/0x470 age=7 cpu=1 pid=456 [ 23.201836] __slab_alloc.constprop.66+0x554/0x620 [ 23.208994] __kmalloc+0x2f2/0x360 [ 23.216105] hash_net_create+0x16a/0x470 [ 23.223238] ip_set_create+0x3e6/0x740 [ 23.230343] nfnetlink_rcv_msg+0x599/0x640 [ 23.237454] netlink_rcv_skb+0x14f/0x190 [ 23.244533] nfnetlink_rcv+0x3f6/0x790 [ 23.251579] netlink_unicast+0x272/0x390 [ 23.258573] netlink_sendmsg+0x5a1/0xa50 [ 23.265485] SYSC_sendto+0x1da/0x2c0 [ 23.272364] SyS_sendto+0xe/0x10 [ 23.279168] entry_SYSCALL_64_fastpath+0x12/0x6f The bug is fixed in the patch and the testsuite is extended in ipset to check cidr handling more thoroughly. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22netfilter: nfnetlink: work around wrong endianess in res_id fieldPablo Neira Ayuso
commit a9de9777d613500b089a7416f936bf3ae5f070d2 upstream. The convention in nfnetlink is to use network byte order in every header field as well as in the attribute payload. The initial version of the batching infrastructure assumes that res_id comes in host byte order though. The only client of the batching infrastructure is nf_tables, so let's add a workaround to address this inconsistency. We currently have 11 nfnetlink subsystems according to NFNL_SUBSYS_COUNT, so we can assume that the subsystem 2560, ie. htons(10), will not be allocated anytime soon, so it can be an alias of nf_tables from the nfnetlink batching path when interpreting the res_id field. Based on original patch from Florian Westphal. Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22svcrdma: Fix send_reply() scatter/gather set-upChuck Lever
commit 9d11b51ce7c150a69e761e30518f294fc73d55ff upstream. The Linux NFS server returns garbage in the data payload of inline NFS/RDMA READ replies. These are READs of under 1000 bytes or so where the client has not provided either a reply chunk or a write list. The NFS server delivers the data payload for an NFS READ reply to the transport in an xdr_buf page list. If the NFS client did not provide a reply chunk or a write list, send_reply() is supposed to set up a separate sge for the page containing the READ data, and another sge for XDR padding if needed, then post all of the sges via a single SEND Work Request. The problem is send_reply() does not advance through the xdr_buf when setting up scatter/gather entries for SEND WR. It always calls dma_map_xdr with xdr_off set to zero. When there's more than one sge, dma_map_xdr() sets up the SEND sge's so they all point to the xdr_buf's head. The current Linux NFS/RDMA client always provides a reply chunk or a write list when performing an NFS READ over RDMA. Therefore, it does not exercise this particular case. The Linux server has never had to use more than one extra sge for building RPC/RDMA replies with a Linux client. However, an NFS/RDMA client _is_ allowed to send small NFS READs without setting up a write list or reply chunk. The NFS READ reply fits entirely within the inline reply buffer in this case. This is perhaps a more efficient way of performing NFS READs that the Linux NFS/RDMA client may some day adopt. Fixes: b432e6b3d9c1 ('svcrdma: Change DMA mapping logic to . . .') BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=285 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-03fib_rules: fix fib rule dumps across multiple skbsWilson Kok
[ Upstream commit 41fc014332d91ee90c32840bf161f9685b7fbf2b ] dump_rules returns skb length and not error. But when family == AF_UNSPEC, the caller of dump_rules assumes that it returns an error. Hence, when family == AF_UNSPEC, we continue trying to dump on -EMSGSIZE errors resulting in incorrect dump idx carried between skbs belonging to the same dump. This results in fib rule dump always only dumping rules that fit into the first skb. This patch fixes dump_rules to return error so that we exit correctly and idx is correctly maintained between skbs that are part of the same dump. Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-03net: revert "net_sched: move tp->root allocation into fw_init()"WANG Cong
[ Upstream commit d8aecb10115497f6cdf841df8c88ebb3ba25fa28 ] fw filter uses tp->root==NULL to check if it is the old method, so it doesn't need allocation at all in this case. This patch reverts the offending commit and adds some comments for old method to make it obvious. Fixes: 33f8b9ecdb15 ("net_sched: move tp->root allocation into fw_init()") Reported-by: Akshat Kakkar <akshat.1984@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-03Fix AF_PACKET ABI breakage in 4.2David Woodhouse
[ Upstream commit d3869efe7a8a2298516d9af4f91487cf486ca945 ] Commit 7d82410950aa ("virtio: add explicit big-endian support to memory accessors") accidentally changed the virtio_net header used by AF_PACKET with PACKET_VNET_HDR from host-endian to big-endian. Since virtio_legacy_is_little_endian() is a very long identifier, define a vio_le macro and use that throughout the code instead of the hard-coded 'false' for little-endian. This restores the ABI to match 4.1 and earlier kernels, and makes my test program work again. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-03tcp: add proper TS val into RST packetsEric Dumazet
[ Upstream commit 675ee231d960af2af3606b4480324e26797eb010 ] RST packets sent on behalf of TCP connections with TS option (RFC 7323 TCP timestamps) have incorrect TS val (set to 0), but correct TS ecr. A > B: Flags [S], seq 0, win 65535, options [mss 1000,nop,nop,TS val 100 ecr 0], length 0 B > A: Flags [S.], seq 2444755794, ack 1, win 28960, options [mss 1460,nop,nop,TS val 7264344 ecr 100], length 0 A > B: Flags [.], ack 1, win 65535, options [nop,nop,TS val 110 ecr 7264344], length 0 B > A: Flags [R.], seq 1, ack 1, win 28960, options [nop,nop,TS val 0 ecr 110], length 0 We need to call skb_mstamp_get() to get proper TS val, derived from skb->skb_mstamp Note that RFC 1323 was advocating to not send TS option in RST segment, but RFC 7323 recommends the opposite : Once TSopt has been successfully negotiated, that is both <SYN> and <SYN,ACK> contain TSopt, the TSopt MUST be sent in every non-<RST> segment for the duration of the connection, and SHOULD be sent in an <RST> segment (see Section 5.2 for details) Note this RFC recommends to send TS val = 0, but we believe it is premature : We do not know if all TCP stacks are properly handling the receive side : When an <RST> segment is received, it MUST NOT be subjected to the PAWS check by verifying an acceptable value in SEG.TSval, and information from the Timestamps option MUST NOT be used to update connection state information. SEG.TSecr MAY be used to provide stricter <RST> acceptance checks. In 5 years, if/when all TCP stack are RFC 7323 ready, we might consider to decide to send TS val = 0, if it buys something. Fixes: 7faee5c0d514 ("tcp: remove TCP_SKB_CB(skb)->when") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-03openvswitch: Zero flows on allocation.Jesse Gross
[ Upstream commit ae5f2fb1d51fa128a460bcfbe3c56d7ab8bf6a43 ] When support for megaflows was introduced, OVS needed to start installing flows with a mask applied to them. Since masking is an expensive operation, OVS also had an optimization that would only take the parts of the flow keys that were covered by a non-zero mask. The values stored in the remaining pieces should not matter because they are masked out. While this works fine for the purposes of matching (which must always look at the mask), serialization to netlink can be problematic. Since the flow and the mask are serialized separately, the uninitialized portions of the flow can be encoded with whatever values happen to be present. In terms of functionality, this has little effect since these fields will be masked out by definition. However, it leaks kernel memory to userspace, which is a potential security vulnerability. It is also possible that other code paths could look at the masked key and get uninitialized data, although this does not currently appear to be an issue in practice. This removes the mask optimization for flows that are being installed. This was always intended to be the case as the mask optimizations were really targetting per-packet flow operations. Fixes: 03f0d916 ("openvswitch: Mega flow implementation") Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-03netlink: Replace rhash_portid with boundHerbert Xu
[ Upstream commit da314c9923fed553a007785a901fd395b7eb6c19 ] On Mon, Sep 21, 2015 at 02:20:22PM -0400, Tejun Heo wrote: > > store_release and load_acquire are different from the usual memory > barriers and can't be paired this way. You have to pair store_release > and load_acquire. Besides, it isn't a particularly good idea to OK I've decided to drop the acquire/release helpers as they don't help us at all and simply pessimises the code by using full memory barriers (on some architectures) where only a write or read barrier is needed. > depend on memory barriers embedded in other data structures like the > above. Here, especially, rhashtable_insert() would have write barrier > *before* the entry is hashed not necessarily *after*, which means that > in the above case, a socket which appears to have set bound to a > reader might not visible when the reader tries to look up the socket > on the hashtable. But you are right we do need an explicit write barrier here to ensure that the hashing is visible. > There's no reason to be overly smart here. This isn't a crazy hot > path, write barriers tend to be very cheap, store_release more so. > Please just do smp_store_release() and note what it's paired with. It's not about being overly smart. It's about actually understanding what's going on with the code. I've seen too many instances of people simply sprinkling synchronisation primitives around without any knowledge of what is happening underneath, which is just a recipe for creating hard-to-debug races. > > @@ -1539,7 +1546,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr, > > } > > } > > > > - if (!nlk->portid) { > > + if (!nlk->bound) { > > I don't think you can skip load_acquire here just because this is the > second deref of the variable. That doesn't change anything. Race > condition could still happen between the first and second tests and > skipping the second would lead to the same kind of bug. The reason this one is OK is because we do not use nlk->portid or try to get nlk from the hash table before we return to user-space. However, there is a real bug here that none of these acquire/release helpers discovered. The two bound tests here used to be a single one. Now that they are separate it is entirely possible for another thread to come in the middle and bind the socket. So we need to repeat the portid check in order to maintain consistency. > > @@ -1587,7 +1594,7 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr, > > !netlink_allowed(sock, NL_CFG_F_NONROOT_SEND)) > > return -EPERM; > > > > - if (!nlk->portid) > > + if (!nlk->bound) > > Don't we need load_acquire here too? Is this path holding a lock > which makes that unnecessary? Ditto. ---8<--- The commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ("netlink: Fix autobind race condition that leads to zero port ID") created some new races that can occur due to inconcsistencies between the two port IDs. Tejun is right that a barrier is unavoidable. Therefore I am reverting to the original patch that used a boolean to indicate that a user netlink socket has been bound. Barriers have been added where necessary to ensure that a valid portid and the hashed socket is visible. I have also changed netlink_insert to only return EBUSY if the socket is bound to a portid different to the requested one. This combined with only reading nlk->bound once in netlink_bind fixes a race where two threads that bind the socket at the same time with different port IDs may both succeed. Fixes: 1f770c0a09da ("netlink: Fix autobind race condition that leads to zero port ID") Reported-by: Tejun Heo <tj@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Nacked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-03netlink: Fix autobind race condition that leads to zero port IDHerbert Xu
[ Upstream commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ] The commit c0bb07df7d981e4091432754e30c9c720e2c0c78 ("netlink: Reset portid after netlink_insert failure") introduced a race condition where if two threads try to autobind the same socket one of them may end up with a zero port ID. This led to kernel deadlocks that were observed by multiple people. This patch reverts that commit and instead fixes it by introducing a separte rhash_portid variable so that the real portid is only set after the socket has been successfully hashed. Fixes: c0bb07df7d98 ("netlink: Reset portid after netlink_insert failure") Reported-by: Tejun Heo <tj@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-03rtnetlink: catch -EOPNOTSUPP errors from ndo_bridge_getlinkRoopa Prabhu
[ Upstream commit d64f69b0373a7d0bcec8b5da7712977518a8f42b ] problem reported: kernel 4.1.3 ------------ # bridge vlan port vlan ids eth0 1 PVID Egress Untagged 90 91 92 93 94 95 96 97 98 99 100 vmbr0 1 PVID Egress Untagged 94 kernel 4.2 ----------- # bridge vlan port vlan ids ndo_bridge_getlink can return -EOPNOTSUPP when an interfaces ndo_bridge_getlink op is set to switchdev_port_bridge_getlink and CONFIG_SWITCHDEV is not defined. This today can happen to bond, rocker and team devices. This patch adds -EOPNOTSUPP checks after calls to ndo_bridge_getlink. Fixes: 85fdb956726ff2a ("switchdev: cut over to new switchdev_port_bridge_getlink") Reported-by: Alexandre DERUMIER <aderumier@odiso.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-03bridge: fix igmpv3 / mldv2 report parsingLinus Lüssing
[ Upstream commit c2d4fbd2163e607915cc05798ce7fb7f31117cc1 ] With the newly introduced helper functions the skb pulling is hidden in the checksumming function - and undone before returning to the caller. The IGMPv3 and MLDv2 report parsing functions in the bridge still assumed that the skb is pointing to the beginning of the IGMP/MLD message while it is now kept at the beginning of the IPv4/6 header, breaking the message parsing and creating packet loss. Fixing this by taking the offset between IP and IGMP/MLD header into account, too. Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code") Reported-by: Tobias Powalowski <tobias.powalowski@googlemail.com> Tested-by: Tobias Powalowski <tobias.powalowski@googlemail.com> Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-03sctp: fix race on protocol/netns initializationMarcelo Ricardo Leitner
[ Upstream commit 8e2d61e0aed2b7c4ecb35844fe07e0b2b762dee4 ] Consider sctp module is unloaded and is being requested because an user is creating a sctp socket. During initialization, sctp will add the new protocol type and then initialize pernet subsys: status = sctp_v4_protosw_init(); if (status) goto err_protosw_init; status = sctp_v6_protosw_init(); if (status) goto err_v6_protosw_init; status = register_pernet_subsys(&sctp_net_ops); The problem is that after those calls to sctp_v{4,6}_protosw_init(), it is possible for userspace to create SCTP sockets like if the module is already fully loaded. If that happens, one of the possible effects is that we will have readers for net->sctp.local_addr_list list earlier than expected and sctp_net_init() does not take precautions while dealing with that list, leading to a potential panic but not limited to that, as sctp_sock_init() will copy a bunch of blank/partially initialized values from net->sctp. The race happens like this: CPU 0 | CPU 1 socket() | __sock_create | socket() inet_create | __sock_create list_for_each_entry_rcu( | answer, &inetsw[sock->type], | list) { | inet_create /* no hits */ | if (unlikely(err)) { | ... | request_module() | /* socket creation is blocked | * the module is fully loaded | */ | sctp_init | sctp_v4_protosw_init | inet_register_protosw | list_add_rcu(&p->list, | last_perm); | | list_for_each_entry_rcu( | answer, &inetsw[sock->type], sctp_v6_protosw_init | list) { | /* hit, so assumes protocol | * is already loaded | */ | /* socket creation continues | * before netns is initialized | */ register_pernet_subsys | Simply inverting the initialization order between register_pernet_subsys() and sctp_v4_protosw_init() is not possible because register_pernet_subsys() will create a control sctp socket, so the protocol must be already visible by then. Deferring the socket creation to a work-queue is not good specially because we loose the ability to handle its errors. So, as suggested by Vlad, the fix is to split netns initialization in two moments: defaults and control socket, so that the defaults are already loaded by when we register the protocol, while control socket initialization is kept at the same moment it is today. Fixes: 4db67e808640 ("sctp: Make the address lists per network namespace") Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-03netlink, mmap: transform mmap skb into full skb on tapsDaniel Borkmann
[ Upstream commit 1853c949646005b5959c483becde86608f548f24 ] Ken-ichirou reported that running netlink in mmap mode for receive in combination with nlmon will throw a NULL pointer dereference in __kfree_skb() on nlmon_xmit(), in my case I can also trigger an "unable to handle kernel paging request". The problem is the skb_clone() in __netlink_deliver_tap_skb() for skbs that are mmaped. I.e. the cloned skb doesn't have a destructor, whereas the mmap netlink skb has it pointed to netlink_skb_destructor(), set in the handler netlink_ring_setup_skb(). There, skb->head is being set to NULL, so that in such cases, __kfree_skb() doesn't perform a skb_release_data() via skb_release_all(), where skb->head is possibly being freed through kfree(head) into slab allocator, although netlink mmap skb->head points to the mmap buffer. Similarly, the same has to be done also for large netlink skbs where the data area is vmalloced. Therefore, as discussed, make a copy for these rather rare cases for now. This fixes the issue on my and Ken-ichirou's test-cases. Reference: http://thread.gmane.org/gmane.linux.network/371129 Fixes: bcbde0d449ed ("net: netlink: virtual tap device management") Reported-by: Ken-ichirou MATSUZAWA <chamaken@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Ken-ichirou MATSUZAWA <chamaken@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-03ipv6: fix multipath route replace error recoveryRoopa Prabhu
[ Upstream commit 6b9ea5a64ed5eeb3f68f2e6fcce0ed1179801d1e ] Problem: The ecmp route replace support for ipv6 in the kernel, deletes the existing ecmp route too early, ie when it installs the first nexthop. If there is an error in installing the subsequent nexthops, its too late to recover the already deleted existing route leaving the fib in an inconsistent state. This patch reduces the possibility of this by doing the following: a) Changes the existing multipath route add code to a two stage process: build rt6_infos + insert them ip6_route_add rt6_info creation code is moved into ip6_route_info_create. b) This ensures that most errors are caught during building rt6_infos and we fail early c) Separates multipath add and del code. Because add needs the special two stage mode in a) and delete essentially does not care. d) In any event if the code fails during inserting a route again, a warning is printed (This should be unlikely) Before the patch: $ip -6 route show 3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024 /* Try replacing the route with a duplicate nexthop */ $ip -6 route change 3000:1000:1000:1000::2/128 nexthop via fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1 RTNETLINK answers: File exists $ip -6 route show /* previously added ecmp route 3000:1000:1000:1000::2 dissappears from * kernel */ After the patch: $ip -6 route show 3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024 /* Try replacing the route with a duplicate nexthop */ $ip -6 route change 3000:1000:1000:1000::2/128 nexthop via fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1 RTNETLINK answers: File exists $ip -6 route show 3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024 Fixes: 27596472473a ("ipv6: fix ECMP route replacement") Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-03net/ipv6: Correct PIM6 mrt_lock handlingRichard Laing
[ Upstream commit 25b4a44c19c83d98e8c0807a7ede07c1f28eab8b ] In the IPv6 multicast routing code the mrt_lock was not being released correctly in the MFC iterator, as a result adding or deleting a MIF would cause a hang because the mrt_lock could not be acquired. This fix is a copy of the code for the IPv4 case and ensures that the lock is released correctly. Signed-off-by: Richard Laing <richard.laing@alliedtelesis.co.nz> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-03ipv6: fix exthdrs offload registration in out_rt pathDaniel Borkmann
[ Upstream commit e41b0bedba0293b9e1e8d1e8ed553104b9693656 ] We previously register IPPROTO_ROUTING offload under inet6_add_offload(), but in error path, we try to unregister it with inet_del_offload(). This doesn't seem correct, it should actually be inet6_del_offload(), also ipv6_exthdrs_offload_exit() from that commit seems rather incorrect (it also uses rthdr_offload twice), but it got removed entirely later on. Fixes: 3336288a9fea ("ipv6: Switch to using new offload infrastructure.") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-03sock, diag: fix panic in sock_diag_put_filterinfoDaniel Borkmann
[ Upstream commit b382c08656000c12a146723a153b85b13a855b49 ] diag socket's sock_diag_put_filterinfo() dumps classic BPF programs upon request to user space (ss -0 -b). However, native eBPF programs attached to sockets (SO_ATTACH_BPF) cannot be dumped with this method: Their orig_prog is always NULL. However, sock_diag_put_filterinfo() unconditionally tries to access its filter length resp. wants to copy the filter insns from there. Internal cBPF to eBPF transformations attached to sockets don't have this issue, as orig_prog state is kept. It's currently only used by packet sockets. If we would want to add native eBPF support in the future, this needs to be done through a different attribute than PACKET_DIAG_FILTER to not confuse possible user space disassemblers that work on diag data. Fixes: 89aa075832b0 ("net: sock: allow eBPF programs to be attached to sockets") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-29SUNRPC: Lock the transport layer on shutdownTrond Myklebust
commit 79234c3db6842a3de03817211d891e0c2878f756 upstream. Avoid all races with the connect/disconnect handlers by taking the transport lock. Reported-by:"Suzuki K. Poulose" <suzuki.poulose@arm.com> Acked-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-29SUNRPC: Ensure that we wait for connections to complete before retryingTrond Myklebust
commit 0fdea1e8a2853f79d39b8555cc9de16a7e0ab26f upstream. Commit 718ba5b87343, moved the responsibility for unlocking the socket to xs_tcp_setup_socket, meaning that the socket will be unlocked before we know that it has finished trying to connect. The following patch is based on an initial patch by Russell King to ensure that we delay clearing the XPRT_CONNECTING flag until we either know that we failed to initiate a connection attempt, or the connection attempt itself failed. Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from racing") Reported-by: Russell King <linux@arm.linux.org.uk> Reported-by: Russell King <rmk+kernel@arm.linux.org.uk> Tested-by: Russell King <rmk+kernel@arm.linux.org.uk> Tested-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-29SUNRPC: xs_reset_transport must mark the connection as disconnectedTrond Myklebust
commit 0c78789e3a030615c6650fde89546cadf40ec2cc upstream. In case the reconnection attempt fails. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-29SUNRPC: Fix a thinko in xs_connect()Trond Myklebust
commit 99b1a4c32ad22024ac6198a4337aaec5ea23168f upstream. It is rather pointless to test the value of transport->inet after calling xs_reset_transport(), since it will always be zero, and so we will never see any exponential back off behaviour. Also don't force early connections for SOFTCONN tasks. If the server disconnects us, we should respect the exponential backoff. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-29svcrdma: Change maximum server payload back to RPCSVC_MAXPAYLOADChuck Lever
commit cc9a903d915c21626b6b2fbf8ed0ff16a7f82210 upstream. Both commit 0380a3f375 ("svcrdma: Add a separate "max data segs" macro for svcrdma") and commit 7e5be28827bf ("svcrdma: advertise the correct max payload") are incorrect. This commit reverts both changes, restoring the server's maximum payload size to 1MB. Commit 7e5be28827bf based the server's maximum payload on the _client's_ RPCRDMA_MAX_DATA_SEGS value. That was wrong. Commit 0380a3f375 tried to fix this so that the client maximum payload size could be raised without affecting the server, but managed to confuse matters more on the server side. More importantly, limiting the advertised maximum payload size was meant to be a workaround, not the actual fix. We need to revisit https://bugzilla.linux-nfs.org/show_bug.cgi?id=270 A Linux client on a platform with 64KB pages can overrun and crash an x86_64 NFS/RDMA server when the r/wsize is 1MB. An x86/64 Linux client seems to work fine using 1MB reads and writes when the Linux server's maximum payload size is restored to 1MB. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=270 Fixes: 0380a3f375 ("svcrdma: Add a separate "max data segs" macro") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-29mac80211: enable assoc check for mesh interfacesBob Copeland
commit 3633ebebab2bbe88124388b7620442315c968e8f upstream. We already set a station to be associated when peering completes, both in user space and in the kernel. Thus we should always have an associated sta before sending data frames to that station. Failure to check assoc state can cause crashes in the lower-level driver due to transmitting unicast data frames before driver sta structures (e.g. ampdu state in ath9k) are initialized. This occurred when forwarding in the presence of fixed mesh paths: frames were transmitted to stations with whom we hadn't yet completed peering. Reported-by: Alexis Green <agreen@cococorp.com> Tested-by: Jesse Jones <jjones@cococorp.com> Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-29nfc: nci: hci: Add check on skb nci_hci_send_cmd parameterChristophe Ricard
commit 5a9e0ffc0f128ecdf7c770f76c268e4f9f3c9118 upstream. skb can be NULL and may lead to a NULL pointer error. Add a check condition before setting HCI rx buffer. Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-29nfc: netlink: Warning fixChristophe Ricard
commit adca3c38d807b341a965d0aba8721d0784d8471b upstream. When NFC_ATTR_VENDOR_DATA is not set, data_len is 0 and data is NULL. Fixes the following warning: net/nfc/netlink.c:1536:3: warning: 'data' may be used uninitialized +in this function [-Wmaybe-uninitialized] return cmd->doit(dev, data, data_len); Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-29nfc: netlink: Add check on NFC_ATTR_VENDOR_DATAChristophe Ricard
commit fe202fe95564023223ce1910c9e352f391abb1d5 upstream. NFC_ATTR_VENDOR_DATA is an optional vendor_cmd argument. The current code was potentially using a non existing argument leading to potential catastrophic results. Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-21fs: create and use seq_show_option for escapingKees Cook
commit a068acf2ee77693e0bf39d6e07139ba704f461c3 upstream. Many file systems that implement the show_options hook fail to correctly escape their output which could lead to unescaped characters (e.g. new lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This could lead to confusion, spoofed entries (resulting in things like systemd issuing false d-bus "mount" notifications), and who knows what else. This looks like it would only be the root user stepping on themselves, but it's possible weird things could happen in containers or in other situations with delegated mount privileges. Here's an example using overlay with setuid fusermount trusting the contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use of "sudo" is something more sneaky: $ BASE="ovl" $ MNT="$BASE/mnt" $ LOW="$BASE/lower" $ UP="$BASE/upper" $ WORK="$BASE/work/ 0 0 none /proc fuse.pwn user_id=1000" $ mkdir -p "$LOW" "$UP" "$WORK" $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt $ cat /proc/mounts none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0 none /proc fuse.pwn user_id=1000 0 0 $ fusermount -u /proc $ cat /proc/mounts cat: /proc/mounts: No such file or directory This fixes the problem by adding new seq_show_option and seq_show_option_n helpers, and updating the vulnerable show_option handlers to use them as needed. Some, like SELinux, need to be open coded due to unusual existing escape mechanisms. [akpm@linux-foundation.org: add lost chunk, per Kees] [keescook@chromium.org: seq_show_option should be using const parameters] Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Jan Kara <jack@suse.com> Acked-by: Paul Moore <paul@paul-moore.com> Cc: J. R. Okajima <hooanon05g@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Some straggler bug fixes here: 1) Netlink_sendmsg() doesn't check iterator type properly in mmap case, from Ken-ichirou MATSUZAWA. 2) Don't sleep in atomic context in bcmgenet driver, from Florian Fainelli. 3) The pfkey_broadcast() code patch can't actually ever use anything other than GFP_ATOMIC. And the cases that right now pass GFP_KERNEL or similar will currently trigger an RCU splat. Just use GFP_ATOMIC unconditionally. From David Ahern. 4) Fix FD bit timings handling in pcan_usb driver, from Marc Kleine-Budde. 5) Cache dst leaked in ip6_gre tunnel removal, fix from Huaibin Wang. 6) Traversal into drivers/net/ethernet/renesas should be triggered by CONFIG_NET_VENDOR_RENESAS, not a particular driver's config option. From Kazuya Mizuguchi. 7) Fix regression in handling of igmp_join errors in vxlan, from Marcelo Ricardo Leitner. 8) Make phy_{read,write}_mmd_indirect() properly take the mdio_lock mutex when programming the registers. From Russell King. 9) Fix non-forced handling in u32_destroy(), from WANG Cong. 10) Test the EVENT_NO_RUNTIME_PM flag before it is cleared in usbnet_stop(), from Eugene Shatokhin. 11) In sfc driver, don't fetch statistics firmware isn't capable of, from Bert Kenward. 12) Verify ASCONF address parameter location in SCTP, from Xin Long" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: sctp: donot reset the overall_error_count in SHUTDOWN_RECEIVE state sctp: asconf's process should verify address parameter is in the beginning sfc: only use vadaptor stats if firmware is capable net: phy: fixed: propagate fixed link values to struct usbnet: Get EVENT_NO_RUNTIME_PM bit before it is cleared drivers: net: xgene: fix: Oops in linkwatch_fire_event cls_u32: complete the check for non-forced case in u32_destroy() net: fec: use reinit_completion() in mdio accessor functions net: phy: add locking to phy_read_mmd_indirect()/phy_write_mmd_indirect() vxlan: re-ignore EADDRINUSE from igmp_join net: compile renesas directory if NET_VENDOR_RENESAS is configured ip6_gre: release cached dst on tunnel removal phylib: Make PHYs children of their MDIO bus, not the bus' parent. can: pcan_usb: don't provide CAN FD bittimings by non-FD adapters net: Fix RCU splat in af_key net: bcmgenet: fix uncleaned dma flags net: bcmgenet: Avoid sleeping in bcmgenet_timeout netlink: mmap: fix tx type check
2015-08-27sctp: donot reset the overall_error_count in SHUTDOWN_RECEIVE statelucien
Commit f8d960524328 ("sctp: Enforce retransmission limit during shutdown") fixed a problem with excessive retransmissions in the SHUTDOWN_PENDING by not resetting the association overall_error_count. This allowed the association to better enforce assoc.max_retrans limit. However, the same issue still exists when the association is in SHUTDOWN_RECEIVED state. In this state, HB-ACKs will continue to reset the overall_error_count for the association would extend the lifetime of association unnecessarily. This patch solves this by resetting the overall_error_count whenever the current state is small then SCTP_STATE_SHUTDOWN_PENDING. As a small side-effect, we end up also handling SCTP_STATE_SHUTDOWN_ACK_SENT and SCTP_STATE_SHUTDOWN_SENT states, but they are not really impacted because we disable Heartbeats in those states. Fixes: Commit f8d960524328 ("sctp: Enforce retransmission limit during shutdown") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27sctp: asconf's process should verify address parameter is in the beginninglucien
in sctp_process_asconf(), we get address parameter from the beginning of the addip params. but we never check if it's really there. if the addr param is not there, it still can pass sctp_verify_asconf(), then to be handled by sctp_process_asconf(), it will not be safe. so add a code in sctp_verify_asconf() to check the address parameter is in the beginning, or return false to send abort. note that this can also detect multiple address parameters, and reject it. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25cls_u32: complete the check for non-forced case in u32_destroy()WANG Cong
In commit 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone") I added a check in u32_destroy() to see if all real filters are gone for each tp, however, that is only done for root_ht, same is needed for others. This can be reproduced by the following tc commands: tc filter add dev eth0 parent 1:0 prio 5 handle 15: protocol ip u32 divisor 256 tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:2 u32 ht 15:2: match ip src 10.0.0.2 flowid 1:10 tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:3 u32 ht 15:2: match ip src 10.0.0.3 flowid 1:10 Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone") Reported-by: Akshat Kakkar <akshat.1984@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25ip6_gre: release cached dst on tunnel removalhuaibin Wang
When a tunnel is deleted, the cached dst entry should be released. This problem may prevent the removal of a netns (seen with a x-netns IPv6 gre tunnel): unregister_netdevice: waiting for lo to become free. Usage count = 3 CC: Dmitry Kozlov <xeb@mail.ru> Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Signed-off-by: huaibin Wang <huaibin.wang@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-24net: Fix RCU splat in af_keyDavid Ahern
Hit the following splat testing VRF change for ipsec: [ 113.475692] =============================== [ 113.476194] [ INFO: suspicious RCU usage. ] [ 113.476667] 4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED Not tainted [ 113.477545] ------------------------------- [ 113.478013] /work/monster-14/dsa/kernel.git/include/linux/rcupdate.h:568 Illegal context switch in RCU read-side critical section! [ 113.479288] [ 113.479288] other info that might help us debug this: [ 113.479288] [ 113.480207] [ 113.480207] rcu_scheduler_active = 1, debug_locks = 1 [ 113.480931] 2 locks held by setkey/6829: [ 113.481371] #0: (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [<ffffffff814e9887>] pfkey_sendmsg+0xfb/0x213 [ 113.482509] #1: (rcu_read_lock){......}, at: [<ffffffff814e767f>] rcu_read_lock+0x0/0x6e [ 113.483509] [ 113.483509] stack backtrace: [ 113.484041] CPU: 0 PID: 6829 Comm: setkey Not tainted 4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED [ 113.485422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014 [ 113.486845] 0000000000000001 ffff88001d4c7a98 ffffffff81518af2 ffffffff81086962 [ 113.487732] ffff88001d538480 ffff88001d4c7ac8 ffffffff8107ae75 ffffffff8180a154 [ 113.488628] 0000000000000b30 0000000000000000 00000000000000d0 ffff88001d4c7ad8 [ 113.489525] Call Trace: [ 113.489813] [<ffffffff81518af2>] dump_stack+0x4c/0x65 [ 113.490389] [<ffffffff81086962>] ? console_unlock+0x3d6/0x405 [ 113.491039] [<ffffffff8107ae75>] lockdep_rcu_suspicious+0xfa/0x103 [ 113.491735] [<ffffffff81064032>] rcu_preempt_sleep_check+0x45/0x47 [ 113.492442] [<ffffffff8106404d>] ___might_sleep+0x19/0x1c8 [ 113.493077] [<ffffffff81064268>] __might_sleep+0x6c/0x82 [ 113.493681] [<ffffffff81133190>] cache_alloc_debugcheck_before.isra.50+0x1d/0x24 [ 113.494508] [<ffffffff81134876>] kmem_cache_alloc+0x31/0x18f [ 113.495149] [<ffffffff814012b5>] skb_clone+0x64/0x80 [ 113.495712] [<ffffffff814e6f71>] pfkey_broadcast_one+0x3d/0xff [ 113.496380] [<ffffffff814e7b84>] pfkey_broadcast+0xb5/0x11e [ 113.497024] [<ffffffff814e82d1>] pfkey_register+0x191/0x1b1 [ 113.497653] [<ffffffff814e9770>] pfkey_process+0x162/0x17e [ 113.498274] [<ffffffff814e9895>] pfkey_sendmsg+0x109/0x213 In pfkey_sendmsg the net mutex is taken and then pfkey_broadcast takes the RCU lock. Since pfkey_broadcast takes the RCU lock the allocation argument is pointless since GFP_ATOMIC must be used between the rcu_read_{,un}lock. The one call outside of rcu can be done with GFP_KERNEL. Fixes: 7f6b9dbd5afbd ("af_key: locking change") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-23netlink: mmap: fix tx type checkKen-ichirou MATSUZAWA
I can't send netlink message via mmaped netlink socket since commit: a8866ff6a5bce7d0ec465a63bc482a85c09b0d39 netlink: make the check for "send from tx_ring" deterministic msg->msg_iter.type is set to WRITE (1) at SYSCALL_DEFINE6(sendto, ... import_single_range(WRITE, ... iov_iter_init(1, WRITE, ... call path, so that we need to check the type by iter_is_iovec() to accept the WRITE. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 9p regression fix from Al Viro: "Fix for breakage introduced when switching p9_client_{read,write}() to struct iov_iter * (went into 4.1)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: 9p: ensure err is initialized to 0 in p9_client_read/write
2015-08-229p: ensure err is initialized to 0 in p9_client_read/writeVincent Bernat
Some use of those functions were providing unitialized values to those functions. Notably, when reading 0 bytes from an empty file on a 9P filesystem, the return code of read() was not 0. Tested with this simple program: #include <assert.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> int main(int argc, const char **argv) { assert(argc == 2); char buffer[256]; int fd = open(argv[1], O_RDONLY|O_NOCTTY); assert(fd >= 0); assert(read(fd, buffer, 0) == 0); return 0; } Cc: stable@vger.kernel.org # v4.1 Signed-off-by: Vincent Bernat <vincent@bernat.im> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-08-21mm: make page pfmemalloc check more robustMichal Hocko
Commit c48a11c7ad26 ("netvm: propagate page->pfmemalloc to skb") added checks for page->pfmemalloc to __skb_fill_page_desc(): if (page->pfmemalloc && !page->mapping) skb->pfmemalloc = true; It assumes page->mapping == NULL implies that page->pfmemalloc can be trusted. However, __delete_from_page_cache() can set set page->mapping to NULL and leave page->index value alone. Due to being in union, a non-zero page->index will be interpreted as true page->pfmemalloc. So the assumption is invalid if the networking code can see such a page. And it seems it can. We have encountered this with a NFS over loopback setup when such a page is attached to a new skbuf. There is no copying going on in this case so the page confuses __skb_fill_page_desc which interprets the index as pfmemalloc flag and the network stack drops packets that have been allocated using the reserves unless they are to be queued on sockets handling the swapping which is the case here and that leads to hangs when the nfs client waits for a response from the server which has been dropped and thus never arrive. The struct page is already heavily packed so rather than finding another hole to put it in, let's do a trick instead. We can reuse the index again but define it to an impossible value (-1UL). This is the page index so it should never see the value that large. Replace all direct users of page->pfmemalloc by page_is_pfmemalloc which will hide this nastiness from unspoiled eyes. The information will get lost if somebody wants to use page->index obviously but that was the case before and the original code expected that the information should be persisted somewhere else if that is really needed (e.g. what SLAB and SLUB do). [akpm@linux-foundation.org: fix blooper in slub] Fixes: c48a11c7ad26 ("netvm: propagate page->pfmemalloc to skb") Signed-off-by: Michal Hocko <mhocko@suse.com> Debugged-by: Vlastimil Babka <vbabka@suse.com> Debugged-by: Jiri Bohac <jbohac@suse.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Acked-by: Mel Gorman <mgorman@suse.de> Cc: <stable@vger.kernel.org> [3.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>