summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2014-12-16netlink: use jhash as hashfn for rhashtableDaniel Borkmann
[ Upstream commit 7f19fc5e0b617593dcda0d9956adc78b559ef1f5 ] For netlink, we shouldn't be using arch_fast_hash() as a hashing discipline, but rather jhash() instead. Since netlink sockets can be opened by any user, a local attacker would be able to easily create collisions with the DPDK-derived arch_fast_hash(), which trades off performance for security by using crc32 CPU instructions on x86_64. While it might have a legimite use case in other places, it should be avoided in netlink context, though. As rhashtable's API is very flexible, we could later on still decide on other hashing disciplines, if legitimate. Reference: http://thread.gmane.org/gmane.linux.kernel/1844123 Fixes: e341694e3eb5 ("netlink: Convert netlink_lookup() to use RCU protected hash table") Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-16net: sctp: use MAX_HEADER for headroom reserve in output pathDaniel Borkmann
[ Upstream commit 9772b54c55266ce80c639a80aa68eeb908f8ecf5 ] To accomodate for enough headroom for tunnels, use MAX_HEADER instead of LL_MAX_HEADER. Robert reported that he has hit after roughly 40hrs of trinity an skb_under_panic() via SCTP output path (see reference). I couldn't reproduce it from here, but not using MAX_HEADER as elsewhere in other protocols might be one possible cause for this. In any case, it looks like accounting on chunks themself seems to look good as the skb already passed the SCTP output path and did not hit any skb_over_panic(). Given tunneling was enabled in his .config, the headroom would have been expanded by MAX_HEADER in this case. Reported-by: Robert Święcki <robert@swiecki.net> Reference: https://lkml.org/lkml/2014/12/1/507 Fixes: 594ccc14dfe4d ("[SCTP] Replace incorrect use of dev_alloc_skb with alloc_skb in sctp_packet_transmit().") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-16gre: Set inner mac header in gro completeTom Herbert
[ Upstream commit 6fb2a756739aa507c1fd5b8126f0bfc2f070dc46 ] Set the inner mac header to point to the GRE payload when doing GRO. This is needed if we proceed to send the packet through GRE GSO which now uses the inner mac header instead of inner network header to determine the length of encapsulation headers. Fixes: 14051f0452a2 ("gre: Use inner mac length when computing tunnel length") Reported-by: Wolfgang Walter <linux@stwm.de> Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-16rtnetlink: release net refcnt on error in do_setlink()Nicolas Dichtel
[ Upstream commit e0ebde0e131b529fd721b24f62872def5ec3718c ] rtnl_link_get_net() holds a reference on the 'struct net', we need to release it in case of error. CC: Eric W. Biederman <ebiederm@xmission.com> Fixes: b51642f6d77b ("net: Enable a userns root rtnl calls that are safe for unprivilged users") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-16net-timestamp: make tcp_recvmsg call ipv6_recv_error for AF_INET6 socksWillem de Bruijn
[ Upstream commit f4713a3dfad045d46afcb9c2a7d0bba288920ed4 ] TCP timestamping introduced MSG_ERRQUEUE handling for TCP sockets. If the socket is of family AF_INET6, call ipv6_recv_error instead of ip_recv_error. This change is more complex than a single branch due to the loadable ipv6 module. It reuses a pre-existing indirect function call from ping. The ping code is safe to call, because it is part of the core ipv6 module and always present when AF_INET6 sockets are active. Fixes: 4ed2d765 (net-timestamp: TCP timestamping) Signed-off-by: Willem de Bruijn <willemb@google.com> ---- It may also be worthwhile to add WARN_ON_ONCE(sk->family == AF_INET6) to ip_recv_error. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-16ipv6: gre: fix wrong skb->protocol in WCCPYuri Chislov
[ Upstream commit be6572fdb1bfbe23b2624d477de50af50b02f5d6 ] When using GRE redirection in WCCP, it sets the wrong skb->protocol, that is, ETH_P_IP instead of ETH_P_IPV6 for the encapuslated traffic. Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Cc: Dmitry Kozlov <xeb@mail.ru> Signed-off-by: Yuri Chislov <yuri.chislov@gmail.com> Tested-by: Yuri Chislov <yuri.chislov@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-16ip_tunnel: the lack of vti_link_ops' dellink() cause kernel paniclucien
[ Upstream commit 20ea60ca9952bd19d4b0d74719daba305aef5178 ] Now the vti_link_ops do not point the .dellink, for fb tunnel device (ip_vti0), the net_device will be removed as the default .dellink is unregister_netdevice_queue,but the tunnel still in the tunnel list, then if we add a new vti tunnel, in ip_tunnel_find(): hlist_for_each_entry_rcu(t, head, hash_node) { if (local == t->parms.iph.saddr && remote == t->parms.iph.daddr && link == t->parms.link && ==> type == t->dev->type && ip_tunnel_key_match(&t->parms, flags, key)) break; } the panic will happen, cause dev of ip_tunnel *t is null: [ 3835.072977] IP: [<ffffffffa04103fd>] ip_tunnel_find+0x9d/0xc0 [ip_tunnel] [ 3835.073008] PGD b2c21067 PUD b7277067 PMD 0 [ 3835.073008] Oops: 0000 [#1] SMP ..... [ 3835.073008] Stack: [ 3835.073008] ffff8800b72d77f0 ffffffffa0411924 ffff8800bb956000 ffff8800b72d78e0 [ 3835.073008] ffff8800b72d78a0 0000000000000000 ffffffffa040d100 ffff8800b72d7858 [ 3835.073008] ffffffffa040b2e3 0000000000000000 0000000000000000 0000000000000000 [ 3835.073008] Call Trace: [ 3835.073008] [<ffffffffa0411924>] ip_tunnel_newlink+0x64/0x160 [ip_tunnel] [ 3835.073008] [<ffffffffa040b2e3>] vti_newlink+0x43/0x70 [ip_vti] [ 3835.073008] [<ffffffff8150d4da>] rtnl_newlink+0x4fa/0x5f0 [ 3835.073008] [<ffffffff812f68bb>] ? nla_strlcpy+0x5b/0x70 [ 3835.073008] [<ffffffff81508fb0>] ? rtnl_link_ops_get+0x40/0x60 [ 3835.073008] [<ffffffff8150d11f>] ? rtnl_newlink+0x13f/0x5f0 [ 3835.073008] [<ffffffff81509cf4>] rtnetlink_rcv_msg+0xa4/0x270 [ 3835.073008] [<ffffffff8126adf5>] ? sock_has_perm+0x75/0x90 [ 3835.073008] [<ffffffff81509c50>] ? rtnetlink_rcv+0x30/0x30 [ 3835.073008] [<ffffffff81529e39>] netlink_rcv_skb+0xa9/0xc0 [ 3835.073008] [<ffffffff81509c48>] rtnetlink_rcv+0x28/0x30 .... modprobe ip_vti ip link del ip_vti0 type vti ip link add ip_vti0 type vti rmmod ip_vti do that one or more times, kernel will panic. fix it by assigning ip_tunnel_dellink to vti_link_ops' dellink, in which we skip the unregister of fb tunnel device. do the same on ip6_vti. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-07Revert "netfilter: conntrack: fix race in __nf_conntrack_confirm against ↵Greg Kroah-Hartman
get_next_corpse" This reverts commit 88fd8709858b9cd94e2d28b4b3cac4b752f0683a which was commit 5195c14c8b27cc0b18220ddbf0e5ad3328a04187 upstream, which was reverted in commit 43612d7c04f1a4f5e60104143918fcdf018b66ee. Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Daniel Borkmann <dborkman@redhat.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpsebill bonaparte
commit 5195c14c8b27cc0b18220ddbf0e5ad3328a04187 upstream. After removal of the central spinlock nf_conntrack_lock, in commit 93bb0ceb75be2 ("netfilter: conntrack: remove central spinlock nf_conntrack_lock"), it is possible to race against get_next_corpse(). The race is against the get_next_corpse() cleanup on the "unconfirmed" list (a per-cpu list with seperate locking), which set the DYING bit. Fix this race, in __nf_conntrack_confirm(), by removing the CT from unconfirmed list before checking the DYING bit. In case race occured, re-add the CT to the dying list. While at this, fix coding style of the comment that has been updated. Fixes: 93bb0ceb75be2 ("netfilter: conntrack: remove central spinlock nf_conntrack_lock") Reported-by: bill bonaparte <programme110@gmail.com> Signed-off-by: bill bonaparte <programme110@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06net/ping: handle protocol mismatching scenarioJane Zhou
commit 91a0b603469069cdcce4d572b7525ffc9fd352a6 upstream. ping_lookup() may return a wrong sock if sk_buff's and sock's protocols dont' match. For example, sk_buff's protocol is ETH_P_IPV6, but sock's sk_family is AF_INET, in that case, if sk->sk_bound_dev_if is zero, a wrong sock will be returned. the fix is to "continue" the searching, if no matching, return NULL. Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: netdev@vger.kernel.org Signed-off-by: Jane Zhou <a17711@motorola.com> Signed-off-by: Yiwei Zhao <gbjc64@motorola.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06mac80211: Fix regression that triggers a kernel BUG with CCMPRonald Wahl
commit 4f031fa9f188b2b0641ac20087d9e16bcfb4e49d upstream. Commit 7ec7c4a9a686c608315739ab6a2b0527a240883c (mac80211: port CCMP to cryptoapi's CCM driver) introduced a regression when decrypting empty packets (data_len == 0). This will lead to backtraces like: (scatterwalk_start) from [<c01312f4>] (scatterwalk_map_and_copy+0x2c/0xa8) (scatterwalk_map_and_copy) from [<c013a5a0>] (crypto_ccm_decrypt+0x7c/0x25c) (crypto_ccm_decrypt) from [<c032886c>] (ieee80211_aes_ccm_decrypt+0x160/0x170) (ieee80211_aes_ccm_decrypt) from [<c031c628>] (ieee80211_crypto_ccmp_decrypt+0x1ac/0x238) (ieee80211_crypto_ccmp_decrypt) from [<c032ef28>] (ieee80211_rx_handlers+0x870/0x1d24) (ieee80211_rx_handlers) from [<c0330c7c>] (ieee80211_prepare_and_rx_handle+0x8a0/0x91c) (ieee80211_prepare_and_rx_handle) from [<c0331260>] (ieee80211_rx+0x568/0x730) (ieee80211_rx) from [<c01d3054>] (__carl9170_rx+0x94c/0xa20) (__carl9170_rx) from [<c01d3324>] (carl9170_rx_stream+0x1fc/0x320) (carl9170_rx_stream) from [<c01cbccc>] (carl9170_usb_tasklet+0x80/0xc8) (carl9170_usb_tasklet) from [<c00199dc>] (tasklet_hi_action+0x88/0xcc) (tasklet_hi_action) from [<c00193c8>] (__do_softirq+0xcc/0x200) (__do_softirq) from [<c0019734>] (irq_exit+0x80/0xe0) (irq_exit) from [<c0009c10>] (handle_IRQ+0x64/0x80) (handle_IRQ) from [<c000c3a0>] (__irq_svc+0x40/0x4c) (__irq_svc) from [<c0009d44>] (arch_cpu_idle+0x2c/0x34) Such packets can appear for example when using the carl9170 wireless driver because hardware sometimes generates garbage when the internal FIFO overruns. This patch adds an additional length check. Fixes: 7ec7c4a9a686 ("mac80211: port CCMP to cryptoapi's CCM driver") Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Ronald Wahl <ronald.wahl@raritan.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06ipx: fix locking regression in ipx_sendmsg and ipx_recvmsgJiri Bohac
[ Upstream commit 01462405f0c093b2f8dfddafcadcda6c9e4c5cdf ] This fixes an old regression introduced by commit b0d0d915 (ipx: remove the BKL). When a recvmsg syscall blocks waiting for new data, no data can be sent on the same socket with sendmsg because ipx_recvmsg() sleeps with the socket locked. This breaks mars-nwe (NetWare emulator): - the ncpserv process reads the request using recvmsg - ncpserv forks and spawns nwconn - ncpserv calls a (blocking) recvmsg and waits for new requests - nwconn deadlocks in sendmsg on the same socket Commit b0d0d915 has simply replaced BKL locking with lock_sock/release_sock. Unlike now, BKL got unlocked while sleeping, so a blocking recvmsg did not block a concurrent sendmsg. Only keep the socket locked while actually working with the socket data and release it prior to calling skb_recv_datagram(). Signed-off-by: Jiri Bohac <jbohac@suse.cz> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06ipv4: Fix incorrect error code when adding an unreachable routePanu Matilainen
[ Upstream commit 49dd18ba4615eaa72f15c9087dea1c2ab4744cf5 ] Trying to add an unreachable route incorrectly returns -ESRCH if if custom FIB rules are present: [root@localhost ~]# ip route add 74.125.31.199 dev eth0 via 1.2.3.4 RTNETLINK answers: Network is unreachable [root@localhost ~]# ip rule add to 55.66.77.88 table 200 [root@localhost ~]# ip route add 74.125.31.199 dev eth0 via 1.2.3.4 RTNETLINK answers: No such process [root@localhost ~]# Commit 83886b6b636173b206f475929e58fac75c6f2446 ("[NET]: Change "not found" return value for rule lookup") changed fib_rules_lookup() to use -ESRCH as a "not found" code internally, but for user space it should be translated into -ENETUNREACH. Handle the translation centrally in ipv4-specific fib_lookup(), leaving the DECnet case alone. On a related note, commit b7a71b51ee37d919e4098cd961d59a883fd272d8 ("ipv4: removed redundant conditional") removed a similar translation from ip_route_input_slow() prematurely AIUI. Fixes: b7a71b51ee37 ("ipv4: removed redundant conditional") Signed-off-by: Panu Matilainen <pmatilai@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21net: sctp: fix skb_over_panic when receiving malformed ASCONF chunksDaniel Borkmann
commit 9de7922bc709eee2f609cd01d98aaedc4cf5ea74 upstream. Commit 6f4c618ddb0 ("SCTP : Add paramters validity check for ASCONF chunk") added basic verification of ASCONF chunks, however, it is still possible to remotely crash a server by sending a special crafted ASCONF chunk, even up to pre 2.6.12 kernels: skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768 head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950 end:0x440 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:129! [...] Call Trace: <IRQ> [<ffffffff8144fb1c>] skb_put+0x5c/0x70 [<ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp] [<ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp] [<ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20 [<ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp] [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp] [<ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0 [<ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp] [<ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp] [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp] [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp] [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter] [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0 [<ffffffff81497078>] ip_local_deliver+0x98/0xa0 [<ffffffff8149653d>] ip_rcv_finish+0x12d/0x440 [<ffffffff81496ac5>] ip_rcv+0x275/0x350 [<ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750 [<ffffffff81460588>] netif_receive_skb+0x58/0x60 This can be triggered e.g., through a simple scripted nmap connection scan injecting the chunk after the handshake, for example, ... -------------- INIT[ASCONF; ASCONF_ACK] -------------> <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------ -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- ------------------ ASCONF; UNKNOWN ------------------> ... where ASCONF chunk of length 280 contains 2 parameters ... 1) Add IP address parameter (param length: 16) 2) Add/del IP address parameter (param length: 255) ... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the Address Parameter in the ASCONF chunk is even missing, too. This is just an example and similarly-crafted ASCONF chunks could be used just as well. The ASCONF chunk passes through sctp_verify_asconf() as all parameters passed sanity checks, and after walking, we ended up successfully at the chunk end boundary, and thus may invoke sctp_process_asconf(). Parameter walking is done with WORD_ROUND() to take padding into account. In sctp_process_asconf()'s TLV processing, we may fail in sctp_process_asconf_param() e.g., due to removal of the IP address that is also the source address of the packet containing the ASCONF chunk, and thus we need to add all TLVs after the failure to our ASCONF response to remote via helper function sctp_add_asconf_response(), which basically invokes a sctp_addto_chunk() adding the error parameters to the given skb. When walking to the next parameter this time, we proceed with ... length = ntohs(asconf_param->param_hdr.length); asconf_param = (void *)asconf_param + length; ... instead of the WORD_ROUND()'ed length, thus resulting here in an off-by-one that leads to reading the follow-up garbage parameter length of 12336, and thus throwing an skb_over_panic for the reply when trying to sctp_addto_chunk() next time, which implicitly calls the skb_put() with that length. Fix it by using sctp_walk_params() [ which is also used in INIT parameter processing ] macro in the verification *and* in ASCONF processing: it will make sure we don't spill over, that we walk parameters WORD_ROUND()'ed. Moreover, we're being more defensive and guard against unknown parameter types and missized addresses. Joint work with Vlad Yasevich. Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21net: sctp: fix panic on duplicate ASCONF chunksDaniel Borkmann
commit b69040d8e39f20d5215a03502a8e8b4c6ab78395 upstream. When receiving a e.g. semi-good formed connection scan in the form of ... -------------- INIT[ASCONF; ASCONF_ACK] -------------> <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------ -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- ---------------- ASCONF_a; ASCONF_b -----------------> ... where ASCONF_a equals ASCONF_b chunk (at least both serials need to be equal), we panic an SCTP server! The problem is that good-formed ASCONF chunks that we reply with ASCONF_ACK chunks are cached per serial. Thus, when we receive a same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do not need to process them again on the server side (that was the idea, also proposed in the RFC). Instead, we know it was cached and we just resend the cached chunk instead. So far, so good. Where things get nasty is in SCTP's side effect interpreter, that is, sctp_cmd_interpreter(): While incoming ASCONF_a (chunk = event_arg) is being marked !end_of_packet and !singleton, and we have an association context, we do not flush the outqueue the first time after processing the ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it queued up, although we set local_cork to 1. Commit 2e3216cd54b1 changed the precedence, so that as long as we get bundled, incoming chunks we try possible bundling on outgoing queue as well. Before this commit, we would just flush the output queue. Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we continue to process the same ASCONF_b chunk from the packet. As we have cached the previous ASCONF_ACK, we find it, grab it and do another SCTP_CMD_REPLY command on it. So, effectively, we rip the chunk->list pointers and requeue the same ASCONF_ACK chunk another time. Since we process ASCONF_b, it's correctly marked with end_of_packet and we enforce an uncork, and thus flush, thus crashing the kernel. Fix it by testing if the ASCONF_ACK is currently pending and if that is the case, do not requeue it. When flushing the output queue we may relink the chunk for preparing an outgoing packet, but eventually unlink it when it's copied into the skb right before transmission. Joint work with Vlad Yasevich. Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21net: sctp: fix remote memory pressure from excessive queueingDaniel Borkmann
commit 26b87c7881006311828bb0ab271a551a62dcceb4 upstream. This scenario is not limited to ASCONF, just taken as one example triggering the issue. When receiving ASCONF probes in the form of ... -------------- INIT[ASCONF; ASCONF_ACK] -------------> <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------ -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- ---- ASCONF_a; [ASCONF_b; ...; ASCONF_n;] JUNK ------> [...] ---- ASCONF_m; [ASCONF_o; ...; ASCONF_z;] JUNK ------> ... where ASCONF_a, ASCONF_b, ..., ASCONF_z are good-formed ASCONFs and have increasing serial numbers, we process such ASCONF chunk(s) marked with !end_of_packet and !singleton, since we have not yet reached the SCTP packet end. SCTP does only do verification on a chunk by chunk basis, as an SCTP packet is nothing more than just a container of a stream of chunks which it eats up one by one. We could run into the case that we receive a packet with a malformed tail, above marked as trailing JUNK. All previous chunks are here goodformed, so the stack will eat up all previous chunks up to this point. In case JUNK does not fit into a chunk header and there are no more other chunks in the input queue, or in case JUNK contains a garbage chunk header, but the encoded chunk length would exceed the skb tail, or we came here from an entirely different scenario and the chunk has pdiscard=1 mark (without having had a flush point), it will happen, that we will excessively queue up the association's output queue (a correct final chunk may then turn it into a response flood when flushing the queue ;)): I ran a simple script with incremental ASCONF serial numbers and could see the server side consuming excessive amount of RAM [before/after: up to 2GB and more]. The issue at heart is that the chunk train basically ends with !end_of_packet and !singleton markers and since commit 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet") therefore preventing an output queue flush point in sctp_do_sm() -> sctp_cmd_interpreter() on the input chunk (chunk = event_arg) even though local_cork is set, but its precedence has changed since then. In the normal case, the last chunk with end_of_packet=1 would trigger the queue flush to accommodate possible outgoing bundling. In the input queue, sctp_inq_pop() seems to do the right thing in terms of discarding invalid chunks. So, above JUNK will not enter the state machine and instead be released and exit the sctp_assoc_bh_rcv() chunk processing loop. It's simply the flush point being missing at loop exit. Adding a try-flush approach on the output queue might not work as the underlying infrastructure might be long gone at this point due to the side-effect interpreter run. One possibility, albeit a bit of a kludge, would be to defer invalid chunk freeing into the state machine in order to possibly trigger packet discards and thus indirectly a queue flush on error. It would surely be better to discard chunks as in the current, perhaps better controlled environment, but going back and forth, it's simply architecturally not possible. I tried various trailing JUNK attack cases and it seems to look good now. Joint work with Vlad Yasevich. Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21netfilter: nft_compat: fix wrong target lookup in nft_target_select_ops()Arturo Borrero
commit 7965ee93719921ea5978f331da653dfa2d7b99f5 upstream. The code looks for an already loaded target, and the correct list to search is nft_target_list, not nft_match_list. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21netfilter: nf_log: release skbuff on nlmsg put failureHoucheng Lin
commit b51d3fa364885a2c1e1668f88776c67c95291820 upstream. The kernel should reserve enough room in the skb so that the DONE message can always be appended. However, in case of e.g. new attribute erronously not being size-accounted for, __nfulnl_send() will still try to put next nlmsg into this full skbuf, causing the skb to be stuck forever and blocking delivery of further messages. Fix issue by releasing skb immediately after nlmsg_put error and WARN() so we can track down the cause of such size mismatch. [ fw@strlen.de: add tailroom/len info to WARN ] Signed-off-by: Houcheng Lin <houcheng@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21netfilter: nfnetlink_log: fix maximum packet length logged to userspaceFlorian Westphal
commit c1e7dc91eed0ed1a51c9b814d648db18bf8fc6e9 upstream. don't try to queue payloads > 0xffff - NLA_HDRLEN, it does not work. The nla length includes the size of the nla struct, so anything larger results in u16 integer overflow. This patch is similar to 9cefbbc9c8f9abe (netfilter: nfnetlink_queue: cleanup copy_range usage). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21netfilter: nf_log: account for size of NLMSG_DONE attributeFlorian Westphal
commit 9dfa1dfe4d5e5e66a991321ab08afe69759d797a upstream. We currently neither account for the nlattr size, nor do we consider the size of the trailing NLMSG_DONE when allocating nlmsg skb. This can result in nflog to stop working, as __nfulnl_send() re-tries sending forever if it failed to append NLMSG_DONE (which will never work if buffer is not large enough). Reported-by: Houcheng Lin <houcheng@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21netfilter: nf_tables: check for NULL in nf_tables_newchain pcpu stats allocationSabrina Dubroca
commit c123bb7163043bb8f33858cf8e45b01c17dbd171 upstream. alloc_percpu returns NULL on failure, not a negative error code. Fixes: ff3cd7b3c922 ("netfilter: nf_tables: refactor chain statistic routines") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21netfilter: ipset: off by one in ip_set_nfnl_get_byindex()Dan Carpenter
commit 0f9f5e1b83abd2b37c67658e02a6fc9001831fa5 upstream. The ->ip_set_list[] array is initialized in ip_set_net_init() and it has ->ip_set_max elements so this check should be >= instead of > otherwise we are off by one. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21sunrpc: fix sleeping under rcu_read_lock in gss_stringify_acceptorJeff Layton
commit b3ecba096729f521312d1863ad22530695527aed upstream. Bruce reported that he was seeing the following BUG pop: BUG: sleeping function called from invalid context at mm/slab.c:2846 in_atomic(): 0, irqs_disabled(): 0, pid: 4539, name: mount.nfs 2 locks held by mount.nfs/4539: #0: (nfs_clid_init_mutex){+.+.+.}, at: [<ffffffffa01c0a9a>] nfs4_discover_server_trunking+0x4a/0x2f0 [nfsv4] #1: (rcu_read_lock){......}, at: [<ffffffffa00e3185>] gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss] Preemption disabled at:[<ffffffff81a4f082>] printk+0x4d/0x4f CPU: 3 PID: 4539 Comm: mount.nfs Not tainted 3.18.0-rc1-00013-g5b095e9 #3393 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 ffff880021499390 ffff8800381476a8 ffffffff81a534cf 0000000000000001 0000000000000000 ffff8800381476c8 ffffffff81097854 00000000000000d0 0000000000000018 ffff880038147718 ffffffff8118e4f3 0000000020479f00 Call Trace: [<ffffffff81a534cf>] dump_stack+0x4f/0x7c [<ffffffff81097854>] __might_sleep+0x114/0x180 [<ffffffff8118e4f3>] __kmalloc+0x1a3/0x280 [<ffffffffa00e31d8>] gss_stringify_acceptor+0x58/0xb0 [auth_rpcgss] [<ffffffffa00e3185>] ? gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss] [<ffffffffa006b438>] rpcauth_stringify_acceptor+0x18/0x30 [sunrpc] [<ffffffffa01b0469>] nfs4_proc_setclientid+0x199/0x380 [nfsv4] [<ffffffffa01b04d0>] ? nfs4_proc_setclientid+0x200/0x380 [nfsv4] [<ffffffffa01bdf1a>] nfs40_discover_server_trunking+0xda/0x150 [nfsv4] [<ffffffffa01bde45>] ? nfs40_discover_server_trunking+0x5/0x150 [nfsv4] [<ffffffffa01c0acf>] nfs4_discover_server_trunking+0x7f/0x2f0 [nfsv4] [<ffffffffa01c8e24>] nfs4_init_client+0x104/0x2f0 [nfsv4] [<ffffffffa01539b4>] nfs_get_client+0x314/0x3f0 [nfs] [<ffffffffa0153780>] ? nfs_get_client+0xe0/0x3f0 [nfs] [<ffffffffa01c83aa>] nfs4_set_client+0x8a/0x110 [nfsv4] [<ffffffffa0069708>] ? __rpc_init_priority_wait_queue+0xa8/0xf0 [sunrpc] [<ffffffffa01c9b2f>] nfs4_create_server+0x12f/0x390 [nfsv4] [<ffffffffa01c1472>] nfs4_remote_mount+0x32/0x60 [nfsv4] [<ffffffff81196489>] mount_fs+0x39/0x1b0 [<ffffffff81166145>] ? __alloc_percpu+0x15/0x20 [<ffffffff811b276b>] vfs_kern_mount+0x6b/0x150 [<ffffffffa01c1396>] nfs_do_root_mount+0x86/0xc0 [nfsv4] [<ffffffffa01c1784>] nfs4_try_mount+0x44/0xc0 [nfsv4] [<ffffffffa01549b7>] ? get_nfs_version+0x27/0x90 [nfs] [<ffffffffa0161a2d>] nfs_fs_mount+0x47d/0xd60 [nfs] [<ffffffff81a59c5e>] ? mutex_unlock+0xe/0x10 [<ffffffffa01606a0>] ? nfs_remount+0x430/0x430 [nfs] [<ffffffffa01609c0>] ? nfs_clone_super+0x140/0x140 [nfs] [<ffffffff81196489>] mount_fs+0x39/0x1b0 [<ffffffff81166145>] ? __alloc_percpu+0x15/0x20 [<ffffffff811b276b>] vfs_kern_mount+0x6b/0x150 [<ffffffff811b5830>] do_mount+0x210/0xbe0 [<ffffffff811b54ca>] ? copy_mount_options+0x3a/0x160 [<ffffffff811b651f>] SyS_mount+0x6f/0xb0 [<ffffffff81a5c852>] system_call_fastpath+0x12/0x17 Sleeping under the rcu_read_lock is bad. This patch fixes it by dropping the rcu_read_lock before doing the allocation and then reacquiring it and redoing the dereference before doing the copy. If we find that the string has somehow grown in the meantime, we'll reallocate and try again. Reported-by: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21mac80211: fix use-after-free in defragmentationJohannes Berg
commit b8fff407a180286aa683d543d878d98d9fc57b13 upstream. Upon receiving the last fragment, all but the first fragment are freed, but the multicast check for statistics at the end of the function refers to the current skb (the last fragment) causing a use-after-free bug. Since multicast frames cannot be fragmented and we check for this early in the function, just modify that check to also do the accounting to fix the issue. Reported-by: Yosef Khyal <yosefx.khyal@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21mac80211: schedule the actual switch of the station before CSA count 0Luciano Coelho
commit ff1e417c7c239b7abfe70aa90460a77eaafc7f83 upstream. Due to the time it takes to process the beacon that started the CSA process, we may be late for the switch if we try to reach exactly beacon 0. To avoid that, use count - 1 when calculating the switch time. Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21mac80211: use secondary channel offset IE also beacons during CSALuciano Coelho
commit 84469a45a1bedec9918e94ab2f78c5dc0739e4a7 upstream. If we are switching from an HT40+ to an HT40- channel (or vice-versa), we need the secondary channel offset IE to specify what is the post-CSA offset to be used. This applies both to beacons and to probe responses. In ieee80211_parse_ch_switch_ie() we were ignoring this IE from beacons and using the *current* HT information IE instead. This was causing us to use the same offset as before the switch. Fix that by using the secondary channel offset IE also for beacons and don't ever use the pre-switch offset. Additionally, remove the "beacon" argument from ieee80211_parse_ch_switch_ie(), since it's not needed anymore. Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21mac80211: properly flush delayed scan work on interface removalJohannes Berg
commit 46238845bd609a5c0fbe076e1b82b4c5b33360b2 upstream. When an interface is deleted, an ongoing hardware scan is canceled and the driver must abort the scan, at the very least reporting completion while the interface is removed. However, if it scheduled the work that might only run after everything is said and done, which leads to cfg80211 warning that the scan isn't reported as finished yet; this is no fault of the driver, it already did, but mac80211 hasn't processed it. To fix this situation, flush the delayed work when the interface being removed is the one that was executing the scan. Reported-by: Sujith Manoharan <sujith@msujith.org> Tested-by: Sujith Manoharan <sujith@msujith.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21libceph: do not crash on large auth ticketsIlya Dryomov
commit aaef31703a0cf6a733e651885bfb49edc3ac6774 upstream. Large (greater than 32k, the value of PAGE_ALLOC_COSTLY_ORDER) auth tickets will have their buffers vmalloc'ed, which leads to the following crash in crypto: [ 28.685082] BUG: unable to handle kernel paging request at ffffeb04000032c0 [ 28.686032] IP: [<ffffffff81392b42>] scatterwalk_pagedone+0x22/0x80 [ 28.686032] PGD 0 [ 28.688088] Oops: 0000 [#1] PREEMPT SMP [ 28.688088] Modules linked in: [ 28.688088] CPU: 0 PID: 878 Comm: kworker/0:2 Not tainted 3.17.0-vm+ #305 [ 28.688088] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 28.688088] Workqueue: ceph-msgr con_work [ 28.688088] task: ffff88011a7f9030 ti: ffff8800d903c000 task.ti: ffff8800d903c000 [ 28.688088] RIP: 0010:[<ffffffff81392b42>] [<ffffffff81392b42>] scatterwalk_pagedone+0x22/0x80 [ 28.688088] RSP: 0018:ffff8800d903f688 EFLAGS: 00010286 [ 28.688088] RAX: ffffeb04000032c0 RBX: ffff8800d903f718 RCX: ffffeb04000032c0 [ 28.688088] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8800d903f750 [ 28.688088] RBP: ffff8800d903f688 R08: 00000000000007de R09: ffff8800d903f880 [ 28.688088] R10: 18df467c72d6257b R11: 0000000000000000 R12: 0000000000000010 [ 28.688088] R13: ffff8800d903f750 R14: ffff8800d903f8a0 R15: 0000000000000000 [ 28.688088] FS: 00007f50a41c7700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000 [ 28.688088] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 28.688088] CR2: ffffeb04000032c0 CR3: 00000000da3f3000 CR4: 00000000000006b0 [ 28.688088] Stack: [ 28.688088] ffff8800d903f698 ffffffff81392ca8 ffff8800d903f6e8 ffffffff81395d32 [ 28.688088] ffff8800dac96000 ffff880000000000 ffff8800d903f980 ffff880119b7e020 [ 28.688088] ffff880119b7e010 0000000000000000 0000000000000010 0000000000000010 [ 28.688088] Call Trace: [ 28.688088] [<ffffffff81392ca8>] scatterwalk_done+0x38/0x40 [ 28.688088] [<ffffffff81392ca8>] scatterwalk_done+0x38/0x40 [ 28.688088] [<ffffffff81395d32>] blkcipher_walk_done+0x182/0x220 [ 28.688088] [<ffffffff813990bf>] crypto_cbc_encrypt+0x15f/0x180 [ 28.688088] [<ffffffff81399780>] ? crypto_aes_set_key+0x30/0x30 [ 28.688088] [<ffffffff8156c40c>] ceph_aes_encrypt2+0x29c/0x2e0 [ 28.688088] [<ffffffff8156d2a3>] ceph_encrypt2+0x93/0xb0 [ 28.688088] [<ffffffff8156d7da>] ceph_x_encrypt+0x4a/0x60 [ 28.688088] [<ffffffff8155b39d>] ? ceph_buffer_new+0x5d/0xf0 [ 28.688088] [<ffffffff8156e837>] ceph_x_build_authorizer.isra.6+0x297/0x360 [ 28.688088] [<ffffffff8112089b>] ? kmem_cache_alloc_trace+0x11b/0x1c0 [ 28.688088] [<ffffffff8156b496>] ? ceph_auth_create_authorizer+0x36/0x80 [ 28.688088] [<ffffffff8156ed83>] ceph_x_create_authorizer+0x63/0xd0 [ 28.688088] [<ffffffff8156b4b4>] ceph_auth_create_authorizer+0x54/0x80 [ 28.688088] [<ffffffff8155f7c0>] get_authorizer+0x80/0xd0 [ 28.688088] [<ffffffff81555a8b>] prepare_write_connect+0x18b/0x2b0 [ 28.688088] [<ffffffff81559289>] try_read+0x1e59/0x1f10 This is because we set up crypto scatterlists as if all buffers were kmalloc'ed. Fix it. Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21netlink: Properly unbind in error conditions.Hiroaki SHIMODA
[ Upstream commit 6251edd932ce3faadbfe27b0a0fe79780e0972e9 ] Even if netlink_kernel_cfg::unbind is implemented the unbind() method is not called, because cfg->unbind is omitted in __netlink_kernel_create(). And fix wrong argument of test_bit() and off by one problem. At this point, no unbind() method is implemented, so there is no real issue. Fixes: 4f520900522f ("netlink: have netlink per-protocol bind function return an error code.") Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Cc: Richard Guy Briggs <rgb@redhat.com> Acked-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21ipv6: fix IPV6_PKTINFO with v4 mappedEric Dumazet
[ Upstream commit 5337b5b75cd9bd3624a6820e3c2a084d2480061c ] Use IS_ENABLED(CONFIG_IPV6), to enable this code if IPv6 is a module. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: c8e6ad0829a7 ("ipv6: honor IPV6_PKTINFO with v4 mapped addresses on sendmsg") Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21net: sctp: fix memory leak in auth key managementDaniel Borkmann
[ Upstream commit 4184b2a79a7612a9272ce20d639934584a1f3786 ] A very minimal and simple user space application allocating an SCTP socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing the socket again will leak the memory containing the authentication key from user space: unreferenced object 0xffff8800837047c0 (size 16): comm "a.out", pid 2789, jiffies 4296954322 (age 192.258s) hex dump (first 16 bytes): 01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff816d7e8e>] kmemleak_alloc+0x4e/0xb0 [<ffffffff811c88d8>] __kmalloc+0xe8/0x270 [<ffffffffa0870c23>] sctp_auth_create_key+0x23/0x50 [sctp] [<ffffffffa08718b1>] sctp_auth_set_key+0xa1/0x140 [sctp] [<ffffffffa086b383>] sctp_setsockopt+0xd03/0x1180 [sctp] [<ffffffff815bfd94>] sock_common_setsockopt+0x14/0x20 [<ffffffff815beb61>] SyS_setsockopt+0x71/0xd0 [<ffffffff816e58a9>] system_call_fastpath+0x12/0x17 [<ffffffffffffffff>] 0xffffffffffffffff This is bad because of two things, we can bring down a machine from user space when auth_enable=1, but also we would leave security sensitive keying material in memory without clearing it after use. The issue is that sctp_auth_create_key() already sets the refcount to 1, but after allocation sctp_auth_set_key() does an additional refcount on it, and thus leaving it around when we free the socket. Fixes: 65b07e5d0d0 ("[SCTP]: API updates to suport SCTP-AUTH extensions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed ↵Daniel Borkmann
packet [ Upstream commit e40607cbe270a9e8360907cb1e62ddf0736e4864 ] An SCTP server doing ASCONF will panic on malformed INIT ping-of-death in the form of: ------------ INIT[PARAM: SET_PRIMARY_IP] ------------> While the INIT chunk parameter verification dissects through many things in order to detect malformed input, it misses to actually check parameters inside of parameters. E.g. RFC5061, section 4.2.4 proposes a 'set primary IP address' parameter in ASCONF, which has as a subparameter an address parameter. So an attacker may send a parameter type other than SCTP_PARAM_IPV4_ADDRESS or SCTP_PARAM_IPV6_ADDRESS, param_type2af() will subsequently return 0 and thus sctp_get_af_specific() returns NULL, too, which we then happily dereference unconditionally through af->from_addr_param(). The trace for the log: BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 IP: [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp] PGD 0 Oops: 0000 [#1] SMP [...] Pid: 0, comm: swapper Not tainted 2.6.32-504.el6.x86_64 #1 Bochs Bochs RIP: 0010:[<ffffffffa01e9c62>] [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp] [...] Call Trace: <IRQ> [<ffffffffa01f2add>] ? sctp_bind_addr_copy+0x5d/0xe0 [sctp] [<ffffffffa01e1fcb>] sctp_sf_do_5_1B_init+0x21b/0x340 [sctp] [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp] [<ffffffffa01e5c09>] ? sctp_endpoint_lookup_assoc+0xc9/0xf0 [sctp] [<ffffffffa01e61f6>] sctp_endpoint_bh_rcv+0x116/0x230 [sctp] [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp] [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp] [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter] [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0 [...] A minimal way to address this is to check for NULL as we do on all other such occasions where we know sctp_get_af_specific() could possibly return with NULL. Fixes: d6de3097592b ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21gre6: Move the setting of dev->iflink into the ndo_init functions.Steffen Klassert
[ Upstream commit f03eb128e3f4276f46442d14f3b8f864f3775821 ] Otherwise it gets overwritten by register_netdev(). Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21sit: Use ipip6_tunnel_init as the ndo_init function.Steffen Klassert
[ Upstream commit ebe084aafb7e93adf210e80043c9f69adf56820d ] ipip6_tunnel_init() sets the dev->iflink via a call to ipip6_tunnel_bind_dev(). After that, register_netdevice() sets dev->iflink = -1. So we loose the iflink configuration for ipv6 tunnels. Fix this by using ipip6_tunnel_init() as the ndo_init function. Then ipip6_tunnel_init() is called after dev->iflink is set to -1 from register_netdevice(). Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21vti6: Use vti6_dev_init as the ndo_init function.Steffen Klassert
[ Upstream commit 16a0231bf7dc3fb37e9b1f1cb1a277dc220b5c5e ] vti6_dev_init() sets the dev->iflink via a call to vti6_link_config(). After that, register_netdevice() sets dev->iflink = -1. So we loose the iflink configuration for vti6 tunnels. Fix this by using vti6_dev_init() as the ndo_init function. Then vti6_dev_init() is called after dev->iflink is set to -1 from register_netdevice(). Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function.Steffen Klassert
[ Upstream commit 6c6151daaf2d8dc2046d9926539feed5f66bf74e ] ip6_tnl_dev_init() sets the dev->iflink via a call to ip6_tnl_link_config(). After that, register_netdevice() sets dev->iflink = -1. So we loose the iflink configuration for ipv6 tunnels. Fix this by using ip6_tnl_dev_init() as the ndo_init function. Then ip6_tnl_dev_init() is called after dev->iflink is set to -1 from register_netdevice(). Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21inet: frags: remove the WARN_ON from inet_evict_bucketNikolay Aleksandrov
[ Upstream commit d70127e8a942364de8dd140fe73893efda363293 ] The WARN_ON in inet_evict_bucket can be triggered by a valid case: inet_frag_kill and inet_evict_bucket can be running in parallel on the same queue which means that there has been at least one more ref added by a previous inet_frag_find call, but inet_frag_kill can delete the timer before inet_evict_bucket which will cause the WARN_ON() there to trigger since we'll have refcnt!=1. Now, this case is valid because the queue is being "killed" for some reason (removed from the chain list and its timer deleted) so it will get destroyed in the end by one of the inet_frag_put() calls which reaches 0 i.e. refcnt is still valid. CC: Florian Westphal <fw@strlen.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McLean <chutzpah@gentoo.org> Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue") Reported-by: Patrick McLean <chutzpah@gentoo.org> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21inet: frags: fix a race between inet_evict_bucket and inet_frag_killNikolay Aleksandrov
[ Upstream commit 65ba1f1ec0eff1c25933468e1d238201c0c2cb29 ] When the evictor is running it adds some chosen frags to a local list to be evicted once the chain lock has been released but at the same time the *frag_queue can be running for some of the same queues and it may call inet_frag_kill which will wait on the chain lock and will then delete the queue from the wrong list since it was added in the eviction one. The fix is simple - check if the queue has the evict flag set under the chain lock before deleting it, this is safe because the evict flag is set only under that lock and having the flag set also means that the queue has been detached from the chain list, so no need to delete it again. An important note to make is that we're safe w.r.t refcnt because inet_frag_kill and inet_evict_bucket will sync on the del_timer operation where only one of the two can succeed (or if the timer is executing - none of them), the cases are: 1. inet_frag_kill succeeds in del_timer - then the timer ref is removed, but inet_evict_bucket will not add this queue to its expire list but will restart eviction in that chain 2. inet_evict_bucket succeeds in del_timer - then the timer ref is kept until the evictor "expires" the queue, but inet_frag_kill will remove the initial ref and will set INET_FRAG_COMPLETE which will make the frag_expire fn just to remove its ref. In the end all of the queue users will do an inet_frag_put and the one that reaches 0 will free it. The refcount balance should be okay. CC: Florian Westphal <fw@strlen.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McLean <chutzpah@gentoo.org> Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Patrick McLean <chutzpah@gentoo.org> Tested-by: Patrick McLean <chutzpah@gentoo.org> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14mac80211: fix typo in starting baserate for rts_cts_rate_idxKarl Beldan
commit c7abf25af0f41be4b50d44c5b185d52eea360cb8 upstream. It affects non-(V)HT rates and can lead to selecting an rts_cts rate that is not a basic rate or way superior to the reference rate (ATM rates[0] used for the 1st attempt of the protected frame data). E.g, assuming drivers register growing (bitrate) sorted tables of ieee80211_rate-s, having : - rates[0].idx == d'2 and basic_rates == b'10100 will select rts_cts idx b'10011 & ~d'(BIT(2)-1), i.e. 1, likewise - rates[0].idx == d'2 and basic_rates == b'10001 will select rts_cts idx b'10000 The first is not a basic rate and the second is > rates[0]. Also, wrt severity of the addressed misbehavior, ATM we only have one rts_cts_rate_idx rather than one per rate table entry, so this idx might still point to bitrates > rates[1..MAX_RATES]. Fixes: 5253ffb8c9e1 ("mac80211: always pick a basic rate to tx RTS/CTS for pre-HT rates") Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14libceph: ceph-msgr workqueue needs a resque workerIlya Dryomov
commit f9865f06f7f18c6661c88d0511f05c48612319cc upstream. Commit f363e45fd118 ("net/ceph: make ceph_msgr_wq non-reentrant") effectively removed WQ_MEM_RECLAIM flag from ceph_msgr_wq. This is wrong - libceph is very much a memory reclaim path, so restore it. Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Tested-by: Micha Krause <micha@krausam.de> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14fix misuses of f_count() in ppp and netlinkAl Viro
commit 24dff96a37a2ca319e75a74d3929b2de22447ca6 upstream. we used to check for "nobody else could start doing anything with that opened file" by checking that refcount was 2 or less - one for descriptor table and one we'd acquired in fget() on the way to wherever we are. That was race-prone (somebody else might have had a reference to descriptor table and do fget() just as we'd been checking) and it had become flat-out incorrect back when we switched to fget_light() on those codepaths - unlike fget(), it doesn't grab an extra reference unless the descriptor table is shared. The same change allowed a race-free check, though - we are safe exactly when refcount is less than 2. It was a long time ago; pre-2.6.12 for ioctl() (the codepath leading to ppp one) and 2.6.17 for sendmsg() (netlink one). OTOH, netlink hadn't grown that check until 3.9 and ppp used to live in drivers/net, not drivers/net/ppp until 3.1. The bug existed well before that, though, and the same fix used to apply in old location of file. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUTTrond Myklebust
commit 2aca5b869ace67a63aab895659e5dc14c33a4d6e upstream. The flag RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT was intended introduced in order to allow NFSv4 clients to disable resend timeouts. Since those cause the RPC layer to break the connection, they mess up the duplicate reply caches that remain indexed on the port number in NFSv4.. This patch includes the code that was missing in the original to set the appropriate flag in struct rpc_clnt, when the caller of rpc_create() sets RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT. Fixes: 8a19a0b6cb2e (SUNRPC: Add RPC task and client level options to...) Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14SUNRPC: Don't wake tasks during connection abortBenjamin Coddington
commit a743419f420a64d442280845c0377a915b76644f upstream. When aborting a connection to preserve source ports, don't wake the task in xs_error_report. This allows tasks with RPC_TASK_SOFTCONN to succeed if the connection needs to be re-established since it preserves the task's status instead of setting it to the status of the aborting kernel_connect(). This may also avoid a potential conflict on the socket's lock. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packetsBen Hutchings
[ Upstream commit 5188cd44c55db3e92cd9e77a40b5baa7ed4340f7 ] UFO is now disabled on all drivers that work with virtio net headers, but userland may try to send UFO/IPv6 packets anyway. Instead of sending with ID=0, we should select identifiers on their behalf (as we used to). Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14gre: Use inner mac length when computing tunnel lengthTom Herbert
[ Upstream commit 14051f0452a2c26a3f4791e6ad6a435e8f1945ff ] Currently, skb_inner_network_header is used but this does not account for Ethernet header for ETH_P_TEB. Use skb_inner_mac_header which handles TEB and also should work with IP encapsulation in which case inner mac and inner network headers are the same. Tested: Ran TCP_STREAM over GRE, worked as expected. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14ipv4: Do not cache routing failures due to disabled forwarding.Nicolas Cavallari
[ Upstream commit fa19c2b050ab5254326f5fc07096dd3c6a8d5d58 ] If we cache them, the kernel will reuse them, independently of whether forwarding is enabled or not. Which means that if forwarding is disabled on the input interface where the first routing request comes from, then that unreachable result will be cached and reused for other interfaces, even if forwarding is enabled on them. The opposite is also true. This can be verified with two interfaces A and B and an output interface C, where B has forwarding enabled, but not A and trying ip route get $dst iif A from $src && ip route get $dst iif B from $src Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr> Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14tcp: md5: do not use alloc_percpu()Eric Dumazet
[ Upstream commit 349ce993ac706869d553a1816426d3a4bfda02b1 ] percpu tcp_md5sig_pool contains memory blobs that ultimately go through sg_set_buf(). -> sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf)); This requires that whole area is in a physically contiguous portion of memory. And that @buf is not backed by vmalloc(). Given that alloc_percpu() can use vmalloc() areas, this does not fit the requirements. Replace alloc_percpu() by a static DEFINE_PER_CPU() as tcp_md5sig_pool is small anyway, there is no gain to dynamically allocate it. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 765cf9976e93 ("tcp: md5: remove one indirection level in tcp_md5sig_pool") Reported-by: Crestez Dan Leonard <cdleonard@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14net: fix saving TX flow hash in sock for outgoing connectionsSathya Perla
[ Upstream commit 9e7ceb060754f134231f68cb29d5db31419fe1ed ] The commit "net: Save TX flow hash in sock and set in skbuf on xmit" introduced the inet_set_txhash() and ip6_set_txhash() routines to calculate and record flow hash(sk_txhash) in the socket structure. sk_txhash is used to set skb->hash which is used to spread flows across multiple TXQs. But, the above routines are invoked before the source port of the connection is created. Because of this all outgoing connections that just differ in the source port get hashed into the same TXQ. This patch fixes this problem for IPv4/6 by invoking the the above routines after the source port is available for the socket. Fixes: b73c3d0e4("net: Save TX flow hash in sock and set in skbuf on xmit") Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14net: tso: fix unaligned access to crafted TCP header in helper APIKarl Beldan
[ Upstream commit a63ba13eec092b70d4e5522d692eaeb2f9747387 ] The crafted header start address is from a driver supplied buffer, which one can reasonably expect to be aligned on a 4-bytes boundary. However ATM the TSO helper API is only used by ethernet drivers and the tcp header will then be aligned to a 2-bytes only boundary from the header start address. Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Cc: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14netlink: Re-add locking to netlink_lookup() and seq walkerThomas Graf
[ Upstream commit 78fd1d0ab072d4d9b5f0b7c14a1516665170b565 ] The synchronize_rcu() in netlink_release() introduces unacceptable latency. Reintroduce minimal lookup so we can drop the synchronize_rcu() until socket destruction has been RCUfied. Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com> Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>