summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2014-07-06netfilter: ipt_ULOG: fix info leaksMathias Krause
commit 278f2b3e2af5f32ea1afe34fa12a2518153e6e49 upstream. The ulog messages leak heap bytes by the means of padding bytes and incompletely filled string arrays. Fix those by memset(0)'ing the whole struct before filling it. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jan Tore Morken <jantore@morken.priv.no> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-06ipvs: Fix panic due to non-linear skbPeter Christensen
commit f44a5f45f544561302e855e7bd104e5f506ec01b upstream. Receiving a ICMP response to an IPIP packet in a non-linear skb could cause a kernel panic in __skb_pull. The problem was introduced in commit f2edb9f7706dcb2c0d9a362b2ba849efe3a97f5e ("ipvs: implement passive PMTUD for IPIP packets"). Signed-off-by: Peter Christensen <pch@ordbogen.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-06SUNRPC: Fix a module reference leak in svc_handle_xprtTrond Myklebust
commit c789102c20bbbdda6831a273e046715be9d6af79 upstream. If the accept() call fails, we need to put the module reference. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30skbuff: skb_segment: orphan frags before copyingMichael S. Tsirkin
commit 1fd819ecb90cc9b822cd84d3056ddba315d3340f upstream. skb_segment copies frags around, so we need to copy them carefully to avoid accessing user memory after reporting completion to userspace through a callback. skb_segment doesn't normally happen on datapath: TSO needs to be disabled - so disabling zero copy in this case does not look like a big deal. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2. As skb_segment() only supports page-frags *or* a frag list, there is no need for the additional frag_skb pointer or the preparatory renaming.] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Eddie Chapman <eddie@ehuk.net> # backported to 3.10 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30Bluetooth: Fix L2CAP deadlockJukka Taimisto
commit 8a96f3cd22878fc0bb564a8478a6e17c0b8dca73 upstream. -[0x01 Introduction We have found a programming error causing a deadlock in Bluetooth subsystem of Linux kernel. The problem is caused by missing release_sock() call when L2CAP connection creation fails due full accept queue. The issue can be reproduced with 3.15-rc5 kernel and is also present in earlier kernels. -[0x02 Details The problem occurs when multiple L2CAP connections are created to a PSM which contains listening socket (like SDP) and left pending, for example, configuration (the underlying ACL link is not disconnected between connections). When L2CAP connection request is received and listening socket is found the l2cap_sock_new_connection_cb() function (net/bluetooth/l2cap_sock.c) is called. This function locks the 'parent' socket and then checks if the accept queue is full. 1178 lock_sock(parent); 1179 1180 /* Check for backlog size */ 1181 if (sk_acceptq_is_full(parent)) { 1182 BT_DBG("backlog full %d", parent->sk_ack_backlog); 1183 return NULL; 1184 } If case the accept queue is full NULL is returned, but the 'parent' socket is not released. Thus when next L2CAP connection request is received the code blocks on lock_sock() since the parent is still locked. Also note that for connections already established and waiting for configuration to complete a timeout will occur and l2cap_chan_timeout() (net/bluetooth/l2cap_core.c) will be called. All threads calling this function will also be blocked waiting for the channel mutex since the thread which is waiting on lock_sock() alread holds the channel mutex. We were able to reproduce this by sending continuously L2CAP connection request followed by disconnection request containing invalid CID. This left the created connections pending configuration. After the deadlock occurs it is impossible to kill bluetoothd, btmon will not get any more data etc. requiring reboot to recover. -[0x03 Fix Releasing the 'parent' socket when l2cap_sock_new_connection_cb() returns NULL seems to fix the issue. Signed-off-by: Jukka Taimisto <jtt@codenomicon.com> Reported-by: Tommi Mäkilä <tmakila@codenomicon.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30af_iucv: wrong mapping of sent and confirmed skbsUrsula Braun
commit f5738e2ef88070ef1372e6e718124d88e9abe4ac upstream. When sending data through IUCV a MESSAGE COMPLETE interrupt signals that sent data memory can be freed or reused again. With commit f9c41a62bba3f3f7ef3541b2a025e3371bcbba97 "af_iucv: fix recvmsg by replacing skb_pull() function" the MESSAGE COMPLETE callback iucv_callback_txdone() identifies the wrong skb as being confirmed, which leads to data corruption. This patch fixes the skb mapping logic in iucv_callback_txdone(). Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26rtnetlink: fix userspace API breakage for iproute2 < v3.9.0Michal Schmidt
[ Upstream commit e5eca6d41f53db48edd8cf88a3f59d2c30227f8e ] When running RHEL6 userspace on a current upstream kernel, "ip link" fails to show VF information. The reason is a kernel<->userspace API change introduced by commit 88c5b5ce5cb57 ("rtnetlink: Call nlmsg_parse() with correct header length"), after which the kernel does not see iproute2's IFLA_EXT_MASK attribute in the netlink request. iproute2 adjusted for the API change in its commit 63338dca4513 ("libnetlink: Use ifinfomsg instead of rtgenmsg in rtnl_wilddump_req_filter"). The problem has been noticed before: http://marc.info/?l=linux-netdev&m=136692296022182&w=2 (Subject: Re: getting VF link info seems to be broken in 3.9-rc8) We can do better than tell those with old userspace to upgrade. We can recognize the old iproute2 in the kernel by checking the netlink message length. Even when including the IFLA_EXT_MASK attribute, its netlink message is shorter than struct ifinfomsg. With this patch "ip link" shows VF information in both old and new iproute2 versions. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26sctp: Fix sk_ack_backlog wrap-around problemXufeng Zhang
[ Upstream commit d3217b15a19a4779c39b212358a5c71d725822ee ] Consider the scenario: For a TCP-style socket, while processing the COOKIE_ECHO chunk in sctp_sf_do_5_1D_ce(), after it has passed a series of sanity check, a new association would be created in sctp_unpack_cookie(), but afterwards, some processing maybe failed, and sctp_association_free() will be called to free the previously allocated association, in sctp_association_free(), sk_ack_backlog value is decremented for this socket, since the initial value for sk_ack_backlog is 0, after the decrement, it will be 65535, a wrap-around problem happens, and if we want to establish new associations afterward in the same socket, ABORT would be triggered since sctp deem the accept queue as full. Fix this issue by only decrementing sk_ack_backlog for associations in the endpoint's list. Fix-suggested-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26ipv4: fix a race in ip4_datagram_release_cb()Eric Dumazet
[ Upstream commit 9709674e68646cee5a24e3000b3558d25412203a ] Alexey gave a AddressSanitizer[1] report that finally gave a good hint at where was the origin of various problems already reported by Dormando in the past [2] Problem comes from the fact that UDP can have a lockless TX path, and concurrent threads can manipulate sk_dst_cache, while another thread, is holding socket lock and calls __sk_dst_set() in ip4_datagram_release_cb() (this was added in linux-3.8) It seems that all we need to do is to use sk_dst_check() and sk_dst_set() so that all the writers hold same spinlock (sk->sk_dst_lock) to prevent corruptions. TCP stack do not need this protection, as all sk_dst_cache writers hold the socket lock. [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel AddressSanitizer: heap-use-after-free in ipv4_dst_check Read of size 2 by thread T15453: [<ffffffff817daa3a>] ipv4_dst_check+0x1a/0x90 ./net/ipv4/route.c:1116 [<ffffffff8175b789>] __sk_dst_check+0x89/0xe0 ./net/core/sock.c:531 [<ffffffff81830a36>] ip4_datagram_release_cb+0x46/0x390 ??:0 [<ffffffff8175eaea>] release_sock+0x17a/0x230 ./net/core/sock.c:2413 [<ffffffff81830882>] ip4_datagram_connect+0x462/0x5d0 ??:0 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b ./arch/x86/kernel/entry_64.S:629 Freed by thread T15455: [<ffffffff8178d9b8>] dst_destroy+0xa8/0x160 ./net/core/dst.c:251 [<ffffffff8178de25>] dst_release+0x45/0x80 ./net/core/dst.c:280 [<ffffffff818304c1>] ip4_datagram_connect+0xa1/0x5d0 ??:0 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b ./arch/x86/kernel/entry_64.S:629 Allocated by thread T15453: [<ffffffff8178d291>] dst_alloc+0x81/0x2b0 ./net/core/dst.c:171 [<ffffffff817db3b7>] rt_dst_alloc+0x47/0x50 ./net/ipv4/route.c:1406 [< inlined >] __ip_route_output_key+0x3e8/0xf70 __mkroute_output ./net/ipv4/route.c:1939 [<ffffffff817dde08>] __ip_route_output_key+0x3e8/0xf70 ./net/ipv4/route.c:2161 [<ffffffff817deb34>] ip_route_output_flow+0x14/0x30 ./net/ipv4/route.c:2249 [<ffffffff81830737>] ip4_datagram_connect+0x317/0x5d0 ??:0 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b ./arch/x86/kernel/entry_64.S:629 [2] <4>[196727.311203] general protection fault: 0000 [#1] SMP <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1 <4>[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013 <4>[196727.311364] task: ffff885e6f069700 ti: ffff885e6f072000 task.ti: ffff885e6f072000 <4>[196727.311377] RIP: 0010:[<ffffffff815f8c7f>] [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80 <4>[196727.311399] RSP: 0018:ffff885effd23a70 EFLAGS: 00010282 <4>[196727.311409] RAX: dead000000200200 RBX: ffff8854c398ecc0 RCX: 0000000000000040 <4>[196727.311423] RDX: dead000000100100 RSI: dead000000100100 RDI: dead000000200200 <4>[196727.311437] RBP: ffff885effd23a80 R08: ffffffff815fd9e0 R09: ffff885d5a590800 <4>[196727.311451] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 <4>[196727.311464] R13: ffffffff81c8c280 R14: 0000000000000000 R15: ffff880e85ee16ce <4>[196727.311510] FS: 0000000000000000(0000) GS:ffff885effd20000(0000) knlGS:0000000000000000 <4>[196727.311554] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[196727.311581] CR2: 00007a46751eb000 CR3: 0000005e65688000 CR4: 00000000000407e0 <4>[196727.311625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[196727.311669] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>[196727.311713] Stack: <4>[196727.311733] ffff8854c398ecc0 ffff8854c398ecc0 ffff885effd23ab0 ffffffff815b7f42 <4>[196727.311784] ffff88be6595bc00 ffff8854c398ecc0 0000000000000000 ffff8854c398ecc0 <4>[196727.311834] ffff885effd23ad0 ffffffff815b86c6 ffff885d5a590800 ffff8816827821c0 <4>[196727.311885] Call Trace: <4>[196727.311907] <IRQ> <4>[196727.311912] [<ffffffff815b7f42>] dst_destroy+0x32/0xe0 <4>[196727.311959] [<ffffffff815b86c6>] dst_release+0x56/0x80 <4>[196727.311986] [<ffffffff81620bd5>] tcp_v4_do_rcv+0x2a5/0x4a0 <4>[196727.312013] [<ffffffff81622b5a>] tcp_v4_rcv+0x7da/0x820 <4>[196727.312041] [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360 <4>[196727.312070] [<ffffffff815de02d>] ? nf_hook_slow+0x7d/0x150 <4>[196727.312097] [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360 <4>[196727.312125] [<ffffffff815fda92>] ip_local_deliver_finish+0xb2/0x230 <4>[196727.312154] [<ffffffff815fdd9a>] ip_local_deliver+0x4a/0x90 <4>[196727.312183] [<ffffffff815fd799>] ip_rcv_finish+0x119/0x360 <4>[196727.312212] [<ffffffff815fe00b>] ip_rcv+0x22b/0x340 <4>[196727.312242] [<ffffffffa0339680>] ? macvlan_broadcast+0x160/0x160 [macvlan] <4>[196727.312275] [<ffffffff815b0c62>] __netif_receive_skb_core+0x512/0x640 <4>[196727.312308] [<ffffffff811427fb>] ? kmem_cache_alloc+0x13b/0x150 <4>[196727.312338] [<ffffffff815b0db1>] __netif_receive_skb+0x21/0x70 <4>[196727.312368] [<ffffffff815b0fa1>] netif_receive_skb+0x31/0xa0 <4>[196727.312397] [<ffffffff815b1ae8>] napi_gro_receive+0xe8/0x140 <4>[196727.312433] [<ffffffffa00274f1>] ixgbe_poll+0x551/0x11f0 [ixgbe] <4>[196727.312463] [<ffffffff815fe00b>] ? ip_rcv+0x22b/0x340 <4>[196727.312491] [<ffffffff815b1691>] net_rx_action+0x111/0x210 <4>[196727.312521] [<ffffffff815b0db1>] ? __netif_receive_skb+0x21/0x70 <4>[196727.312552] [<ffffffff810519d0>] __do_softirq+0xd0/0x270 <4>[196727.312583] [<ffffffff816cef3c>] call_softirq+0x1c/0x30 <4>[196727.312613] [<ffffffff81004205>] do_softirq+0x55/0x90 <4>[196727.312640] [<ffffffff81051c85>] irq_exit+0x55/0x60 <4>[196727.312668] [<ffffffff816cf5c3>] do_IRQ+0x63/0xe0 <4>[196727.312696] [<ffffffff816c5aaa>] common_interrupt+0x6a/0x6a <4>[196727.312722] <EOI> <1>[196727.313071] RIP [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80 <4>[196727.313100] RSP <ffff885effd23a70> <4>[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]--- <0>[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt Reported-by: Alexey Preobrazhensky <preobr@google.com> Reported-by: dormando <dormando@rydia.ne> Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 8141ed9fcedb2 ("ipv4: Add a socket release callback for datagram sockets") Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26ipip, sit: fix ipv4_{update_pmtu,redirect} callsDmitry Popov
[ Upstream commit 2346829e641b804ece9ac9298136b56d9567c278 ] ipv4_{update_pmtu,redirect} were called with tunnel's ifindex (t->dev is a tunnel netdevice). It caused wrong route lookup and failure of pmtu update or redirect. We should use the same ifindex that we use in ip_route_output_* in *tunnel_xmit code. It is t->parms.link . Signed-off-by: Dmitry Popov <ixaphire@qrator.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26net: force a list_del() in unregister_netdevice_many()Eric Dumazet
[ Upstream commit 87757a917b0b3c0787e0563c679762152be81312 ] unregister_netdevice_many() API is error prone and we had too many bugs because of dangling LIST_HEAD on stacks. See commit f87e6f47933e3e ("net: dont leave active on stack LIST_HEAD") In fact, instead of making sure no caller leaves an active list_head, just force a list_del() in the callee. No one seems to need to access the list after unregister_netdevice_many() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26tcp: fix cwnd undo on DSACK in F-RTOYuchung Cheng
[ Upstream commit 0cfa5c07d6d1d7f8e710fc671c5ba1ce85e09fa4 ] This bug is discovered by an recent F-RTO issue on tcpm list https://www.ietf.org/mail-archive/web/tcpm/current/msg08794.html The bug is that currently F-RTO does not use DSACK to undo cwnd in certain cases: upon receiving an ACK after the RTO retransmission in F-RTO, and the ACK has DSACK indicating the retransmission is spurious, the sender only calls tcp_try_undo_loss() if some never retransmisted data is sacked (FLAG_ORIG_DATA_SACKED). The correct behavior is to unconditionally call tcp_try_undo_loss so the DSACK information is used properly to undo the cwnd reduction. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26net: fix inet_getid() and ipv6_select_ident() bugsEric Dumazet
[ Upstream commit 39c36094d78c39e038c1e499b2364e13bce36f54 ] I noticed we were sending wrong IPv4 ID in TCP flows when MTU discovery is disabled. Note how GSO/TSO packets do not have monotonically incrementing ID. 06:37:41.575531 IP (id 14227, proto: TCP (6), length: 4396) 06:37:41.575534 IP (id 14272, proto: TCP (6), length: 65212) 06:37:41.575544 IP (id 14312, proto: TCP (6), length: 57972) 06:37:41.575678 IP (id 14317, proto: TCP (6), length: 7292) 06:37:41.575683 IP (id 14361, proto: TCP (6), length: 63764) It appears I introduced this bug in linux-3.1. inet_getid() must return the old value of peer->ip_id_count, not the new one. Lets revert this part, and remove the prevention of a null identification field in IPv6 Fragment Extension Header, which is dubious and not even done properly. Fixes: 87c48fa3b463 ("ipv6: make fragment identifications less predictable") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26net: tunnels - enable module autoloadingTom Gundersen
[ Upstream commit f98f89a0104454f35a62d681683c844f6dbf4043 ] Enable the module alias hookup to allow tunnel modules to be autoloaded on demand. This is in line with how most other netdev kinds work, and will allow userspace to create tunnels without having CAP_SYS_MODULE. Signed-off-by: Tom Gundersen <teg@jklm.no> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26netlink: Only check file credentials for implicit destinationsEric W. Biederman
[ Upstream commit 2d7a85f4b06e9c27ff629f07a524c48074f07f81 ] It was possible to get a setuid root or setcap executable to write to it's stdout or stderr (which has been set made a netlink socket) and inadvertently reconfigure the networking stack. To prevent this we check that both the creator of the socket and the currentl applications has permission to reconfigure the network stack. Unfortunately this breaks Zebra which always uses sendto/sendmsg and creates it's socket without any privileges. To keep Zebra working don't bother checking if the creator of the socket has privilege when a destination address is specified. Instead rely exclusively on the privileges of the sender of the socket. Note from Andy: This is exactly Eric's code except for some comment clarifications and formatting fixes. Neither I nor, I think, anyone else is thrilled with this approach, but I'm hesitant to wait on a better fix since 3.15 is almost here. Note to stable maintainers: This is a mess. An earlier series of patches in 3.15 fix a rather serious security issue (CVE-2014-0181), but they did so in a way that breaks Zebra. The offending series includes: commit aa4cf9452f469f16cea8c96283b641b4576d4a7b Author: Eric W. Biederman <ebiederm@xmission.com> Date: Wed Apr 23 14:28:03 2014 -0700 net: Add variants of capable for use on netlink messages If a given kernel version is missing that series of fixes, it's probably worth backporting it and this patch. if that series is present, then this fix is critical if you care about Zebra. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26net: Use netlink_ns_capable to verify the permisions of netlink messagesEric W. Biederman
[ Upstream commit 90f62cf30a78721641e08737bda787552428061e ] It is possible by passing a netlink socket to a more privileged executable and then to fool that executable into writing to the socket data that happens to be valid netlink message to do something that privileged executable did not intend to do. To keep this from happening replace bare capable and ns_capable calls with netlink_capable, netlink_net_calls and netlink_ns_capable calls. Which act the same as the previous calls except they verify that the opener of the socket had the desired permissions as well. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26net: Add variants of capable for use on netlink messagesEric W. Biederman
[ Upstream commit aa4cf9452f469f16cea8c96283b641b4576d4a7b ] netlink_net_capable - The common case use, for operations that are safe on a network namespace netlink_capable - For operations that are only known to be safe for the global root netlink_ns_capable - The general case of capable used to handle special cases __netlink_ns_capable - Same as netlink_ns_capable except taking a netlink_skb_parms instead of the skbuff of a netlink message. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26net: Add variants of capable for use on on socketsEric W. Biederman
[ Upstream commit a3b299da869d6e78cf42ae0b1b41797bcb8c5e4b ] sk_net_capable - The common case, operations that are safe in a network namespace. sk_capable - Operations that are not known to be safe in a network namespace sk_ns_capable - The general case for special cases. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26net: Move the permission check in sock_diag_put_filterinfo to packet_diag_dumpEric W. Biederman
[ Upstream commit a53b72c83a4216f2eb883ed45a0cbce014b8e62d ] The permission check in sock_diag_put_filterinfo is wrong, and it is so removed from it's sources it is not clear why it is wrong. Move the computation into packet_diag_dump and pass a bool of the result into sock_diag_filterinfo. This does not yet correct the capability check but instead simply moves it to make it clear what is going on. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26netlink: Rename netlink_capable netlink_allowedEric W. Biederman
[ Upstream commit 5187cd055b6e81fc6526109456f8b20623148d5f ] netlink_capable is a static internal function in af_netlink.c and we have better uses for the name netlink_capable. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-16netfilter: ipv4: defrag: set local_df flag on defragmented skbFlorian Westphal
commit 895162b1101b3ea5db08ca6822ae9672717efec0 upstream. else we may fail to forward skb even if original fragments do fit outgoing link mtu: 1. remote sends 2k packets in two 1000 byte frags, DF set 2. we want to forward but only see '2k > mtu and DF set' 3. we then send icmp error saying that outgoing link is 1500 But original sender never sent a packet that would not fit the outgoing link. Setting local_df makes outgoing path test size vs. IPCB(skb)->frag_max_size, so we will still send the correct error in case the largest original size did not fit outgoing link mtu. Reported-by: Maxime Bizon <mbizon@freebox.fr> Suggested-by: Maxime Bizon <mbizon@freebox.fr> Fixes: 5f2d04f1f9 (ipv4: fix path MTU discovery with connection tracking) Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-11netfilter: Fix potential use after free in ip6_route_me_harder()Sergey Popovich
commit a8951d5814e1373807a94f79f7ccec7041325470 upstream. Dst is released one line before we access it again with dst->error. Fixes: 58e35d147128 netfilter: ipv6: propagate routing errors from ip6_route_me_harder() Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07libceph: fix corruption when using page_count 0 page in rbdChunwei Chen
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream. It has been reported that using ZFSonLinux on rbd will result in memory corruption. The bug report can be found here: https://github.com/zfsonlinux/spl/issues/241 http://tracker.ceph.com/issues/7790 The reason is that ZFS will send pages with page_count 0 into rbd, which in turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with page_count 0, as it will do get_page and put_page, and erroneously free the page. This type of issue has been noted before, and handled in iscsi, drbd, etc. So, rbd should also handle this. This fix address this issue by fall back to slower sendmsg when page_count 0 detected. Cc: Sage Weil <sage@inktank.com> Cc: Yehuda Sadeh <yehuda@inktank.com> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07Bluetooth: Fix redundant encryption request for reauthenticationJohan Hedberg
commit 09da1f3463eb81d59685df723b1c5950b7570340 upstream. When we're performing reauthentication (in order to elevate the security level from an unauthenticated key to an authenticated one) we do not need to issue any encryption command once authentication completes. Since the trigger for the encryption HCI command is the ENCRYPT_PEND flag this flag should not be set in this scenario. Instead, the REAUTH_PEND flag takes care of all necessary steps for reauthentication. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07Bluetooth: Fix triggering BR/EDR L2CAP Connect too earlyJohan Hedberg
commit 9eb1fbfa0a737fd4d3a6d12d71c5ea9af622b887 upstream. Commit 1c2e004183178 introduced an event handler for the encryption key refresh complete event with the intent of fixing some LE/SMP cases. However, this event is shared with BR/EDR and there we actually want to act only on the auth_complete event (which comes after the key refresh). If we do not do this we may trigger an L2CAP Connect Request too early and cause the remote side to return a security block error. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07mac80211: fix on-channel remain-on-channelJohannes Berg
commit b4b177a5556a686909e643f1e9b6434c10de079f upstream. Jouni reported that if a remain-on-channel was active on the same channel as the current operating channel, then the ROC would start, but any frames transmitted using mgmt-tx on the same channel would get delayed until after the ROC. The reason for this is that the ROC starts, but doesn't have any handling for "remain on the same channel", so it stops the interface queues. The later mgmt-tx then puts the frame on the interface queues (since it's on the current operating channel) and thus they get delayed until after the ROC. To fix this, add some logic to handle remaining on the same channel specially and not stop the queues etc. in this case. This not only fixes the bug but also improves behaviour in this case as data frames etc. can continue to flow. Reported-by: Jouni Malinen <j@w1.fi> Tested-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07mac80211: fix suspend vs. authentication raceJohannes Berg
commit 1a1cb744de160ee70086a77afff605bbc275d291 upstream. Since Stanislaw's patch removing the quiescing code, mac80211 had a race regarding suspend vs. authentication: as cfg80211 doesn't track authentication attempts, it can't abort them. Therefore the attempts may be kept running while suspending, which can lead to all kinds of issues, in at least some cases causing an error in iwlmvm firmware. Fix this by aborting the authentication attempt when suspending. Cc: stable@vger.kernel.org Fixes: 12e7f517029d ("mac80211: cleanup generic suspend/resume procedures") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30net-gro: reset skb->truesize in napi_reuse_skb()Eric Dumazet
[ Upstream commit e33d0ba8047b049c9262fdb1fcafb93cb52ceceb ] Recycling skb always had been very tough... This time it appears GRO layer can accumulate skb->truesize adjustments made by drivers when they attach a fragment to skb. skb_gro_receive() can only subtract from skb->truesize the used part of a fragment. I spotted this problem seeing TcpExtPruneCalled and TcpExtTCPRcvCollapsed that were unexpected with a recent kernel, where TCP receive window should be sized properly to accept traffic coming from a driver not overshooting skb->truesize. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30ipv4: initialise the itag variable in __mkroute_inputLi RongQing
[ Upstream commit fbdc0ad095c0a299e9abf5d8ac8f58374951149a ] the value of itag is a random value from stack, and may not be initiated by fib_validate_source, which called fib_combine_itag if CONFIG_IP_ROUTE_CLASSID is not set This will make the cached dst uncertainty Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30ip6_tunnel: fix potential NULL pointer dereferenceSusant Sahani
[ Upstream commit c8965932a2e3b70197ec02c6741c29460279e2a8 ] The function ip6_tnl_validate assumes that the rtnl attribute IFLA_IPTUN_PROTO always be filled . If this attribute is not filled by the userspace application kernel get crashed with NULL pointer dereference. This patch fixes the potential kernel crash when IFLA_IPTUN_PROTO is missing . Signed-off-by: Susant Sahani <susant@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30ipv4: fib_semantics: increment fib_info_cnt after fib_info allocationSergey Popovich
[ Upstream commit aeefa1ecfc799b0ea2c4979617f14cecd5cccbfd ] Increment fib_info_cnt in fib_create_info() right after successfuly alllocating fib_info structure, overwise fib_metrics allocation failure leads to fib_info_cnt incorrectly decremented in free_fib_info(), called on error path from fib_create_info(). Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30net: ipv6: send pkttoobig immediately if orig frag size > mtuFlorian Westphal
[ Upstream commit 418a31561d594a2b636c1e2fa94ecd9e1245abb1 ] If conntrack defragments incoming ipv6 frags it stores largest original frag size in ip6cb and sets ->local_df. We must thus first test the largest original frag size vs. mtu, and not vice versa. Without this patch PKTTOOBIG is still generated in ip6_fragment() later in the stack, but 1) IPSTATS_MIB_INTOOBIGERRORS won't increment 2) packet did (needlessly) traverse netfilter postrouting hook. Fixes: fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in forwarding path") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30net: ipv4: ip_forward: fix inverted local_df testFlorian Westphal
[ Upstream commit ca6c5d4ad216d5942ae544bbf02503041bd802aa ] local_df means 'ignore DF bit if set', so if its set we're allowed to perform ip fragmentation. This wasn't noticed earlier because the output path also drops such skbs (and emits needed icmp error) and because netfilter ip defrag did not set local_df until couple of days ago. Only difference is that DF-packets-larger-than MTU now discarded earlier (f.e. we avoid pointless netfilter postrouting trip). While at it, drop the repeated test ip_exceeds_mtu, checking it once is enough... Fixes: fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in forwarding path") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30tcp_cubic: fix the range of delayed_ackLiu Yu
[ Upstream commit 0cda345d1b2201dd15591b163e3c92bad5191745 ] commit b9f47a3aaeab (tcp_cubic: limit delayed_ack ratio to prevent divide error) try to prevent divide error, but there is still a little chance that delayed_ack can reach zero. In case the param cnt get negative value, then ratio+cnt would overflow and may happen to be zero. As a result, min(ratio, ACK_RATIO_LIMIT) will calculate to be zero. In some old kernels, such as 2.6.32, there is a bug that would pass negative param, which then ultimately leads to this divide error. commit 5b35e1e6e9c (tcp: fix tcp_trim_head() to adjust segment count with skb MSS) fixed the negative param issue. However, it's safe that we fix the range of delayed_ack as well, to make sure we do not hit a divide by zero. CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Liu Yu <allanyuliu@tencent.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30sctp: reset flowi4_oif parameter on route lookupXufeng Zhang
[ Upstream commit 85350871317a5adb35519d9dc6fc9e80809d42ad ] commit 813b3b5db83 (ipv4: Use caller's on-stack flowi as-is in output route lookups.) introduces another regression which is very similar to the problem of commit e6b45241c (ipv4: reset flowi parameters on route connect) wants to fix: Before we call ip_route_output_key() in sctp_v4_get_dst() to get a dst that matches a bind address as the source address, we have already called this function previously and the flowi parameters have been initialized including flowi4_oif, so when we call this function again, the process in __ip_route_output_key() will be different because of the setting of flowi4_oif, and we'll get a networking device which corresponds to the inputted flowi4_oif as the output device, this is wrong because we'll never hit this place if the previously returned source address of dst match one of the bound addresses. To reproduce this problem, a vlan setting is enough: # ifconfig eth0 up # route del default # vconfig add eth0 2 # vconfig add eth0 3 # ifconfig eth0.2 10.0.1.14 netmask 255.255.255.0 # route add default gw 10.0.1.254 dev eth0.2 # ifconfig eth0.3 10.0.0.14 netmask 255.255.255.0 # ip rule add from 10.0.0.14 table 4 # ip route add table 4 default via 10.0.0.254 src 10.0.0.14 dev eth0.3 # sctp_darn -H 10.0.0.14 -P 36422 -h 10.1.4.134 -p 36422 -s -I You'll detect that all the flow are routed to eth0.2(10.0.1.254). Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30bridge: Handle IFLA_ADDRESS correctly when creating bridge deviceToshiaki Makita
[ Upstream commit 30313a3d5794472c3548d7288e306a5492030370 ] When bridge device is created with IFLA_ADDRESS, we are not calling br_stp_change_bridge_id(), which leads to incorrect local fdb management and bridge id calculation, and prevents us from receiving frames on the bridge device. Reported-by: Tom Gundersen <teg@jklm.no> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30ipv6: fib: fix fib dump restartKumar Sundararajan
[ Upstream commit 1c2658545816088477e91860c3a645053719cb54 ] When the ipv6 fib changes during a table dump, the walk is restarted and the number of nodes dumped are skipped. But the existing code doesn't advance to the next node after a node is skipped. This can cause the dump to loop or produce lots of duplicates when the fib is modified during the dump. This change advances the walk to the next node if the current node is skipped after a restart. Signed-off-by: Kumar Sundararajan <kumar@fb.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30rtnetlink: Only supply IFLA_VF_PORTS information when RTEXT_FILTER_VF is setDavid Gibson
[ Upstream commit c53864fd60227de025cb79e05493b13f69843971 ] Since 115c9b81928360d769a76c632bae62d15206a94a (rtnetlink: Fix problem with buffer allocation), RTM_NEWLINK messages only contain the IFLA_VFINFO_LIST attribute if they were solicited by a GETLINK message containing an IFLA_EXT_MASK attribute with the RTEXT_FILTER_VF flag. That was done because some user programs broke when they received more data than expected - because IFLA_VFINFO_LIST contains information for each VF it can become large if there are many VFs. However, the IFLA_VF_PORTS attribute, supplied for devices which implement ndo_get_vf_port (currently the 'enic' driver only), has the same problem. It supplies per-VF information and can therefore become large, but it is not currently conditional on the IFLA_EXT_MASK value. Worse, it interacts badly with the existing EXT_MASK handling. When IFLA_EXT_MASK is not supplied, the buffer for netlink replies is fixed at NLMSG_GOODSIZE. If the information for IFLA_VF_PORTS exceeds this, then rtnl_fill_ifinfo() returns -EMSGSIZE on the first message in a packet. netlink_dump() will misinterpret this as having finished the listing and omit data for this interface and all subsequent ones. That can cause getifaddrs(3) to enter an infinite loop. This patch addresses the problem by only supplying IFLA_VF_PORTS when IFLA_EXT_MASK is supplied with the RTEXT_FILTER_VF flag set. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30rtnetlink: Warn when interface's information won't fit in our packetDavid Gibson
[ Upstream commit 973462bbde79bb827824c73b59027a0aed5c9ca6 ] Without IFLA_EXT_MASK specified, the information reported for a single interface in response to RTM_GETLINK is expected to fit within a netlink packet of NLMSG_GOODSIZE. If it doesn't, however, things will go badly wrong, When listing all interfaces, netlink_dump() will incorrectly treat -EMSGSIZE on the first message in a packet as the end of the listing and omit information for that interface and all subsequent ones. This can cause getifaddrs(3) to enter an infinite loop. This patch won't fix the problem, but it will WARN_ON() making it easier to track down what's going wrong. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30net: Fix ns_capable check in sock_diag_put_filterinfoAndrew Lutomirski
[ Upstream commit 78541c1dc60b65ecfce5a6a096fc260219d6784e ] The caller needs capabilities on the namespace being queried, not on their own namespace. This is a security bug, although it likely has only a minor impact. Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30net: sctp: cache auth_enable per endpointVlad Yasevich
[ Upstream commit b14878ccb7fac0242db82720b784ab62c467c0dc ] Currently, it is possible to create an SCTP socket, then switch auth_enable via sysctl setting to 1 and crash the system on connect: Oops[#1]: CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1 task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000 [...] Call Trace: [<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80 [<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4 [<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c [<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8 [<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214 [<ffffffff8043af68>] sctp_rcv+0x588/0x630 [<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24 [<ffffffff803acb50>] ip6_input+0x2c0/0x440 [<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564 [<ffffffff80310650>] process_backlog+0xb4/0x18c [<ffffffff80313cbc>] net_rx_action+0x12c/0x210 [<ffffffff80034254>] __do_softirq+0x17c/0x2ac [<ffffffff800345e0>] irq_exit+0x54/0xb0 [<ffffffff800075a4>] ret_from_irq+0x0/0x4 [<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48 [<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148 [<ffffffff805a88b0>] start_kernel+0x37c/0x398 Code: dd0900b8 000330f8 0126302d <dcc60000> 50c0fff1 0047182a a48306a0 03e00008 00000000 ---[ end trace b530b0551467f2fd ]--- Kernel panic - not syncing: Fatal exception in interrupt What happens while auth_enable=0 in that case is, that ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs() when endpoint is being created. After that point, if an admin switches over to auth_enable=1, the machine can crash due to NULL pointer dereference during reception of an INIT chunk. When we enter sctp_process_init() via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk, the INIT verification succeeds and while we walk and process all INIT params via sctp_process_param() we find that net->sctp.auth_enable is set, therefore do not fall through, but invoke sctp_auth_asoc_set_default_hmac() instead, and thus, dereference what we have set to NULL during endpoint initialization phase. The fix is to make auth_enable immutable by caching its value during endpoint initialization, so that its original value is being carried along until destruction. The bug seems to originate from the very first days. Fix in joint work with Daniel Borkmann. Reported-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30vlan: Fix lockdep warning when vlan dev handle notificationdingtianhong
[ Upstream commit dc8eaaa006350d24030502a4521542e74b5cb39f ] When I open the LOCKDEP config and run these steps: modprobe 8021q vconfig add eth2 20 vconfig add eth2.20 30 ifconfig eth2 xx.xx.xx.xx then the Call Trace happened: [32524.386288] ============================================= [32524.386293] [ INFO: possible recursive locking detected ] [32524.386298] 3.14.0-rc2-0.7-default+ #35 Tainted: G O [32524.386302] --------------------------------------------- [32524.386306] ifconfig/3103 is trying to acquire lock: [32524.386310] (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff814275f4>] dev_mc_sync+0x64/0xb0 [32524.386326] [32524.386326] but task is already holding lock: [32524.386330] (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff8141af83>] dev_set_rx_mode+0x23/0x40 [32524.386341] [32524.386341] other info that might help us debug this: [32524.386345] Possible unsafe locking scenario: [32524.386345] [32524.386350] CPU0 [32524.386352] ---- [32524.386354] lock(&vlan_netdev_addr_lock_key/1); [32524.386359] lock(&vlan_netdev_addr_lock_key/1); [32524.386364] [32524.386364] *** DEADLOCK *** [32524.386364] [32524.386368] May be due to missing lock nesting notation [32524.386368] [32524.386373] 2 locks held by ifconfig/3103: [32524.386376] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff81431d42>] rtnl_lock+0x12/0x20 [32524.386387] #1: (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff8141af83>] dev_set_rx_mode+0x23/0x40 [32524.386398] [32524.386398] stack backtrace: [32524.386403] CPU: 1 PID: 3103 Comm: ifconfig Tainted: G O 3.14.0-rc2-0.7-default+ #35 [32524.386409] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [32524.386414] ffffffff81ffae40 ffff8800d9625ae8 ffffffff814f68a2 ffff8800d9625bc8 [32524.386421] ffffffff810a35fb ffff8800d8a8d9d0 00000000d9625b28 ffff8800d8a8e5d0 [32524.386428] 000003cc00000000 0000000000000002 ffff8800d8a8e5f8 0000000000000000 [32524.386435] Call Trace: [32524.386441] [<ffffffff814f68a2>] dump_stack+0x6a/0x78 [32524.386448] [<ffffffff810a35fb>] __lock_acquire+0x7ab/0x1940 [32524.386454] [<ffffffff810a323a>] ? __lock_acquire+0x3ea/0x1940 [32524.386459] [<ffffffff810a4874>] lock_acquire+0xe4/0x110 [32524.386464] [<ffffffff814275f4>] ? dev_mc_sync+0x64/0xb0 [32524.386471] [<ffffffff814fc07a>] _raw_spin_lock_nested+0x2a/0x40 [32524.386476] [<ffffffff814275f4>] ? dev_mc_sync+0x64/0xb0 [32524.386481] [<ffffffff814275f4>] dev_mc_sync+0x64/0xb0 [32524.386489] [<ffffffffa0500cab>] vlan_dev_set_rx_mode+0x2b/0x50 [8021q] [32524.386495] [<ffffffff8141addf>] __dev_set_rx_mode+0x5f/0xb0 [32524.386500] [<ffffffff8141af8b>] dev_set_rx_mode+0x2b/0x40 [32524.386506] [<ffffffff8141b3cf>] __dev_open+0xef/0x150 [32524.386511] [<ffffffff8141b177>] __dev_change_flags+0xa7/0x190 [32524.386516] [<ffffffff8141b292>] dev_change_flags+0x32/0x80 [32524.386524] [<ffffffff8149ca56>] devinet_ioctl+0x7d6/0x830 [32524.386532] [<ffffffff81437b0b>] ? dev_ioctl+0x34b/0x660 [32524.386540] [<ffffffff814a05b0>] inet_ioctl+0x80/0xa0 [32524.386550] [<ffffffff8140199d>] sock_do_ioctl+0x2d/0x60 [32524.386558] [<ffffffff81401a52>] sock_ioctl+0x82/0x2a0 [32524.386568] [<ffffffff811a7123>] do_vfs_ioctl+0x93/0x590 [32524.386578] [<ffffffff811b2705>] ? rcu_read_lock_held+0x45/0x50 [32524.386586] [<ffffffff811b39e5>] ? __fget_light+0x105/0x110 [32524.386594] [<ffffffff811a76b1>] SyS_ioctl+0x91/0xb0 [32524.386604] [<ffffffff815057e2>] system_call_fastpath+0x16/0x1b ======================================================================== The reason is that all of the addr_lock_key for vlan dev have the same class, so if we change the status for vlan dev, the vlan dev and its real dev will hold the same class of addr_lock_key together, so the warning happened. we should distinguish the lock depth for vlan dev and its real dev. v1->v2: Convert the vlan_netdev_addr_lock_key to an array of eight elements, which could support to add 8 vlan id on a same vlan dev, I think it is enough for current scene, because a netdev's name is limited to IFNAMSIZ which could not hold 8 vlan id, and the vlan dev would not meet the same class key with its real dev. The new function vlan_dev_get_lockdep_subkey() will return the subkey and make the vlan dev could get a suitable class key. v2->v3: According David's suggestion, I use the subclass to distinguish the lock key for vlan dev and its real dev, but it make no sense, because the difference for subclass in the lock_class_key doesn't mean that the difference class for lock_key, so I use lock_depth to distinguish the different depth for every vlan dev, the same depth of the vlan dev could have the same lock_class_key, I import the MAX_LOCK_DEPTH from the include/linux/sched.h, I think it is enough here, the lockdep should never exceed that value. v3->v4: Add a huge array of locking keys will waste static kernel memory and is not a appropriate method, we could use _nested() variants to fix the problem, calculate the depth for every vlan dev, and use the depth as the subclass for addr_lock_key. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30ip6_gre: don't allow to remove the fb_tunnel_devNicolas Dichtel
[ Upstream commit 54d63f787b652755e66eb4dd8892ee6d3f5197fc ] It's possible to remove the FB tunnel with the command 'ip link del ip6gre0' but this is unsafe, the module always supposes that this device exists. For example, ip6gre_tunnel_lookup() may use it unconditionally. Let's add a rtnl handler for dellink, which will never remove the FB tunnel (we let ip6gre_destroy_tunnels() do the job). Introduced by commit c12b395a4664 ("gre: Support GRE over IPv6"). CC: Dmitry Kozlov <xeb@mail.ru> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30filter: prevent nla extensions to peek beyond the end of the messageMathias Krause
[ Upstream commit 05ab8f2647e4221cbdb3856dd7d32bd5407316b3 ] The BPF_S_ANC_NLATTR and BPF_S_ANC_NLATTR_NEST extensions fail to check for a minimal message length before testing the supplied offset to be within the bounds of the message. This allows the subtraction of the nla header to underflow and therefore -- as the data type is unsigned -- allowing far to big offset and length values for the search of the netlink attribute. The remainder calculation for the BPF_S_ANC_NLATTR_NEST extension is also wrong. It has the minuend and subtrahend mixed up, therefore calculates a huge length value, allowing to overrun the end of the message while looking for the netlink attribute. The following three BPF snippets will trigger the bugs when attached to a UNIX datagram socket and parsing a message with length 1, 2 or 3. ,-[ PoC for missing size check in BPF_S_ANC_NLATTR ]-- | ld #0x87654321 | ldx #42 | ld #nla | ret a `--- ,-[ PoC for the same bug in BPF_S_ANC_NLATTR_NEST ]-- | ld #0x87654321 | ldx #42 | ld #nlan | ret a `--- ,-[ PoC for wrong remainder calculation in BPF_S_ANC_NLATTR_NEST ]-- | ; (needs a fake netlink header at offset 0) | ld #0 | ldx #42 | ld #nlan | ret a `--- Fix the first issue by ensuring the message length fulfills the minimal size constrains of a nla header. Fix the second bug by getting the math for the remainder calculation right. Fixes: 4738c1db15 ("[SKFILTER]: Add SKF_ADF_NLATTR instruction") Fixes: d214c7537b ("filter: add SKF_AD_NLATTR_NEST to look for nested..") Cc: Patrick McHardy <kaber@trash.net> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30ipv4: return valid RTA_IIF on ip route getJulian Anastasov
[ Upstream commit 91146153da2feab18efab2e13b0945b6bb704ded ] Extend commit 13378cad02afc2adc6c0e07fca03903c7ada0b37 ("ipv4: Change rt->rt_iif encoding.") from 3.6 to return valid RTA_IIF on 'ip route get ... iif DEVICE' instead of rt_iif 0 which is displayed as 'iif *'. inet_iif is not appropriate to use because skb_iif is not set. Use the skb->dev->ifindex instead. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30net: ipv4: current group_info should be put after using.Wang, Xiaoming
[ Upstream commit b04c46190219a4f845e46a459e3102137b7f6cac ] Plug a group_info refcount leak in ping_init. group_info is only needed during initialization and the code failed to release the reference on exit. While here move grabbing the reference to a place where it is actually needed. Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> Signed-off-by: Zhang Dongxing <dongxing.zhang@intel.com> Signed-off-by: xiaoming wang <xiaoming.wang@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30vti: don't allow to add the same tunnel twiceNicolas Dichtel
[ Upstream commit 8d89dcdf80d88007647945a753821a06eb6cc5a5 ] Before the patch, it was possible to add two times the same tunnel: ip l a vti1 type vti remote 10.16.0.121 local 10.16.0.249 key 41 ip l a vti2 type vti remote 10.16.0.121 local 10.16.0.249 key 41 It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the argument dev->type, which was set only later (when calling ndo_init handler in register_netdevice()). Let's set this type in the setup handler, which is called before newlink handler. Introduced by commit b9959fd3b0fa ("vti: switch to new ip tunnel code"). CC: Cong Wang <amwang@redhat.com> CC: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30gre: don't allow to add the same tunnel twiceNicolas Dichtel
[ Upstream commit 5a4552752d8f7f4cef1d98775ece7adb7616fde2 ] Before the patch, it was possible to add two times the same tunnel: ip l a gre1 type gre remote 10.16.0.121 local 10.16.0.249 ip l a gre2 type gre remote 10.16.0.121 local 10.16.0.249 It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the argument dev->type, which was set only later (when calling ndo_init handler in register_netdevice()). Let's set this type in the setup handler, which is called before newlink handler. Introduced by commit c54419321455 ("GRE: Refactor GRE tunneling code."). CC: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30ipv6: Limit mtu to 65575 bytesEric Dumazet
[ Upstream commit 30f78d8ebf7f514801e71b88a10c948275168518 ] Francois reported that setting big mtu on loopback device could prevent tcp sessions making progress. We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets. We must limit the IPv6 MTU to (65535 + 40) bytes in theory. Tested: ifconfig lo mtu 70000 netperf -H ::1 Before patch : Throughput : 0.05 Mbits After patch : Throughput : 35484 Mbits Reported-by: Francois WELLENREITER <f.wellenreiter@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30bridge: Fix double free and memory leak around br_allowed_ingressToshiaki Makita
[ Upstream commit eb7076182d1ae4bc4641534134ed707100d76acc ] br_allowed_ingress() has two problems. 1. If br_allowed_ingress() is called by br_handle_frame_finish() and vlan_untag() in br_allowed_ingress() fails, skb will be freed by both vlan_untag() and br_handle_frame_finish(). 2. If br_allowed_ingress() is called by br_dev_xmit() and br_allowed_ingress() fails, the skb will not be freed. Fix these two problems by freeing the skb in br_allowed_ingress() if it fails. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>