summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2014-07-28ipv4: fix buffer overflow in ip_options_compile()Eric Dumazet
[ Upstream commit 10ec9472f05b45c94db3c854d22581a20b97db41 ] There is a benign buffer overflow in ip_options_compile spotted by AddressSanitizer[1] : Its benign because we always can access one extra byte in skb->head (because header is followed by struct skb_shared_info), and in this case this byte is not even used. [28504.910798] ================================================================== [28504.912046] AddressSanitizer: heap-buffer-overflow in ip_options_compile [28504.913170] Read of size 1 by thread T15843: [28504.914026] [<ffffffff81802f91>] ip_options_compile+0x121/0x9c0 [28504.915394] [<ffffffff81804a0d>] ip_options_get_from_user+0xad/0x120 [28504.916843] [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630 [28504.918175] [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0 [28504.919490] [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90 [28504.920835] [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70 [28504.922208] [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140 [28504.923459] [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b [28504.924722] [28504.925106] Allocated by thread T15843: [28504.925815] [<ffffffff81804995>] ip_options_get_from_user+0x35/0x120 [28504.926884] [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630 [28504.927975] [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0 [28504.929175] [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90 [28504.930400] [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70 [28504.931677] [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140 [28504.932851] [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b [28504.934018] [28504.934377] The buggy address ffff880026382828 is located 0 bytes to the right [28504.934377] of 40-byte region [ffff880026382800, ffff880026382828) [28504.937144] [28504.937474] Memory state around the buggy address: [28504.938430] ffff880026382300: ........ rrrrrrrr rrrrrrrr rrrrrrrr [28504.939884] ffff880026382400: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.941294] ffff880026382500: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr [28504.942504] ffff880026382600: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.943483] ffff880026382700: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.944511] >ffff880026382800: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr [28504.945573] ^ [28504.946277] ffff880026382900: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.094949] ffff880026382a00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.096114] ffff880026382b00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.097116] ffff880026382c00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.098472] ffff880026382d00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.099804] Legend: [28505.100269] f - 8 freed bytes [28505.100884] r - 8 redzone bytes [28505.101649] . - 8 allocated bytes [28505.102406] x=1..7 - x allocated bytes + (8-x) redzone bytes [28505.103637] ================================================================== [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28dns_resolver: Null-terminate the right stringBen Hutchings
[ Upstream commit 640d7efe4c08f06c4ae5d31b79bd8740e7f6790a ] *_result[len] is parsed as *(_result[len]) which is not at all what we want to touch here. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: 84a7c0b1db1c ("dns_resolver: assure that dns_query() result is null-terminated") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28dns_resolver: assure that dns_query() result is null-terminatedManuel Schölling
[ Upstream commit 84a7c0b1db1c17d5ded8d3800228a608e1070b40 ] dns_query() credulously assumes that keys are null-terminated and returns a copy of a memory block that is off by one. Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28net: sctp: fix information leaks in ulpevent layerDaniel Borkmann
[ Upstream commit 8f2e5ae40ec193bc0a0ed99e95315c3eebca84ea ] While working on some other SCTP code, I noticed that some structures shared with user space are leaking uninitialized stack or heap buffer. In particular, struct sctp_sndrcvinfo has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when putting this into cmsg. But also struct sctp_remote_error contains a 2 bytes hole that we don't fill but place into a skb through skb_copy_expand() via sctp_ulpevent_make_remote_error(). Both structures are defined by the IETF in RFC6458: * Section 5.3.2. SCTP Header Information Structure: The sctp_sndrcvinfo structure is defined below: struct sctp_sndrcvinfo { uint16_t sinfo_stream; uint16_t sinfo_ssn; uint16_t sinfo_flags; <-- 2 bytes hole --> uint32_t sinfo_ppid; uint32_t sinfo_context; uint32_t sinfo_timetolive; uint32_t sinfo_tsn; uint32_t sinfo_cumtsn; sctp_assoc_t sinfo_assoc_id; }; * 6.1.3. SCTP_REMOTE_ERROR: A remote peer may send an Operation Error message to its peer. This message indicates a variety of error conditions on an association. The entire ERROR chunk as it appears on the wire is included in an SCTP_REMOTE_ERROR event. Please refer to the SCTP specification [RFC4960] and any extensions for a list of possible error formats. An SCTP error notification has the following format: struct sctp_remote_error { uint16_t sre_type; uint16_t sre_flags; uint32_t sre_length; uint16_t sre_error; <-- 2 bytes hole --> sctp_assoc_t sre_assoc_id; uint8_t sre_data[]; }; Fix this by setting both to 0 before filling them out. We also have other structures shared between user and kernel space in SCTP that contains holes (e.g. struct sctp_paddrthlds), but we copy that buffer over from user space first and thus don't need to care about it in that cases. While at it, we can also remove lengthy comments copied from the draft, instead, we update the comment with the correct RFC number where one can look it up. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28tipc: clear 'next'-pointer of message fragments before reassemblyJon Paul Maloy
[ Upstream commit 999417549c16dd0e3a382aa9f6ae61688db03181 ] If the 'next' pointer of the last fragment buffer in a message is not zeroed before reassembly, we risk ending up with a corrupt message, since the reassembly function itself isn't doing this. Currently, when a buffer is retrieved from the deferred queue of the broadcast link, the next pointer is not cleared, with the result as described above. This commit corrects this, and thereby fixes a bug that may occur when long broadcast messages are transmitted across dual interfaces. The bug has been present since 40ba3cdf542a469aaa9083fa041656e59b109b90 ("tipc: message reassembly using fragment chain") This commit should be applied to both net and net-next. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28netlink: Fix handling of error from netlink_dump().Ben Pfaff
[ Upstream commit ac30ef832e6af0505b6f0251a6659adcfa74975e ] netlink_dump() returns a negative errno value on error. Until now, netlink_recvmsg() directly recorded that negative value in sk->sk_err, but that's wrong since sk_err takes positive errno values. (This manifests as userspace receiving a positive return value from the recv() system call, falsely indicating success.) This bug was introduced in the commit that started checking the netlink_dump() return value, commit b44d211 (netlink: handle errors from netlink_dump()). Multithreaded Netlink dumps are one way to trigger this behavior in practice, as described in the commit message for the userspace workaround posted here: http://openvswitch.org/pipermail/dev/2014-June/042339.html This commit also fixes the same bug in netlink_poll(), introduced in commit cd1df525d (netlink: add flow control for memory mapped I/O). Signed-off-by: Ben Pfaff <blp@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28appletalk: Fix socket referencing in skbAndrey Utkin
[ Upstream commit 36beddc272c111689f3042bf3d10a64d8a805f93 ] Setting just skb->sk without taking its reference and setting a destructor is invalid. However, in the places where this was done, skb is used in a way not requiring skb->sk setting. So dropping the setting of skb->sk. Thanks to Eric Dumazet <eric.dumazet@gmail.com> for correct solution. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79441 Reported-by: Ed Martin <edman007@edman007.com> Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28tcp: fix false undo corner casesYuchung Cheng
[ Upstream commit 6e08d5e3c8236e7484229e46fdf92006e1dd4c49 ] The undo code assumes that, upon entering loss recovery, TCP 1) always retransmit something 2) the retransmission never fails locally (e.g., qdisc drop) so undo_marker is set in tcp_enter_recovery() and undo_retrans is incremented only when tcp_retransmit_skb() is successful. When the assumption is broken because TCP's cwnd is too small to retransmit or the retransmit fails locally. The next (DUP)ACK would incorrectly revert the cwnd and the congestion state in tcp_try_undo_dsack() or tcp_may_undo(). Subsequent (DUP)ACKs may enter the recovery state. The sender repeatedly enter and (incorrectly) exit recovery states if the retransmits continue to fail locally while receiving (DUP)ACKs. The fix is to initialize undo_retrans to -1 and start counting on the first retransmission. Always increment undo_retrans even if the retransmissions fail locally because they couldn't cause DSACKs to undo the cwnd reduction. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28igmp: fix the problem when mc leave groupdingtianhong
[ Upstream commit 52ad353a5344f1f700c5b777175bdfa41d3cd65a ] The problem was triggered by these steps: 1) create socket, bind and then setsockopt for add mc group. mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37"); mreq.imr_interface.s_addr = inet_addr("192.168.1.2"); setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq)); 2) drop the mc group for this socket. mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37"); mreq.imr_interface.s_addr = inet_addr("0.0.0.0"); setsockopt(sockfd, IPPROTO_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq)); 3) and then drop the socket, I found the mc group was still used by the dev: netstat -g Interface RefCnt Group --------------- ------ --------------------- eth2 1 255.0.0.37 Normally even though the IP_DROP_MEMBERSHIP return error, the mc group still need to be released for the netdev when drop the socket, but this process was broken when route default is NULL, the reason is that: The ip_mc_leave_group() will choose the in_dev by the imr_interface.s_addr, if input addr is NULL, the default route dev will be chosen, then the ifindex is got from the dev, then polling the inet->mc_list and return -ENODEV, but if the default route dev is NULL, the in_dev and ifIndex is both NULL, when polling the inet->mc_list, the mc group will be released from the mc_list, but the dev didn't dec the refcnt for this mc group, so when dropping the socket, the mc_list is NULL and the dev still keep this group. v1->v2: According Hideaki's suggestion, we should align with IPv6 (RFC3493) and BSDs, so I add the checking for the in_dev before polling the mc_list, make sure when we remove the mc group, dec the refcnt to the real dev which was using the mc address. The problem would never happened again. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28ipv4: icmp: Fix pMTU handling for rare caseEdward Allcutt
[ Upstream commit 68b7107b62983f2cff0948292429d5f5999df096 ] Some older router implementations still send Fragmentation Needed errors with the Next-Hop MTU field set to zero. This is explicitly described as an eventuality that hosts must deal with by the standard (RFC 1191) since older standards specified that those bits must be zero. Linux had a generic (for all of IPv4) implementation of the algorithm described in the RFC for searching a list of MTU plateaus for a good value. Commit 46517008e116 ("ipv4: Kill ip_rt_frag_needed().") removed this as part of the changes to remove the routing cache. Subsequently any Fragmentation Needed packet with a zero Next-Hop MTU has been discarded without being passed to the per-protocol handlers or notifying userspace for raw sockets. When there is a router which does not implement RFC 1191 on an MTU limited path then this results in stalled connections since large packets are discarded and the local protocols are not notified so they never attempt to lower the pMTU. One example I have seen is an OpenBSD router terminating IPSec tunnels. It's worth pointing out that this case is distinct from the BSD 4.2 bug which incorrectly calculated the Next-Hop MTU since the commit in question dismissed that as a valid concern. All of the per-protocols handlers implement the simple approach from RFC 1191 of immediately falling back to the minimum value. Although this is sub-optimal it is vastly preferable to connections hanging indefinitely. Remove the Next-Hop MTU != 0 check and allow such packets to follow the normal path. Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().") Signed-off-by: Edward Allcutt <edward.allcutt@openmarket.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28tcp: Fix divide by zero when pushing during tcp-repairChristoph Paasch
[ Upstream commit 5924f17a8a30c2ae18d034a86ee7581b34accef6 ] When in repair-mode and TCP_RECV_QUEUE is set, we end up calling tcp_push with mss_now being 0. If data is in the send-queue and tcp_set_skb_tso_segs gets called, we crash because it will divide by mss_now: [ 347.151939] divide error: 0000 [#1] SMP [ 347.152907] Modules linked in: [ 347.152907] CPU: 1 PID: 1123 Comm: packetdrill Not tainted 3.16.0-rc2 #4 [ 347.152907] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 347.152907] task: f5b88540 ti: f3c82000 task.ti: f3c82000 [ 347.152907] EIP: 0060:[<c1601359>] EFLAGS: 00210246 CPU: 1 [ 347.152907] EIP is at tcp_set_skb_tso_segs+0x49/0xa0 [ 347.152907] EAX: 00000b67 EBX: f5acd080 ECX: 00000000 EDX: 00000000 [ 347.152907] ESI: f5a28f40 EDI: f3c88f00 EBP: f3c83d10 ESP: f3c83d00 [ 347.152907] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 347.152907] CR0: 80050033 CR2: 083158b0 CR3: 35146000 CR4: 000006b0 [ 347.152907] Stack: [ 347.152907] c167f9d9 f5acd080 000005b4 00000002 f3c83d20 c16013e6 f3c88f00 f5acd080 [ 347.152907] f3c83da0 c1603b5a f3c83d38 c10a0188 00000000 00000000 f3c83d84 c10acc85 [ 347.152907] c1ad5ec0 00000000 00000000 c1ad679c 010003e0 00000000 00000000 f3c88fc8 [ 347.152907] Call Trace: [ 347.152907] [<c167f9d9>] ? apic_timer_interrupt+0x2d/0x34 [ 347.152907] [<c16013e6>] tcp_init_tso_segs+0x36/0x50 [ 347.152907] [<c1603b5a>] tcp_write_xmit+0x7a/0xbf0 [ 347.152907] [<c10a0188>] ? up+0x28/0x40 [ 347.152907] [<c10acc85>] ? console_unlock+0x295/0x480 [ 347.152907] [<c10ad24f>] ? vprintk_emit+0x1ef/0x4b0 [ 347.152907] [<c1605716>] __tcp_push_pending_frames+0x36/0xd0 [ 347.152907] [<c15f4860>] tcp_push+0xf0/0x120 [ 347.152907] [<c15f7641>] tcp_sendmsg+0xf1/0xbf0 [ 347.152907] [<c116d920>] ? kmem_cache_free+0xf0/0x120 [ 347.152907] [<c106a682>] ? __sigqueue_free+0x32/0x40 [ 347.152907] [<c106a682>] ? __sigqueue_free+0x32/0x40 [ 347.152907] [<c114f0f0>] ? do_wp_page+0x3e0/0x850 [ 347.152907] [<c161c36a>] inet_sendmsg+0x4a/0xb0 [ 347.152907] [<c1150269>] ? handle_mm_fault+0x709/0xfb0 [ 347.152907] [<c15a006b>] sock_aio_write+0xbb/0xd0 [ 347.152907] [<c1180b79>] do_sync_write+0x69/0xa0 [ 347.152907] [<c1181023>] vfs_write+0x123/0x160 [ 347.152907] [<c1181d55>] SyS_write+0x55/0xb0 [ 347.152907] [<c167f0d8>] sysenter_do_call+0x12/0x28 This can easily be reproduced with the following packetdrill-script (the "magic" with netem, sk_pacing and limit_output_bytes is done to prevent the kernel from pushing all segments, because hitting the limit without doing this is not so easy with packetdrill): 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 32792 <mss 1460> +0 > S. 0:0(0) ack 1 <mss 1460> +0.1 < . 1:1(0) ack 1 win 65000 +0 accept(3, ..., ...) = 4 // This forces that not all segments of the snd-queue will be pushed +0 `tc qdisc add dev tun0 root netem delay 10ms` +0 `sysctl -w net.ipv4.tcp_limit_output_bytes=2` +0 setsockopt(4, SOL_SOCKET, 47, [2], 4) = 0 +0 write(4,...,10000) = 10000 +0 write(4,...,10000) = 10000 // Set tcp-repair stuff, particularly TCP_RECV_QUEUE +0 setsockopt(4, SOL_TCP, 19, [1], 4) = 0 +0 setsockopt(4, SOL_TCP, 20, [1], 4) = 0 // This now will make the write push the remaining segments +0 setsockopt(4, SOL_SOCKET, 47, [20000], 4) = 0 +0 `sysctl -w net.ipv4.tcp_limit_output_bytes=130000` // Now we will crash +0 write(4,...,1000) = 1000 This happens since ec3423257508 (tcp: fix retransmission in repair mode). Prior to that, the call to tcp_push was prevented by a check for tp->repair. The patch fixes it, by adding the new goto-label out_nopush. When exiting tcp_sendmsg and a push is not required, which is the case for tp->repair, we go to this label. When repairing and calling send() with TCP_RECV_QUEUE, the data is actually put in the receive-queue. So, no push is required because no data has been added to the send-queue. Cc: Andrew Vagin <avagin@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Fixes: ec3423257508 (tcp: fix retransmission in repair mode) Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Acked-by: Andrew Vagin <avagin@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fixEric Dumazet
[ Upstream commit 7f502361531e9eecb396cf99bdc9e9a59f7ebd7f ] We have two different ways to handle changes to sk->sk_dst First way (used by TCP) assumes socket lock is owned by caller, and use no extra lock : __sk_dst_set() & __sk_dst_reset() Another way (used by UDP) uses sk_dst_lock because socket lock is not always taken. Note that sk_dst_lock is not softirq safe. These ways are not inter changeable for a given socket type. ipv4_sk_update_pmtu(), added in linux-3.8, added a race, as it used the socket lock as synchronization, but users might be UDP sockets. Instead of converting sk_dst_lock to a softirq safe version, use xchg() as we did for sk_rx_dst in commit e47eb5dfb296b ("udp: ipv4: do not use sk_dst_lock from softirq context") In a follow up patch, we probably can remove sk_dst_lock, as it is only used in IPv6. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Fixes: 9cb3a50c5f63e ("ipv4: Invalidate the socket cached route on pmtu events if possible") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28ipv4: fix dst race in sk_dst_get()Eric Dumazet
[ Upstream commit f88649721268999bdff09777847080a52004f691 ] When IP route cache had been removed in linux-3.6, we broke assumption that dst entries were all freed after rcu grace period. DST_NOCACHE dst were supposed to be freed from dst_release(). But it appears we want to keep such dst around, either in UDP sockets or tunnels. In sk_dst_get() we need to make sure dst refcount is not 0 before incrementing it, or else we might end up freeing a dst twice. DST_NOCACHE set on a dst does not mean this dst can not be attached to a socket or a tunnel. Then, before actual freeing, we need to observe a rcu grace period to make sure all other cpus can catch the fact the dst is no longer usable. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dormando <dormando@rydia.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-288021q: fix a potential memory leakLi RongQing
[ Upstream commit 916c1689a09bc1ca81f2d7a34876f8d35aadd11b ] skb_cow called in vlan_reorder_header does not free the skb when it failed, and vlan_reorder_header returns NULL to reset original skb when it is called in vlan_untag, lead to a memory leak. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28net: sctp: check proc_dointvec result in proc_sctp_do_authDaniel Borkmann
[ Upstream commit 24599e61b7552673dd85971cf5a35369cd8c119e ] When writing to the sysctl field net.sctp.auth_enable, it can well be that the user buffer we handed over to proc_dointvec() via proc_sctp_do_auth() handler contains something other than integers. In that case, we would set an uninitialized 4-byte value from the stack to net->sctp.auth_enable that can be leaked back when reading the sysctl variable, and it can unintentionally turn auth_enable on/off based on the stack content since auth_enable is interpreted as a boolean. Fix it up by making sure proc_dointvec() returned sucessfully. Fixes: b14878ccb7fa ("net: sctp: cache auth_enable per endpoint") Reported-by: Florian Westphal <fwestpha@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skbNeal Cardwell
[ Upstream commit 2cd0d743b05e87445c54ca124a9916f22f16742e ] If there is an MSS change (or misbehaving receiver) that causes a SACK to arrive that covers the end of an skb but is less than one MSS, then tcp_match_skb_to_sack() was rounding up pkt_len to the full length of the skb ("Round if necessary..."), then chopping all bytes off the skb and creating a zero-byte skb in the write queue. This was visible now because the recently simplified TLP logic in bef1909ee3ed1c ("tcp: fixing TLP's FIN recovery") could find that 0-byte skb at the end of the write queue, and now that we do not check that skb's length we could send it as a TLP probe. Consider the following example scenario: mss: 1000 skb: seq: 0 end_seq: 4000 len: 4000 SACK: start_seq: 3999 end_seq: 4000 The tcp_match_skb_to_sack() code will compute: in_sack = false pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999 new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000 new_len += mss = 4000 Previously we would find the new_len > skb->len check failing, so we would fall through and set pkt_len = new_len = 4000 and chop off pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment afterward in the write queue. With this new commit, we notice that the new new_len >= skb->len check succeeds, so that we return without trying to fragment. Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28ip_tunnel: fix ip_tunnel_lookupDmitry Popov
[ Upstream commit e0056593b61253f1a8a9941dacda22e73b963cdc ] This patch fixes 3 similar bugs where incoming packets might be routed into wrong non-wildcard tunnels: 1) Consider the following setup: ip address add 1.1.1.1/24 dev eth0 ip address add 1.1.1.2/24 dev eth0 ip tunnel add ipip1 remote 2.2.2.2 local 1.1.1.1 mode ipip dev eth0 ip link set ipip1 up Incoming ipip packets from 2.2.2.2 were routed into ipip1 even if it has dst = 1.1.1.2. Moreover even if there was wildcard tunnel like ip tunnel add ipip0 remote 2.2.2.2 local any mode ipip dev eth0 but it was created before explicit one (with local 1.1.1.1), incoming ipip packets with src = 2.2.2.2 and dst = 1.1.1.2 were still routed into ipip1. Same issue existed with all tunnels that use ip_tunnel_lookup (gre, vti) 2) ip address add 1.1.1.1/24 dev eth0 ip tunnel add ipip1 remote 2.2.146.85 local 1.1.1.1 mode ipip dev eth0 ip link set ipip1 up Incoming ipip packets with dst = 1.1.1.1 were routed into ipip1, no matter what src address is. Any remote ip address which has ip_tunnel_hash = 0 raised this issue, 2.2.146.85 is just an example, there are more than 4 million of them. And again, wildcard tunnel like ip tunnel add ipip0 remote any local 1.1.1.1 mode ipip dev eth0 wouldn't be ever matched if it was created before explicit tunnel like above. Gre & vti tunnels had the same issue. 3) ip address add 1.1.1.1/24 dev eth0 ip tunnel add gre1 remote 2.2.146.84 local 1.1.1.1 key 1 mode gre dev eth0 ip link set gre1 up Any incoming gre packet with key = 1 were routed into gre1, no matter what src/dst addresses are. Any remote ip address which has ip_tunnel_hash = 0 raised the issue, 2.2.146.84 is just an example, there are more than 4 million of them. Wildcard tunnel like ip tunnel add gre2 remote any local any key 1 mode gre dev eth0 wouldn't be ever matched if it was created before explicit tunnel like above. All this stuff happened because while looking for a wildcard tunnel we didn't check that matched tunnel is a wildcard one. Fixed. Signed-off-by: Dmitry Popov <ixaphire@qrator.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-09mac80211: fix a memory leak on sta rate selection tableFelix Fietkau
commit 53d045258ee2e38b1e882617cb0799a04d05f5fa upstream. If the rate control algorithm uses a selection table, it is leaked when the station is destroyed - fix that. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Reported-by: Christophe Prévotaux <cprevotaux@nltinc.com> Fixes: 0d528d85c519 ("mac80211: improve the rate control API") [add commit log entry, remove pointless NULL check] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-09mac80211: don't check netdev state for debugfs read/writeArik Nemtsov
commit 923eaf367206e01f22c97aee22300e332d071916 upstream. Doing so will lead to an oops for a p2p-dev interface, since it has no netdev. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-09Bluetooth: Fix locking of hdev when calling into SMP codeJohan Hedberg
commit c73f94b8c093a615ce80eabbde0ac6eb9abfe31a upstream. The SMP code expects hdev to be unlocked since e.g. crypto functions will try to (re)lock it. Therefore, we need to release the lock before calling into smp.c from mgmt.c. Without this we risk a deadlock whenever the smp_user_confirm_reply() function is called. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Tested-by: Lukasz Rymanowski <lukasz.rymanowski@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-09Bluetooth: Fix check for connection encryptionJohan Hedberg
commit e694788d73efe139b24f78b036deb97fe57fa8cb upstream. The conn->link_key variable tracks the type of link key in use. It is set whenever we respond to a link key request as well as when we get a link key notification event. These two events do not however always guarantee that encryption is enabled: getting a link key request and responding to it may only mean that the remote side has requested authentication but not encryption. On the other hand, the encrypt change event is a certain guarantee that encryption is enabled. The real encryption state is already tracked in the conn->link_mode variable through the HCI_LM_ENCRYPT bit. This patch fixes a check for encryption in the hci_conn_auth function to use the proper conn->link_mode value and thereby eliminates the chance of a false positive result. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-09Bluetooth: Fix SSP acceptor just-works confirmation without MITMJohan Hedberg
commit ba15a58b179ed76a7e887177f2b06de12c58ec8f upstream. From the Bluetooth Core Specification 4.1 page 1958: "if both devices have set the Authentication_Requirements parameter to one of the MITM Protection Not Required options, authentication stage 1 shall function as if both devices set their IO capabilities to DisplayOnly (e.g., Numeric comparison with automatic confirmation on both devices)" So far our implementation has done user confirmation for all just-works cases regardless of the MITM requirements, however following the specification to the word means that we should not be doing confirmation when neither side has the MITM flag set. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Tested-by: Szymon Janc <szymon.janc@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-06netfilter: ipt_ULOG: fix info leaksMathias Krause
commit 278f2b3e2af5f32ea1afe34fa12a2518153e6e49 upstream. The ulog messages leak heap bytes by the means of padding bytes and incompletely filled string arrays. Fix those by memset(0)'ing the whole struct before filling it. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jan Tore Morken <jantore@morken.priv.no> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-06ipvs: Fix panic due to non-linear skbPeter Christensen
commit f44a5f45f544561302e855e7bd104e5f506ec01b upstream. Receiving a ICMP response to an IPIP packet in a non-linear skb could cause a kernel panic in __skb_pull. The problem was introduced in commit f2edb9f7706dcb2c0d9a362b2ba849efe3a97f5e ("ipvs: implement passive PMTUD for IPIP packets"). Signed-off-by: Peter Christensen <pch@ordbogen.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-06SUNRPC: Fix a module reference leak in svc_handle_xprtTrond Myklebust
commit c789102c20bbbdda6831a273e046715be9d6af79 upstream. If the accept() call fails, we need to put the module reference. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30skbuff: skb_segment: orphan frags before copyingMichael S. Tsirkin
commit 1fd819ecb90cc9b822cd84d3056ddba315d3340f upstream. skb_segment copies frags around, so we need to copy them carefully to avoid accessing user memory after reporting completion to userspace through a callback. skb_segment doesn't normally happen on datapath: TSO needs to be disabled - so disabling zero copy in this case does not look like a big deal. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2. As skb_segment() only supports page-frags *or* a frag list, there is no need for the additional frag_skb pointer or the preparatory renaming.] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Eddie Chapman <eddie@ehuk.net> # backported to 3.10 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30Bluetooth: Fix L2CAP deadlockJukka Taimisto
commit 8a96f3cd22878fc0bb564a8478a6e17c0b8dca73 upstream. -[0x01 Introduction We have found a programming error causing a deadlock in Bluetooth subsystem of Linux kernel. The problem is caused by missing release_sock() call when L2CAP connection creation fails due full accept queue. The issue can be reproduced with 3.15-rc5 kernel and is also present in earlier kernels. -[0x02 Details The problem occurs when multiple L2CAP connections are created to a PSM which contains listening socket (like SDP) and left pending, for example, configuration (the underlying ACL link is not disconnected between connections). When L2CAP connection request is received and listening socket is found the l2cap_sock_new_connection_cb() function (net/bluetooth/l2cap_sock.c) is called. This function locks the 'parent' socket and then checks if the accept queue is full. 1178 lock_sock(parent); 1179 1180 /* Check for backlog size */ 1181 if (sk_acceptq_is_full(parent)) { 1182 BT_DBG("backlog full %d", parent->sk_ack_backlog); 1183 return NULL; 1184 } If case the accept queue is full NULL is returned, but the 'parent' socket is not released. Thus when next L2CAP connection request is received the code blocks on lock_sock() since the parent is still locked. Also note that for connections already established and waiting for configuration to complete a timeout will occur and l2cap_chan_timeout() (net/bluetooth/l2cap_core.c) will be called. All threads calling this function will also be blocked waiting for the channel mutex since the thread which is waiting on lock_sock() alread holds the channel mutex. We were able to reproduce this by sending continuously L2CAP connection request followed by disconnection request containing invalid CID. This left the created connections pending configuration. After the deadlock occurs it is impossible to kill bluetoothd, btmon will not get any more data etc. requiring reboot to recover. -[0x03 Fix Releasing the 'parent' socket when l2cap_sock_new_connection_cb() returns NULL seems to fix the issue. Signed-off-by: Jukka Taimisto <jtt@codenomicon.com> Reported-by: Tommi Mäkilä <tmakila@codenomicon.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30af_iucv: wrong mapping of sent and confirmed skbsUrsula Braun
commit f5738e2ef88070ef1372e6e718124d88e9abe4ac upstream. When sending data through IUCV a MESSAGE COMPLETE interrupt signals that sent data memory can be freed or reused again. With commit f9c41a62bba3f3f7ef3541b2a025e3371bcbba97 "af_iucv: fix recvmsg by replacing skb_pull() function" the MESSAGE COMPLETE callback iucv_callback_txdone() identifies the wrong skb as being confirmed, which leads to data corruption. This patch fixes the skb mapping logic in iucv_callback_txdone(). Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26rtnetlink: fix userspace API breakage for iproute2 < v3.9.0Michal Schmidt
[ Upstream commit e5eca6d41f53db48edd8cf88a3f59d2c30227f8e ] When running RHEL6 userspace on a current upstream kernel, "ip link" fails to show VF information. The reason is a kernel<->userspace API change introduced by commit 88c5b5ce5cb57 ("rtnetlink: Call nlmsg_parse() with correct header length"), after which the kernel does not see iproute2's IFLA_EXT_MASK attribute in the netlink request. iproute2 adjusted for the API change in its commit 63338dca4513 ("libnetlink: Use ifinfomsg instead of rtgenmsg in rtnl_wilddump_req_filter"). The problem has been noticed before: http://marc.info/?l=linux-netdev&m=136692296022182&w=2 (Subject: Re: getting VF link info seems to be broken in 3.9-rc8) We can do better than tell those with old userspace to upgrade. We can recognize the old iproute2 in the kernel by checking the netlink message length. Even when including the IFLA_EXT_MASK attribute, its netlink message is shorter than struct ifinfomsg. With this patch "ip link" shows VF information in both old and new iproute2 versions. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26sctp: Fix sk_ack_backlog wrap-around problemXufeng Zhang
[ Upstream commit d3217b15a19a4779c39b212358a5c71d725822ee ] Consider the scenario: For a TCP-style socket, while processing the COOKIE_ECHO chunk in sctp_sf_do_5_1D_ce(), after it has passed a series of sanity check, a new association would be created in sctp_unpack_cookie(), but afterwards, some processing maybe failed, and sctp_association_free() will be called to free the previously allocated association, in sctp_association_free(), sk_ack_backlog value is decremented for this socket, since the initial value for sk_ack_backlog is 0, after the decrement, it will be 65535, a wrap-around problem happens, and if we want to establish new associations afterward in the same socket, ABORT would be triggered since sctp deem the accept queue as full. Fix this issue by only decrementing sk_ack_backlog for associations in the endpoint's list. Fix-suggested-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26ipv4: fix a race in ip4_datagram_release_cb()Eric Dumazet
[ Upstream commit 9709674e68646cee5a24e3000b3558d25412203a ] Alexey gave a AddressSanitizer[1] report that finally gave a good hint at where was the origin of various problems already reported by Dormando in the past [2] Problem comes from the fact that UDP can have a lockless TX path, and concurrent threads can manipulate sk_dst_cache, while another thread, is holding socket lock and calls __sk_dst_set() in ip4_datagram_release_cb() (this was added in linux-3.8) It seems that all we need to do is to use sk_dst_check() and sk_dst_set() so that all the writers hold same spinlock (sk->sk_dst_lock) to prevent corruptions. TCP stack do not need this protection, as all sk_dst_cache writers hold the socket lock. [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel AddressSanitizer: heap-use-after-free in ipv4_dst_check Read of size 2 by thread T15453: [<ffffffff817daa3a>] ipv4_dst_check+0x1a/0x90 ./net/ipv4/route.c:1116 [<ffffffff8175b789>] __sk_dst_check+0x89/0xe0 ./net/core/sock.c:531 [<ffffffff81830a36>] ip4_datagram_release_cb+0x46/0x390 ??:0 [<ffffffff8175eaea>] release_sock+0x17a/0x230 ./net/core/sock.c:2413 [<ffffffff81830882>] ip4_datagram_connect+0x462/0x5d0 ??:0 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b ./arch/x86/kernel/entry_64.S:629 Freed by thread T15455: [<ffffffff8178d9b8>] dst_destroy+0xa8/0x160 ./net/core/dst.c:251 [<ffffffff8178de25>] dst_release+0x45/0x80 ./net/core/dst.c:280 [<ffffffff818304c1>] ip4_datagram_connect+0xa1/0x5d0 ??:0 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b ./arch/x86/kernel/entry_64.S:629 Allocated by thread T15453: [<ffffffff8178d291>] dst_alloc+0x81/0x2b0 ./net/core/dst.c:171 [<ffffffff817db3b7>] rt_dst_alloc+0x47/0x50 ./net/ipv4/route.c:1406 [< inlined >] __ip_route_output_key+0x3e8/0xf70 __mkroute_output ./net/ipv4/route.c:1939 [<ffffffff817dde08>] __ip_route_output_key+0x3e8/0xf70 ./net/ipv4/route.c:2161 [<ffffffff817deb34>] ip_route_output_flow+0x14/0x30 ./net/ipv4/route.c:2249 [<ffffffff81830737>] ip4_datagram_connect+0x317/0x5d0 ??:0 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b ./arch/x86/kernel/entry_64.S:629 [2] <4>[196727.311203] general protection fault: 0000 [#1] SMP <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1 <4>[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013 <4>[196727.311364] task: ffff885e6f069700 ti: ffff885e6f072000 task.ti: ffff885e6f072000 <4>[196727.311377] RIP: 0010:[<ffffffff815f8c7f>] [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80 <4>[196727.311399] RSP: 0018:ffff885effd23a70 EFLAGS: 00010282 <4>[196727.311409] RAX: dead000000200200 RBX: ffff8854c398ecc0 RCX: 0000000000000040 <4>[196727.311423] RDX: dead000000100100 RSI: dead000000100100 RDI: dead000000200200 <4>[196727.311437] RBP: ffff885effd23a80 R08: ffffffff815fd9e0 R09: ffff885d5a590800 <4>[196727.311451] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 <4>[196727.311464] R13: ffffffff81c8c280 R14: 0000000000000000 R15: ffff880e85ee16ce <4>[196727.311510] FS: 0000000000000000(0000) GS:ffff885effd20000(0000) knlGS:0000000000000000 <4>[196727.311554] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[196727.311581] CR2: 00007a46751eb000 CR3: 0000005e65688000 CR4: 00000000000407e0 <4>[196727.311625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[196727.311669] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>[196727.311713] Stack: <4>[196727.311733] ffff8854c398ecc0 ffff8854c398ecc0 ffff885effd23ab0 ffffffff815b7f42 <4>[196727.311784] ffff88be6595bc00 ffff8854c398ecc0 0000000000000000 ffff8854c398ecc0 <4>[196727.311834] ffff885effd23ad0 ffffffff815b86c6 ffff885d5a590800 ffff8816827821c0 <4>[196727.311885] Call Trace: <4>[196727.311907] <IRQ> <4>[196727.311912] [<ffffffff815b7f42>] dst_destroy+0x32/0xe0 <4>[196727.311959] [<ffffffff815b86c6>] dst_release+0x56/0x80 <4>[196727.311986] [<ffffffff81620bd5>] tcp_v4_do_rcv+0x2a5/0x4a0 <4>[196727.312013] [<ffffffff81622b5a>] tcp_v4_rcv+0x7da/0x820 <4>[196727.312041] [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360 <4>[196727.312070] [<ffffffff815de02d>] ? nf_hook_slow+0x7d/0x150 <4>[196727.312097] [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360 <4>[196727.312125] [<ffffffff815fda92>] ip_local_deliver_finish+0xb2/0x230 <4>[196727.312154] [<ffffffff815fdd9a>] ip_local_deliver+0x4a/0x90 <4>[196727.312183] [<ffffffff815fd799>] ip_rcv_finish+0x119/0x360 <4>[196727.312212] [<ffffffff815fe00b>] ip_rcv+0x22b/0x340 <4>[196727.312242] [<ffffffffa0339680>] ? macvlan_broadcast+0x160/0x160 [macvlan] <4>[196727.312275] [<ffffffff815b0c62>] __netif_receive_skb_core+0x512/0x640 <4>[196727.312308] [<ffffffff811427fb>] ? kmem_cache_alloc+0x13b/0x150 <4>[196727.312338] [<ffffffff815b0db1>] __netif_receive_skb+0x21/0x70 <4>[196727.312368] [<ffffffff815b0fa1>] netif_receive_skb+0x31/0xa0 <4>[196727.312397] [<ffffffff815b1ae8>] napi_gro_receive+0xe8/0x140 <4>[196727.312433] [<ffffffffa00274f1>] ixgbe_poll+0x551/0x11f0 [ixgbe] <4>[196727.312463] [<ffffffff815fe00b>] ? ip_rcv+0x22b/0x340 <4>[196727.312491] [<ffffffff815b1691>] net_rx_action+0x111/0x210 <4>[196727.312521] [<ffffffff815b0db1>] ? __netif_receive_skb+0x21/0x70 <4>[196727.312552] [<ffffffff810519d0>] __do_softirq+0xd0/0x270 <4>[196727.312583] [<ffffffff816cef3c>] call_softirq+0x1c/0x30 <4>[196727.312613] [<ffffffff81004205>] do_softirq+0x55/0x90 <4>[196727.312640] [<ffffffff81051c85>] irq_exit+0x55/0x60 <4>[196727.312668] [<ffffffff816cf5c3>] do_IRQ+0x63/0xe0 <4>[196727.312696] [<ffffffff816c5aaa>] common_interrupt+0x6a/0x6a <4>[196727.312722] <EOI> <1>[196727.313071] RIP [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80 <4>[196727.313100] RSP <ffff885effd23a70> <4>[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]--- <0>[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt Reported-by: Alexey Preobrazhensky <preobr@google.com> Reported-by: dormando <dormando@rydia.ne> Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 8141ed9fcedb2 ("ipv4: Add a socket release callback for datagram sockets") Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26ipip, sit: fix ipv4_{update_pmtu,redirect} callsDmitry Popov
[ Upstream commit 2346829e641b804ece9ac9298136b56d9567c278 ] ipv4_{update_pmtu,redirect} were called with tunnel's ifindex (t->dev is a tunnel netdevice). It caused wrong route lookup and failure of pmtu update or redirect. We should use the same ifindex that we use in ip_route_output_* in *tunnel_xmit code. It is t->parms.link . Signed-off-by: Dmitry Popov <ixaphire@qrator.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26net: force a list_del() in unregister_netdevice_many()Eric Dumazet
[ Upstream commit 87757a917b0b3c0787e0563c679762152be81312 ] unregister_netdevice_many() API is error prone and we had too many bugs because of dangling LIST_HEAD on stacks. See commit f87e6f47933e3e ("net: dont leave active on stack LIST_HEAD") In fact, instead of making sure no caller leaves an active list_head, just force a list_del() in the callee. No one seems to need to access the list after unregister_netdevice_many() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26tcp: fix cwnd undo on DSACK in F-RTOYuchung Cheng
[ Upstream commit 0cfa5c07d6d1d7f8e710fc671c5ba1ce85e09fa4 ] This bug is discovered by an recent F-RTO issue on tcpm list https://www.ietf.org/mail-archive/web/tcpm/current/msg08794.html The bug is that currently F-RTO does not use DSACK to undo cwnd in certain cases: upon receiving an ACK after the RTO retransmission in F-RTO, and the ACK has DSACK indicating the retransmission is spurious, the sender only calls tcp_try_undo_loss() if some never retransmisted data is sacked (FLAG_ORIG_DATA_SACKED). The correct behavior is to unconditionally call tcp_try_undo_loss so the DSACK information is used properly to undo the cwnd reduction. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26net: fix inet_getid() and ipv6_select_ident() bugsEric Dumazet
[ Upstream commit 39c36094d78c39e038c1e499b2364e13bce36f54 ] I noticed we were sending wrong IPv4 ID in TCP flows when MTU discovery is disabled. Note how GSO/TSO packets do not have monotonically incrementing ID. 06:37:41.575531 IP (id 14227, proto: TCP (6), length: 4396) 06:37:41.575534 IP (id 14272, proto: TCP (6), length: 65212) 06:37:41.575544 IP (id 14312, proto: TCP (6), length: 57972) 06:37:41.575678 IP (id 14317, proto: TCP (6), length: 7292) 06:37:41.575683 IP (id 14361, proto: TCP (6), length: 63764) It appears I introduced this bug in linux-3.1. inet_getid() must return the old value of peer->ip_id_count, not the new one. Lets revert this part, and remove the prevention of a null identification field in IPv6 Fragment Extension Header, which is dubious and not even done properly. Fixes: 87c48fa3b463 ("ipv6: make fragment identifications less predictable") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26net: tunnels - enable module autoloadingTom Gundersen
[ Upstream commit f98f89a0104454f35a62d681683c844f6dbf4043 ] Enable the module alias hookup to allow tunnel modules to be autoloaded on demand. This is in line with how most other netdev kinds work, and will allow userspace to create tunnels without having CAP_SYS_MODULE. Signed-off-by: Tom Gundersen <teg@jklm.no> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26netlink: Only check file credentials for implicit destinationsEric W. Biederman
[ Upstream commit 2d7a85f4b06e9c27ff629f07a524c48074f07f81 ] It was possible to get a setuid root or setcap executable to write to it's stdout or stderr (which has been set made a netlink socket) and inadvertently reconfigure the networking stack. To prevent this we check that both the creator of the socket and the currentl applications has permission to reconfigure the network stack. Unfortunately this breaks Zebra which always uses sendto/sendmsg and creates it's socket without any privileges. To keep Zebra working don't bother checking if the creator of the socket has privilege when a destination address is specified. Instead rely exclusively on the privileges of the sender of the socket. Note from Andy: This is exactly Eric's code except for some comment clarifications and formatting fixes. Neither I nor, I think, anyone else is thrilled with this approach, but I'm hesitant to wait on a better fix since 3.15 is almost here. Note to stable maintainers: This is a mess. An earlier series of patches in 3.15 fix a rather serious security issue (CVE-2014-0181), but they did so in a way that breaks Zebra. The offending series includes: commit aa4cf9452f469f16cea8c96283b641b4576d4a7b Author: Eric W. Biederman <ebiederm@xmission.com> Date: Wed Apr 23 14:28:03 2014 -0700 net: Add variants of capable for use on netlink messages If a given kernel version is missing that series of fixes, it's probably worth backporting it and this patch. if that series is present, then this fix is critical if you care about Zebra. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26net: Use netlink_ns_capable to verify the permisions of netlink messagesEric W. Biederman
[ Upstream commit 90f62cf30a78721641e08737bda787552428061e ] It is possible by passing a netlink socket to a more privileged executable and then to fool that executable into writing to the socket data that happens to be valid netlink message to do something that privileged executable did not intend to do. To keep this from happening replace bare capable and ns_capable calls with netlink_capable, netlink_net_calls and netlink_ns_capable calls. Which act the same as the previous calls except they verify that the opener of the socket had the desired permissions as well. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26net: Add variants of capable for use on netlink messagesEric W. Biederman
[ Upstream commit aa4cf9452f469f16cea8c96283b641b4576d4a7b ] netlink_net_capable - The common case use, for operations that are safe on a network namespace netlink_capable - For operations that are only known to be safe for the global root netlink_ns_capable - The general case of capable used to handle special cases __netlink_ns_capable - Same as netlink_ns_capable except taking a netlink_skb_parms instead of the skbuff of a netlink message. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26net: Add variants of capable for use on on socketsEric W. Biederman
[ Upstream commit a3b299da869d6e78cf42ae0b1b41797bcb8c5e4b ] sk_net_capable - The common case, operations that are safe in a network namespace. sk_capable - Operations that are not known to be safe in a network namespace sk_ns_capable - The general case for special cases. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26net: Move the permission check in sock_diag_put_filterinfo to packet_diag_dumpEric W. Biederman
[ Upstream commit a53b72c83a4216f2eb883ed45a0cbce014b8e62d ] The permission check in sock_diag_put_filterinfo is wrong, and it is so removed from it's sources it is not clear why it is wrong. Move the computation into packet_diag_dump and pass a bool of the result into sock_diag_filterinfo. This does not yet correct the capability check but instead simply moves it to make it clear what is going on. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26netlink: Rename netlink_capable netlink_allowedEric W. Biederman
[ Upstream commit 5187cd055b6e81fc6526109456f8b20623148d5f ] netlink_capable is a static internal function in af_netlink.c and we have better uses for the name netlink_capable. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-16netfilter: ipv4: defrag: set local_df flag on defragmented skbFlorian Westphal
commit 895162b1101b3ea5db08ca6822ae9672717efec0 upstream. else we may fail to forward skb even if original fragments do fit outgoing link mtu: 1. remote sends 2k packets in two 1000 byte frags, DF set 2. we want to forward but only see '2k > mtu and DF set' 3. we then send icmp error saying that outgoing link is 1500 But original sender never sent a packet that would not fit the outgoing link. Setting local_df makes outgoing path test size vs. IPCB(skb)->frag_max_size, so we will still send the correct error in case the largest original size did not fit outgoing link mtu. Reported-by: Maxime Bizon <mbizon@freebox.fr> Suggested-by: Maxime Bizon <mbizon@freebox.fr> Fixes: 5f2d04f1f9 (ipv4: fix path MTU discovery with connection tracking) Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-11netfilter: Fix potential use after free in ip6_route_me_harder()Sergey Popovich
commit a8951d5814e1373807a94f79f7ccec7041325470 upstream. Dst is released one line before we access it again with dst->error. Fixes: 58e35d147128 netfilter: ipv6: propagate routing errors from ip6_route_me_harder() Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07libceph: fix corruption when using page_count 0 page in rbdChunwei Chen
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream. It has been reported that using ZFSonLinux on rbd will result in memory corruption. The bug report can be found here: https://github.com/zfsonlinux/spl/issues/241 http://tracker.ceph.com/issues/7790 The reason is that ZFS will send pages with page_count 0 into rbd, which in turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with page_count 0, as it will do get_page and put_page, and erroneously free the page. This type of issue has been noted before, and handled in iscsi, drbd, etc. So, rbd should also handle this. This fix address this issue by fall back to slower sendmsg when page_count 0 detected. Cc: Sage Weil <sage@inktank.com> Cc: Yehuda Sadeh <yehuda@inktank.com> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07Bluetooth: Fix redundant encryption request for reauthenticationJohan Hedberg
commit 09da1f3463eb81d59685df723b1c5950b7570340 upstream. When we're performing reauthentication (in order to elevate the security level from an unauthenticated key to an authenticated one) we do not need to issue any encryption command once authentication completes. Since the trigger for the encryption HCI command is the ENCRYPT_PEND flag this flag should not be set in this scenario. Instead, the REAUTH_PEND flag takes care of all necessary steps for reauthentication. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07Bluetooth: Fix triggering BR/EDR L2CAP Connect too earlyJohan Hedberg
commit 9eb1fbfa0a737fd4d3a6d12d71c5ea9af622b887 upstream. Commit 1c2e004183178 introduced an event handler for the encryption key refresh complete event with the intent of fixing some LE/SMP cases. However, this event is shared with BR/EDR and there we actually want to act only on the auth_complete event (which comes after the key refresh). If we do not do this we may trigger an L2CAP Connect Request too early and cause the remote side to return a security block error. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07mac80211: fix on-channel remain-on-channelJohannes Berg
commit b4b177a5556a686909e643f1e9b6434c10de079f upstream. Jouni reported that if a remain-on-channel was active on the same channel as the current operating channel, then the ROC would start, but any frames transmitted using mgmt-tx on the same channel would get delayed until after the ROC. The reason for this is that the ROC starts, but doesn't have any handling for "remain on the same channel", so it stops the interface queues. The later mgmt-tx then puts the frame on the interface queues (since it's on the current operating channel) and thus they get delayed until after the ROC. To fix this, add some logic to handle remaining on the same channel specially and not stop the queues etc. in this case. This not only fixes the bug but also improves behaviour in this case as data frames etc. can continue to flow. Reported-by: Jouni Malinen <j@w1.fi> Tested-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07mac80211: fix suspend vs. authentication raceJohannes Berg
commit 1a1cb744de160ee70086a77afff605bbc275d291 upstream. Since Stanislaw's patch removing the quiescing code, mac80211 had a race regarding suspend vs. authentication: as cfg80211 doesn't track authentication attempts, it can't abort them. Therefore the attempts may be kept running while suspending, which can lead to all kinds of issues, in at least some cases causing an error in iwlmvm firmware. Fix this by aborting the authentication attempt when suspending. Cc: stable@vger.kernel.org Fixes: 12e7f517029d ("mac80211: cleanup generic suspend/resume procedures") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30net-gro: reset skb->truesize in napi_reuse_skb()Eric Dumazet
[ Upstream commit e33d0ba8047b049c9262fdb1fcafb93cb52ceceb ] Recycling skb always had been very tough... This time it appears GRO layer can accumulate skb->truesize adjustments made by drivers when they attach a fragment to skb. skb_gro_receive() can only subtract from skb->truesize the used part of a fragment. I spotted this problem seeing TcpExtPruneCalled and TcpExtTCPRcvCollapsed that were unexpected with a recent kernel, where TCP receive window should be sized properly to accept traffic coming from a driver not overshooting skb->truesize. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>