summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2018-04-08Revert "ip6_vti: adjust vti mtu according to mtu of lower device"Greg Kroah-Hartman
This reverts commit 1139d77d8a7f9aa6b6ae0a1c902f94775dad2f52 which is commit 53c81e95df1793933f87748d36070a721f6cb287 upstream. Ben writes that there are a number of follow-on patches needed to fix this up, but they get complex to backport, and some custom fixes are needed, so let's just revert this and wait for a "real" set of patches to resolve this to be submitted if it is really needed. Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Cc: Petr Vorel <pvorel@suse.cz> Cc: Alexey Kodanev <alexey.kodanev@oracle.com> Cc: David S. Miller <davem@davemloft.net> Cc: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08Bluetooth: Fix missing encryption refresh on Security RequestSzymon Janc
commit 64e759f58f128730b97a3c3a26d283c075ad7c86 upstream. If Security Request is received on connection that is already encrypted with sufficient security master should perform encryption key refresh procedure instead of just ignoring Slave Security Request (Core Spec 5.0 Vol 3 Part H 2.4.6). > ACL Data RX: Handle 3585 flags 0x02 dlen 6 SMP: Security Request (0x0b) len 1 Authentication requirement: Bonding, No MITM, SC, No Keypresses (0x09) < HCI Command: LE Start Encryption (0x08|0x0019) plen 28 Handle: 3585 Random number: 0x0000000000000000 Encrypted diversifier: 0x0000 Long term key: 44264272a5c426a9e868f034cf0e69f3 > HCI Event: Command Status (0x0f) plen 4 LE Start Encryption (0x08|0x0019) ncmd 1 Status: Success (0x00) > HCI Event: Encryption Key Refresh Complete (0x30) plen 3 Status: Success (0x00) Handle: 3585 Signed-off-by: Szymon Janc <szymon.janc@codecoup.pl> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08netfilter: x_tables: add and use xt_check_proc_nameFlorian Westphal
commit b1d0a5d0cba4597c0394997b2d5fced3e3841b4e upstream. recent and hashlimit both create /proc files, but only check that name is 0 terminated. This can trigger WARN() from procfs when name is "" or "/". Add helper for this and then use it for both. Cc: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: <syzbot+0502b00edac2a0680b61@syzkaller.appspotmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08netfilter: bridge: ebt_among: add more missing match size checksFlorian Westphal
commit c8d70a700a5b486bfa8e5a7d33d805389f6e59f9 upstream. ebt_among is special, it has a dynamic match size and is exempt from the central size checks. commit c4585a2823edf ("bridge: ebt_among: add missing match size checks") added validation for pool size, but missed fact that the macros ebt_among_wh_src/dst can already return out-of-bound result because they do not check value of wh_src/dst_ofs (an offset) vs. the size of the match that userspace gave to us. v2: check that offset has correct alignment. Paolo Abeni points out that we should also check that src/dst wormhash arrays do not overlap, and src + length lines up with start of dst (or vice versa). v3: compact wormhash_sizes_valid() part NB: Fixes tag is intentionally wrong, this bug exists from day one when match was added for 2.6 kernel. Tag is there so stable maintainers will notice this one too. Tested with same rules from the earlier patch. Fixes: c4585a2823edf ("bridge: ebt_among: add missing match size checks") Reported-by: <syzbot+bdabab6f1983a03fc009@syzkaller.appspotmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08xfrm: Refuse to insert 32 bit userspace socket policies on 64 bit systemsSteffen Klassert
commit 19d7df69fdb2636856dc8919de72fc1bf8f79598 upstream. We don't have a compat layer for xfrm, so userspace and kernel structures have different sizes in this case. This results in a broken configuration, so refuse to configure socket policies when trying to insert from 32 bit userspace as we do it already with policies inserted via netlink. Reported-and-tested-by: syzbot+e1a1577ca8bcb47b769a@syzkaller.appspotmail.com Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08net: xfrm: use preempt-safe this_cpu_read() in ipcomp_alloc_tfms()Greg Hackmann
commit 0dcd7876029b58770f769cbb7b484e88e4a305e5 upstream. f7c83bcbfaf5 ("net: xfrm: use __this_cpu_read per-cpu helper") added a __this_cpu_read() call inside ipcomp_alloc_tfms(). At the time, __this_cpu_read() required the caller to either not care about races or to handle preemption/interrupt issues. 3.15 tightened the rules around some per-cpu operations, and now __this_cpu_read() should never be used in a preemptible context. On 3.15 and later, we need to use this_cpu_read() instead. syzkaller reported this leading to the following kernel BUG while fuzzing sendmsg: BUG: using __this_cpu_read() in preemptible [00000000] code: repro/3101 caller is ipcomp_init_state+0x185/0x990 CPU: 3 PID: 3101 Comm: repro Not tainted 4.16.0-rc4-00123-g86f84779d8e9 #154 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0xb9/0x115 check_preemption_disabled+0x1cb/0x1f0 ipcomp_init_state+0x185/0x990 ? __xfrm_init_state+0x876/0xc20 ? lock_downgrade+0x5e0/0x5e0 ipcomp4_init_state+0xaa/0x7c0 __xfrm_init_state+0x3eb/0xc20 xfrm_init_state+0x19/0x60 pfkey_add+0x20df/0x36f0 ? pfkey_broadcast+0x3dd/0x600 ? pfkey_sock_destruct+0x340/0x340 ? pfkey_seq_stop+0x80/0x80 ? __skb_clone+0x236/0x750 ? kmem_cache_alloc+0x1f6/0x260 ? pfkey_sock_destruct+0x340/0x340 ? pfkey_process+0x62a/0x6f0 pfkey_process+0x62a/0x6f0 ? pfkey_send_new_mapping+0x11c0/0x11c0 ? mutex_lock_io_nested+0x1390/0x1390 pfkey_sendmsg+0x383/0x750 ? dump_sp+0x430/0x430 sock_sendmsg+0xc0/0x100 ___sys_sendmsg+0x6c8/0x8b0 ? copy_msghdr_from_user+0x3b0/0x3b0 ? pagevec_lru_move_fn+0x144/0x1f0 ? find_held_lock+0x32/0x1c0 ? do_huge_pmd_anonymous_page+0xc43/0x11e0 ? lock_downgrade+0x5e0/0x5e0 ? get_kernel_page+0xb0/0xb0 ? _raw_spin_unlock+0x29/0x40 ? do_huge_pmd_anonymous_page+0x400/0x11e0 ? __handle_mm_fault+0x553/0x2460 ? __fget_light+0x163/0x1f0 ? __sys_sendmsg+0xc7/0x170 __sys_sendmsg+0xc7/0x170 ? SyS_shutdown+0x1a0/0x1a0 ? __do_page_fault+0x5a0/0xca0 ? lock_downgrade+0x5e0/0x5e0 SyS_sendmsg+0x27/0x40 ? __sys_sendmsg+0x170/0x170 do_syscall_64+0x19f/0x640 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x7f0ee73dfb79 RSP: 002b:00007ffe14fc15a8 EFLAGS: 00000207 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0ee73dfb79 RDX: 0000000000000000 RSI: 00000000208befc8 RDI: 0000000000000004 RBP: 00007ffe14fc15b0 R08: 00007ffe14fc15c0 R09: 00007ffe14fc15c0 R10: 0000000000000000 R11: 0000000000000207 R12: 0000000000400440 R13: 00007ffe14fc16b0 R14: 0000000000000000 R15: 0000000000000000 Signed-off-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08xfrm_user: uncoditionally validate esn replay attribute structFlorian Westphal
commit d97ca5d714a5334aecadadf696875da40f1fbf3e upstream. The sanity test added in ecd7918745234 can be bypassed, validation only occurs if XFRM_STATE_ESN flag is set, but rest of code doesn't care and just checks if the attribute itself is present. So always validate. Alternative is to reject if we have the attribute without the flag but that would change abi. Reported-by: syzbot+0ab777c27d2bb7588f73@syzkaller.appspotmail.com Cc: Mathias Krause <minipli@googlemail.com> Fixes: ecd7918745234 ("xfrm_user: ensure user supplied esn replay window is valid") Fixes: d8647b79c3b7e ("xfrm: Add user interface for esn and big anti-replay windows") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08mac80211: ibss: Fix channel type enum in ieee80211_sta_join_ibss()Matthias Kaehlcke
commit a4ac6f2e53e568a77a2eb3710efd99ca08634c0a upstream. cfg80211_chandef_create() expects an 'enum nl80211_channel_type' as channel type however in ieee80211_sta_join_ibss() NL80211_CHAN_WIDTH_20_NOHT is passed in two occasions, which is of the enum type 'nl80211_chan_width'. Change the value to NL80211_CHAN_NO_HT (20 MHz, non-HT channel) of the channel type enum. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Cc: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08mac80211: Fix clang warning about constant operand in logical operationMatthias Kaehlcke
commit 93f56de259376d7e4fff2b2d104082e1fa66e237 upstream. When clang detects a non-boolean constant in a logical operation it generates a 'constant-logical-operand' warning. In ieee80211_try_rate_control_ops_get() the result of strlen(<const str>) is used in a logical operation, clang resolves the expression to an (integer) constant at compile time when clang's builtin strlen function is used. Change the condition to check for strlen() > 0 to make the constant operand boolean and thus avoid the warning. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Cc: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08netfilter: ctnetlink: Make some parameters integer to avoid enum mismatchMatthias Kaehlcke
commit a2b7cbdd2559aff06cebc28a7150f81c307a90d3 upstream. Not all parameters passed to ctnetlink_parse_tuple() and ctnetlink_exp_dump_tuple() match the enum type in the signatures of these functions. Since this is intended change the argument type of to be an unsigned integer value. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08cfg80211: Fix array-bounds warning in fragment copyMatthias Kaehlcke
commit aa1702dd162f420bf85ecef0c77686ef0dbc1496 upstream. __ieee80211_amsdu_copy_frag intentionally initializes a pointer to array[-1] to increment it later to valid values. clang rightfully generates an array-bounds warning on the initialization statement. Initialize the pointer to array[0] and change the algorithm from increment before to increment after consume. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Cc: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08nl80211: Fix enum type of variable in nl80211_put_sta_rate()Matthias Kaehlcke
commit bbf67e450a5dc2a595e1e7a67b4869f1a7f5a338 upstream. rate_flg is of type 'enum nl80211_attrs', however it is assigned with 'enum nl80211_rate_info' values. Change the type of rate_flg accordingly. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Cc: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08netfilter: nf_nat_h323: fix logical-not-parentheses warningNick Desaulniers
commit eee6ebbac18a189ef33d25ea9b8bcae176515e49 upstream. Clang produces the following warning: net/ipv4/netfilter/nf_nat_h323.c:553:6: error: logical not is only applied to the left hand side of this comparison [-Werror,-Wlogical-not-parentheses] if (!set_h225_addr(skb, protoff, data, dataoff, taddr, ^ add parentheses after the '!' to evaluate the comparison first add parentheses around left hand side expression to silence this warning There's not necessarily a bug here, but it's cleaner to return early, ex: if (x) return ... rather than: if (x == 0) ... else return Also added a return code check that seemed to be missing in one instance. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-31kcm: lock lower socket in kcm_attachTom Herbert
[ Upstream commit 2cc683e88c0c993ac3721d9b702cb0630abe2879 ] Need to lock lower socket in order to provide mutual exclusion with kcm_unattach. v2: Add Reported-by for syzbot Fixes: ab7ac4eb9832e32a09f4e804 ("kcm: Kernel Connection Multiplexor module") Reported-by: syzbot+ea75c0ffcd353d32515f064aaebefc5279e6161e@syzkaller.appspotmail.com Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-31skbuff: Fix not waking applications when errors are enqueuedVinicius Costa Gomes
[ Upstream commit 6e5d58fdc9bedd0255a8781b258f10bbdc63e975 ] When errors are enqueued to the error queue via sock_queue_err_skb() function, it is possible that the waiting application is not notified. Calling 'sk->sk_data_ready()' would not notify applications that selected only POLLERR events in poll() (for example). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Randy E. Witt <randy.e.witt@intel.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-31net: Only honor ifindex in IP_PKTINFO if non-0David Ahern
[ Upstream commit 2cbb4ea7de167b02ffa63e9cdfdb07a7e7094615 ] Only allow ifindex from IP_PKTINFO to override SO_BINDTODEVICE settings if the index is actually set in the message. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-31netlink: avoid a double skb free in genlmsg_mcast()Nicolas Dichtel
[ Upstream commit 02a2385f37a7c6594c9d89b64c4a1451276f08eb ] nlmsg_multicast() consumes always the skb, thus the original skb must be freed only when this function is called with a clone. Fixes: cb9f7a9a5c96 ("netlink: ensure to loop over all netns in genlmsg_multicast_allns()") Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-31net/iucv: Free memory obtained by kzallocArvind Yadav
[ Upstream commit fa6a91e9b907231d2e38ea5ed89c537b3525df3d ] Free memory by calling put_device(), if afiucv_iucv_init is not successful. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-31l2tp: do not accept arbitrary socketsEric Dumazet
[ Upstream commit 17cfe79a65f98abe535261856c5aef14f306dff7 ] syzkaller found an issue caused by lack of sufficient checks in l2tp_tunnel_create() RAW sockets can not be considered as UDP ones for instance. In another patch, we shall replace all pr_err() by less intrusive pr_debug() so that syzkaller can find other bugs faster. Acked-by: Guillaume Nault <g.nault@alphalink.fr> Acked-by: James Chapman <jchapman@katalix.com> ================================================================== BUG: KASAN: slab-out-of-bounds in setup_udp_tunnel_sock+0x3ee/0x5f0 net/ipv4/udp_tunnel.c:69 dst_release: dst:00000000d53d0d0f refcnt:-1 Write of size 1 at addr ffff8801d013b798 by task syz-executor3/6242 CPU: 1 PID: 6242 Comm: syz-executor3 Not tainted 4.16.0-rc2+ #253 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report+0x23b/0x360 mm/kasan/report.c:412 __asan_report_store1_noabort+0x17/0x20 mm/kasan/report.c:435 setup_udp_tunnel_sock+0x3ee/0x5f0 net/ipv4/udp_tunnel.c:69 l2tp_tunnel_create+0x1354/0x17f0 net/l2tp/l2tp_core.c:1596 pppol2tp_connect+0x14b1/0x1dd0 net/l2tp/l2tp_ppp.c:707 SYSC_connect+0x213/0x4a0 net/socket.c:1640 SyS_connect+0x24/0x30 net/socket.c:1621 do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-31ipv6: fix access to non-linear packet in ndisc_fill_redirect_hdr_option()Lorenzo Bianconi
[ Upstream commit 9f62c15f28b0d1d746734666d88a79f08ba1e43e ] Fix the following slab-out-of-bounds kasan report in ndisc_fill_redirect_hdr_option when the incoming ipv6 packet is not linear and the accessed data are not in the linear data region of orig_skb. [ 1503.122508] ================================================================== [ 1503.122832] BUG: KASAN: slab-out-of-bounds in ndisc_send_redirect+0x94e/0x990 [ 1503.123036] Read of size 1184 at addr ffff8800298ab6b0 by task netperf/1932 [ 1503.123220] CPU: 0 PID: 1932 Comm: netperf Not tainted 4.16.0-rc2+ #124 [ 1503.123347] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014 [ 1503.123527] Call Trace: [ 1503.123579] <IRQ> [ 1503.123638] print_address_description+0x6e/0x280 [ 1503.123849] kasan_report+0x233/0x350 [ 1503.123946] memcpy+0x1f/0x50 [ 1503.124037] ndisc_send_redirect+0x94e/0x990 [ 1503.125150] ip6_forward+0x1242/0x13b0 [...] [ 1503.153890] Allocated by task 1932: [ 1503.153982] kasan_kmalloc+0x9f/0xd0 [ 1503.154074] __kmalloc_track_caller+0xb5/0x160 [ 1503.154198] __kmalloc_reserve.isra.41+0x24/0x70 [ 1503.154324] __alloc_skb+0x130/0x3e0 [ 1503.154415] sctp_packet_transmit+0x21a/0x1810 [ 1503.154533] sctp_outq_flush+0xc14/0x1db0 [ 1503.154624] sctp_do_sm+0x34e/0x2740 [ 1503.154715] sctp_primitive_SEND+0x57/0x70 [ 1503.154807] sctp_sendmsg+0xaa6/0x1b10 [ 1503.154897] sock_sendmsg+0x68/0x80 [ 1503.154987] ___sys_sendmsg+0x431/0x4b0 [ 1503.155078] __sys_sendmsg+0xa4/0x130 [ 1503.155168] do_syscall_64+0x171/0x3f0 [ 1503.155259] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 1503.155436] Freed by task 1932: [ 1503.155527] __kasan_slab_free+0x134/0x180 [ 1503.155618] kfree+0xbc/0x180 [ 1503.155709] skb_release_data+0x27f/0x2c0 [ 1503.155800] consume_skb+0x94/0xe0 [ 1503.155889] sctp_chunk_put+0x1aa/0x1f0 [ 1503.155979] sctp_inq_pop+0x2f8/0x6e0 [ 1503.156070] sctp_assoc_bh_rcv+0x6a/0x230 [ 1503.156164] sctp_inq_push+0x117/0x150 [ 1503.156255] sctp_backlog_rcv+0xdf/0x4a0 [ 1503.156346] __release_sock+0x142/0x250 [ 1503.156436] release_sock+0x80/0x180 [ 1503.156526] sctp_sendmsg+0xbb0/0x1b10 [ 1503.156617] sock_sendmsg+0x68/0x80 [ 1503.156708] ___sys_sendmsg+0x431/0x4b0 [ 1503.156799] __sys_sendmsg+0xa4/0x130 [ 1503.156889] do_syscall_64+0x171/0x3f0 [ 1503.156980] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 1503.157158] The buggy address belongs to the object at ffff8800298ab600 which belongs to the cache kmalloc-1024 of size 1024 [ 1503.157444] The buggy address is located 176 bytes inside of 1024-byte region [ffff8800298ab600, ffff8800298aba00) [ 1503.157702] The buggy address belongs to the page: [ 1503.157820] page:ffffea0000a62a00 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0 [ 1503.158053] flags: 0x4000000000008100(slab|head) [ 1503.158171] raw: 4000000000008100 0000000000000000 0000000000000000 00000001800e000e [ 1503.158350] raw: dead000000000100 dead000000000200 ffff880036002600 0000000000000000 [ 1503.158523] page dumped because: kasan: bad access detected [ 1503.158698] Memory state around the buggy address: [ 1503.158816] ffff8800298ab900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 1503.158988] ffff8800298ab980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 1503.159165] >ffff8800298aba00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 1503.159338] ^ [ 1503.159436] ffff8800298aba80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1503.159610] ffff8800298abb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1503.159785] ================================================================== [ 1503.159964] Disabling lock debugging due to kernel taint The test scenario to trigger the issue consists of 4 devices: - H0: data sender, connected to LAN0 - H1: data receiver, connected to LAN1 - GW0 and GW1: routers between LAN0 and LAN1. Both of them have an ethernet connection on LAN0 and LAN1 On H{0,1} set GW0 as default gateway while on GW0 set GW1 as next hop for data from LAN0 to LAN1. Moreover create an ip6ip6 tunnel between H0 and H1 and send 3 concurrent data streams (TCP/UDP/SCTP) from H0 to H1 through ip6ip6 tunnel (send buffer size is set to 16K). While data streams are active flush the route cache on HA multiple times. I have not been able to identify a given commit that introduced the issue since, using the reproducer described above, the kasan report has been triggered from 4.14 and I have not gone back further. Reported-by: Jianlin Shi <jishi@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-31dccp: check sk for closed state in dccp_sendmsg()Alexey Kodanev
[ Upstream commit 67f93df79aeefc3add4e4b31a752600f834236e2 ] dccp_disconnect() sets 'dp->dccps_hc_tx_ccid' tx handler to NULL, therefore if DCCP socket is disconnected and dccp_sendmsg() is called after it, it will cause a NULL pointer dereference in dccp_write_xmit(). This crash and the reproducer was reported by syzbot. Looks like it is reproduced if commit 69c64866ce07 ("dccp: CVE-2017-8824: use-after-free in DCCP code") is applied. Reported-by: syzbot+f99ab3887ab65d70f816@syzkaller.appspotmail.com Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-31net: Fix hlist corruptions in inet_evict_bucket()Kirill Tkhai
[ Upstream commit a560002437d3646dafccecb1bf32d1685112ddda ] inet_evict_bucket() iterates global list, and several tasks may call it in parallel. All of them hash the same fq->list_evictor to different lists, which leads to list corruption. This patch makes fq be hashed to expired list only if this has not been made yet by another task. Since inet_frag_alloc() allocates fq using kmem_cache_zalloc(), we may rely on list_evictor is initially unhashed. The problem seems to exist before async pernet_operations, as there was possible to have exit method to be executed in parallel with inet_frags::frags_work, so I add two Fixes tags. This also may go to stable. Fixes: d1fe19444d82 "inet: frag: don't re-use chainlist for evictor" Fixes: f84c6821aa54 "net: Convert pernet_subsys, registered from inet_init()" Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-31net: use skb_to_full_sk() in skb_update_prio()Eric Dumazet
[ Upstream commit 4dcb31d4649df36297296b819437709f5407059c ] Andrei Vagin reported a KASAN: slab-out-of-bounds error in skb_update_prio() Since SYNACK might be attached to a request socket, we need to get back to the listener socket. Since this listener is manipulated without locks, add const qualifiers to sock_cgroup_prioidx() so that the const can also be used in skb_update_prio() Also add the const qualifier to sock_cgroup_classid() for consistency. Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-31ieee802154: 6lowpan: fix possible NULL deref in lowpan_device_event()Eric Dumazet
[ Upstream commit ca0edb131bdf1e6beaeb2b8289fd6b374b74147d ] A tun device type can trivially be set to arbitrary value using TUNSETLINK ioctl(). Therefore, lowpan_device_event() must really check that ieee802154_ptr is not NULL. Fixes: 2c88b5283f60d ("ieee802154: 6lowpan: remove check on null") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Aring <alex.aring@gmail.com> Cc: Stefan Schmidt <stefan@osg.samsung.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-31sch_netem: fix skb leak in netem_enqueue()Alexey Kodanev
[ Upstream commit 35d889d10b649fda66121891ec05eca88150059d ] When we exceed current packets limit and we have more than one segment in the list returned by skb_gso_segment(), netem drops only the first one, skipping the rest, hence kmemleak reports: unreferenced object 0xffff880b5d23b600 (size 1024): comm "softirq", pid 0, jiffies 4384527763 (age 2770.629s) hex dump (first 32 bytes): 00 80 23 5d 0b 88 ff ff 00 00 00 00 00 00 00 00 ..#]............ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000d8a19b9d>] __alloc_skb+0xc9/0x520 [<000000001709b32f>] skb_segment+0x8c8/0x3710 [<00000000c7b9bb88>] tcp_gso_segment+0x331/0x1830 [<00000000c921cba1>] inet_gso_segment+0x476/0x1370 [<000000008b762dd4>] skb_mac_gso_segment+0x1f9/0x510 [<000000002182660a>] __skb_gso_segment+0x1dd/0x620 [<00000000412651b9>] netem_enqueue+0x1536/0x2590 [sch_netem] [<0000000005d3b2a9>] __dev_queue_xmit+0x1167/0x2120 [<00000000fc5f7327>] ip_finish_output2+0x998/0xf00 [<00000000d309e9d3>] ip_output+0x1aa/0x2c0 [<000000007ecbd3a4>] tcp_transmit_skb+0x18db/0x3670 [<0000000042d2a45f>] tcp_write_xmit+0x4d4/0x58c0 [<0000000056a44199>] tcp_tasklet_func+0x3d9/0x540 [<0000000013d06d02>] tasklet_action+0x1ca/0x250 [<00000000fcde0b8b>] __do_softirq+0x1b4/0x5a3 [<00000000e7ed027c>] irq_exit+0x1e2/0x210 Fix it by adding the rest of the segments, if any, to skb 'to_free' list. Add new __qdisc_drop_all() and qdisc_drop_all() functions because they can be useful in the future if we need to drop segmented GSO packets in other places. Fixes: 6071bd1aa13e ("netem: Segment GSO packets on enqueue") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-31net sched actions: return explicit error when tunnel_key mode is not specifiedRoman Mashak
[ Upstream commit 51d4740f88affd85d49c04e3c9cd129c0e33bcb9 ] If set/unset mode of the tunnel_key action is not provided, ->init() still returns 0, and the caller proceeds with bogus 'struct tc_action *' object, this results in crash: % tc actions add action tunnel_key src_ip 1.1.1.1 dst_ip 2.2.2.1 id 7 index 1 [ 35.805515] general protection fault: 0000 [#1] SMP PTI [ 35.806161] Modules linked in: act_tunnel_key kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd serio_raw [ 35.808233] CPU: 1 PID: 428 Comm: tc Not tainted 4.16.0-rc4+ #286 [ 35.808929] RIP: 0010:tcf_action_init+0x90/0x190 [ 35.809457] RSP: 0018:ffffb8edc068b9a0 EFLAGS: 00010206 [ 35.810053] RAX: 1320c000000a0003 RBX: 0000000000000001 RCX: 0000000000000000 [ 35.810866] RDX: 0000000000000070 RSI: 0000000000007965 RDI: ffffb8edc068b910 [ 35.811660] RBP: ffffb8edc068b9d0 R08: 0000000000000000 R09: ffffb8edc068b808 [ 35.812463] R10: ffffffffc02bf040 R11: 0000000000000040 R12: ffffb8edc068bb38 [ 35.813235] R13: 0000000000000000 R14: 0000000000000000 R15: ffffb8edc068b910 [ 35.814006] FS: 00007f3d0d8556c0(0000) GS:ffff91d1dbc40000(0000) knlGS:0000000000000000 [ 35.814881] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 35.815540] CR2: 000000000043f720 CR3: 0000000019248001 CR4: 00000000001606a0 [ 35.816457] Call Trace: [ 35.817158] tc_ctl_action+0x11a/0x220 [ 35.817795] rtnetlink_rcv_msg+0x23d/0x2e0 [ 35.818457] ? __slab_alloc+0x1c/0x30 [ 35.819079] ? __kmalloc_node_track_caller+0xb1/0x2b0 [ 35.819544] ? rtnl_calcit.isra.30+0xe0/0xe0 [ 35.820231] netlink_rcv_skb+0xce/0x100 [ 35.820744] netlink_unicast+0x164/0x220 [ 35.821500] netlink_sendmsg+0x293/0x370 [ 35.822040] sock_sendmsg+0x30/0x40 [ 35.822508] ___sys_sendmsg+0x2c5/0x2e0 [ 35.823149] ? pagecache_get_page+0x27/0x220 [ 35.823714] ? filemap_fault+0xa2/0x640 [ 35.824423] ? page_add_file_rmap+0x108/0x200 [ 35.825065] ? alloc_set_pte+0x2aa/0x530 [ 35.825585] ? finish_fault+0x4e/0x70 [ 35.826140] ? __handle_mm_fault+0xbc1/0x10d0 [ 35.826723] ? __sys_sendmsg+0x41/0x70 [ 35.827230] __sys_sendmsg+0x41/0x70 [ 35.827710] do_syscall_64+0x68/0x120 [ 35.828195] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 35.828859] RIP: 0033:0x7f3d0ca4da67 [ 35.829331] RSP: 002b:00007ffc9f284338 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 35.830304] RAX: ffffffffffffffda RBX: 00007ffc9f284460 RCX: 00007f3d0ca4da67 [ 35.831247] RDX: 0000000000000000 RSI: 00007ffc9f2843b0 RDI: 0000000000000003 [ 35.832167] RBP: 000000005aa6a7a9 R08: 0000000000000001 R09: 0000000000000000 [ 35.833075] R10: 00000000000005f1 R11: 0000000000000246 R12: 0000000000000000 [ 35.833997] R13: 00007ffc9f2884c0 R14: 0000000000000001 R15: 0000000000674640 [ 35.834923] Code: 24 30 bb 01 00 00 00 45 31 f6 eb 5e 8b 50 08 83 c2 07 83 e2 fc 83 c2 70 49 8b 07 48 8b 40 70 48 85 c0 74 10 48 89 14 24 4c 89 ff <ff> d0 48 8b 14 24 48 01 c2 49 01 d6 45 85 ed 74 05 41 83 47 2c [ 35.837442] RIP: tcf_action_init+0x90/0x190 RSP: ffffb8edc068b9a0 [ 35.838291] ---[ end trace a095c06ee4b97a26 ]--- Fixes: d0f6dd8a914f ("net/sched: Introduce act_tunnel_key") Signed-off-by: Roman Mashak <mrv@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24ip6_vti: adjust vti mtu according to mtu of lower deviceAlexey Kodanev
[ Upstream commit 53c81e95df1793933f87748d36070a721f6cb287 ] LTP/udp6_ipsec_vti tests fail when sending large UDP datagrams over ip6_vti that require fragmentation and the underlying device has an MTU smaller than 1500 plus some extra space for headers. This happens because ip6_vti, by default, sets MTU to ETH_DATA_LEN and not updating it depending on a destination address or link parameter. Further attempts to send UDP packets may succeed because pmtu gets updated on ICMPV6_PKT_TOOBIG in vti6_err(). In case the lower device has larger MTU size, e.g. 9000, ip6_vti works but not using the possible maximum size, output packets have 1500 limit. The above cases require manual MTU setup after ip6_vti creation. However ip_vti already updates MTU based on lower device with ip_tunnel_bind_dev(). Here is the example when the lower device MTU is set to 9000: # ip a sh ltp_ns_veth2 ltp_ns_veth2@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 ... inet 10.0.0.2/24 scope global ltp_ns_veth2 inet6 fd00::2/64 scope global # ip li add vti6 type vti6 local fd00::2 remote fd00::1 # ip li show vti6 vti6@NONE: <POINTOPOINT,NOARP> mtu 1500 ... link/tunnel6 fd00::2 peer fd00::1 After the patch: # ip li add vti6 type vti6 local fd00::2 remote fd00::1 # ip li show vti6 vti6@NONE: <POINTOPOINT,NOARP> mtu 8832 ... link/tunnel6 fd00::2 peer fd00::1 Reported-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24netfilter: x_tables: unlock on error in xt_find_table_lock()Dan Carpenter
[ Upstream commit 7dde07e9c53617549d67dd3e1d791496d0d3868e ] According to my static checker we should unlock here before the return. That seems reasonable to me as well. Fixes" b9e69e127397 ("netfilter: xtables: don't hook tables by default") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24mac80211: Fix possible sband related NULL pointer de-referenceMohammed Shafi Shajakhan
[ Upstream commit 21a8e9dd52b64f0170bad208293ef8c30c3c1403 ] Existing API 'ieee80211_get_sdata_band' returns default 2 GHz band even if the channel context configuration is NULL. This crashes for chipsets which support 5 Ghz alone when it tries to access members of 'sband'. Channel context configuration can be NULL in multivif case and when channel switch is in progress (or) when it fails. Fix this by replacing the API 'ieee80211_get_sdata_band' with 'ieee80211_get_sband' which returns a NULL pointer for sband when the channel configuration is NULL. An example scenario is as below: In multivif mode (AP + STA) with drivers like ath10k, when we do a channel switch in the AP vif (which has a number of clients connected) and a STA vif which is connected to some other AP, when the channel switch in AP vif fails, while the STA vifs tries to connect to the other AP, there is a window where the channel context is NULL/invalid and this results in a crash while the clients connected to the AP vif tries to reconnect and this race is very similar to the one investigated by Michal in https://patchwork.kernel.org/patch/3788161/ and this does happens with hardware that supports 5Ghz alone after long hours of testing with continuous channel switch on the AP vif ieee80211 phy0: channel context reservation cannot be finalized because some interfaces aren't switching wlan0: failed to finalize CSA, disconnecting wlan0-1: deauthenticating from 8c:fd:f0:01:54:9c by local choice (Reason: 3=DEAUTH_LEAVING) WARNING: CPU: 1 PID: 19032 at net/mac80211/ieee80211_i.h:1013 sta_info_alloc+0x374/0x3fc [mac80211] [<bf77272c>] (sta_info_alloc [mac80211]) [<bf78776c>] (ieee80211_add_station [mac80211])) [<bf73cc50>] (nl80211_new_station [cfg80211]) Unable to handle kernel NULL pointer dereference at virtual address 00000014 pgd = d5f4c000 Internal error: Oops: 17 [#1] PREEMPT SMP ARM PC is at sta_info_alloc+0x380/0x3fc [mac80211] LR is at sta_info_alloc+0x37c/0x3fc [mac80211] [<bf772738>] (sta_info_alloc [mac80211]) [<bf78776c>] (ieee80211_add_station [mac80211]) [<bf73cc50>] (nl80211_new_station [cfg80211])) Cc: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabledPaolo Abeni
[ Upstream commit 1442f6f7c1b77de1c508318164a527e240c24a4d ] When creating a new ipvs service, ipv6 addresses are always accepted if CONFIG_IP_VS_IPV6 is enabled. On dest creation the address family is not explicitly checked. This allows the user-space to configure ipvs services even if the system is booted with ipv6.disable=1. On specific configuration, ipvs can try to call ipv6 routing code at setup time, causing the kernel to oops due to fib6_rules_ops being NULL. This change addresses the issue adding a check for the ipv6 module being enabled while validating ipv6 service operations and adding the same validation for dest operations. According to git history, this issue is apparently present since the introduction of ipv6 support, and the oops can be triggered since commit 09571c7ae30865ad ("IPVS: Add function to determine if IPv6 address is local") Fixes: 09571c7ae30865ad ("IPVS: Add function to determine if IPv6 address is local") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24mac80211: don't parse encrypted management frames in ieee80211_frame_ackedEmmanuel Grumbach
[ Upstream commit cf147085fdda044622973a12e4e06f1c753ab677 ] ieee80211_frame_acked is called when a frame is acked by the peer. In case this is a management frame, we check if this an SMPS frame, in which case we can update our antenna configuration. When we parse the management frame we look at the category in case it is an action frame. That byte sits after the IV in case the frame was encrypted. This means that if the frame was encrypted, we basically look at the IV instead of looking at the category. It is then theorically possible that we think that an SMPS action frame was acked where really we had another frame that was encrypted. Since the only management frame whose ack needs to be tracked is the SMPS action frame, and that frame is not a robust management frame, it will never be encrypted. The easiest way to fix this problem is then to not look at frames that were encrypted. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24xprtrdma: Cancel refresh worker during buffer shutdownChuck Lever
[ Upstream commit 9378b274e1eb6925db315e345f48850d2d5d9789 ] Trying to create MRs while the transport is being torn down can cause a crash. Fixes: e2ac236c0b65 ("xprtrdma: Allocate MRs on demand") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24netfilter: nft_dynset: continue to next expr if _OP_ADD succeededLiping Zhang
[ Upstream commit 277a292835c196894ef895d5e1fd6170bb916f55 ] Currently, after adding the following nft rules: # nft add set x target1 { type ipv4_addr \; flags timeout \;} # nft add rule x y set add ip daddr timeout 1d @target1 counter the counters will always be zero despite of the elements are added to the dynamic set "target1" or not, as we will break the nft expr traversal unconditionally: # nft list ruleset ... set target1 { ... elements = { 8.8.8.8 expires 23h59m53s} } chain output { ... set add ip daddr timeout 1d @target1 counter packets 0 bytes 0 ^ ^ ... } Since we add the elements to the set successfully, we should continue to the next expression. Additionally, if elements are added to "flow table" successfully, we will _always_ continue to the next expr, even if the operation is _OP_ADD. So it's better to keep them to be consistent. Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates") Reported-by: Robert White <rwhite@pobox.com> Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24tipc: check return value of nlmsg_newPan Bian
[ Upstream commit 78302fd405769c9a9379e9adda119d533dce2eed ] Function nlmsg_new() will return a NULL pointer if there is no enough memory, and its return value should be checked before it is used. However, in function tipc_nl_node_get_monitor(), the validation of the return value of function nlmsg_new() is missed. This patch fixes the bug. Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24netfilter: nf_ct_helper: permit cthelpers with different names via nfnetlinkLiping Zhang
[ Upstream commit 66e5a6b18bd09d0431e97cd3c162e76c5c2aebba ] cthelpers added via nfnetlink may have the same tuple, i.e. except for the l3proto and l4proto, other fields are all zero. So even with the different names, we will also fail to add them: # nfct helper add ssdp inet udp # nfct helper add tftp inet udp nfct v1.4.3: netlink error: File exists So in order to avoid unpredictable behaviour, we should: 1. cthelpers can be selected by nft ct helper obj or xt_CT target, so report error if duplicated { name, l3proto, l4proto } tuple exist. 2. cthelpers can be selected by nf_ct_tuple_src_mask_cmp when nf_ct_auto_assign_helper is enabled, so also report error if duplicated { l3proto, l4proto, src-port } tuple exist. Also note, if the cthelper is added from userspace, then the src-port will always be zero, it's invalid for nf_ct_auto_assign_helper, so there's no need to check the second point listed above. Fixes: 893e093c786c ("netfilter: nf_ct_helper: bail out on duplicated helpers") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24openvswitch: Delete conntrack entry clashing with an expectation.Jarno Rajahalme
[ Upstream commit cf5d70918877c6a6655dc1e92e2ebb661ce904fd ] Conntrack helpers do not check for a potentially clashing conntrack entry when creating a new expectation. Also, nf_conntrack_in() will check expectations (via init_conntrack()) only if a conntrack entry can not be found. The expectation for a packet which also matches an existing conntrack entry will not be removed by conntrack, and is currently handled inconsistently by OVS, as OVS expects the expectation to be removed when the connection tracking entry matching that expectation is confirmed. It should be noted that normally an IP stack would not allow reuse of a 5-tuple of an old (possibly lingering) connection for a new data connection, so this is somewhat unlikely corner case. However, it is possible that a misbehaving source could cause conntrack entries be created that could then interfere with new related connections. Fix this in the OVS module by deleting the clashing conntrack entry after an expectation has been matched. This causes the following nf_conntrack_in() call also find the expectation and remove it when creating the new conntrack entry, as well as the forthcoming reply direction packets to match the new related connection instead of the old clashing conntrack entry. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Yang Song <yangsong@vmware.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24netfilter: xt_CT: fix refcnt leak on error pathGao Feng
[ Upstream commit 470acf55a021713869b9bcc967268ac90c8a0fac ] There are two cases which causes refcnt leak. 1. When nf_ct_timeout_ext_add failed in xt_ct_set_timeout, it should free the timeout refcnt. Now goto the err_put_timeout error handler instead of going ahead. 2. When the time policy is not found, we should call module_put. Otherwise, the related cthelper module cannot be removed anymore. It is easy to reproduce by typing the following command: # iptables -t raw -A OUTPUT -p tcp -j CT --helper ftp --timeout xxx Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24tcp: remove poll() flakes with FastOpenEric Dumazet
[ Upstream commit 0f9fa831aecfc297b7b45d4f046759bcefcf87f0 ] When using TCP FastOpen for an active session, we send one wakeup event from tcp_finish_connect(), right before the data eventually contained in the received SYNACK is queued to sk->sk_receive_queue. This means that depending on machine load or luck, poll() users might receive POLLOUT events instead of POLLIN|POLLOUT To fix this, we need to move the call to sk->sk_state_change() after the (optional) call to tcp_rcv_fastopen_synack() Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24net: ipv6: send unsolicited NA on admin upDavid Ahern
[ Upstream commit 4a6e3c5def13c91adf2acc613837001f09af3baa ] ndisc_notify is the ipv6 equivalent to arp_notify. When arp_notify is set to 1, gratuitous arp requests are sent when the device is brought up. The same is expected when ndisc_notify is set to 1 (per ndisc_notify in Documentation/networking/ip-sysctl.txt). The NA is not sent on NETDEV_UP event; add it. Fixes: 5cb04436eef6 ("ipv6: add knob to send unsolicited ND on link-layer address change") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22mac80211: remove BUG() when interface type is invalidLuca Coelho
[ Upstream commit c7976f5272486e4ff406014c4b43e2fa3b70b052 ] In the ieee80211_setup_sdata() we check if the interface type is valid and, if not, call BUG(). This should never happen, but if there is something wrong with the code, it will not be caught until the bug happens when an interface is being set up. Calling BUG() is too extreme for this and a WARN_ON() would be better used instead. Change that. Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22net: xfrm: allow clearing socket xfrm policies.Lorenzo Colitti
[ Upstream commit be8f8284cd897af2482d4e54fbc2bdfc15557259 ] Currently it is possible to add or update socket policies, but not clear them. Therefore, once a socket policy has been applied, the socket cannot be used for unencrypted traffic. This patch allows (privileged) users to clear socket policies by passing in a NULL pointer and zero length argument to the {IP,IPV6}_{IPSEC,XFRM}_POLICY setsockopts. This results in both the incoming and outgoing policies being cleared. The simple approach taken in this patch cannot clear socket policies in only one direction. If desired this could be added in the future, for example by continuing to pass in a length of zero (which currently is guaranteed to return EMSGSIZE) and making the policy be a pointer to an integer that contains one of the XFRM_POLICY_{IN,OUT} enum values. An alternative would have been to interpret the length as a signed integer and use XFRM_POLICY_IN (i.e., 0) to clear the input policy and -XFRM_POLICY_OUT (i.e., -1) to clear the output policy. Tested: https://android-review.googlesource.com/539816 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22Bluetooth: 6lowpan: fix delay work init in add_peer_chan()Michael Scott
[ Upstream commit d2891c4d071d807f01cc911dc42a68f4568d65cf ] When adding 6lowpan devices very rapidly we sometimes see a crash: [23122.306615] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.0-43-arm64 #1 Debian 4.9.9.linaro.43-1 [23122.315400] Hardware name: HiKey Development Board (DT) [23122.320623] task: ffff800075443080 task.stack: ffff800075484000 [23122.326551] PC is at expire_timers+0x70/0x150 [23122.330907] LR is at run_timer_softirq+0xa0/0x1a0 [23122.335616] pc : [<ffff000008142dd8>] lr : [<ffff000008142f58>] pstate: 600001c5 This was due to add_peer_chan() unconditionally initializing the lowpan_btle_dev->notify_peers delayed work structure, even if the lowpan_btle_dev passed into add_peer_chan() had previously been initialized. Normally, this would go unnoticed as the delayed work timer is set for 100 msec, however when calling add_peer_chan() faster than 100 msec it clears out a previously queued delay work causing the crash above. To fix this, let add_peer_chan() know when a new lowpan_btle_dev is passed in so that it only performs the delay work initialization when needed. Signed-off-by: Michael Scott <michael.scott@linaro.org> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22Bluetooth: Avoid bt_accept_unlink() double unlinkingDean Jenkins
[ Upstream commit 27bfbc21a0c0f711fa5382de026c7c0700c9ea28 ] There is a race condition between a thread calling bt_accept_dequeue() and a different thread calling bt_accept_unlink(). Protection against concurrency is implemented using sk locking. However, sk locking causes serialisation of the bt_accept_dequeue() and bt_accept_unlink() threads. This serialisation can cause bt_accept_dequeue() to obtain the sk from the parent list but becomes blocked waiting for the sk lock held by the bt_accept_unlink() thread. bt_accept_unlink() unlinks sk and this thread releases the sk lock unblocking bt_accept_dequeue() which potentially runs bt_accept_unlink() again on the same sk causing a crash. The attempt to double unlink the same sk from the parent list can cause a NULL pointer dereference crash due to bt_sk(sk)->parent becoming NULL on the first unlink, followed by the second unlink trying to execute bt_sk(sk)->parent->sk_ack_backlog-- in bt_accept_unlink() which crashes. When sk is in the parent list, bt_sk(sk)->parent will be not be NULL. When sk is removed from the parent list, bt_sk(sk)->parent is set to NULL. Therefore, add a defensive check for bt_sk(sk)->parent not being NULL to ensure that sk is still in the parent list after the sk lock has been taken in bt_accept_dequeue(). If bt_sk(sk)->parent is detected as being NULL then restart the loop so that the loop variables are refreshed to use the latest values. This is necessary as list_for_each_entry_safe() is not thread safe so causing a risk of an infinite loop occurring as sk could point to itself. In addition, in bt_accept_dequeue() increase the sk reference count to protect against early freeing of sk. Early freeing can be possible if the bt_accept_unlink() thread calls l2cap_sock_kill() or rfcomm_sock_kill() functions before bt_accept_dequeue() gets the sk lock. For test purposes, the probability of failure can be increased by putting a msleep of 1 second in bt_accept_dequeue() between getting the sk and waiting for the sk lock. This exposes the fact that the loop list_for_each_entry_safe(p, n, &bt_sk(parent)->accept_q) is not safe from threads that unlink sk from the list in parallel with the loop which can cause sk to become stale within the loop. Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22sched: act_csum: don't mangle TCP and UDP GSO packetsDavide Caratti
[ Upstream commit add641e7dee31b36aee83412c29e39dd1f5e0c9c ] after act_csum computes the checksum on skbs carrying GSO TCP/UDP packets, subsequent segmentation fails because skb_needs_check(skb, true) returns true. Because of that, skb_warn_bad_offload() is invoked and the following message is displayed: WARNING: CPU: 3 PID: 28 at net/core/dev.c:2553 skb_warn_bad_offload+0xf0/0xfd <...> [<ffffffff8171f486>] skb_warn_bad_offload+0xf0/0xfd [<ffffffff8161304c>] __skb_gso_segment+0xec/0x110 [<ffffffff8161340d>] validate_xmit_skb+0x12d/0x2b0 [<ffffffff816135d2>] validate_xmit_skb_list+0x42/0x70 [<ffffffff8163c560>] sch_direct_xmit+0xd0/0x1b0 [<ffffffff8163c760>] __qdisc_run+0x120/0x270 [<ffffffff81613b3d>] __dev_queue_xmit+0x23d/0x690 [<ffffffff81613fa0>] dev_queue_xmit+0x10/0x20 Since GSO is able to compute checksum on individual segments of such skbs, we can simply skip mangling the packet. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22batman-adv: handle race condition for claims between gatewaysAndreas Pape
[ Upstream commit a3a5129e122709306cfa6409781716c2933df99b ] Consider the following situation which has been found in a test setup: Gateway B has claimed client C and gateway A has the same backbone network as B. C sends a broad- or multicast to B and directly after this packet decides to send another packet to A due to a better TQ value. B will forward the broad-/multicast into the backbone as it is the responsible gw and after that A will claim C as it has been chosen by C as the best gateway. If it now happens that A claims C before it has received the broad-/multicast forwarded by B (due to backbone topology or due to some delay in B when forwarding the packet) we get a critical situation: in the current code A will immediately unclaim C when receiving the multicast due to the roaming client scenario although the position of C has not changed in the mesh. If this happens the multi-/broadcast forwarded by B will be sent back into the mesh by A and we have looping packets until one of the gateways claims C again. In order to prevent this, unclaiming of a client due to the roaming client scenario is only done after a certain time is expired after the last claim of the client. 100 ms are used here, which should be slow enough for big backbones and slow gateways but fast enough not to break the roaming client use case. Acked-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Andreas Pape <apape@phoenixcontact.com> [sven@narfation.org: fix conflicts with current version] Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22net/8021q: create device with all possible features in wanted_featuresAndrey Vagin
[ Upstream commit 88997e4208aea117627898e5f6f9801cf3cd42d2 ] wanted_features is a set of features which have to be enabled if a hardware allows that. Currently when a vlan device is created, its wanted_features is set to current features of its base device. The problem is that the base device can get new features and they are not propagated to vlan-s of this device. If we look at bonding devices, they doesn't have this problem and this patch suggests to fix this issue by the same way how it works for bonding devices. We meet this problem, when we try to create a vlan device over a bonding device. When a system are booting, real devices require time to be initialized, so bonding devices created without slaves, then vlan devices are created and only then ethernet devices are added to the bonding device. As a result we have vlan devices with disabled scatter-gather. * create a bonding device $ ip link add bond0 type bond $ ethtool -k bond0 | grep scatter scatter-gather: off tx-scatter-gather: off [requested on] tx-scatter-gather-fraglist: off [requested on] * create a vlan device $ ip link add link bond0 name bond0.10 type vlan id 10 $ ethtool -k bond0.10 | grep scatter scatter-gather: off tx-scatter-gather: off tx-scatter-gather-fraglist: off * Add a slave device to bond0 $ ip link set dev eth0 master bond0 And now we can see that the bond0 device has got the scatter-gather feature, but the bond0.10 hasn't got it. [root@laptop linux-task-diag]# ethtool -k bond0 | grep scatter scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: on [root@laptop linux-task-diag]# ethtool -k bond0.10 | grep scatter scatter-gather: off tx-scatter-gather: off tx-scatter-gather-fraglist: off With this patch the vlan device will get all new features from the bonding device. Here is a call trace how features which are set in this patch reach dev->wanted_features. register_netdevice vlan_dev_init ... dev->hw_features = NETIF_F_HW_CSUM | NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_GSO_SOFTWARE | NETIF_F_HIGHDMA | NETIF_F_SCTP_CRC | NETIF_F_ALL_FCOE; dev->features |= dev->hw_features; ... dev->wanted_features = dev->features & dev->hw_features; __netdev_update_features(dev); vlan_dev_fix_features ... Cc: Alexey Kuznetsov <kuznet@virtuozzo.com> Cc: Patrick McHardy <kaber@trash.net> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22netem: apply correct delay when rate throttlingNik Unger
[ Upstream commit 5080f39e8c72e01cf37e8359023e7018e2a4901e ] I recently reported on the netem list that iperf network benchmarks show unexpected results when a bandwidth throttling rate has been configured for netem. Specifically: 1) The measured link bandwidth *increases* when a higher delay is added 2) The measured link bandwidth appears higher than the specified limit 3) The measured link bandwidth for the same very slow settings varies significantly across machines The issue can be reproduced by using tc to configure netem with a 512kbit rate and various (none, 1us, 50ms, 100ms, 200ms) delays on a veth pair between network namespaces, and then using iperf (or any other network benchmarking tool) to test throughput. Complete detailed instructions are in the original email chain here: https://lists.linuxfoundation.org/pipermail/netem/2017-February/001672.html There appear to be two underlying bugs causing these effects: - The first issue causes long delays when the rate is slow and no delay is configured (e.g., "rate 512kbit"). This is because SKBs are not orphaned when no delay is configured, so orphaning does not occur until *after* the rate-induced delay has been applied. For this reason, adding a tiny delay (e.g., "rate 512kbit delay 1us") dramatically increases the measured bandwidth. - The second issue is that rate-induced delays are not correctly applied, allowing SKB delays to occur in parallel. The indended approach is to compute the delay for an SKB and to add this delay to the end of the current queue. However, the code does not detect existing SKBs in the queue due to improperly testing sch->q.qlen, which is nonzero even when packets exist only in the rbtree. Consequently, new SKBs do not wait for the current queue to empty. When packet delays vary significantly (e.g., if packet sizes are different), then this also causes unintended reordering. I modified the code to expect a delay (and orphan the SKB) when a rate is configured. I also added some defensive tests that correctly find the latest scheduled delivery time, even if it is (unexpectedly) for a packet in sch->q. I have tested these changes on the latest kernel (4.11.0-rc1+) and the iperf / ping test results are as expected. Signed-off-by: Nik Unger <njunger@uwaterloo.ca> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-18netfilter: x_tables: pack percpu counter allocationsFlorian Westphal
commit ae0ac0ed6fcf5af3be0f63eb935f483f44a402d2 upstream. instead of allocating each xt_counter individually, allocate 4k chunks and then use these for counter allocation requests. This should speed up rule evaluation by increasing data locality, also speeds up ruleset loading because we reduce calls to the percpu allocator. As Eric points out we can't use PAGE_SIZE, page_allocator would fail on arches with 64k page size. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-18netfilter: x_tables: pass xt_counters struct to counter allocatorFlorian Westphal
commit f28e15bacedd444608e25421c72eb2cf4527c9ca upstream. Keeps some noise away from a followup patch. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-18netfilter: x_tables: pass xt_counters struct instead of packet counterFlorian Westphal
commit 4d31eef5176df06f218201bc9c0ce40babb41660 upstream. On SMP we overload the packet counter (unsigned long) to contain percpu offset. Hide this from callers and pass xt_counters address instead. Preparation patch to allocate the percpu counters in page-sized batch chunks. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>