summaryrefslogtreecommitdiff
path: root/drivers/net/bonding
AgeCommit message (Collapse)Author
2020-03-20bonding/alb: make sure arp header is pulled before accessing itEric Dumazet
commit b7469e83d2add567e4e0b063963db185f3167cea upstream. Similar to commit 38f88c454042 ("bonding/alb: properly access headers in bond_alb_xmit()"), we need to make sure arp header was pulled in skb->head before blindly accessing it in rlb_arp_xmit(). Remove arp_pkt() private helper, since it is more readable/obvious to have the following construct back to back : if (!pskb_network_may_pull(skb, sizeof(*arp))) return NULL; arp = (struct arp_pkt *)skb_network_header(skb); syzbot reported : BUG: KMSAN: uninit-value in bond_slave_has_mac_rx include/net/bonding.h:704 [inline] BUG: KMSAN: uninit-value in rlb_arp_xmit drivers/net/bonding/bond_alb.c:662 [inline] BUG: KMSAN: uninit-value in bond_alb_xmit+0x575/0x25e0 drivers/net/bonding/bond_alb.c:1477 CPU: 0 PID: 12743 Comm: syz-executor.4 Not tainted 5.6.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 bond_slave_has_mac_rx include/net/bonding.h:704 [inline] rlb_arp_xmit drivers/net/bonding/bond_alb.c:662 [inline] bond_alb_xmit+0x575/0x25e0 drivers/net/bonding/bond_alb.c:1477 __bond_start_xmit drivers/net/bonding/bond_main.c:4257 [inline] bond_start_xmit+0x85d/0x2f70 drivers/net/bonding/bond_main.c:4282 __netdev_start_xmit include/linux/netdevice.h:4524 [inline] netdev_start_xmit include/linux/netdevice.h:4538 [inline] xmit_one net/core/dev.c:3470 [inline] dev_hard_start_xmit+0x531/0xab0 net/core/dev.c:3486 __dev_queue_xmit+0x37de/0x4220 net/core/dev.c:4063 dev_queue_xmit+0x4b/0x60 net/core/dev.c:4096 packet_snd net/packet/af_packet.c:2967 [inline] packet_sendmsg+0x8347/0x93b0 net/packet/af_packet.c:2992 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg net/socket.c:672 [inline] __sys_sendto+0xc1b/0xc50 net/socket.c:1998 __do_sys_sendto net/socket.c:2010 [inline] __se_sys_sendto+0x107/0x130 net/socket.c:2006 __x64_sys_sendto+0x6e/0x90 net/socket.c:2006 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45c479 Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fc77ffbbc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007fc77ffbc6d4 RCX: 000000000045c479 RDX: 000000000000000e RSI: 00000000200004c0 RDI: 0000000000000003 RBP: 000000000076bf20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 0000000000000a04 R14: 00000000004cc7b0 R15: 000000000076bf2c Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline] kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127 kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:82 slab_alloc_node mm/slub.c:2793 [inline] __kmalloc_node_track_caller+0xb40/0x1200 mm/slub.c:4401 __kmalloc_reserve net/core/skbuff.c:142 [inline] __alloc_skb+0x2fd/0xac0 net/core/skbuff.c:210 alloc_skb include/linux/skbuff.h:1051 [inline] alloc_skb_with_frags+0x18c/0xa70 net/core/skbuff.c:5766 sock_alloc_send_pskb+0xada/0xc60 net/core/sock.c:2242 packet_alloc_skb net/packet/af_packet.c:2815 [inline] packet_snd net/packet/af_packet.c:2910 [inline] packet_sendmsg+0x66a0/0x93b0 net/packet/af_packet.c:2992 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg net/socket.c:672 [inline] __sys_sendto+0xc1b/0xc50 net/socket.c:1998 __do_sys_sendto net/socket.c:2010 [inline] __se_sys_sendto+0x107/0x130 net/socket.c:2006 __x64_sys_sendto+0x6e/0x90 net/socket.c:2006 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-14bonding/alb: properly access headers in bond_alb_xmit()Eric Dumazet
[ Upstream commit 38f88c45404293bbc027b956def6c10cbd45c616 ] syzbot managed to send an IPX packet through bond_alb_xmit() and af_packet and triggered a use-after-free. First, bond_alb_xmit() was using ipx_hdr() helper to reach the IPX header, but ipx_hdr() was using the transport offset instead of the network offset. In the particular syzbot report transport offset was 0xFFFF This patch removes ipx_hdr() since it was only (mis)used from bonding. Then we need to make sure IPv4/IPv6/IPX headers are pulled in skb->head before dereferencing anything. BUG: KASAN: use-after-free in bond_alb_xmit+0x153a/0x1590 drivers/net/bonding/bond_alb.c:1452 Read of size 2 at addr ffff8801ce56dfff by task syz-executor.2/18108 (if (ipx_hdr(skb)->ipx_checksum != IPX_NO_CHECKSUM) ...) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: [<ffffffff8441fc42>] __dump_stack lib/dump_stack.c:17 [inline] [<ffffffff8441fc42>] dump_stack+0x14d/0x20b lib/dump_stack.c:53 [<ffffffff81a7dec4>] print_address_description+0x6f/0x20b mm/kasan/report.c:282 [<ffffffff81a7e0ec>] kasan_report_error mm/kasan/report.c:380 [inline] [<ffffffff81a7e0ec>] kasan_report mm/kasan/report.c:438 [inline] [<ffffffff81a7e0ec>] kasan_report.cold+0x8c/0x2a0 mm/kasan/report.c:422 [<ffffffff81a7dc4f>] __asan_report_load_n_noabort+0xf/0x20 mm/kasan/report.c:469 [<ffffffff82c8c00a>] bond_alb_xmit+0x153a/0x1590 drivers/net/bonding/bond_alb.c:1452 [<ffffffff82c60c74>] __bond_start_xmit drivers/net/bonding/bond_main.c:4199 [inline] [<ffffffff82c60c74>] bond_start_xmit+0x4f4/0x1570 drivers/net/bonding/bond_main.c:4224 [<ffffffff83baa558>] __netdev_start_xmit include/linux/netdevice.h:4525 [inline] [<ffffffff83baa558>] netdev_start_xmit include/linux/netdevice.h:4539 [inline] [<ffffffff83baa558>] xmit_one net/core/dev.c:3611 [inline] [<ffffffff83baa558>] dev_hard_start_xmit+0x168/0x910 net/core/dev.c:3627 [<ffffffff83bacf35>] __dev_queue_xmit+0x1f55/0x33b0 net/core/dev.c:4238 [<ffffffff83bae3a8>] dev_queue_xmit+0x18/0x20 net/core/dev.c:4278 [<ffffffff84339189>] packet_snd net/packet/af_packet.c:3226 [inline] [<ffffffff84339189>] packet_sendmsg+0x4919/0x70b0 net/packet/af_packet.c:3252 [<ffffffff83b1ac0c>] sock_sendmsg_nosec net/socket.c:673 [inline] [<ffffffff83b1ac0c>] sock_sendmsg+0x12c/0x160 net/socket.c:684 [<ffffffff83b1f5a2>] __sys_sendto+0x262/0x380 net/socket.c:1996 [<ffffffff83b1f700>] SYSC_sendto net/socket.c:2008 [inline] [<ffffffff83b1f700>] SyS_sendto+0x40/0x60 net/socket.c:2004 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-12bonding: fix unexpected IFF_BONDING bit unsetTaehee Yoo
[ Upstream commit 65de65d9033750d2cf1b336c9d6e9da3a8b5cc6e ] The IFF_BONDING means bonding master or bonding slave device. ->ndo_add_slave() sets IFF_BONDING flag and ->ndo_del_slave() unsets IFF_BONDING flag. bond0<--bond1 Both bond0 and bond1 are bonding device and these should keep having IFF_BONDING flag until they are removed. But bond1 would lose IFF_BONDING at ->ndo_del_slave() because that routine do not check whether the slave device is the bonding type or not. This patch adds the interface type check routine before removing IFF_BONDING flag. Test commands: ip link add bond0 type bond ip link add bond1 type bond ip link set bond1 master bond0 ip link set bond1 nomaster ip link del bond1 type bond ip link add bond1 type bond Splat looks like: [ 226.665555] proc_dir_entry 'bonding/bond1' already registered [ 226.666440] WARNING: CPU: 0 PID: 737 at fs/proc/generic.c:361 proc_register+0x2a9/0x3e0 [ 226.667571] Modules linked in: bonding af_packet sch_fq_codel ip_tables x_tables unix [ 226.668662] CPU: 0 PID: 737 Comm: ip Not tainted 5.4.0-rc3+ #96 [ 226.669508] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 226.670652] RIP: 0010:proc_register+0x2a9/0x3e0 [ 226.671612] Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 39 01 00 00 48 8b 04 24 48 89 ea 48 c7 c7 a0 0b 14 9f 48 8b b0 e 0 00 00 00 e8 07 e7 88 ff <0f> 0b 48 c7 c7 40 2d a5 9f e8 59 d6 23 01 48 8b 4c 24 10 48 b8 00 [ 226.675007] RSP: 0018:ffff888050e17078 EFLAGS: 00010282 [ 226.675761] RAX: dffffc0000000008 RBX: ffff88805fdd0f10 RCX: ffffffff9dd344e2 [ 226.676757] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88806c9f6b8c [ 226.677751] RBP: ffff8880507160f3 R08: ffffed100d940019 R09: ffffed100d940019 [ 226.678761] R10: 0000000000000001 R11: ffffed100d940018 R12: ffff888050716008 [ 226.679757] R13: ffff8880507160f2 R14: dffffc0000000000 R15: ffffed100a0e2c1e [ 226.680758] FS: 00007fdc217cc0c0(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000 [ 226.681886] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 226.682719] CR2: 00007f49313424d0 CR3: 0000000050e46001 CR4: 00000000000606f0 [ 226.683727] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 226.684725] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 226.685681] Call Trace: [ 226.687089] proc_create_seq_private+0xb3/0xf0 [ 226.687778] bond_create_proc_entry+0x1b3/0x3f0 [bonding] [ 226.691458] bond_netdev_event+0x433/0x970 [bonding] [ 226.692139] ? __module_text_address+0x13/0x140 [ 226.692779] notifier_call_chain+0x90/0x160 [ 226.693401] register_netdevice+0x9b3/0xd80 [ 226.694010] ? alloc_netdev_mqs+0x854/0xc10 [ 226.694629] ? netdev_change_features+0xa0/0xa0 [ 226.695278] ? rtnl_create_link+0x2ed/0xad0 [ 226.695849] bond_newlink+0x2a/0x60 [bonding] [ 226.696422] __rtnl_newlink+0xb9f/0x11b0 [ 226.696968] ? rtnl_link_unregister+0x220/0x220 [ ... ] Fixes: 0b680e753724 ("[PATCH] bonding: Add priv_flag to avoid event mishandling") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06bonding: fix potential NULL deref in bond_update_slave_arrEric Dumazet
commit a7137534b597b7c303203e6bc3ed87e87a273bb8 upstream. syzbot got a NULL dereference in bond_update_slave_arr() [1], happening after a failure to allocate bond->slave_arr A workqueue (bond_slave_arr_handler) is supposed to retry the allocation later, but if the slave is removed before the workqueue had a chance to complete, bond->slave_arr can still be NULL. [1] Failed to build slave-array. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI Modules linked in: Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:bond_update_slave_arr.cold+0xc6/0x198 drivers/net/bonding/bond_main.c:4039 RSP: 0018:ffff88018fe33678 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc9000290b000 RDX: 0000000000000000 RSI: ffffffff82b63037 RDI: ffff88019745ea20 RBP: ffff88018fe33760 R08: ffff880170754280 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff88019745ea00 R14: 0000000000000000 R15: ffff88018fe338b0 FS: 00007febd837d700(0000) GS:ffff8801dad00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004540a0 CR3: 00000001c242e005 CR4: 00000000001626f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: [<ffffffff82b5b45e>] __bond_release_one+0x43e/0x500 drivers/net/bonding/bond_main.c:1923 [<ffffffff82b5b966>] bond_release drivers/net/bonding/bond_main.c:2039 [inline] [<ffffffff82b5b966>] bond_do_ioctl+0x416/0x870 drivers/net/bonding/bond_main.c:3562 [<ffffffff83ae25f4>] dev_ifsioc+0x6f4/0x940 net/core/dev_ioctl.c:328 [<ffffffff83ae2e58>] dev_ioctl+0x1b8/0xc70 net/core/dev_ioctl.c:495 [<ffffffff83995ffd>] sock_do_ioctl+0x1bd/0x300 net/socket.c:1088 [<ffffffff83996a80>] sock_ioctl+0x300/0x5d0 net/socket.c:1196 [<ffffffff81b124db>] vfs_ioctl fs/ioctl.c:47 [inline] [<ffffffff81b124db>] file_ioctl fs/ioctl.c:501 [inline] [<ffffffff81b124db>] do_vfs_ioctl+0xacb/0x1300 fs/ioctl.c:688 [<ffffffff81b12dc6>] SYSC_ioctl fs/ioctl.c:705 [inline] [<ffffffff81b12dc6>] SyS_ioctl+0xb6/0xe0 fs/ioctl.c:696 [<ffffffff8101ccc8>] do_syscall_64+0x528/0x770 arch/x86/entry/common.c:305 [<ffffffff84400091>] entry_SYSCALL_64_after_hwframe+0x42/0xb7 Fixes: ee6377147409 ("bonding: Simplify the xmit function for modes that use xmit_hash") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-06bonding: Force slave speed check after link state recovery for 802.3adThomas Falcon
[ Upstream commit 12185dfe44360f814ac4ead9d22ad2af7511b2e9 ] The following scenario was encountered during testing of logical partition mobility on pseries partitions with bonded ibmvnic adapters in LACP mode. 1. Driver receives a signal that the device has been swapped, and it needs to reset to initialize the new device. 2. Driver reports loss of carrier and begins initialization. 3. Bonding driver receives NETDEV_CHANGE notifier and checks the slave's current speed and duplex settings. Because these are unknown at the time, the bond sets its link state to BOND_LINK_FAIL and handles the speed update, clearing AD_PORT_LACP_ENABLE. 4. Driver finishes recovery and reports that the carrier is on. 5. Bond receives a new notification and checks the speed again. The speeds are valid but miimon has not altered the link state yet. AD_PORT_LACP_ENABLE remains off. Because the slave's link state is still BOND_LINK_FAIL, no further port checks are made when it recovers. Though the slave devices are operational and have valid speed and duplex settings, the bond will not send LACPDU's. The simplest fix I can see is to force another speed check in bond_miimon_commit. This way the bond will update AD_PORT_LACP_ENABLE if needed when transitioning from BOND_LINK_FAIL to BOND_LINK_UP. CC: Jarod Wilson <jarod@redhat.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-25bonding: Add vlan tx offload to hw_enc_featuresYueHaibing
[ Upstream commit d595b03de2cb0bdf9bcdf35ff27840cc3a37158f ] As commit 30d8177e8ac7 ("bonding: Always enable vlan tx offload") said, we should always enable bonding's vlan tx offload, pass the vlan packets to the slave devices with vlan tci, let them to handle vlan implementation. Now if encapsulation protocols like VXLAN is used, skb->encapsulation may be set, then the packet is passed to vlan device which based on bonding device. However in netif_skb_features(), the check of hw_enc_features: if (skb->encapsulation) features &= dev->hw_enc_features; clears NETIF_F_HW_VLAN_CTAG_TX/NETIF_F_HW_VLAN_STAG_TX. This results in same issue in commit 30d8177e8ac7 like this: vlan_dev_hard_start_xmit -->dev_queue_xmit -->validate_xmit_skb -->netif_skb_features //NETIF_F_HW_VLAN_CTAG_TX is cleared -->validate_xmit_vlan -->__vlan_hwaccel_push_inside //skb->tci is cleared ... --> bond_start_xmit --> bond_xmit_hash //BOND_XMIT_POLICY_ENCAP34 --> __skb_flow_dissect // nhoff point to IP header --> case htons(ETH_P_8021Q) // skb_vlan_tag_present is false, so vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan), //vlan point to ip header wrongly Fixes: b2a103e6d0af ("bonding: convert to ndo_fix_features") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-04bonding: validate ip header before check IPPROTO_IGMPCong Wang
[ Upstream commit 9d1bc24b52fb8c5d859f9a47084bf1179470e04c ] bond_xmit_roundrobin() checks for IGMP packets but it parses the IP header even before checking skb->protocol. We should validate the IP header with pskb_may_pull() before using iph->protocol. Reported-and-tested-by: syzbot+e5be16aa39ad6e755391@syzkaller.appspotmail.com Fixes: a2fd940f4cff ("bonding: fix broken multicast with round-robin mode") Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10bonding: Always enable vlan tx offloadYueHaibing
[ Upstream commit 30d8177e8ac776d89d387fad547af6a0f599210e ] We build vlan on top of bonding interface, which vlan offload is off, bond mode is 802.3ad (LACP) and xmit_hash_policy is BOND_XMIT_POLICY_ENCAP34. Because vlan tx offload is off, vlan tci is cleared and skb push the vlan header in validate_xmit_vlan() while sending from vlan devices. Then in bond_xmit_hash, __skb_flow_dissect() fails to get information from protocol headers encapsulated within vlan, because 'nhoff' is points to IP header, so bond hashing is based on layer 2 info, which fails to distribute packets across slaves. This patch always enable bonding's vlan tx offload, pass the vlan packets to the slave devices with vlan tci, let them to handle vlan implementation. Fixes: 278339a42a1b ("bonding: propogate vlan_features to bonding master") Suggested-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-16bonding: fix arp_validate toggling in active-backup modeJarod Wilson
[ Upstream commit a9b8a2b39ce65df45687cf9ef648885c2a99fe75 ] There's currently a problem with toggling arp_validate on and off with an active-backup bond. At the moment, you can start up a bond, like so: modprobe bonding mode=1 arp_interval=100 arp_validate=0 arp_ip_targets=192.168.1.1 ip link set bond0 down echo "ens4f0" > /sys/class/net/bond0/bonding/slaves echo "ens4f1" > /sys/class/net/bond0/bonding/slaves ip link set bond0 up ip addr add 192.168.1.2/24 dev bond0 Pings to 192.168.1.1 work just fine. Now turn on arp_validate: echo 1 > /sys/class/net/bond0/bonding/arp_validate Pings to 192.168.1.1 continue to work just fine. Now when you go to turn arp_validate off again, the link falls flat on it's face: echo 0 > /sys/class/net/bond0/bonding/arp_validate dmesg ... [133191.911987] bond0: Setting arp_validate to none (0) [133194.257793] bond0: bond_should_notify_peers: slave ens4f0 [133194.258031] bond0: link status definitely down for interface ens4f0, disabling it [133194.259000] bond0: making interface ens4f1 the new active one [133197.330130] bond0: link status definitely down for interface ens4f1, disabling it [133197.331191] bond0: now running without any active interface! The problem lies in bond_options.c, where passing in arp_validate=0 results in bond->recv_probe getting set to NULL. This flies directly in the face of commit 3fe68df97c7f, which says we need to set recv_probe = bond_arp_recv, even if we're not using arp_validate. Said commit fixed this in bond_option_arp_interval_set, but missed that we can get to that same state in bond_option_arp_validate_set as well. One solution would be to universally set recv_probe = bond_arp_recv here as well, but I don't think bond_option_arp_validate_set has any business touching recv_probe at all, and that should be left to the arp_interval code, so we can just make things much tidier here. Fixes: 3fe68df97c7f ("bonding: always set recv_probe to bond_arp_rcv in arp monitor") CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-08bonding: show full hw address in sysfs for slave entriesKonstantin Khorenko
[ Upstream commit 18bebc6dd3281955240062655a4df35eef2c46b3 ] Bond expects ethernet hwaddr for its slave, but it can be longer than 6 bytes - infiniband interface for example. # cat /sys/devices/<skipped>/net/ib0/address 80:00:02:08:fe:80:00:00:00:00:00:00:7c:fe:90:03:00:be:5d:e1 # cat /sys/devices/<skipped>/net/ib0/bonding_slave/perm_hwaddr 80:00:02:08:fe:80 So print full hwaddr in sysfs "bonding_slave/perm_hwaddr" as well. Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-27bonding: fix event handling for stacked bondsSabrina Dubroca
[ Upstream commit 92480b3977fd3884649d404cbbaf839b70035699 ] When a bond is enslaved to another bond, bond_netdev_event() only handles the event as if the bond is a master, and skips treating the bond as a slave. This leads to a refcount leak on the slave, since we don't remove the adjacency to its master and the master holds a reference on the slave. Reproducer: ip link add bondL type bond ip link add bondU type bond ip link set bondL master bondU ip link del bondL No "Fixes:" tag, this code is older than git history. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-23bonding: update nest level on unlinkWillem de Bruijn
[ Upstream commit 001e465f09a18857443489a57e74314a3368c805 ] A network device stack with multiple layers of bonding devices can trigger a false positive lockdep warning. Adding lockdep nest levels fixes this. Update the level on both enslave and unlink, to avoid the following series of events .. ip netns add test ip netns exec test bash ip link set dev lo addr 00:11:22:33:44:55 ip link set dev lo down ip link add dev bond1 type bond ip link add dev bond2 type bond ip link set dev lo master bond1 ip link set dev bond1 master bond2 ip link set dev bond1 nomaster ip link set dev bond2 master bond1 .. from still generating a splat: [ 193.652127] ====================================================== [ 193.658231] WARNING: possible circular locking dependency detected [ 193.664350] 4.20.0 #8 Not tainted [ 193.668310] ------------------------------------------------------ [ 193.674417] ip/15577 is trying to acquire lock: [ 193.678897] 00000000a40e3b69 (&(&bond->stats_lock)->rlock#3/3){+.+.}, at: bond_get_stats+0x58/0x290 [ 193.687851] but task is already holding lock: [ 193.693625] 00000000807b9d9f (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0x58/0x290 [..] [ 193.851092] lock_acquire+0xa7/0x190 [ 193.855138] _raw_spin_lock_nested+0x2d/0x40 [ 193.859878] bond_get_stats+0x58/0x290 [ 193.864093] dev_get_stats+0x5a/0xc0 [ 193.868140] bond_get_stats+0x105/0x290 [ 193.872444] dev_get_stats+0x5a/0xc0 [ 193.876493] rtnl_fill_stats+0x40/0x130 [ 193.880797] rtnl_fill_ifinfo+0x6c5/0xdc0 [ 193.885271] rtmsg_ifinfo_build_skb+0x86/0xe0 [ 193.890091] rtnetlink_event+0x5b/0xa0 [ 193.894320] raw_notifier_call_chain+0x43/0x60 [ 193.899225] netdev_change_features+0x50/0xa0 [ 193.904044] bond_compute_features.isra.46+0x1ab/0x270 [ 193.909640] bond_enslave+0x141d/0x15b0 [ 193.913946] do_set_master+0x89/0xa0 [ 193.918016] do_setlink+0x37c/0xda0 [ 193.921980] __rtnl_newlink+0x499/0x890 [ 193.926281] rtnl_newlink+0x48/0x70 [ 193.930238] rtnetlink_rcv_msg+0x171/0x4b0 [ 193.934801] netlink_rcv_skb+0xd1/0x110 [ 193.939103] rtnetlink_rcv+0x15/0x20 [ 193.943151] netlink_unicast+0x3b5/0x520 [ 193.947544] netlink_sendmsg+0x2fd/0x3f0 [ 193.951942] sock_sendmsg+0x38/0x50 [ 193.955899] ___sys_sendmsg+0x2ba/0x2d0 [ 193.960205] __x64_sys_sendmsg+0xad/0x100 [ 193.964687] do_syscall_64+0x5a/0x460 [ 193.968823] entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 7e2556e40026 ("bonding: avoid lockdep confusion in bond_get_stats()") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-21bonding: fix 802.3ad state sent to partner when unbinding slaveToni Peltonen
[ Upstream commit 3b5b3a3331d141e8f2a7aaae3a94dfa1e61ecbe4 ] Previously when unbinding a slave the 802.3ad implementation only told partner that the port is not suitable for aggregation by setting the port aggregation state from aggregatable to individual. This is not enough. If the physical layer still stays up and we only unbinded this port from the bond there is nothing in the aggregation status alone to prevent the partner from sending traffic towards us. To ensure that the partner doesn't consider this port at all anymore we should also disable collecting and distributing to signal that this actor is going away. Also clear AD_STATE_SYNCHRONIZATION to ensure partner exits collecting + distributing state. I have tested this behaviour againts Arista EOS switches with mlx5 cards (physical link stays up even when interface is down) and simulated the same situation virtually Linux <-> Linux with two network namespaces running two veth device pairs. In both cases setting aggregation to individual doesn't alone prevent traffic from being to sent towards this port given that the link stays up in partners end. Partner still keeps it's end in collecting + distributing state and continues until timeout is reached. In most cases this means we are losing the traffic partner sends towards our port while we wait for timeout. This is most visible with slow periodic time (LACP rate slow). Other open source implementations like Open VSwitch and libreswitch, and vendor implementations like Arista EOS, seem to disable collecting + distributing to when doing similar port disabling/detaching/removing change. With this patch kernel implementation would behave the same way and ensure partner doesn't consider our actor viable anymore. Signed-off-by: Toni Peltonen <peltzi@peltzi.fi> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-10bonding: fix length of actor systemTobias Jungel
[ Upstream commit 414dd6fb9a1a1b59983aea7bf0f79f0085ecc5b8 ] The attribute IFLA_BOND_AD_ACTOR_SYSTEM is sent to user space having the length of sizeof(bond->params.ad_actor_system) which is 8 byte. This patch aligns the length to ETH_ALEN to have the same MAC address exposed as using sysfs. Fixes: f87fda00b6ed2 ("bonding: prevent out of bound accesses") Signed-off-by: Tobias Jungel <tobias.jungel@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-10bonding: ratelimit failed speed/duplex update warningAndreas Born
[ Upstream commit 11e9d7829dd08dbafb24517fe922f11c3a8a9dc2 ] bond_miimon_commit() handles the UP transition for each slave of a bond in the case of MII. It is triggered 10 times per second for the default MII Polling interval of 100ms. For device drivers that do not implement __ethtool_get_link_ksettings() the call to bond_update_speed_duplex() fails persistently while the MII status could remain UP. That is, in this and other cases where the speed/duplex update keeps failing over a longer period of time while the MII state is UP, a warning is printed every MII polling interval. To address these excessive warnings net_ratelimit() should be used. Printing a warning once would not be sufficient since the call to bond_update_speed_duplex() could recover to succeed and fail again later. In that case there would be no new indication what went wrong. Fixes: b5bf0f5b16b9c (bonding: correctly update link status during mii-commit phase) Signed-off-by: Andreas Born <futur.andy@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-10-18bonding: avoid possible dead-lockMahesh Bandewar
[ Upstream commit d4859d749aa7090ffb743d15648adb962a1baeae ] Syzkaller reported this on a slightly older kernel but it's still applicable to the current kernel - ====================================================== WARNING: possible circular locking dependency detected 4.18.0-next-20180823+ #46 Not tainted ------------------------------------------------------ syz-executor4/26841 is trying to acquire lock: 00000000dd41ef48 ((wq_completion)bond_dev->name){+.+.}, at: flush_workqueue+0x2db/0x1e10 kernel/workqueue.c:2652 but task is already holding lock: 00000000768ab431 (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline] 00000000768ab431 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4708 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (rtnl_mutex){+.+.}: __mutex_lock_common kernel/locking/mutex.c:925 [inline] __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088 rtnl_lock+0x17/0x20 net/core/rtnetlink.c:77 bond_netdev_notify drivers/net/bonding/bond_main.c:1310 [inline] bond_netdev_notify_work+0x44/0xd0 drivers/net/bonding/bond_main.c:1320 process_one_work+0xc73/0x1aa0 kernel/workqueue.c:2153 worker_thread+0x189/0x13c0 kernel/workqueue.c:2296 kthread+0x35a/0x420 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415 -> #1 ((work_completion)(&(&nnw->work)->work)){+.+.}: process_one_work+0xc0b/0x1aa0 kernel/workqueue.c:2129 worker_thread+0x189/0x13c0 kernel/workqueue.c:2296 kthread+0x35a/0x420 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415 -> #0 ((wq_completion)bond_dev->name){+.+.}: lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901 flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655 drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820 destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155 __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138 bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734 register_netdevice+0x337/0x1100 net/core/dev.c:8410 bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453 rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099 rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343 netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:632 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115 __sys_sendmsg+0x11d/0x290 net/socket.c:2153 __do_sys_sendmsg net/socket.c:2162 [inline] __se_sys_sendmsg net/socket.c:2160 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Chain exists of: (wq_completion)bond_dev->name --> (work_completion)(&(&nnw->work)->work) --> rtnl_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(rtnl_mutex); lock((work_completion)(&(&nnw->work)->work)); lock(rtnl_mutex); lock((wq_completion)bond_dev->name); *** DEADLOCK *** 1 lock held by syz-executor4/26841: stack backtrace: CPU: 1 PID: 26841 Comm: syz-executor4 Not tainted 4.18.0-next-20180823+ #46 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 print_circular_bug.isra.34.cold.55+0x1bd/0x27d kernel/locking/lockdep.c:1222 check_prev_add kernel/locking/lockdep.c:1862 [inline] check_prevs_add kernel/locking/lockdep.c:1975 [inline] validate_chain kernel/locking/lockdep.c:2416 [inline] __lock_acquire+0x3449/0x5020 kernel/locking/lockdep.c:3412 lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901 flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655 drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820 destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155 __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138 bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734 register_netdevice+0x337/0x1100 net/core/dev.c:8410 bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453 rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099 rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343 netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:632 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115 __sys_sendmsg+0x11d/0x290 net/socket.c:2153 __do_sys_sendmsg net/socket.c:2162 [inline] __se_sys_sendmsg net/socket.c:2160 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457089 Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f2df20a5c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f2df20a66d4 RCX: 0000000000457089 RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003 RBP: 0000000000930140 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000004d40b8 R14: 00000000004c8ad8 R15: 0000000000000001 Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-06bonding: avoid lockdep confusion in bond_get_stats()Eric Dumazet
[ Upstream commit 7e2556e40026a1b0c16f37446ab398d5a5a892e4 ] syzbot found that the following sequence produces a LOCKDEP splat [1] ip link add bond10 type bond ip link add bond11 type bond ip link set bond11 master bond10 To fix this, we can use the already provided nest_level. This patch also provides correct nesting for dev->addr_list_lock [1] WARNING: possible recursive locking detected 4.18.0-rc6+ #167 Not tainted -------------------------------------------- syz-executor751/4439 is trying to acquire lock: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline] (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426 but task is already holding lock: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline] (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&bond->stats_lock)->rlock); lock(&(&bond->stats_lock)->rlock); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by syz-executor751/4439: #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20 net/core/rtnetlink.c:77 #1: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline] #1: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426 #2: (____ptrval____) (rcu_read_lock){....}, at: bond_get_stats+0x0/0x560 include/linux/compiler.h:215 stack backtrace: CPU: 0 PID: 4439 Comm: syz-executor751 Not tainted 4.18.0-rc6+ #167 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 print_deadlock_bug kernel/locking/lockdep.c:1765 [inline] check_deadlock kernel/locking/lockdep.c:1809 [inline] validate_chain kernel/locking/lockdep.c:2405 [inline] __lock_acquire.cold.64+0x1fb/0x486 kernel/locking/lockdep.c:3435 lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:144 spin_lock include/linux/spinlock.h:310 [inline] bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426 dev_get_stats+0x10f/0x470 net/core/dev.c:8316 bond_get_stats+0x232/0x560 drivers/net/bonding/bond_main.c:3432 dev_get_stats+0x10f/0x470 net/core/dev.c:8316 rtnl_fill_stats+0x4d/0xac0 net/core/rtnetlink.c:1169 rtnl_fill_ifinfo+0x1aa6/0x3fb0 net/core/rtnetlink.c:1611 rtmsg_ifinfo_build_skb+0xc8/0x190 net/core/rtnetlink.c:3268 rtmsg_ifinfo_event.part.30+0x45/0xe0 net/core/rtnetlink.c:3300 rtmsg_ifinfo_event net/core/rtnetlink.c:3297 [inline] rtnetlink_event+0x144/0x170 net/core/rtnetlink.c:4716 notifier_call_chain+0x180/0x390 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1735 call_netdevice_notifiers net/core/dev.c:1753 [inline] netdev_features_change net/core/dev.c:1321 [inline] netdev_change_features+0xb3/0x110 net/core/dev.c:7759 bond_compute_features.isra.47+0x585/0xa50 drivers/net/bonding/bond_main.c:1120 bond_enslave+0x1b25/0x5da0 drivers/net/bonding/bond_main.c:1755 bond_do_ioctl+0x7cb/0xae0 drivers/net/bonding/bond_main.c:3528 dev_ifsioc+0x43c/0xb30 net/core/dev_ioctl.c:327 dev_ioctl+0x1b5/0xcc0 net/core/dev_ioctl.c:493 sock_do_ioctl+0x1d3/0x3e0 net/socket.c:992 sock_ioctl+0x30d/0x680 net/socket.c:1093 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:500 [inline] do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:684 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701 __do_sys_ioctl fs/ioctl.c:708 [inline] __se_sys_ioctl fs/ioctl.c:706 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x440859 Code: e8 2c af 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffc51a92878 EFLAGS: 00000213 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440859 RDX: 0000000020000040 RSI: 0000000000008990 RDI: 0000000000000003 RBP: 0000000000000000 R08: 00000000004002c8 R09: 00000000004002c8 R10: 00000000022d5880 R11: 0000000000000213 R12: 0000000000007390 R13: 0000000000401db0 R14: 0000000000000000 R15: 0000000000000000 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26bonding: re-evaluate force_primary when the primary slave name changesXiangning Yu
[ Upstream commit eb55bbf865d9979098c6a7a17cbdb41237ece951 ] There is a timing issue under active-standy mode, when bond_enslave() is called, bond->params.primary might not be initialized yet. Any time the primary slave string changes, bond->force_primary should be set to true to make sure the primary becomes the active slave. Signed-off-by: Xiangning Yu <yuxiangning@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-16bonding: require speed/duplex only for 802.3ad, alb and tlbAndreas Born
commit ad729bc9acfb7c47112964b4877ef5404578ed13 upstream. The patch c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") puts the link state to down if bond_update_speed_duplex() cannot retrieve speed and duplex settings. Assumably the patch was written with 802.3ad mode in mind which relies on link speed/duplex settings. For other modes like active-backup these settings are not required. Thus, only for these other modes, this patch reintroduces support for slaves that do not support reporting speed or duplex such as wireless devices. This fixes the regression reported in bug 196547 (https://bugzilla.kernel.org/show_bug.cgi?id=196547). Fixes: c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") Signed-off-by: Andreas Born <futur.andy@googlemail.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Nate Clark <nate@neworld.us> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-16bonding: fix active-backup transitionMahesh Bandewar
commit 3f3c278c94dd994fe0d9f21679ae19b9c0a55292 upstream. Earlier patch c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") made an attempt to keep slave state consistent with speed and duplex settings. Unfortunately link-state transition is used to change the active link especially when used in conjunction with mii-mon. The above mentioned patch broke that logic. Also when speed and duplex settings for a link are updated during a link-event, the link-status should not be changed to invoke correct transition logic. This patch fixes this issue by moving the link-state update outside of the bond_update_speed_duplex() fn and to the places where this fn is called and update link-state selectively. Fixes: c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reviewed-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Nate Clark <nate@neworld.us> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-16bonding: correctly update link status during mii-commit phaseMahesh Bandewar
commit b5bf0f5b16b9c316c34df9f31d4be8729eb86845 upstream. bond_miimon_commit() marks the link UP after attempting to get the speed and duplex settings for the link. There is a possibility that bond_update_speed_duplex() could fail. This is another place where it could result into an inconsistent bonding link state. With this patch the link will be marked UP only if the speed and duplex values retrieved have sane values and processed further. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Nate Clark <nate@neworld.us> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-19bonding: send learning packets for vlans on slaveDebabrata Banerjee
[ Upstream commit 21706ee8a47d3ede7fdae0be6d7c0a0e31a83229 ] There was a regression at some point from the intended functionality of commit f60c3704e87d ("bonding: Fix alb mode to only use first level vlans.") Given the return value vlan_get_encap_level() we need to store the nest level of the bond device, and then compare the vlan's encap level to this. Without this, this check always fails and learning packets are never sent. In addition, this same commit caused a regression in the behavior of balance_alb, which requires learning packets be sent for all interfaces using the slave's mac in order to load balance properly. For vlan's that have not set a user mac, we can send after checking one bit. Otherwise we need send the set mac, albeit defeating rx load balancing for that vlan. Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-19bonding: do not allow rlb updates to invalid macDebabrata Banerjee
[ Upstream commit 4fa8667ca3989ce14cf66301fa251544fbddbdd0 ] Make sure multicast, broadcast, and zero mac's cannot be the output of rlb updates, which should all be directed arps. Receive load balancing will be collapsed if any of these happen, as the switch will broadcast. Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-29bonding: do not set slave_dev npinfo before slave_enable_netpoll in bond_enslaveXin Long
[ Upstream commit ddea788c63094f7c483783265563dd5b50052e28 ] After Commit 8a8efa22f51b ("bonding: sync netpoll code with bridge"), it would set slave_dev npinfo in slave_enable_netpoll when enslaving a dev if bond->dev->npinfo was set. However now slave_dev npinfo is set with bond->dev->npinfo before calling slave_enable_netpoll. With slave_dev npinfo set, __netpoll_setup called in slave_enable_netpoll will not call slave dev's .ndo_netpoll_setup(). It causes that the lower dev of this slave dev can't set its npinfo. One way to reproduce it: # modprobe bonding # brctl addbr br0 # brctl addif br0 eth1 # ifconfig bond0 192.168.122.1/24 up # ifenslave bond0 eth2 # systemctl restart netconsole # ifenslave bond0 br0 # ifconfig eth2 down # systemctl restart netconsole The netpoll won't really work. This patch is to remove that slave_dev npinfo setting in bond_enslave(). Fixes: 8a8efa22f51b ("bonding: sync netpoll code with bridge") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13bonding: process the err returned by dev_set_allmulti properly in bond_enslaveXin Long
[ Upstream commit 9f5a90c107741b864398f4ac0014711a8c1d8474 ] When dev_set_promiscuity(1) succeeds but dev_set_allmulti(1) fails, dev_set_promiscuity(-1) should be done before going to the err path. Otherwise, dev->promiscuity will leak. Fixes: 7e1a1ac1fbaa ("bonding: Check return of dev_set_promiscuity/allmulti") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13bonding: move dev_mc_sync after master_upper_dev_link in bond_enslaveXin Long
[ Upstream commit ae42cc62a9f07f1f6979054ed92606b9c30f4a2e ] Beniamino found a crash when adding vlan as slave of bond which is also the parent link: ip link add bond1 type bond ip link set bond1 up ip link add link bond1 vlan1 type vlan id 80 ip link set vlan1 master bond1 The call trace is as below: [<ffffffffa850842a>] queued_spin_lock_slowpath+0xb/0xf [<ffffffffa8515680>] _raw_spin_lock+0x20/0x30 [<ffffffffa83f6f07>] dev_mc_sync+0x37/0x80 [<ffffffffc08687dc>] vlan_dev_set_rx_mode+0x1c/0x30 [8021q] [<ffffffffa83efd2a>] __dev_set_rx_mode+0x5a/0xa0 [<ffffffffa83f7138>] dev_mc_sync_multiple+0x78/0x80 [<ffffffffc084127c>] bond_enslave+0x67c/0x1190 [bonding] [<ffffffffa8401909>] do_setlink+0x9c9/0xe50 [<ffffffffa8403bf2>] rtnl_newlink+0x522/0x880 [<ffffffffa8403ff7>] rtnetlink_rcv_msg+0xa7/0x260 [<ffffffffa8424ecb>] netlink_rcv_skb+0xab/0xc0 [<ffffffffa83fe498>] rtnetlink_rcv+0x28/0x30 [<ffffffffa8424850>] netlink_unicast+0x170/0x210 [<ffffffffa8424bf8>] netlink_sendmsg+0x308/0x420 [<ffffffffa83cc396>] sock_sendmsg+0xb6/0xf0 This is actually a dead lock caused by sync slave hwaddr from master when the master is the slave's 'slave'. This dead loop check is actually done by netdev_master_upper_dev_link. However, Commit 1f718f0f4f97 ("bonding: populate neighbour's private on enslave") moved it after dev_mc_sync. This patch is to fix it by moving dev_mc_sync after master_upper_dev_link, so that this loop check would be earlier than dev_mc_sync. It also moves if (mode == BOND_MODE_8023AD) into if (!bond_uses_primary) clause as an improvement. Note team driver also has this issue, I will fix it in another patch. Fixes: 1f718f0f4f97 ("bonding: populate neighbour's private on enslave") Reported-by: Beniamino Galvani <bgalvani@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13bonding: fix the err path for dev hwaddr sync in bond_enslaveXin Long
[ Upstream commit 5c78f6bfae2b10ff70e21d343e64584ea6280c26 ] vlan_vids_add_by_dev is called right after dev hwaddr sync, so on the err path it should unsync dev hwaddr. Otherwise, the slave dev's hwaddr will never be unsync when this err happens. Fixes: 1ff412ad7714 ("bonding: change the bond's vlan syncing functions with the standard ones") Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13bonding: Don't update slave->link until ready to commitNithin Sujir
[ Upstream commit 797a93647a48d6cb8a20641a86a71713a947f786 ] In the loadbalance arp monitoring scheme, when a slave link change is detected, the slave->link is immediately updated and slave_state_changed is set. Later down the function, the rtnl_lock is acquired and the changes are committed, updating the bond link state. However, the acquisition of the rtnl_lock can fail. The next time the monitor runs, since slave->link is already updated, it determines that link is unchanged. This results in the bond link state permanently out of sync with the slave link. This patch modifies bond_loadbalance_arp_mon() to handle link changes identical to bond_ab_arp_{inspect/commit}(). The new link state is maintained in slave->new_link until we're ready to commit at which point it's copied into slave->link. NOTE: miimon_{inspect/commit}() has a more complex state machine requiring the use of the bond_{propose,commit}_link_state() functions which maintains the intermediate state in slave->link_new_state. The arp monitors don't require that. Testing: This bug is very easy to reproduce with the following steps. 1. In a loop, toggle a slave link of a bond slave interface. 2. In a separate loop, do ifconfig up/down of an unrelated interface to create contention for rtnl_lock. Within a few iterations, the bond link goes out of sync with the slave link. Signed-off-by: Nithin Nayak Sujir <nsujir@tintri.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24bonding: handle link transition from FAIL to UP correctlyMahesh Bandewar
[ Upstream commit fb9eb899a6dc663e4a2deed9af2ac28f507d0ffb ] When link transitions from LINK_FAIL to LINK_UP, the commit phase is not called. This leads to an erroneous state causing slave-link state to get stuck in "going down" state while its speed and duplex are perfectly fine. This issue is a side-effect of splitting link-set into propose and commit phases introduced by de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring") This patch fixes these issues by calling commit phase whenever link state change is proposed. Fixes: de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring") Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22bonding: refine bond_fold_stats() wrap detectionEric Dumazet
[ Upstream commit 142c6594acbcc32391af9c15f8cd65c6c177698f ] Some device drivers reset their stats at down/up events, possibly fooling bonding stats, since they operate with relative deltas. It is nearly not possible to fix drivers, since some of them compute the tx/rx counters based on per rx/tx queue stats, and the queues can be reconfigured (ethtool -L) between the down/up sequence. Lets avoid accumulating 'negative' values that render bonding stats useless. It is better to lose small deltas, assuming the bonding stats are fetched at a reasonable frequency. Fixes: 5f0c5f73e5ef ("bonding: make global bonding stats more reliable") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22bonding: make speed, duplex setting consistent with link stateMahesh Bandewar
[ Upstream commit c4adfc822bf5d8e97660b6114b5a8892530ce8cb ] bond_update_speed_duplex() retrieves speed and duplex settings. There is a possibility of failure in retrieving these values but caller has to assume it's always successful. This leads to having inconsistent slave link settings. If these (speed, duplex) values cannot be retrieved, then keeping the link UP causes problems. The updated bond_update_speed_duplex() returns 0 on success if it retrieves sane values for speed and duplex. On failure it returns 1 and marks the link down. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-24bonding: discard lowest hash bit for 802.3ad layer3+4Hangbin Liu
[ Upstream commit b5f862180d7011d9575d0499fa37f0f25b423b12 ] After commit 07f4c90062f8 ("tcp/dccp: try to not exhaust ip_local_port_range in connect()"), we will try to use even ports for connect(). Then if an application (seen clearly with iperf) opens multiple streams to the same destination IP and port, each stream will be given an even source port. So the bonding driver's simple xmit_hash_policy based on layer3+4 addressing will always hash all these streams to the same interface. And the total throughput will limited to a single slave. Change the tcp code will impact the whole tcp behavior, only for bonding usage. Paolo Abeni suggested fix this by changing the bonding code only, which should be more reasonable, and less impact. Fix this by discarding the lowest hash bit because it contains little entropy. After the fix we can re-balance between slaves. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07bonding: fix accounting of active ports in 3adJarod Wilson
[ Upstream commit 751da2a69b7cc82d83dc310ed7606225f2d6e014 ] As of 7bb11dc9f59d and 0622cab0341c, bond slaves in a 3ad bond are not removed from the aggregator when they are down, and the active slave count is NOT equal to number of ports in the aggregator, but rather the number of ports in the aggregator that are still enabled. The sysfs spew for bonding_show_ad_num_ports() has a comment that says "Show number of active 802.3ad ports.", but it's currently showing total number of ports, both active and inactive. Remedy it by using the same logic introduced in 0622cab0341c in __bond_3ad_get_active_agg_info(), so sysfs, procfs and netlink all report the number of active ports. Note that this means that IFLA_BOND_AD_INFO_NUM_PORTS really means NUM_ACTIVE_PORTS instead of NUM_PORTS, and thus perhaps should be renamed for clarity. Lightly tested on a dual i40e lacp bond, simulating link downs with an ip link set dev <slave2> down, was able to produce the state where I could see both in the same aggregator, but a number of ports count of 1. MII Status: up Active Aggregator Info: Aggregator ID: 1 Number of ports: 2 <--- Slave Interface: ens10 MII Status: up <--- Aggregator ID: 1 Slave Interface: ens11 MII Status: up Aggregator ID: 1 MII Status: up Active Aggregator Info: Aggregator ID: 1 Number of ports: 1 <--- Slave Interface: ens10 MII Status: down <--- Aggregator ID: 1 Slave Interface: ens11 MII Status: up Aggregator ID: 1 CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-11Merge branch 'work.uaccess2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess.h prepwork from Al Viro: "Preparations to tree-wide switch to use of linux/uaccess.h (which, obviously, will allow to start unifying stuff for real). The last step there, ie PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ `git grep -l "$PATT"|grep -v ^include/linux/uaccess.h` is not taken here - I would prefer to do it once just before or just after -rc1. However, everything should be ready for it" * 'work.uaccess2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: remove a stray reference to asm/uaccess.h in docs sparc64: separate extable_64.h, switch elf_64.h to it score: separate extable.h, switch module.h to it mips: separate extable.h, switch module.h to it x86: separate extable.h, switch sections.h to it remove stray include of asm/uaccess.h from cacheflush.h mn10300: remove a bogus processor.h->uaccess.h include xtensa: split uaccess.h into C and asm sides bonding: quit messing with IOCTL kill __kernel_ds_p off mn10300: finish verify_area() off frv: move HAVE_ARCH_UNMAPPED_AREA to pgtable.h exceptions: detritus removal
2016-09-27bonding: quit messing with IOCTLAl Viro
The only remaining users are issuing SIOCGMIIPHY and SIOCGMIIREG, neither of which deals with userland pointers. Simply calling ->ndo_do_ioctl() is fine; no messing with set_fs() is needed. It used to mess with SIOCETHTOOL, which would've needed set_fs(), but that has been killed in "[NET] ethtool ops are the only way" 9 years ago... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/mediatek/mtk_eth_soc.c drivers/net/ethernet/qlogic/qed/qed_dcbx.c drivers/net/phy/Kconfig All conflicts were cases of overlapping commits. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-04bonding: Fix bonding crashMahesh Bandewar
Following few steps will crash kernel - (a) Create bonding master > modprobe bonding miimon=50 (b) Create macvlan bridge on eth2 > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \ type macvlan (c) Now try adding eth2 into the bond > echo +eth2 > /sys/class/net/bond0/bonding/slaves <crash> Bonding does lots of things before checking if the device enslaved is busy or not. In this case when the notifier call-chain sends notifications, the bond_netdev_event() assumes that the rx_handler /rx_handler_data is registered while the bond_enslave() hasn't progressed far enough to register rx_handler for the new slave. This patch adds a rx_handler check that can be performed right at the beginning of the enslave code to avoid getting into this situation. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-01bonding: Remove deprecated create_singlethread_workqueueBhaktipriya Shridhar
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "wq" queues multiple work items viz &bond->mcast_work, &nnw->work, &bond->mii_work, &bond->arp_work, &bond->alb_work, &bond->mii_work, &bond->ad_work, &bond->slave_arr_work which require strict execution ordering. Hence, an ordered dedicated workqueue has been used. Since, it is a network driver, WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-09bonding: fix the typoZhu Yanjun
The message "803.ad" should be "802.3ad". Signed-off-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25net/bonding: Enforce active-backup policy for IPoIB bondsMark Bloch
When using an IPoIB bond currently only active-backup mode is a valid use case and this commit strengthens it. Since commit 2ab82852a270 ("net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address()") was introduced till 4.7-rc1, IPoIB didn't support the set_mac_address ndo, and hence the fail over mac policy always applied to IPoIB bonds. With the introduction of commit 492a7e67ff83 ("IB/IPoIB: Allow setting the device address"), that doesn't hold and practically IPoIB bonds are broken as of that. To fix it, lets go to fail over mac if the device doesn't support the ndo OR this is IPoIB device. As a by-product, this commit also prevents a stack corruption which occurred when trying to copy 20 bytes (IPoIB) device address to a sockaddr struct that has only 16 bytes of storage. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Just several instances of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14bonding: set carrier off for devices created through netlinkBeniamino Galvani
Commit e826eafa65c6 ("bonding: Call netif_carrier_off after register_netdevice") moved netif_carrier_off() from bond_init() to bond_create(), but the latter is called only for initial default devices and ones created through sysfs: $ modprobe bonding $ echo +bond1 > /sys/class/net/bonding_masters $ ip link add bond2 type bond $ grep "MII Status" /proc/net/bonding/* /proc/net/bonding/bond0:MII Status: down /proc/net/bonding/bond1:MII Status: down /proc/net/bonding/bond2:MII Status: up Ensure that carrier is initially off also for devices created through netlink. Signed-off-by: Beniamino Galvani <bgalvani@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/mellanox/mlx5/core/en.h drivers/net/ethernet/mellanox/mlx5/core/en_main.c drivers/net/usb/r8152.c All three conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05bonding: fix enslavement slave link notificationsAviv Heller
Currently, link notifications are not sent by bond_set_slave_link_state() upon enslavement if the slave is enslaved when up. This happens because slave->link default init value is 0, which is the same as BOND_LINK_UP, resulting in bond_set_slave_link_state() ignoring this transition. This patch sets the default value of slave->link to BOND_LINK_NOCHANGE, assuring it will count as a state transition and thus trigger notification logic. Signed-off-by: Aviv Heller <avivh@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05net: introduce default neigh_construct/destroy ndo calls for L2 upper devicesJiri Pirko
L2 upper device needs to propagate neigh_construct/destroy calls down to lower devices. Do this by defining default ndo functions and use them in team, bond, bridge and vlan. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01bonding: prevent out of bound accessesEric Dumazet
ether_addr_equal_64bits() requires some care about its arguments, namely that 8 bytes might be read, even if last 2 byte values are not used. KASan detected a violation with null_mac_addr and lacpdu_mcast_addr in bond_3ad.c Same problem with mac_bcast[] and mac_v6_allmcast[] in bond_alb.c : Although the 8-byte alignment was there, KASan would detect out of bound accesses. Fixes: 815117adaf5b ("bonding: use ether_addr_equal_unaligned for bond addr compare") Fixes: bb54e58929f3 ("bonding: Verify RX LACPDU has proper dest mac-addr") Fixes: 885a136c52a8 ("bonding: use compare_ether_addr_64bits() in ALB") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Several cases of overlapping changes, except the packet scheduler conflicts which deal with the addition of the free list parameter to qdisc_enqueue(). Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-28bonding: fix 802.3ad aggregator reselectionJay Vosburgh
Since commit 7bb11dc9f59d ("bonding: unify all places where actor-oper key needs to be updated."), the logic in bonding to handle selection between multiple aggregators has not functioned. This affects only configurations wherein the bonding slaves connect to two discrete aggregators (e.g., two independent switches, each with LACP enabled), thus creating two separate aggregation groups within a single bond. The cause is a change in 7bb11dc9f59d to no longer set AD_PORT_BEGIN on a port after a link state change, which would cause the port to be reselected for attachment to an aggregator as if were newly added to the bond. We cannot restore the prior behavior, as it contradicts IEEE 802.1AX 5.4.12, which requires ports that "become inoperable" (lose carrier, setting port_enabled=false as per 802.1AX 5.4.7) to remain selected (i.e., assigned to the aggregator). As the port now remains selected, the aggregator selection logic is not invoked. A side effect of this change is that aggregators in bonding will now contain ports that are link down. The aggregator selection logic does not currently handle this situation correctly, causing incorrect aggregator selection. This patch makes two changes to repair the aggregator selection logic in bonding to function as documented and within the confines of the standard: First, the aggregator selection and related logic now utilizes the number of active ports per aggregator, not the number of selected ports (as some selected ports may be down). The ad_select "bandwidth" and "count" options only consider ports that are link up. Second, on any carrier state change of any slave, the aggregator selection logic is explicitly called to insure the correct aggregator is active. Reported-by: Veli-Matti Lintu <veli-matti.lintu@opinsys.fi> Fixes: 7bb11dc9f59d ("bonding: unify all places where actor-oper key needs to be updated.") Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net: add netdev_lockdep_set_classes() helperEric Dumazet
It is time to add netdev_lockdep_set_classes() helper so that lockdep annotations per device type are easier to manage. This removes a lot of copies and missing annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07net_sched: transform qdisc running bit into a seqcountEric Dumazet
Instead of using a single bit (__QDISC___STATE_RUNNING) in sch->__state, use a seqcount. This adds lockdep support, but more importantly it will allow us to sample qdisc/class statistics without having to grab qdisc root lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>