summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2014-11-15Merge tag 'nfs-for-3.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: - stable patches to fix NFSv4.x delegation reclaim error paths - fix a bug whereby we were advertising NFSv4.1 but using NFSv4.2 features - fix a use-after-free problem with pNFS block layouts - fix a memory leak in the pNFS files O_DIRECT code - replace an intrusive and Oops-prone performance fix in the NFSv4 atomic open code with a safer one-line version and revert the two original patches" * tag 'nfs-for-3.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: sunrpc: fix sleeping under rcu_read_lock in gss_stringify_acceptor NFS: Don't try to reclaim delegation open state if recovery failed NFSv4: Ensure that we call FREE_STATEID when NFSv4.x stateids are revoked NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return NFSv4.1: nfs41_clear_delegation_stateid shouldn't trust NFS_DELEGATED_STATE NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired NFS: SEEK is an NFS v4.2 feature nfs: Fix use of uninitialized variable in nfs_getattr() nfs: Remove bogus assignment nfs: remove spurious WARN_ON_ONCE in write path pnfs/blocklayout: serialize GETDEVICEINFO calls nfs: fix pnfs direct write memory leak Revert "NFS: nfs4_do_open should add negative results to the dcache." Revert "NFS: remove BUG possibility in nfs4_open_and_get_state" NFSv4: Ensure nfs_atomic_open set the dentry verifier on ENOENT
2014-11-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) sunhme driver lacks DMA mapping error checks, based upon a report by Meelis Roos. 2) Fix memory leak in mvpp2 driver, from Sudip Mukherjee. 3) DMA memory allocation sizes are wrong in systemport ethernet driver, fix from Florian Fainelli. 4) Fix use after free in mac80211 defragmentation code, from Johannes Berg. 5) Some networking uapi headers missing from Kbuild file, from Stephen Hemminger. 6) TUN driver gets csum_start offset wrong when VLAN accel is enabled, and macvtap has a similar bug, from Herbert Xu. 7) Adjust several tunneling drivers to set dev->iflink after registry, because registry sets that to -1 overwriting whatever we did. From Steffen Klassert. 8) Geneve forgets to set inner tunneling type, causing GSO segmentation to fail on some NICs. From Jesse Gross. 9) Fix several locking bugs in stmmac driver, from Fabrice Gasnier and Giuseppe CAVALLARO. 10) Fix spurious timeouts with NewReno on low traffic connections, from Marcelo Leitner. 11) Fix descriptor updates in enic driver, from Govindarajulu Varadarajan. 12) PPP calls bpf_prog_create() with locks held, which isn't kosher. Fix from Takashi Iwai. 13) Fix NULL deref in SCTP with malformed INIT packets, from Daniel Borkmann. 14) psock_fanout selftest accesses past the end of the mmap ring, fix from Shuah Khan. 15) Fix PTP timestamping for VLAN packets, from Richard Cochran. 16) netlink_unbind() calls in netlink pass wrong initial argument, from Hiroaki SHIMODA. 17) vxlan socket reuse accidently reuses a socket when the address family is different, so we have to explicitly check this, from Marcelo Lietner. 18) Fix missing include in nft_reject_bridge.c breaking the build on ppc and other architectures, from Guenter Roeck. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (75 commits) vxlan: Do not reuse sockets for a different address family smsc911x: power-up phydev before doing a software reset. lib: rhashtable - Remove weird non-ASCII characters from comments net/smsc911x: Fix delays in the PHY enable/disable routines net/smsc911x: Fix rare soft reset timeout issue due to PHY power-down mode netlink: Properly unbind in error conditions. net: ptp: fix time stamp matching logic for VLAN packets. cxgb4 : dcb open-lldp interop fixes selftests/net: psock_fanout seg faults in sock_fanout_read_ring() net: bcmgenet: apply MII configuration in bcmgenet_open() net: bcmgenet: connect and disconnect from the PHY state machine net: qualcomm: Fix dependency ixgbe: phy: fix uninitialized status in ixgbe_setup_phy_link_tnx net: phy: Correctly handle MII ioctl which changes autonegotiation. ipv6: fix IPV6_PKTINFO with v4 mapped net: sctp: fix memory leak in auth key management net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet net: ppp: Don't call bpf_prog_create() in ppp_lock net/mlx4_en: Advertize encapsulation offloads features only when VXLAN tunnel is set cxgb4 : Fix bug in DCB app deletion ...
2014-11-13libceph: change from BUG to WARN for __remove_osd() assertsIlya Dryomov
No reason to use BUG_ON for osd request list assertions. Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-11-13libceph: clear r_req_lru_item in __unregister_linger_request()Ilya Dryomov
kick_requests() can put linger requests on the notarget list. This means we need to clear the much-overloaded req->r_req_lru_item in __unregister_linger_request() as well, or we get an assertion failure in ceph_osdc_release_request() - !list_empty(&req->r_req_lru_item). AFAICT the assumption was that registered linger requests cannot be on any of req->r_req_lru_item lists, but that's clearly not the case. Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-11-13libceph: unlink from o_linger_requests when clearing r_osdIlya Dryomov
Requests have to be unlinked from both osd->o_requests (normal requests) and osd->o_linger_requests (linger requests) lists when clearing req->r_osd. Otherwise __unregister_linger_request() gets confused and we trip over a !list_empty(&osd->o_linger_requests) assert in __remove_osd(). MON=1 OSD=1: # cat remove-osd.sh #!/bin/bash rbd create --size 1 test DEV=$(rbd map test) ceph osd out 0 sleep 3 rbd map dne/dne # obtain a new osdmap as a side effect rbd unmap $DEV & # will block sleep 3 ceph osd in 0 Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-11-13libceph: do not crash on large auth ticketsIlya Dryomov
Large (greater than 32k, the value of PAGE_ALLOC_COSTLY_ORDER) auth tickets will have their buffers vmalloc'ed, which leads to the following crash in crypto: [ 28.685082] BUG: unable to handle kernel paging request at ffffeb04000032c0 [ 28.686032] IP: [<ffffffff81392b42>] scatterwalk_pagedone+0x22/0x80 [ 28.686032] PGD 0 [ 28.688088] Oops: 0000 [#1] PREEMPT SMP [ 28.688088] Modules linked in: [ 28.688088] CPU: 0 PID: 878 Comm: kworker/0:2 Not tainted 3.17.0-vm+ #305 [ 28.688088] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 28.688088] Workqueue: ceph-msgr con_work [ 28.688088] task: ffff88011a7f9030 ti: ffff8800d903c000 task.ti: ffff8800d903c000 [ 28.688088] RIP: 0010:[<ffffffff81392b42>] [<ffffffff81392b42>] scatterwalk_pagedone+0x22/0x80 [ 28.688088] RSP: 0018:ffff8800d903f688 EFLAGS: 00010286 [ 28.688088] RAX: ffffeb04000032c0 RBX: ffff8800d903f718 RCX: ffffeb04000032c0 [ 28.688088] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8800d903f750 [ 28.688088] RBP: ffff8800d903f688 R08: 00000000000007de R09: ffff8800d903f880 [ 28.688088] R10: 18df467c72d6257b R11: 0000000000000000 R12: 0000000000000010 [ 28.688088] R13: ffff8800d903f750 R14: ffff8800d903f8a0 R15: 0000000000000000 [ 28.688088] FS: 00007f50a41c7700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000 [ 28.688088] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 28.688088] CR2: ffffeb04000032c0 CR3: 00000000da3f3000 CR4: 00000000000006b0 [ 28.688088] Stack: [ 28.688088] ffff8800d903f698 ffffffff81392ca8 ffff8800d903f6e8 ffffffff81395d32 [ 28.688088] ffff8800dac96000 ffff880000000000 ffff8800d903f980 ffff880119b7e020 [ 28.688088] ffff880119b7e010 0000000000000000 0000000000000010 0000000000000010 [ 28.688088] Call Trace: [ 28.688088] [<ffffffff81392ca8>] scatterwalk_done+0x38/0x40 [ 28.688088] [<ffffffff81392ca8>] scatterwalk_done+0x38/0x40 [ 28.688088] [<ffffffff81395d32>] blkcipher_walk_done+0x182/0x220 [ 28.688088] [<ffffffff813990bf>] crypto_cbc_encrypt+0x15f/0x180 [ 28.688088] [<ffffffff81399780>] ? crypto_aes_set_key+0x30/0x30 [ 28.688088] [<ffffffff8156c40c>] ceph_aes_encrypt2+0x29c/0x2e0 [ 28.688088] [<ffffffff8156d2a3>] ceph_encrypt2+0x93/0xb0 [ 28.688088] [<ffffffff8156d7da>] ceph_x_encrypt+0x4a/0x60 [ 28.688088] [<ffffffff8155b39d>] ? ceph_buffer_new+0x5d/0xf0 [ 28.688088] [<ffffffff8156e837>] ceph_x_build_authorizer.isra.6+0x297/0x360 [ 28.688088] [<ffffffff8112089b>] ? kmem_cache_alloc_trace+0x11b/0x1c0 [ 28.688088] [<ffffffff8156b496>] ? ceph_auth_create_authorizer+0x36/0x80 [ 28.688088] [<ffffffff8156ed83>] ceph_x_create_authorizer+0x63/0xd0 [ 28.688088] [<ffffffff8156b4b4>] ceph_auth_create_authorizer+0x54/0x80 [ 28.688088] [<ffffffff8155f7c0>] get_authorizer+0x80/0xd0 [ 28.688088] [<ffffffff81555a8b>] prepare_write_connect+0x18b/0x2b0 [ 28.688088] [<ffffffff81559289>] try_read+0x1e59/0x1f10 This is because we set up crypto scatterlists as if all buffers were kmalloc'ed. Fix it. Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
2014-11-13sunrpc: fix sleeping under rcu_read_lock in gss_stringify_acceptorJeff Layton
Bruce reported that he was seeing the following BUG pop: BUG: sleeping function called from invalid context at mm/slab.c:2846 in_atomic(): 0, irqs_disabled(): 0, pid: 4539, name: mount.nfs 2 locks held by mount.nfs/4539: #0: (nfs_clid_init_mutex){+.+.+.}, at: [<ffffffffa01c0a9a>] nfs4_discover_server_trunking+0x4a/0x2f0 [nfsv4] #1: (rcu_read_lock){......}, at: [<ffffffffa00e3185>] gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss] Preemption disabled at:[<ffffffff81a4f082>] printk+0x4d/0x4f CPU: 3 PID: 4539 Comm: mount.nfs Not tainted 3.18.0-rc1-00013-g5b095e9 #3393 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 ffff880021499390 ffff8800381476a8 ffffffff81a534cf 0000000000000001 0000000000000000 ffff8800381476c8 ffffffff81097854 00000000000000d0 0000000000000018 ffff880038147718 ffffffff8118e4f3 0000000020479f00 Call Trace: [<ffffffff81a534cf>] dump_stack+0x4f/0x7c [<ffffffff81097854>] __might_sleep+0x114/0x180 [<ffffffff8118e4f3>] __kmalloc+0x1a3/0x280 [<ffffffffa00e31d8>] gss_stringify_acceptor+0x58/0xb0 [auth_rpcgss] [<ffffffffa00e3185>] ? gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss] [<ffffffffa006b438>] rpcauth_stringify_acceptor+0x18/0x30 [sunrpc] [<ffffffffa01b0469>] nfs4_proc_setclientid+0x199/0x380 [nfsv4] [<ffffffffa01b04d0>] ? nfs4_proc_setclientid+0x200/0x380 [nfsv4] [<ffffffffa01bdf1a>] nfs40_discover_server_trunking+0xda/0x150 [nfsv4] [<ffffffffa01bde45>] ? nfs40_discover_server_trunking+0x5/0x150 [nfsv4] [<ffffffffa01c0acf>] nfs4_discover_server_trunking+0x7f/0x2f0 [nfsv4] [<ffffffffa01c8e24>] nfs4_init_client+0x104/0x2f0 [nfsv4] [<ffffffffa01539b4>] nfs_get_client+0x314/0x3f0 [nfs] [<ffffffffa0153780>] ? nfs_get_client+0xe0/0x3f0 [nfs] [<ffffffffa01c83aa>] nfs4_set_client+0x8a/0x110 [nfsv4] [<ffffffffa0069708>] ? __rpc_init_priority_wait_queue+0xa8/0xf0 [sunrpc] [<ffffffffa01c9b2f>] nfs4_create_server+0x12f/0x390 [nfsv4] [<ffffffffa01c1472>] nfs4_remote_mount+0x32/0x60 [nfsv4] [<ffffffff81196489>] mount_fs+0x39/0x1b0 [<ffffffff81166145>] ? __alloc_percpu+0x15/0x20 [<ffffffff811b276b>] vfs_kern_mount+0x6b/0x150 [<ffffffffa01c1396>] nfs_do_root_mount+0x86/0xc0 [nfsv4] [<ffffffffa01c1784>] nfs4_try_mount+0x44/0xc0 [nfsv4] [<ffffffffa01549b7>] ? get_nfs_version+0x27/0x90 [nfs] [<ffffffffa0161a2d>] nfs_fs_mount+0x47d/0xd60 [nfs] [<ffffffff81a59c5e>] ? mutex_unlock+0xe/0x10 [<ffffffffa01606a0>] ? nfs_remount+0x430/0x430 [nfs] [<ffffffffa01609c0>] ? nfs_clone_super+0x140/0x140 [nfs] [<ffffffff81196489>] mount_fs+0x39/0x1b0 [<ffffffff81166145>] ? __alloc_percpu+0x15/0x20 [<ffffffff811b276b>] vfs_kern_mount+0x6b/0x150 [<ffffffff811b5830>] do_mount+0x210/0xbe0 [<ffffffff811b54ca>] ? copy_mount_options+0x3a/0x160 [<ffffffff811b651f>] SyS_mount+0x6f/0xb0 [<ffffffff81a5c852>] system_call_fastpath+0x12/0x17 Sleeping under the rcu_read_lock is bad. This patch fixes it by dropping the rcu_read_lock before doing the allocation and then reacquiring it and redoing the dereference before doing the copy. If we find that the string has somehow grown in the meantime, we'll reallocate and try again. Cc: <stable@vger.kernel.org> # v3.17+ Reported-by: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-12netlink: Properly unbind in error conditions.Hiroaki SHIMODA
Even if netlink_kernel_cfg::unbind is implemented the unbind() method is not called, because cfg->unbind is omitted in __netlink_kernel_create(). And fix wrong argument of test_bit() and off by one problem. At this point, no unbind() method is implemented, so there is no real issue. Fixes: 4f520900522f ("netlink: have netlink per-protocol bind function return an error code.") Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Cc: Richard Guy Briggs <rgb@redhat.com> Acked-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11ipv6: fix IPV6_PKTINFO with v4 mappedEric Dumazet
Use IS_ENABLED(CONFIG_IPV6), to enable this code if IPv6 is a module. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: c8e6ad0829a7 ("ipv6: honor IPV6_PKTINFO with v4 mapped addresses on sendmsg") Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11net: sctp: fix memory leak in auth key managementDaniel Borkmann
A very minimal and simple user space application allocating an SCTP socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing the socket again will leak the memory containing the authentication key from user space: unreferenced object 0xffff8800837047c0 (size 16): comm "a.out", pid 2789, jiffies 4296954322 (age 192.258s) hex dump (first 16 bytes): 01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff816d7e8e>] kmemleak_alloc+0x4e/0xb0 [<ffffffff811c88d8>] __kmalloc+0xe8/0x270 [<ffffffffa0870c23>] sctp_auth_create_key+0x23/0x50 [sctp] [<ffffffffa08718b1>] sctp_auth_set_key+0xa1/0x140 [sctp] [<ffffffffa086b383>] sctp_setsockopt+0xd03/0x1180 [sctp] [<ffffffff815bfd94>] sock_common_setsockopt+0x14/0x20 [<ffffffff815beb61>] SyS_setsockopt+0x71/0xd0 [<ffffffff816e58a9>] system_call_fastpath+0x12/0x17 [<ffffffffffffffff>] 0xffffffffffffffff This is bad because of two things, we can bring down a machine from user space when auth_enable=1, but also we would leave security sensitive keying material in memory without clearing it after use. The issue is that sctp_auth_create_key() already sets the refcount to 1, but after allocation sctp_auth_set_key() does an additional refcount on it, and thus leaving it around when we free the socket. Fixes: 65b07e5d0d0 ("[SCTP]: API updates to suport SCTP-AUTH extensions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed ↵Daniel Borkmann
packet An SCTP server doing ASCONF will panic on malformed INIT ping-of-death in the form of: ------------ INIT[PARAM: SET_PRIMARY_IP] ------------> While the INIT chunk parameter verification dissects through many things in order to detect malformed input, it misses to actually check parameters inside of parameters. E.g. RFC5061, section 4.2.4 proposes a 'set primary IP address' parameter in ASCONF, which has as a subparameter an address parameter. So an attacker may send a parameter type other than SCTP_PARAM_IPV4_ADDRESS or SCTP_PARAM_IPV6_ADDRESS, param_type2af() will subsequently return 0 and thus sctp_get_af_specific() returns NULL, too, which we then happily dereference unconditionally through af->from_addr_param(). The trace for the log: BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 IP: [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp] PGD 0 Oops: 0000 [#1] SMP [...] Pid: 0, comm: swapper Not tainted 2.6.32-504.el6.x86_64 #1 Bochs Bochs RIP: 0010:[<ffffffffa01e9c62>] [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp] [...] Call Trace: <IRQ> [<ffffffffa01f2add>] ? sctp_bind_addr_copy+0x5d/0xe0 [sctp] [<ffffffffa01e1fcb>] sctp_sf_do_5_1B_init+0x21b/0x340 [sctp] [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp] [<ffffffffa01e5c09>] ? sctp_endpoint_lookup_assoc+0xc9/0xf0 [sctp] [<ffffffffa01e61f6>] sctp_endpoint_bh_rcv+0x116/0x230 [sctp] [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp] [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp] [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter] [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0 [...] A minimal way to address this is to check for NULL as we do on all other such occasions where we know sctp_get_af_specific() could possibly return with NULL. Fixes: d6de3097592b ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10udptunnel: Add SKB_GSO_UDP_TUNNEL during gro_complete.Jesse Gross
When doing GRO processing for UDP tunnels, we never add SKB_GSO_UDP_TUNNEL to gso_type - only the type of the inner protocol is added (such as SKB_GSO_TCPV4). The result is that if the packet is later resegmented we will do GSO but not treat it as a tunnel. This results in UDP fragmentation of the outer header instead of (i.e.) TCP segmentation of the inner header as was originally on the wire. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-06Merge tag 'master-2014-11-04' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-11-06 Please pull this batch of fixes intended for the 3.18 stream... For the mac80211 bits, Johannes says: "This contains another small set of fixes for 3.18, these are all over the place and most of the bugs are old, one even dates back to the original mac80211 we merged into the kernel." For the iwlwifi bits, Emmanuel says: "I fix here two issues that are related to the firmware loading flow. A user reported that he couldn't load the driver because the rfkill line was pulled up while we were running the calibrations. This was happening while booting the system: systemd was restoring the "disable wifi settings" and that raised an RFKILL interrupt during the calibration. Our driver didn't handle that properly and this is now fixed." Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-06net: dsa: slave: Fix autoneg for phys on switch MDIO busAndrew Lunn
When the ports phys are connected to the switches internal MDIO bus, we need to connect the phy to the slave netdev, otherwise auto-negotiation etc, does not work. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05tcp: zero retrans_stamp if all retrans were ackedMarcelo Leitner
Ueki Kohei reported that when we are using NewReno with connections that have a very low traffic, we may timeout the connection too early if a second loss occurs after the first one was successfully acked but no data was transfered later. Below is his description of it: When SACK is disabled, and a socket suffers multiple separate TCP retransmissions, that socket's ETIMEDOUT value is calculated from the time of the *first* retransmission instead of the *latest* retransmission. This happens because the tcp_sock's retrans_stamp is set once then never cleared. Take the following connection: Linux remote-machine | | send#1---->(*1)|--------> data#1 --------->| | | | RTO : : | | | ---(*2)|----> data#1(retrans) ---->| | (*3)|<---------- ACK <----------| | | | | : : | : : | : : 16 minutes (or more) : | : : | : : | : : | | | send#2---->(*4)|--------> data#2 --------->| | | | RTO : : | | | ---(*5)|----> data#2(retrans) ---->| | | | | | | RTO*2 : : | | | | | | ETIMEDOUT<----(*6)| | (*1) One data packet sent. (*2) Because no ACK packet is received, the packet is retransmitted. (*3) The ACK packet is received. The transmitted packet is acknowledged. At this point the first "retransmission event" has passed and been recovered from. Any future retransmission is a completely new "event". (*4) After 16 minutes (to correspond with retries2=15), a new data packet is sent. Note: No data is transmitted between (*3) and (*4). The socket's timeout SHOULD be calculated from this point in time, but instead it's calculated from the prior "event" 16 minutes ago. (*5) Because no ACK packet is received, the packet is retransmitted. (*6) At the time of the 2nd retransmission, the socket returns ETIMEDOUT. Therefore, now we clear retrans_stamp as soon as all data during the loss window is fully acked. Reported-by: Ueki Kohei Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05geneve: Unregister pernet subsys on module unload.Jesse Gross
The pernet ops aren't ever unregistered, which causes a memory leak and an OOPs if the module is ever reinserted. Fixes: 0b5e8b8eeae4 ("net: Add Geneve tunneling protocol driver") CC: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05geneve: Set GSO type on transmit.Jesse Gross
Geneve does not currently set the inner protocol type when transmitting packets. This causes GSO segmentation to fail on NICs that do not support Geneve offloading. CC: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04Merge tag 'mac80211-for-john-2014-11-04' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg <johannes@sipsolutions.net> says: "This contains another small set of fixes for 3.18, these are all over the place and most of the bugs are old, one even dates back to the original mac80211 we merged into the kernel." Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-11-03Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph fixes from Sage Weil: "There is a GFP flag fix from Mike Christie, an error code fix from Jan, and fixes for two unnecessary allocations (kmalloc and workqueue) from Ilya. All are well tested. Ilya has one other fix on the way but it didn't get tested in time" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: eliminate unnecessary allocation in process_one_ticket() rbd: Fix error recovery in rbd_obj_read_sync() libceph: use memalloc flags for net IO rbd: use a single workqueue for all devices
2014-11-03gre6: Move the setting of dev->iflink into the ndo_init functions.Steffen Klassert
Otherwise it gets overwritten by register_netdev(). Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03sit: Use ipip6_tunnel_init as the ndo_init function.Steffen Klassert
ipip6_tunnel_init() sets the dev->iflink via a call to ipip6_tunnel_bind_dev(). After that, register_netdevice() sets dev->iflink = -1. So we loose the iflink configuration for ipv6 tunnels. Fix this by using ipip6_tunnel_init() as the ndo_init function. Then ipip6_tunnel_init() is called after dev->iflink is set to -1 from register_netdevice(). Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03vti6: Use vti6_dev_init as the ndo_init function.Steffen Klassert
vti6_dev_init() sets the dev->iflink via a call to vti6_link_config(). After that, register_netdevice() sets dev->iflink = -1. So we loose the iflink configuration for vti6 tunnels. Fix this by using vti6_dev_init() as the ndo_init function. Then vti6_dev_init() is called after dev->iflink is set to -1 from register_netdevice(). Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function.Steffen Klassert
ip6_tnl_dev_init() sets the dev->iflink via a call to ip6_tnl_link_config(). After that, register_netdevice() sets dev->iflink = -1. So we loose the iflink configuration for ipv6 tunnels. Fix this by using ip6_tnl_dev_init() as the ndo_init function. Then ip6_tnl_dev_init() is called after dev->iflink is set to -1 from register_netdevice(). Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03netfilter: nft_reject_bridge: Fix powerpc build errorGuenter Roeck
Fix: net/bridge/netfilter/nft_reject_bridge.c: In function 'nft_reject_br_send_v6_unreach': net/bridge/netfilter/nft_reject_bridge.c:240:3: error: implicit declaration of function 'csum_ipv6_magic' csum_ipv6_magic(&nip6h->saddr, &nip6h->daddr, ^ make[3]: *** [net/bridge/netfilter/nft_reject_bridge.o] Error 1 Seen with powerpc:allmodconfig. Fixes: 523b929d5446 ("netfilter: nft_reject_bridge: don't use IP stack to reject traffic") Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03mac80211: fix use-after-free in defragmentationJohannes Berg
Upon receiving the last fragment, all but the first fragment are freed, but the multicast check for statistics at the end of the function refers to the current skb (the last fragment) causing a use-after-free bug. Since multicast frames cannot be fragmented and we check for this early in the function, just modify that check to also do the accounting to fix the issue. Cc: stable@vger.kernel.org Reported-by: Yosef Khyal <yosefx.khyal@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-02irda: stop calling sk_prot->disconnect() on connection failureLinus Torvalds
The sk_prot is irda's own set of protocol handlers, so irda should statically know what that function is anyway, without using an indirect pointer. And as it happens, we know *exactly* what that pointer is statically: it's NULL, because irda doesn't define a disconnect operation. So calling that function is doubly wrong, and will just cause an oops. Reported-by: Martin Lang <mlg.hessigheim@gmail.com> Cc: Samuel Ortiz <samuel@sortiz.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-31libceph: eliminate unnecessary allocation in process_one_ticket()Ilya Dryomov
Commit c27a3e4d667f ("libceph: do not hard code max auth ticket len") while fixing a buffer overlow tried to keep the same as much of the surrounding code as possible and introduced an unnecessary kmalloc() in the unencrypted ticket path. It is likely to fail on huge tickets, so get rid of it. Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
2014-10-31net: ethtool: Return -EOPNOTSUPP if user space tries to read EEPROM with lengh 0Guenter Roeck
If a driver supports reading EEPROM but no EEPROM is installed in the system, the driver's get_eeprom_len function returns 0. ethtool will subsequently try to read that zero-length EEPROM anyway. If the driver does not support EEPROM access at all, this operation will return -EOPNOTSUPP. If the driver does support EEPROM access but no EEPROM is installed, the operation will return -EINVAL. Return -EOPNOTSUPP in both cases for consistency. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31mpls: Allow mpls_gso to be built as modulePravin B Shelar
Kconfig already allows mpls to be built as module. Following patch fixes Makefile to do same. CC: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31mpls: Fix mpls_gso handler.Pravin B Shelar
mpls gso handler needs to pull skb after segmenting skb. CC: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== netfilter/ipvs fixes for net The following patchset contains fixes for netfilter/ipvs. This round of fixes is larger than usual at this stage, specifically because of the nf_tables bridge reject fixes that I would like to see in 3.18. The patches are: 1) Fix a null-pointer dereference that may occur when logging errors. This problem was introduced by 4a4739d56b0 ("ipvs: Pull out crosses_local_route_boundary logic") in v3.17-rc5. 2) Update hook mask in nft_reject_bridge so we can also filter out packets from there. This fixes 36d2af5 ("netfilter: nf_tables: allow to filter from prerouting and postrouting"), which needs this chunk to work. 3) Two patches to refactor common code to forge the IPv4 and IPv6 reject packets from the bridge. These are required by the nf_tables reject bridge fix. 4) Fix nft_reject_bridge by avoiding the use of the IP stack to reject packets from the bridge. The idea is to forge the reject packets and inject them to the original port via br_deliver() which is now exported for that purpose. 5) Restrict nft_reject_bridge to bridge prerouting and input hooks. the original skbuff may cloned after prerouting when the bridge stack needs to flood it to several bridge ports, it is too late to reject the traffic. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31netfilter: nft_reject_bridge: restrict reject to prerouting and inputPablo Neira Ayuso
Restrict the reject expression to the prerouting and input bridge hooks. If we allow this to be used from forward or any other later bridge hook, if the frame is flooded to several ports, we'll end up sending several reject packets, one per cloned packet. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-31netfilter: nft_reject_bridge: don't use IP stack to reject trafficPablo Neira Ayuso
If the packet is received via the bridge stack, this cannot reject packets from the IP stack. This adds functions to build the reject packet and send it from the bridge stack. Comments and assumptions on this patch: 1) Validate the IPv4 and IPv6 headers before further processing, given that the packet comes from the bridge stack, we cannot assume they are clean. Truncated packets are dropped, we follow similar approach in the existing iptables match/target extensions that need to inspect layer 4 headers that is not available. This also includes packets that are directed to multicast and broadcast ethernet addresses. 2) br_deliver() is exported to inject the reject packet via bridge localout -> postrouting. So the approach is similar to what we already do in the iptables reject target. The reject packet is sent to the bridge port from which we have received the original packet. 3) The reject packet is forged based on the original packet. The TTL is set based on sysctl_ip_default_ttl for IPv4 and per-net ipv6.devconf_all hoplimit for IPv6. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-31netfilter: nf_reject_ipv6: split nf_send_reset6() in smaller functionsPablo Neira Ayuso
That can be reused by the reject bridge expression to build the reject packet. The new functions are: * nf_reject_ip6_tcphdr_get(): to sanitize and to obtain the TCP header. * nf_reject_ip6hdr_put(): to build the IPv6 header. * nf_reject_ip6_tcphdr_put(): to build the TCP header. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-31netfilter: nf_reject_ipv4: split nf_send_reset() in smaller functionsPablo Neira Ayuso
That can be reused by the reject bridge expression to build the reject packet. The new functions are: * nf_reject_ip_tcphdr_get(): to sanitize and to obtain the TCP header. * nf_reject_iphdr_put(): to build the IPv4 header. * nf_reject_ip_tcphdr_put(): to build the TCP header. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-31netfilter: nf_tables_bridge: update hook_mask to allow {pre,post}routingPablo Neira Ayuso
Fixes: 36d2af5 ("netfilter: nf_tables: allow to filter from prerouting and postrouting") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-30drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packetsBen Hutchings
UFO is now disabled on all drivers that work with virtio net headers, but userland may try to send UFO/IPv6 packets anyway. Instead of sending with ID=0, we should select identifiers on their behalf (as we used to). Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data") Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30net: skb_fclone_busy() needs to detect orphaned skbEric Dumazet
Some drivers are unable to perform TX completions in a bound time. They instead call skb_orphan() Problem is skb_fclone_busy() has to detect this case, otherwise we block TCP retransmits and can freeze unlucky tcp sessions on mostly idle hosts. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 1f3279ae0c13 ("tcp: avoid retransmits of TCP packets hanging in host queues") Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30gre: Use inner mac length when computing tunnel lengthTom Herbert
Currently, skb_inner_network_header is used but this does not account for Ethernet header for ETH_P_TEB. Use skb_inner_mac_header which handles TEB and also should work with IP encapsulation in which case inner mac and inner network headers are the same. Tested: Ran TCP_STREAM over GRE, worked as expected. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30ipv4: Do not cache routing failures due to disabled forwarding.Nicolas Cavallari
If we cache them, the kernel will reuse them, independently of whether forwarding is enabled or not. Which means that if forwarding is disabled on the input interface where the first routing request comes from, then that unreachable result will be cached and reused for other interfaces, even if forwarding is enabled on them. The opposite is also true. This can be verified with two interfaces A and B and an output interface C, where B has forwarding enabled, but not A and trying ip route get $dst iif A from $src && ip route get $dst iif B from $src Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr> Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30mac80211: properly flush delayed scan work on interface removalJohannes Berg
When an interface is deleted, an ongoing hardware scan is canceled and the driver must abort the scan, at the very least reporting completion while the interface is removed. However, if it scheduled the work that might only run after everything is said and done, which leads to cfg80211 warning that the scan isn't reported as finished yet; this is no fault of the driver, it already did, but mac80211 hasn't processed it. To fix this situation, flush the delayed work when the interface being removed is the one that was executing the scan. Cc: stable@vger.kernel.org Reported-by: Sujith Manoharan <sujith@msujith.org> Tested-by: Sujith Manoharan <sujith@msujith.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-30libceph: use memalloc flags for net IOMike Christie
This patch has ceph's lib code use the memalloc flags. If the VM layer needs to write data out to free up memory to handle new allocation requests, the block layer must be able to make forward progress. To handle that requirement we use structs like mempools to reserve memory for objects like bios and requests. The problem is when we send/receive block layer requests over the network layer, net skb allocations can fail and the system can lock up. To solve this, the memalloc related flags were added. NBD, iSCSI and NFS uses these flags to tell the network/vm layer that it should use memory reserves to fullfill allcation requests for structs like skbs. I am running ceph in a bunch of VMs in my laptop, so this patch was not tested very harshly. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2014-10-29inet: frags: remove the WARN_ON from inet_evict_bucketNikolay Aleksandrov
The WARN_ON in inet_evict_bucket can be triggered by a valid case: inet_frag_kill and inet_evict_bucket can be running in parallel on the same queue which means that there has been at least one more ref added by a previous inet_frag_find call, but inet_frag_kill can delete the timer before inet_evict_bucket which will cause the WARN_ON() there to trigger since we'll have refcnt!=1. Now, this case is valid because the queue is being "killed" for some reason (removed from the chain list and its timer deleted) so it will get destroyed in the end by one of the inet_frag_put() calls which reaches 0 i.e. refcnt is still valid. CC: Florian Westphal <fw@strlen.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McLean <chutzpah@gentoo.org> Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue") Reported-by: Patrick McLean <chutzpah@gentoo.org> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29inet: frags: fix a race between inet_evict_bucket and inet_frag_killNikolay Aleksandrov
When the evictor is running it adds some chosen frags to a local list to be evicted once the chain lock has been released but at the same time the *frag_queue can be running for some of the same queues and it may call inet_frag_kill which will wait on the chain lock and will then delete the queue from the wrong list since it was added in the eviction one. The fix is simple - check if the queue has the evict flag set under the chain lock before deleting it, this is safe because the evict flag is set only under that lock and having the flag set also means that the queue has been detached from the chain list, so no need to delete it again. An important note to make is that we're safe w.r.t refcnt because inet_frag_kill and inet_evict_bucket will sync on the del_timer operation where only one of the two can succeed (or if the timer is executing - none of them), the cases are: 1. inet_frag_kill succeeds in del_timer - then the timer ref is removed, but inet_evict_bucket will not add this queue to its expire list but will restart eviction in that chain 2. inet_evict_bucket succeeds in del_timer - then the timer ref is kept until the evictor "expires" the queue, but inet_frag_kill will remove the initial ref and will set INET_FRAG_COMPLETE which will make the frag_expire fn just to remove its ref. In the end all of the queue users will do an inet_frag_put and the one that reaches 0 will free it. The refcount balance should be okay. CC: Florian Westphal <fw@strlen.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McLean <chutzpah@gentoo.org> Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Patrick McLean <chutzpah@gentoo.org> Tested-by: Patrick McLean <chutzpah@gentoo.org> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29ipv6: notify userspace when we added or changed an ipv6 tokenLubomir Rintel
NetworkManager might want to know that it changed when the router advertisement arrives. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Daniel Borkmann <dborkman@redhat.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29sch_pie: schedule the timer after all init succeedWANG Cong
Cc: Vijay Subramanian <vijaynsu@cisco.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com>
2014-10-29mac80211: schedule the actual switch of the station before CSA count 0Luciano Coelho
Due to the time it takes to process the beacon that started the CSA process, we may be late for the switch if we try to reach exactly beacon 0. To avoid that, use count - 1 when calculating the switch time. Cc: stable@vger.kernel.org Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-29mac80211: use secondary channel offset IE also beacons during CSALuciano Coelho
If we are switching from an HT40+ to an HT40- channel (or vice-versa), we need the secondary channel offset IE to specify what is the post-CSA offset to be used. This applies both to beacons and to probe responses. In ieee80211_parse_ch_switch_ie() we were ignoring this IE from beacons and using the *current* HT information IE instead. This was causing us to use the same offset as before the switch. Fix that by using the secondary channel offset IE also for beacons and don't ever use the pre-switch offset. Additionally, remove the "beacon" argument from ieee80211_parse_ch_switch_ie(), since it's not needed anymore. Cc: stable@vger.kernel.org Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-29mac80211: flush keys for AP mode on ieee80211_do_stopFelix Fietkau
Userspace can add keys to an AP mode interface before start_ap has been called. If there have been no calls to start_ap/stop_ap in the mean time, the keys will still be around when the interface is brought down. Signed-off-by: Felix Fietkau <nbd@openwrt.org> [adjust comments, fix AP_VLAN case] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-28Merge tag 'master-2014-10-27' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-10-28 Please pull this batch of fixes intended for the 3.18 stream! For the mac80211 bits, Johannes says: "Here are a few fixes for the wireless stack: one fixes the RTS rate, one for a debugfs file, one to return the correct channel to userspace, a sanity check for a userspace value and the remaining two are just documentation fixes." For the iwlwifi bits, Emmanuel says: "I revert here a patch that caused interoperability issues. dvm gets a fix for a bug that was reported by many users. Two minor fixes for BT Coex and platform power fix that helps reducing latency when the PCIe link goes to low power states." In addition... Felix Fietkau adds a couple of ath code fixes related to regulatory rule enforcement. Hauke Mehrtens fixes a build break with bcma when CONFIG_OF_ADDRESS is not set. Karsten Wiese provides a trio of minor fixes for rtl8192cu. Kees Cook prevents a potential information leak in rtlwifi. Larry Finger also brings a trio of minor fixes for rtlwifi. Rafał Miłecki adds a device ID to the bcma bus driver. Rickard Strandqvist offers some strn* -> strl* changes in brcmfmac to eliminate non-terminated string issues. Sujith Manoharan avoids some ath9k stalls by enabling HW queue control only for MCC. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>