summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2013-04-25net: fix incorrect credentials passingLinus Torvalds
commit 83f1b4ba917db5dc5a061a44b3403ddb6e783494 upstream. Commit 257b5358b32f ("scm: Capture the full credentials of the scm sender") changed the credentials passing code to pass in the effective uid/gid instead of the real uid/gid. Obviously this doesn't matter most of the time (since normally they are the same), but it results in differences for suid binaries when the wrong uid/gid ends up being used. This just undoes that (presumably unintentional) part of the commit. Reported-by: Andy Lutomirski <luto@amacapital.net> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: scm_set_cred() does user namespace conversion of euid/egid using cred_to_ucred(). Add and use cred_real_to_ucred() to do the same thing for real uid/gid.] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-04-25can: gw: use kmem_cache_free() instead of kfree()Wei Yongjun
commit 3480a2125923e4b7a56d79efc76743089bf273fc upstream. Memory allocated by kmem_cache_alloc() should be freed using kmem_cache_free(), not kfree(). Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-04-10net: add a synchronize_net() in netdev_rx_handler_unregister()Eric Dumazet
[ Upstream commit 00cfec37484761a44a3b6f4675a54caa618210ae ] commit 35d48903e97819 (bonding: fix rx_handler locking) added a race in bonding driver, reported by Steven Rostedt who did a very good diagnosis : <quoting Steven> I'm currently debugging a crash in an old 3.0-rt kernel that one of our customers is seeing. The bug happens with a stress test that loads and unloads the bonding module in a loop (I don't know all the details as I'm not the one that is directly interacting with the customer). But the bug looks to be something that may still be present and possibly present in mainline too. It will just be much harder to trigger it in mainline. In -rt, interrupts are threads, and can schedule in and out just like any other thread. Note, mainline now supports interrupt threads so this may be easily reproducible in mainline as well. I don't have the ability to tell the customer to try mainline or other kernels, so my hands are somewhat tied to what I can do. But according to a core dump, I tracked down that the eth irq thread crashed in bond_handle_frame() here: slave = bond_slave_get_rcu(skb->dev); bond = slave->bond; <--- BUG the slave returned was NULL and accessing slave->bond caused a NULL pointer dereference. Looking at the code that unregisters the handler: void netdev_rx_handler_unregister(struct net_device *dev) { ASSERT_RTNL(); RCU_INIT_POINTER(dev->rx_handler, NULL); RCU_INIT_POINTER(dev->rx_handler_data, NULL); } Which is basically: dev->rx_handler = NULL; dev->rx_handler_data = NULL; And looking at __netif_receive_skb() we have: rx_handler = rcu_dereference(skb->dev->rx_handler); if (rx_handler) { if (pt_prev) { ret = deliver_skb(skb, pt_prev, orig_dev); pt_prev = NULL; } switch (rx_handler(&skb)) { My question to all of you is, what stops this interrupt from happening while the bonding module is unloading? What happens if the interrupt triggers and we have this: CPU0 CPU1 ---- ---- rx_handler = skb->dev->rx_handler netdev_rx_handler_unregister() { dev->rx_handler = NULL; dev->rx_handler_data = NULL; rx_handler() bond_handle_frame() { slave = skb->dev->rx_handler; bond = slave->bond; <-- NULL pointer dereference!!! What protection am I missing in the bond release handler that would prevent the above from happening? </quoting Steven> We can fix bug this in two ways. First is adding a test in bond_handle_frame() and others to check if rx_handler_data is NULL. A second way is adding a synchronize_net() in netdev_rx_handler_unregister() to make sure that a rcu protected reader has the guarantee to see a non NULL rx_handler_data. The second way is better as it avoids an extra test in fast path. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jiri Pirko <jpirko@redhat.com> Cc: Paul E. McKenney <paulmck@us.ibm.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-04-10ipv6: don't accept node local multicast traffic from the wireHannes Frederic Sowa
[ Upstream commit 1c4a154e5253687c51123956dfcee9e9dfa8542d ] Erik Hugne's errata proposal (Errata ID: 3480) to RFC4291 has been verified: http://www.rfc-editor.org/errata_search.php?eid=3480 We have to check for pkt_type and loopback flag because either the packets are allowed to travel over the loopback interface (in which case pkt_type is PACKET_HOST and IFF_LOOPBACK flag is set) or they travel over a non-loopback interface back to us (in which case PACKET_TYPE is PACKET_LOOPBACK and IFF_LOOPBACK flag is not set). Cc: Erik Hugne <erik.hugne@ericsson.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-04-10ipv6: fix bad free of addrconf_init_netHong Zhiguo
[ Upstream commit a79ca223e029aa4f09abb337accf1812c900a800 ] Signed-off-by: Hong Zhiguo <honkiko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-04-10ipv6: don't accept multicast traffic with scope 0Hannes Frederic Sowa
[ Upstream commit 20314092c1b41894d8c181bf9aa6f022be2416aa ] v2: a) moved before multicast source address check b) changed comment to netdev style Cc: Erik Hugne <erik.hugne@ericsson.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-04-10unix: fix a race condition in unix_release()Paul Moore
[ Upstream commit ded34e0fe8fe8c2d595bfa30626654e4b87621e0 ] As reported by Jan, and others over the past few years, there is a race condition caused by unix_release setting the sock->sk pointer to NULL before properly marking the socket as dead/orphaned. This can cause a problem with the LSM hook security_unix_may_send() if there is another socket attempting to write to this partially released socket in between when sock->sk is set to NULL and it is marked as dead/orphaned. This patch fixes this by only setting sock->sk to NULL after the socket has been marked as dead; I also take the opportunity to make unix_release_sock() a void function as it only ever returned 0/success. Dave, I think this one should go on the -stable pile. Special thanks to Jan for coming up with a reproducer for this problem. Reported-by: Jan Stancek <jan.stancek@gmail.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-04-10thermal: shorten too long mcast group nameMasatake YAMATO
[ Upstream commits 73214f5d9f33b79918b1f7babddd5c8af28dd23d and f1e79e208076ffe7bad97158275f1c572c04f5c7, the latter adds an assertion to genetlink to prevent this from happening again in the future. ] The original name is too long. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-04-108021q: fix a potential use-after-freeCong Wang
[ Upstream commit 4a7df340ed1bac190c124c1601bfc10cde9fb4fb ] vlan_vid_del() could possibly free ->vlan_info after a RCU grace period, however, we may still refer to the freed memory area by 'grp' pointer. Found by code inspection. This patch moves vlan_vid_del() as behind as possible. Cc: Patrick McHardy <kaber@trash.net> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-04-10tcp: undo spurious timeout after SACK renegingYuchung Cheng
[ Upstream commit 7ebe183c6d444ef5587d803b64a1f4734b18c564 ] On SACK reneging the sender immediately retransmits and forces a timeout but disables Eifel (undo). If the (buggy) receiver does not drop any packet this can trigger a false slow-start retransmit storm driven by the ACKs of the original packets. This can be detected with undo and TCP timestamps. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-04-10tcp: preserve ACK clocking in TSOEric Dumazet
[ Upstream commit f4541d60a449afd40448b06496dcd510f505928e ] A long standing problem with TSO is the fact that tcp_tso_should_defer() rearms the deferred timer, while it should not. Current code leads to following bad bursty behavior : 20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119 20:11:24.484337 IP B > A: . ack 263721 win 1117 20:11:24.485086 IP B > A: . ack 265241 win 1117 20:11:24.485925 IP B > A: . ack 266761 win 1117 20:11:24.486759 IP B > A: . ack 268281 win 1117 20:11:24.487594 IP B > A: . ack 269801 win 1117 20:11:24.488430 IP B > A: . ack 271321 win 1117 20:11:24.489267 IP B > A: . ack 272841 win 1117 20:11:24.490104 IP B > A: . ack 274361 win 1117 20:11:24.490939 IP B > A: . ack 275881 win 1117 20:11:24.491775 IP B > A: . ack 277401 win 1117 20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119 20:11:24.492620 IP B > A: . ack 278921 win 1117 20:11:24.493448 IP B > A: . ack 280441 win 1117 20:11:24.494286 IP B > A: . ack 281961 win 1117 20:11:24.495122 IP B > A: . ack 283481 win 1117 20:11:24.495958 IP B > A: . ack 285001 win 1117 20:11:24.496791 IP B > A: . ack 286521 win 1117 20:11:24.497628 IP B > A: . ack 288041 win 1117 20:11:24.498459 IP B > A: . ack 289561 win 1117 20:11:24.499296 IP B > A: . ack 291081 win 1117 20:11:24.500133 IP B > A: . ack 292601 win 1117 20:11:24.500970 IP B > A: . ack 294121 win 1117 20:11:24.501388 IP B > A: . ack 295641 win 1117 20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119 While the expected behavior is more like : 20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119 20:19:49.260446 IP B > A: . ack 154281 win 1212 20:19:49.261282 IP B > A: . ack 155801 win 1212 20:19:49.262125 IP B > A: . ack 157321 win 1212 20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119 20:19:49.262958 IP B > A: . ack 158841 win 1212 20:19:49.263795 IP B > A: . ack 160361 win 1212 20:19:49.264628 IP B > A: . ack 161881 win 1212 20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119 20:19:49.265465 IP B > A: . ack 163401 win 1212 20:19:49.265886 IP B > A: . ack 164921 win 1212 20:19:49.266722 IP B > A: . ack 166441 win 1212 20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119 20:19:49.267559 IP B > A: . ack 167961 win 1212 20:19:49.268394 IP B > A: . ack 169481 win 1212 20:19:49.269232 IP B > A: . ack 171001 win 1212 20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Van Jacobson <vanj@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-04-10SUNRPC: Add barriers to ensure read ordering in rpc_wake_up_task_queue_lockedTrond Myklebust
commit 1166fde6a923c30f4351515b6a9a1efc513e7d00 upstream. We need to be careful when testing task->tk_waitqueue in rpc_wake_up_task_queue_locked, because it can be changed while we are holding the queue->lock. By adding appropriate memory barriers, we can ensure that it is safe to test task->tk_waitqueue for equality if the RPC_TASK_QUEUED bit is set. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-04-10net/irda: add missing error path release_sock callKees Cook
commit 896ee0eee6261e30c3623be931c3f621428947df upstream. This makes sure that release_sock is called for all error conditions in irda_getsockopt. Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Brad Spengler <spender@grsecurity.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-27inet: limit length of fragment queue hash table bucket listsHannes Frederic Sowa
[ Upstream commit 5a3da1fe9561828d0ca7eca664b16ec2b9bf0055 ] This patch introduces a constant limit of the fragment queue hash table bucket list lengths. Currently the limit 128 is choosen somewhat arbitrary and just ensures that we can fill up the fragment cache with empty packets up to the default ip_frag_high_thresh limits. It should just protect from list iteration eating considerable amounts of cpu. If we reach the maximum length in one hash bucket a warning is printed. This is implemented on the caller side of inet_frag_find to distinguish between the different users of inet_fragment.c. I dropped the out of memory warning in the ipv4 fragment lookup path, because we already get a warning by the slab allocator. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jesper Dangaard Brouer <jbrouer@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-27rtnetlink: Mask the rta_type when range checkingVlad Yasevich
[ Upstream commit a5b8db91442fce9c9713fcd656c3698f1adde1d6 ] Range/validity checks on rta_type in rtnetlink_rcv_msg() do not account for flags that may be set. This causes the function to return -EINVAL when flags are set on the type (for example NLA_F_NESTED). Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-27tcp: fix skb_availroom()Eric Dumazet
[ Upstream commit 16fad69cfe4adbbfa813de516757b87bcae36d93 ] Chrome OS team reported a crash on a Pixel ChromeBook in TCP stack : https://code.google.com/p/chromium/issues/detail?id=182056 commit a21d45726acac (tcp: avoid order-1 allocations on wifi and tx path) did a poor choice adding an 'avail_size' field to skb, while what we really needed was a 'reserved_tailroom' one. It would have avoided commit 22b4a4f22da (tcp: fix retransmit of partially acked frames) and this commit. Crash occurs because skb_split() is not aware of the 'avail_size' management (and should not be aware) Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Mukesh Agrawal <quiche@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-27sctp: don't break the loop while meeting the active_path so as to find the ↵Xufeng Zhang
matched transport [ Upstream commit 2317f449af30073cfa6ec8352e4a65a89e357bdd ] sctp_assoc_lookup_tsn() function searchs which transport a certain TSN was sent on, if not found in the active_path transport, then go search all the other transports in the peer's transport_addr_list, however, we should continue to the next entry rather than break the loop when meet the active_path transport. Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-27sctp: Use correct sideffect command in duplicate cookie handlingVlad Yasevich
[ Upstream commit f2815633504b442ca0b0605c16bf3d88a3a0fcea ] When SCTP is done processing a duplicate cookie chunk, it tries to delete a newly created association. For that, it has to set the right association for the side-effect processing to work. However, when it uses the SCTP_CMD_NEW_ASOC command, that performs more work then really needed (like hashing the associationa and assigning it an id) and there is no point to do that only to delete the association as a next step. In fact, it also creates an impossible condition where an association may be found by the getsockopt() call, and that association is empty. This causes a crash in some sctp getsockopts. The solution is rather simple. We simply use SCTP_CMD_SET_ASOC command that doesn't have all the overhead and does exactly what we need. Reported-by: Karl Heiss <kheiss@gmail.com> Tested-by: Karl Heiss <kheiss@gmail.com> CC: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-27net/ipv4: Ensure that location of timestamp option is storedDavid Ward
[ Upstream commit 4660c7f498c07c43173142ea95145e9dac5a6d14 ] This is needed in order to detect if the timestamp option appears more than once in a packet, to remove the option if the packet is fragmented, etc. My previous change neglected to store the option location when the router addresses were prespecified and Pointer > Length. But now the option location is also stored when Flag is an unrecognized value, to ensure these option handling behaviors are still performed. Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-276lowpan: Fix endianness issue in is_addr_link_local().YOSHIFUJI Hideaki / 吉藤英明
[ Upstream commit 9026c4927254f5bea695cc3ef2e255280e6a3011 ] Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-27dcbnl: fix various netlink info leaksMathias Krause
[ Upstream commit 29cd8ae0e1a39e239a3a7b67da1986add1199fc0 ] The dcb netlink interface leaks stack memory in various places: * perm_addr[] buffer is only filled at max with 12 of the 32 bytes but copied completely, * no in-kernel driver fills all fields of an IEEE 802.1Qaz subcommand, so we're leaking up to 58 bytes for ieee_ets structs, up to 136 bytes for ieee_pfc structs, etc., * the same is true for CEE -- no in-kernel driver fills the whole struct, Prevent all of the above stack info leaks by properly initializing the buffers/structures involved. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-27rtnl: fix info leak on RTM_GETLINK request for VF devicesMathias Krause
[ Upstream commit 84d73cd3fb142bf1298a8c13fd4ca50fd2432372 ] Initialize the mac address buffer with 0 as the driver specific function will probably not fill the whole buffer. In fact, all in-kernel drivers fill only ETH_ALEN of the MAX_ADDR_LEN bytes, i.e. 6 of the 32 possible bytes. Therefore we currently leak 26 bytes of stack memory to userland via the netlink interface. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-27ipv6: stop multicast forwarding to process interface scoped addressesHannes Frederic Sowa
[ Upstream commit ddf64354af4a702ee0b85d0a285ba74c7278a460 ] v2: a) used struct ipv6_addr_props v3: a) reverted changes for ipv6_addr_props v4: a) do not use __ipv6_addr_needs_scope_id Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-27bridging: fix rx_handlers return codeCristian Bercaru
[ Upstream commit 3bc1b1add7a8484cc4a261c3e128dbe1528ce01f ] The frames for which rx_handlers return RX_HANDLER_CONSUMED are no longer counted as dropped. They are counted as successfully received by 'netif_receive_skb'. This allows network interface drivers to correctly update their RX-OK and RX-DRP counters based on the result of 'netif_receive_skb'. Signed-off-by: Cristian Bercaru <B43982@freescale.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-27netlabel: correctly list all the static label mappingsPaul Moore
[ Upstream commits 0c1233aba1e948c37f6dc7620cb7c253fcd71ce9 and a6a8fe950e1b8596bb06f2c89c3a1a4bf2011ba9 ] When we have a large number of static label mappings that spill across the netlink message boundary we fail to properly save our state in the netlink_callback struct which causes us to repeat the same listings. This patch fixes this problem by saving the state correctly between calls to the NetLabel static label netlink "dumpit" routines. Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-27tcp: fix double-counted receiver RTT when leaving receiver fast pathNeal Cardwell
[ Upstream commit aab2b4bf224ef8358d262f95b568b8ad0cecf0a0 ] We should not update ts_recent and call tcp_rcv_rtt_measure_ts() both before and after going to step5. That wastes CPU and double-counts the receiver-side RTT sample. Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-27net: ipv6: Don't purge default router if accept_ra=2Lorenzo Colitti
[ Upstream commit 3e8b0ac3e41e3c882222a5522d5df7212438ab51 ] Setting net.ipv6.conf.<interface>.accept_ra=2 causes the kernel to accept RAs even when forwarding is enabled. However, enabling forwarding purges all default routes on the system, breaking connectivity until the next RA is received. Fix this by not purging default routes on interfaces that have accept_ra=2. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-27rds: limit the size allocated by rds_message_alloc()Cong Wang
[ Upstream commit ece6b0a2b25652d684a7ced4ae680a863af041e0 ] Dave Jones reported the following bug: "When fed mangled socket data, rds will trust what userspace gives it, and tries to allocate enormous amounts of memory larger than what kmalloc can satisfy." WARNING: at mm/page_alloc.c:2393 __alloc_pages_nodemask+0xa0d/0xbe0() Hardware name: GA-MA78GM-S2H Modules linked in: vmw_vsock_vmci_transport vmw_vmci vsock fuse bnep dlci bridge 8021q garp stp mrp binfmt_misc l2tp_ppp l2tp_core rfcomm s Pid: 24652, comm: trinity-child2 Not tainted 3.8.0+ #65 Call Trace: [<ffffffff81044155>] warn_slowpath_common+0x75/0xa0 [<ffffffff8104419a>] warn_slowpath_null+0x1a/0x20 [<ffffffff811444ad>] __alloc_pages_nodemask+0xa0d/0xbe0 [<ffffffff8100a196>] ? native_sched_clock+0x26/0x90 [<ffffffff810b2128>] ? trace_hardirqs_off_caller+0x28/0xc0 [<ffffffff810b21cd>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff811861f8>] alloc_pages_current+0xb8/0x180 [<ffffffff8113eaaa>] __get_free_pages+0x2a/0x80 [<ffffffff811934fe>] kmalloc_order_trace+0x3e/0x1a0 [<ffffffff81193955>] __kmalloc+0x2f5/0x3a0 [<ffffffff8104df0c>] ? local_bh_enable_ip+0x7c/0xf0 [<ffffffffa0401ab3>] rds_message_alloc+0x23/0xb0 [rds] [<ffffffffa04043a1>] rds_sendmsg+0x2b1/0x990 [rds] [<ffffffff810b21cd>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff81564620>] sock_sendmsg+0xb0/0xe0 [<ffffffff810b2052>] ? get_lock_stats+0x22/0x70 [<ffffffff810b24be>] ? put_lock_stats.isra.23+0xe/0x40 [<ffffffff81567f30>] sys_sendto+0x130/0x180 [<ffffffff810b872d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff816c547b>] ? _raw_spin_unlock_irq+0x3b/0x60 [<ffffffff816cd767>] ? sysret_check+0x1b/0x56 [<ffffffff810b8695>] ? trace_hardirqs_on_caller+0x115/0x1a0 [<ffffffff81341d8e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff816cd742>] system_call_fastpath+0x16/0x1b ---[ end trace eed6ae990d018c8b ]--- Reported-by: Dave Jones <davej@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-27l2tp: Restore socket refcount when sendmsg succeedsGuillaume Nault
[ Upstream commit 8b82547e33e85fc24d4d172a93c796de1fefa81a ] The sendmsg() syscall handler for PPPoL2TP doesn't decrease the socket reference counter after successful transmissions. Any successful sendmsg() call from userspace will then increase the reference counter forever, thus preventing the kernel's session and tunnel data from being freed later on. The problem only happens when writing directly on L2TP sockets. PPP sockets attached to L2TP are unaffected as the PPP subsystem uses pppol2tp_xmit() which symmetrically increase/decrease reference counters. This patch adds the missing call to sock_put() before returning from pppol2tp_sendmsg(). Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-20batman-adv: Only write requested number of byte to user bufferSven Eckelmann
commit b5a1eeef04cc7859f34dec9b72ea1b28e4aba07c upstream. Don't write more than the requested number of bytes of an batman-adv icmp packet to the userspace buffer. Otherwise unrelated userspace memory might get overridden by the kernel. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-20batman-adv: bat_socket_read missing checksPaul Kot
commit c00b6856fc642b234895cfabd15b289e76726430 upstream. Writing a icmp_packet_rr and then reading icmp_packet can lead to kernel memory corruption, if __user *buf is just below TASK_SIZE. Signed-off-by: Paul Kot <pawlkt@gmail.com> [sven@narfation.org: made it checkpatch clean] Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-20decnet: Fix disappearing sysctl entriesEric W. Biederman
When decnet is built as a module a simple: echo 0.0 >/proc/sys/net/decnet/node_address results in most of the sysctl entries under /proc/sys/net/decnet and /proc/sys/net/decnet/conf disappearing. For more details see http://www.spinics.net/lists/netdev/msg226123.html. This change applies the same workaround used in net/core/sysctl_net_core.c and net/ipv6/sysctl_net_ipv6.c of creating a skeleton of decnet sysctl entries before doing anything else. The problem first appeared in kernel 2.6.27. The later rewrite of sysctl in kernel 3.4 restored the previous behavior and eliminated the need for this workaround. This patch was heavily inspired by a similar but more complex patch by Larry Baker. Reported-by: Larry Baker <baker@usgs.gov> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-20SUNRPC: Don't start the retransmission timer when out of socket spaceTrond Myklebust
commit a9a6b52ee1baa865283a91eb8d443ee91adfca56 upstream. If the socket is full, we're better off just waiting until it empties, or until the connection is broken. The reason why we generally don't want to time out is that the call to xprt->ops->release_xprt() will trigger a connection reset, which isn't helpful... Let's make an exception for soft RPC calls, since they have to provide timeout guarantees. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06ipv6: use a stronger hash for tcpEric Dumazet
[ Upstream commit 08dcdbf6a7b9d14c2302c5bd0c5390ddf122f664 ] It looks like its possible to open thousands of TCP IPv6 sessions on a server, all landing in a single slot of TCP hash table. Incoming packets have to lookup sockets in a very long list. We should hash all bits from foreign IPv6 addresses, using a salt and hash mix, not a simple XOR. inet6_ehashfn() can also separately use the ports, instead of xoring them. Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06ipv4: fix a bug in ping_err().Li Wei
[ Upstream commit b531ed61a2a2a77eeb2f7c88b49aa5ec7d9880d8 ] We should get 'type' and 'code' from the outer ICMP header. Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06bridge: set priority of STP packetsStephen Hemminger
[ Upstream commit 547b4e718115eea74087e28d7fa70aec619200db ] Spanning Tree Protocol packets should have always been marked as control packets, this causes them to get queued in the high prirority FIFO. As Radia Perlman mentioned in her LCA talk, STP dies if bridge gets overloaded and can't communicate. This is a long-standing bug back to the first versions of Linux bridge. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06svcrpc: make svc_age_temp_xprts enqueue under sv_lockJ. Bruce Fields
commit e75bafbff2270993926abcc31358361db74a9bc2 upstream. svc_age_temp_xprts expires xprts in a two-step process: first it takes the sv_lock and moves the xprts to expire off their server-wide list (sv_tempsocks or sv_permsocks) to a local list. Then it drops the sv_lock and enqueues and puts each one. I see no reason for this: svc_xprt_enqueue() will take sp_lock, but the sv_lock and sp_lock are not otherwise nested anywhere (and documentation at the top of this file claims it's correct to nest these with sp_lock inside.) Tested-by: Jason Tibbitts <tibbs@math.uh.edu> Tested-by: Paweł Sikora <pawel.sikora@agmk.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-02-20bridge: Pull ip header into skb->data before looking into ip header.Sarveshwar Bandi
[ Upstream commit 6caab7b0544e83e6c160b5e80f5a4a7dd69545c7 ] If lower layer driver leaves the ip header in the skb fragment, it needs to be first pulled into skb->data before inspecting ip header length or ip version number. Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-02-20tcp: fix for zero packets_in_flight was too broadIlpo Järvinen
[ Upstream commit 6731d2095bd4aef18027c72ef845ab1087c3ba63 ] There are transients during normal FRTO procedure during which the packets_in_flight can go to zero between write_queue state updates and firing the resulting segments out. As FRTO processing occurs during that window the check must be more precise to not match "spuriously" :-). More specificly, e.g., when packets_in_flight is zero but FLAG_DATA_ACKED is true the problematic branch that set cwnd into zero would not be taken and new segments might be sent out later. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Tested-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-02-20tcp: frto should not set snd_cwnd to 0Eric Dumazet
[ Upstream commit 2e5f421211ff76c17130b4597bc06df4eeead24f ] Commit 9dc274151a548 (tcp: fix ABC in tcp_slow_start()) uncovered a bug in FRTO code : tcp_process_frto() is setting snd_cwnd to 0 if the number of in flight packets is 0. As Neal pointed out, if no packet is in flight we lost our chance to disambiguate whether a loss timeout was spurious. We should assume it was a proper loss. Reported-by: Pasi Kärkkäinen <pasik@iki.fi> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-02-20net: sctp: sctp_endpoint_free: zero out secret key dataDaniel Borkmann
[ Upstream commit b5c37fe6e24eec194bb29d22fdd55d73bcc709bf ] On sctp_endpoint_destroy, previously used sensitive keying material should be zeroed out before the memory is returned, as we already do with e.g. auth keys when released. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-02-20net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfreeDaniel Borkmann
[ Upstream commit 6ba542a291a5e558603ac51cda9bded347ce7627 ] In sctp_setsockopt_auth_key, we create a temporary copy of the user passed shared auth key for the endpoint or association and after internal setup, we free it right away. Since it's sensitive data, we should zero out the key before returning the memory back to the allocator. Thus, use kzfree instead of kfree, just as we do in sctp_auth_key_put(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-02-20sctp: refactor sctp_outq_teardown to insure proper re-initalizationNeil Horman
[ Upstream commit 2f94aabd9f6c925d77aecb3ff020f1cc12ed8f86 ] Jamie Parsons reported a problem recently, in which the re-initalization of an association (The duplicate init case), resulted in a loss of receive window space. He tracked down the root cause to sctp_outq_teardown, which discarded all the data on an outq during a re-initalization of the corresponding association, but never reset the outq->outstanding_data field to zero. I wrote, and he tested this fix, which does a proper full re-initalization of the outq, fixing this problem, and hopefully future proofing us from simmilar issues down the road. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Jamie Parsons <Jamie.Parsons@metaswitch.com> Tested-by: Jamie Parsons <Jamie.Parsons@metaswitch.com> CC: Jamie Parsons <Jamie.Parsons@metaswitch.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: netdev@vger.kernel.org Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-02-20packet: fix leakage of tx_ring memoryPhil Sutter
[ Upstream commit 9665d5d62487e8e7b1f546c00e11107155384b9a ] When releasing a packet socket, the routine packet_set_ring() is reused to free rings instead of allocating them. But when calling it for the first time, it fills req->tp_block_nr with the value of rb->pg_vec_len which in the second invocation makes it bail out since req->tp_block_nr is greater zero but req->tp_block_size is zero. This patch solves the problem by passing a zeroed auto-variable to packet_set_ring() upon each invocation from packet_release(). As far as I can tell, this issue exists even since 69e3c75 (net: TX_RING and packet mmap), i.e. the original inclusion of TX ring support into af_packet, but applies only to sockets with both RX and TX ring allocated, which is probably why this was unnoticed all the time. Signed-off-by: Phil Sutter <phil.sutter@viprinet.com> Cc: Johann Baudy <johann.baudy@gnu-log.net> Cc: Daniel Borkmann <dborkman@redhat.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-02-20ipv6: do not create neighbor entries for local deliveryMarcelo Ricardo Leitner
[ Upstream commit bd30e947207e2ea0ff2c08f5b4a03025ddce48d3 ] They will be created at output, if ever needed. This avoids creating empty neighbor entries when TPROXYing/Forwarding packets for addresses that are not even directly reachable. Note that IPv4 already handles it this way. No neighbor entries are created for local input. Tested by myself and customer. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-02-20pktgen: correctly handle failures when adding a deviceCong Wang
[ Upstream commit 604dfd6efc9b79bce432f2394791708d8e8f6efc ] The return value of pktgen_add_device() is not checked, so even if we fail to add some device, for example, non-exist one, we still see "OK:...". This patch fixes it. After this patch, I got: # echo "add_device non-exist" > /proc/net/pktgen/kpktgend_0 -bash: echo: write error: No such device # cat /proc/net/pktgen/kpktgend_0 Running: Stopped: Result: ERROR: can not add device non-exist # echo "add_device eth0" > /proc/net/pktgen/kpktgend_0 # cat /proc/net/pktgen/kpktgend_0 Running: Stopped: eth0 Result: OK: add_device=eth0 (Candidate for -stable) Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-02-20ipv6: fix header length calculation in ip6_append_data()Romain KUNTZ
[ Upstream commit 7efdba5bd9a2f3e2059beeb45c9fa55eefe1bced ] Commit 299b0767 (ipv6: Fix IPsec slowpath fragmentation problem) has introduced a error in the header length calculation that provokes corrupted packets when non-fragmentable extensions headers (Destination Option or Routing Header Type 2) are used. rt->rt6i_nfheader_len is the length of the non-fragmentable extension header, and it should be substracted to rt->dst.header_len, and not to exthdrlen, as it was done before commit 299b0767. This patch reverts to the original and correct behavior. It has been successfully tested with and without IPsec on packets that include non-fragmentable extensions headers. Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-02-20ipv6: fix the noflags test in addrconf_get_prefix_routeRomain Kuntz
[ Upstream commit 85da53bf1c336bb07ac038fb951403ab0478d2c5 ] The tests on the flags in addrconf_get_prefix_route() does no make much sense: the 'noflags' parameter contains the set of flags that must not match with the route flags, so the test must be done against 'noflags', and not against 'flags'. Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-02-20net: prevent setting ttl=0 via IP_TTLCong Wang
[ Upstream commit c9be4a5c49cf51cc70a993f004c5bb30067a65ce ] A regression is introduced by the following commit: commit 4d52cfbef6266092d535237ba5a4b981458ab171 Author: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue Jun 2 00:42:16 2009 -0700 net: ipv4/ip_sockglue.c cleanups Pure cleanups but it is not a pure cleanup... - if (val != -1 && (val < 1 || val>255)) + if (val != -1 && (val < 0 || val > 255)) Since there is no reason provided to allow ttl=0, change it back. Reported-by: nitin padalia <padalia.nitin@gmail.com> Cc: nitin padalia <padalia.nitin@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-02-20Bluetooth: Fix sending HCI commands after resetSzymon Janc
commit dbccd791a3fbbdac12c33834b73beff3984988e9 upstream. After sending reset command wait for its command complete event before sending next command. Some chips sends CC event for command received before reset if reset was send before chip replied with CC. This is also required by specification that host shall not send additional HCI commands before receiving CC for reset. < HCI Command: Reset (0x03|0x0003) plen 0 [hci0] 18.404612 > HCI Event: Command Complete (0x0e) plen 4 [hci0] 18.405850 Write Extended Inquiry Response (0x03|0x0052) ncmd 1 Status: Success (0x00) < HCI Command: Read Local Supported Features (0x04|0x0003) plen 0 [hci0] 18.406079 > HCI Event: Command Complete (0x0e) plen 4 [hci0] 18.407864 Reset (0x03|0x0003) ncmd 1 Status: Success (0x00) < HCI Command: Read Local Supported Features (0x04|0x0003) plen 0 [hci0] 18.408062 > HCI Event: Command Complete (0x0e) plen 12 [hci0] 18.408835 Signed-off-by: Szymon Janc <szymon.janc@tieto.com> Acked-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>