summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2011-03-07arp_notify: unconditionally send gratuitous ARP for NETDEV_NOTIFY_PEERS.Ian Campbell
commit d11327ad6695db8117c78d70611e71102ceec2ac upstream. NETDEV_NOTIFY_PEER is an explicit request by the driver to send a link notification while NETDEV_UP/NETDEV_CHANGEADDR generate link notifications as a sort of side effect. In the later cases the sysctl option is present because link notification events can have undesired effects e.g. if the link is flapping. I don't think this applies in the case of an explicit request from a driver. This patch makes NETDEV_NOTIFY_PEER unconditional, if preferred we could add a new sysctl for this case which defaults to on. This change causes Xen post-migration ARP notifications (which cause switches to relearn their MAC tables etc) to be sent by default. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net> [reported to solve hyperv live migration problem - gkh] Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Mike Surcouf <mike@surcouf.co.uk> Cc: Hank Janssen <hjanssen@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-07dccp: fix oops on Reset after closeGerrit Renker
commit 720dc34bbbe9493c7bd48b2243058b4e447a929d upstream. This fixes a bug in the order of dccp_rcv_state_process() that still permitted reception even after closing the socket. A Reset after close thus causes a NULL pointer dereference by not preventing operations on an already torn-down socket. dccp_v4_do_rcv() | | state other than OPEN v dccp_rcv_state_process() | | DCCP_PKT_RESET v dccp_rcv_reset() | v dccp_time_wait() WARNING: at net/ipv4/inet_timewait_sock.c:141 __inet_twsk_hashdance+0x48/0x128() Modules linked in: arc4 ecb carl9170 rt2870sta(C) mac80211 r8712u(C) crc_ccitt ah [<c0038850>] (unwind_backtrace+0x0/0xec) from [<c0055364>] (warn_slowpath_common) [<c0055364>] (warn_slowpath_common+0x4c/0x64) from [<c0055398>] (warn_slowpath_n) [<c0055398>] (warn_slowpath_null+0x1c/0x24) from [<c02b72d0>] (__inet_twsk_hashd) [<c02b72d0>] (__inet_twsk_hashdance+0x48/0x128) from [<c031caa0>] (dccp_time_wai) [<c031caa0>] (dccp_time_wait+0x40/0xc8) from [<c031c15c>] (dccp_rcv_state_proces) [<c031c15c>] (dccp_rcv_state_process+0x120/0x538) from [<c032609c>] (dccp_v4_do_) [<c032609c>] (dccp_v4_do_rcv+0x11c/0x14c) from [<c0286594>] (release_sock+0xac/0) [<c0286594>] (release_sock+0xac/0x110) from [<c031fd34>] (dccp_close+0x28c/0x380) [<c031fd34>] (dccp_close+0x28c/0x380) from [<c02d9a78>] (inet_release+0x64/0x70) The fix is by testing the socket state first. Receiving a packet in Closed state now also produces the required "No connection" Reset reply of RFC 4340, 8.3.1. Reported-and-tested-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-07sctp: Fix oops when sending queued ASCONF chunksVlad Yasevich
commit c0786693404cffd80ca3cb6e75ee7b35186b2825 upstream. When we finish processing ASCONF_ACK chunk, we try to send the next queued ASCONF. This action runs the sctp state machine recursively and it's not prepared to do so. kernel BUG at kernel/timer.c:790! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/module/ipv6/initstate Modules linked in: sha256_generic sctp libcrc32c ipv6 dm_multipath uinput 8139too i2c_piix4 8139cp mii i2c_core pcspkr virtio_net joydev floppy virtio_blk virtio_pci [last unloaded: scsi_wait_scan] Pid: 0, comm: swapper Not tainted 2.6.34-rc4 #15 /Bochs EIP: 0060:[<c044a2ef>] EFLAGS: 00010286 CPU: 0 EIP is at add_timer+0xd/0x1b EAX: cecbab14 EBX: 000000f0 ECX: c0957b1c EDX: 03595cf4 ESI: cecba800 EDI: cf276f00 EBP: c0957aa0 ESP: c0957aa0 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process swapper (pid: 0, ti=c0956000 task=c0988ba0 task.ti=c0956000) Stack: c0957ae0 d1851214 c0ab62e4 c0ab5f26 0500ffff 00000004 00000005 00000004 <0> 00000000 d18694fd 00000004 1666b892 cecba800 cecba800 c0957b14 00000004 <0> c0957b94 d1851b11 ceda8b00 cecba800 cf276f00 00000001 c0957b14 000000d0 Call Trace: [<d1851214>] ? sctp_side_effects+0x607/0xdfc [sctp] [<d1851b11>] ? sctp_do_sm+0x108/0x159 [sctp] [<d1863386>] ? sctp_pname+0x0/0x1d [sctp] [<d1861a56>] ? sctp_primitive_ASCONF+0x36/0x3b [sctp] [<d185657c>] ? sctp_process_asconf_ack+0x2a4/0x2d3 [sctp] [<d184e35c>] ? sctp_sf_do_asconf_ack+0x1dd/0x2b4 [sctp] [<d1851ac1>] ? sctp_do_sm+0xb8/0x159 [sctp] [<d1863334>] ? sctp_cname+0x0/0x52 [sctp] [<d1854377>] ? sctp_assoc_bh_rcv+0xac/0xe1 [sctp] [<d1858f0f>] ? sctp_inq_push+0x2d/0x30 [sctp] [<d186329d>] ? sctp_rcv+0x797/0x82e [sctp] Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Yuansong Qiao <ysqiao@research.ait.ie> Signed-off-by: Shuaijun Zhang <szhang@research.ait.ie> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02x25: Do not reference freed memory.David S. Miller
commit 96642d42f076101ba98866363d908cab706d156c upstream. In x25_link_free(), we destroy 'nb' before dereferencing 'nb->dev'. Don't do this, because 'nb' might be freed by then. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Tested-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02tcp: Make TCP_MAXSEG minimum more correct.David S. Miller
commit c39508d6f118308355468314ff414644115a07f3 upstream. Use TCP_MIN_MSS instead of constant 64. Reported-by: Min Zhang <mzhang@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Moritz Muehlenhoff <jmm@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02tcp: Increase TCP_MAXSEG socket option minimum.David S. Miller
commit 7a1abd08d52fdeddb3e9a5a33f2f15cc6a5674d2 upstream. As noted by Steve Chen, since commit f5fff5dc8a7a3f395b0525c02ba92c95d42b7390 ("tcp: advertise MSS requested by user") we can end up with a situation where tcp_select_initial_window() does a divide by a zero (or even negative) mss value. The problem is that sometimes we effectively subtract TCPOLEN_TSTAMP_ALIGNED and/or TCPOLEN_MD5SIG_ALIGNED from the mss. Fix this by increasing the minimum from 8 to 64. Reported-by: Steve Chen <schen@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Moritz Muehlenhoff <jmm@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02sunrpc/cache: fix module refcnt leak in a failure pathLi Zefan
commit a5990ea1254cd186b38744507aeec3136a0c1c95 upstream. Don't forget to release the module refcnt if seq_open() returns failure. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02x25: decrement netdev reference counts on unloadApollon Oikonomopoulos
commit 171995e5d82dcc92bea37a7d2a2ecc21068a0f19 upstream. x25 does not decrement the network device reference counts on module unload. Thus unregistering any pre-existing interface after unloading the x25 module hangs and results in unregister_netdevice: waiting for tap0 to become free. Usage count = 1 This patch decrements the reference counts of all interfaces in x25_link_free, the way it is already done in x25_link_device_down for NETDEV_DOWN events. Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02filter: make sure filters dont read uninitialized memoryDavid S. Miller
commit 57fe93b374a6b8711995c2d466c502af9f3a08bb upstream. There is a possibility malicious users can get limited information about uninitialized stack mem array. Even if sk_run_filter() result is bound to packet length (0 .. 65535), we could imagine this can be used by hostile user. Initializing mem[] array, like Dan Rosenberg suggested in his patch is expensive since most filters dont even use this array. Its hard to make the filter validation in sk_chk_filter(), because of the jumps. This might be done later. In this patch, I use a bitmap (a single long var) so that only filters using mem[] loads/stores pay the price of added security checks. For other filters, additional cost is a single instruction. [ Since we access fentry->k a lot now, cache it in a local variable and mark filter entry pointer as const. -DaveM ] Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> [Backported by dann frazier <dannf@debian.org>] Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02sctp: Fix out-of-bounds reading in sctp_asoc_get_hmac()Dan Rosenberg
commit 51e97a12bef19b7e43199fc153cf9bd5f2140362 upstream. The sctp_asoc_get_hmac() function iterates through a peer's hmac_ids array and attempts to ensure that only a supported hmac entry is returned. The current code fails to do this properly - if the last id in the array is out of range (greater than SCTP_AUTH_HMAC_ID_MAX), the id integer remains set after exiting the loop, and the address of an out-of-bounds entry will be returned and subsequently used in the parent function, causing potentially ugly memory corruption. This patch resets the id integer to 0 on encountering an invalid id so that NULL will be returned after finishing the loop if no valid ids are found. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17sched: Fix softirq time accountingVenkatesh Pallipadi
Commit: 75e1056f5c57050415b64cb761a3acc35d91f013 upstream Peter Zijlstra found a bug in the way softirq time is accounted in VIRT_CPU_ACCOUNTING on this thread: http://lkml.indiana.edu/hypermail//linux/kernel/1009.2/01366.html The problem is, softirq processing uses local_bh_disable internally. There is no way, later in the flow, to differentiate between whether softirq is being processed or is it just that bh has been disabled. So, a hardirq when bh is disabled results in time being wrongly accounted as softirq. Looking at the code a bit more, the problem exists in !VIRT_CPU_ACCOUNTING as well. As account_system_time() in normal tick based accouting also uses softirq_count, which will be set even when not in softirq with bh disabled. Peter also suggested solution of using 2*SOFTIRQ_OFFSET as irq count for local_bh_{disable,enable} and using just SOFTIRQ_OFFSET while softirq processing. The patch below does that and adds API in_serving_softirq() which returns whether we are currently processing softirq or not. Also changes one of the usages of softirq_count in net/sched/cls_cgroup.c to in_serving_softirq. Looks like many usages of in_softirq really want in_serving_softirq. Those changes can be made individually on a case by case basis. Signed-off-by: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1286237003-12406-2-git-send-email-venki@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-01-07sctp: Fix a race between ICMP protocol unreachable and connect()Vlad Yasevich
commit 50b5d6ad63821cea324a5a7a19854d4de1a0a819 upstream. ICMP protocol unreachable handling completely disregarded the fact that the user may have locked the socket. It proceeded to destroy the association, even though the user may have held the lock and had a ref on the association. This resulted in the following: Attempt to release alive inet socket f6afcc00 ========================= [ BUG: held lock freed! ] ------------------------- somenu/2672 is freeing memory f6afcc00-f6afcfff, with a lock still held there! (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c 1 lock held by somenu/2672: #0: (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c stack backtrace: Pid: 2672, comm: somenu Not tainted 2.6.32-telco #55 Call Trace: [<c1232266>] ? printk+0xf/0x11 [<c1038553>] debug_check_no_locks_freed+0xce/0xff [<c10620b4>] kmem_cache_free+0x21/0x66 [<c1185f25>] __sk_free+0x9d/0xab [<c1185f9c>] sk_free+0x1c/0x1e [<c1216e38>] sctp_association_put+0x32/0x89 [<c1220865>] __sctp_connect+0x36d/0x3f4 [<c122098a>] ? sctp_connect+0x13/0x4c [<c102d073>] ? autoremove_wake_function+0x0/0x33 [<c12209a8>] sctp_connect+0x31/0x4c [<c11d1e80>] inet_dgram_connect+0x4b/0x55 [<c11834fa>] sys_connect+0x54/0x71 [<c103a3a2>] ? lock_release_non_nested+0x88/0x239 [<c1054026>] ? might_fault+0x42/0x7c [<c1054026>] ? might_fault+0x42/0x7c [<c11847ab>] sys_socketcall+0x6d/0x178 [<c10da994>] ? trace_hardirqs_on_thunk+0xc/0x10 [<c1002959>] syscall_call+0x7/0xb This was because the sctp_wait_for_connect() would aqcure the socket lock and then proceed to release the last reference count on the association, thus cause the fully destruction path to finish freeing the socket. The simplest solution is to start a very short timer in case the socket is owned by user. When the timer expires, we can do some verification and be able to do the release properly. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-01-07net: release dst entry while cache-hot for GSO case tooKrishna Kumar
commit 068a2de57ddf4f472e32e7af868613c574ad1d88 upstream. Non-GSO code drops dst entry for performance reasons, but the same is missing for GSO code. Drop dst while cache-hot for GSO case too. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-01-07sunrpc: prevent use-after-free on clearing XPT_BUSYNeilBrown
commit ed2849d3ecfa339435818eeff28f6c3424300cec upstream. When an xprt is created, it has a refcount of 1, and XPT_BUSY is set. The refcount is *not* owned by the thread that created the xprt (as is clear from the fact that creators never put the reference). Rather, it is owned by the absence of XPT_DEAD. Once XPT_DEAD is set, (And XPT_BUSY is clear) that initial reference is dropped and the xprt can be freed. So when a creator clears XPT_BUSY it is dropping its only reference and so must not touch the xprt again. However svc_recv, after calling ->xpo_accept (and so getting an XPT_BUSY reference on a new xprt), calls svc_xprt_recieved. This clears XPT_BUSY and then svc_xprt_enqueue - this last without owning a reference. This is dangerous and has been seen to leave svc_xprt_enqueue working with an xprt containing garbage. So we need to hold an extra counted reference over that call to svc_xprt_received. For safety, any time we clear XPT_BUSY and then use the xprt again, we first get a reference, and the put it again afterwards. Note that svc_close_all does not need this extra protection as there are no threads running, and the final free can only be called asynchronously from such a thread. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09net sched: fix some kernel memory leaksEric Dumazet
commit 1c40be12f7d8ca1d387510d39787b12e512a7ce8 upstream. We leak at least 32bits of kernel memory to user land in tc dump, because we dont init all fields (capab ?) of the dumped structure. Use C99 initializers so that holes and non explicit fields are zeroed. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: dann frazier <dannf@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09act_nat: use stack variableChangli Gao
commit 504f85c9d05f7c605306e808f0d835fe11bfd18d upstream. act_nat: use stack variable structure tc_nat isn't too big for stack, so we can put it in stack. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Cc: dann frazier <dannf@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09net: Limit socket I/O iovec total length to INT_MAX.David S. Miller
commit 8acfe468b0384e834a303f08ebc4953d72fb690a upstream. This helps protect us from overflow issues down in the individual protocol sendmsg/recvmsg handlers. Once we hit INT_MAX we truncate out the rest of the iovec by setting the iov_len members to zero. This works because: 1) For SOCK_STREAM and SOCK_SEQPACKET sockets, partial writes are allowed and the application will just continue with another write to send the rest of the data. 2) For datagram oriented sockets, where there must be a one-to-one correspondance between write() calls and packets on the wire, INT_MAX is going to be far larger than the packet size limit the protocol is going to check for and signal with -EMSGSIZE. Based upon a patch by Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09net: Truncate recvfrom and sendto length to INT_MAX.Linus Torvalds
commit 253eacc070b114c2ec1f81b067d2fed7305467b0 upstream. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09rds: Integer overflow in RDS cmsg handlingDan Rosenberg
commit 218854af84038d828a32f061858b1902ed2beec6 upstream. In rds_cmsg_rdma_args(), the user-provided args->nr_local value is restricted to less than UINT_MAX. This seems to need a tighter upper bound, since the calculation of total iov_size can overflow, resulting in a small sock_kmalloc() allocation. This would probably just result in walking off the heap and crashing when calling rds_rdma_pages() with a high count value. If it somehow doesn't crash here, then memory corruption could occur soon after. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09econet: fix CVE-2010-3850Phil Blundell
commit 16c41745c7b92a243d0874f534c1655196c64b74 upstream. Add missing check for capable(CAP_NET_ADMIN) in SIOCSIFADDR operation. Signed-off-by: Phil Blundell <philb@gnu.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09econet: disallow NULL remote addr for sendmsg(), fixes CVE-2010-3849Phil Blundell
commit fa0e846494792e722d817b9d3d625a4ef4896c96 upstream. Later parts of econet_sendmsg() rely on saddr != NULL, so return early with EINVAL if NULL was passed otherwise an oops may occur. Signed-off-by: Phil Blundell <philb@gnu.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09x25: Prevent crashing when parsing bad X.25 facilitiesDan Rosenberg
commit 5ef41308f94dcbb3b7afc56cdef1c2ba53fa5d2f upstream. Now with improved comma support. On parsing malformed X.25 facilities, decrementing the remaining length may cause it to underflow. Since the length is an unsigned integer, this will result in the loop continuing until the kernel crashes. This patch adds checks to ensure decrementing the remaining length does not cause it to wrap around. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09can-bcm: fix minor heap overflowOliver Hartkopp
commit 0597d1b99fcfc2c0eada09a698f85ed413d4ba84 upstream. On 64-bit platforms the ASCII representation of a pointer may be up to 17 bytes long. This patch increases the length of the buffer accordingly. http://marc.info/?l=linux-netdev&m=128872251418192&w=2 Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> CC: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09memory corruption in X.25 facilities parsingandrew hendry
commit a6331d6f9a4298173b413cf99a40cc86a9d92c37 upstream. Signed-of-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09x25: Patch to fix bug 15678 - x25 accesses fields beyond end of packet.John Hughes
commit f5eb917b861828da18dc28854308068c66d1449a upstream. Here is a patch to stop X.25 examining fields beyond the end of the packet. For example, when a simple CALL ACCEPTED was received: 10 10 0f x25_parse_facilities was attempting to decode the FACILITIES field, but this packet contains no facilities field. Signed-off-by: John Hughes <john@calva.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09Limit sysctl_tcp_mem and sysctl_udp_mem initializers to prevent integer ↵Robin Holt
overflows. [ Upstream fixed this in a different way as parts of the commits: 8d987e5c7510 (net: avoid limits overflow) a9febbb4bd13 (sysctl: min/max bounds are optional) 27b3d80a7b6a (sysctl: fix min/max handling in __do_proc_doulongvec_minmax()) -DaveM ] On a 16TB x86_64 machine, sysctl_tcp_mem[2], sysctl_udp_mem[2], and sysctl_sctp_mem[2] can integer overflow. Set limit such that they are maximized without overflowing. Signed-off-by: Robin Holt <holt@sgi.com> To: "David S. Miller" <davem@davemloft.net> Cc: Willy Tarreau <w@1wt.eu> Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-sctp@vger.kernel.org Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09net sched: fix kernel leak in act_policeJeff Mahoney
commit 0f04cfd098fb81fded74e78ea1a1b86cc6c6c31e upstream. While reviewing commit 1c40be12f7d8ca1d387510d39787b12e512a7ce8, I audited other users of tc_action_ops->dump for information leaks. That commit covered almost all of them but act_police still had a leak. opt.limit and opt.capab aren't zeroed out before the structure is passed out. This patch uses the C99 initializers to zero everything unused out. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09DECnet: don't leak uninitialized stack byteDan Rosenberg
commit 3c6f27bf33052ea6ba9d82369fb460726fb779c0 upstream. A single uninitialized padding byte is leaked to userspace. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09netfilter: nf_conntrack: allow nf_ct_alloc_hashtable() to get highmem pagesEric Dumazet
commit 6b1686a71e3158d3c5f125260effce171cc7852b upstream. commit ea781f197d6a8 (use SLAB_DESTROY_BY_RCU and get rid of call_rcu()) did a mistake in __vmalloc() call in nf_ct_alloc_hashtable(). I forgot to add __GFP_HIGHMEM, so pages were taken from LOWMEM only. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09net: NETIF_F_HW_CSUM does not imply FCoE CRC offloadBen Hutchings
commit 66c68bcc489fadd4f5e8839e966e3a366e50d1d5 upstream. NETIF_F_HW_CSUM indicates the ability to update an TCP/IP-style 16-bit checksum with the checksum of an arbitrary part of the packet data, whereas the FCoE CRC is something entirely different. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09net: clear heap allocation for ETHTOOL_GRXCLSRLALLKees Cook
commit ae6df5f96a51818d6376da5307d773baeece4014 upstream. Calling ETHTOOL_GRXCLSRLALL with a large rule_cnt will allocate kernel heap without clearing it. For the one driver (niu) that implements it, it will leave the unused portion of heap unchanged and copy the full contents back to userspace. Signed-off-by: Kees Cook <kees.cook@canonical.com> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09irda: Fix heap memory corruption in iriap.cSamuel Ortiz
commit 37f9fc452d138dfc4da2ee1ce5ae85094efc3606 upstream. While parsing the GetValuebyClass command frame, we could potentially write passed the skb->data pointer. Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com> Signed-off-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09irda: Fix parameter extraction stack overflowSamuel Ortiz
commit efc463eb508798da4243625b08c7396462cabf9f upstream. Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com> Signed-off-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28Phonet: disable network namespace supportRémi Denis-Courmont
[Solved differently upstream] Network namespace in the Phonet socket stack causes an OOPS when a namespace is destroyed. This occurs as the loopback exit_net handler is called after the Phonet exit_net handler, and re-enters the Phonet stack. I cannot think of any nice way to fix this in kernel <= 2.6.32. For lack of a better solution, disable namespace support completely. If you need that, upgrade to a newer kernel. Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Ben Hutchings <ben@decadent.org.uk> Acked-by: David S. Miller <davem@davemloft.net>
2010-10-28net: blackhole route should always be recalculatedJianzhao Wang
[ Upstream commit ae2688d59b5f861dc70a091d003773975d2ae7fb ] Blackhole routes are used when xfrm_lookup() returns -EREMOTE (error triggered by IKE for example), hence this kind of route is always temporary and so we should check if a better route exists for next packets. Bug has been introduced by commit d11a4dc18bf41719c9f0d7ed494d295dd2973b92. Signed-off-by: Jianzhao Wang <jianzhao.wang@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28rose: Fix signedness issues wrt. digi count.David S. Miller
[ Upstream commit 9828e6e6e3f19efcb476c567b9999891d051f52f ] Just use explicit casts, since we really can't change the types of structures exported to userspace which have been around for 15 years or so. Reported-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28tcp: Fix race in tcp_pollTom Marshall
[ Upstream commit a4d258036ed9b2a1811c3670c6099203a0f284a0 ] If a RST comes in immediately after checking sk->sk_err, tcp_poll will return POLLIN but not POLLOUT. Fix this by checking sk->sk_err at the end of tcp_poll. Additionally, ensure the correct order of operations on SMP machines with memory barriers. Signed-off-by: Tom Marshall <tdm.code@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28net: clear heap allocations for privileged ethtool actionsKees Cook
[ Upstream commit b00916b189d13a615ff05c9242201135992fcda3 ] Several other ethtool functions leave heap uncleared (potentially) by drivers. Some interfaces appear safe (eeprom, etc), in that the sizes are well controlled. In some situations (e.g. unchecked error conditions), the heap will remain unchanged in areas before copying back to userspace. Note that these are less of an issue since these all require CAP_NET_ADMIN. Cc: stable@kernel.org Signed-off-by: Kees Cook <kees.cook@canonical.com> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28ip: fix truesize mismatch in ip fragmentationEric Dumazet
[ Upstream commit 3d13008e7345fa7a79d8f6438150dc15d6ba6e9d ] Special care should be taken when slow path is hit in ip_fragment() : When walking through frags, we transfert truesize ownership from skb to frags. Then if we hit a slow_path condition, we must undo this or risk uncharging frags->truesize twice, and in the end, having negative socket sk_wmem_alloc counter, or even freeing socket sooner than expected. Many thanks to Nick Bowler, who provided a very clean bug report and test program. Thanks to Jarek for reviewing my first patch and providing a V2 While Nick bisection pointed to commit 2b85a34e911 (net: No more expensive sock_hold()/sock_put() on each tx), underlying bug is older (2.6.12-rc5) A side effect is to extend work done in commit b2722b1c3a893e (ip_fragment: also adjust skb->truesize for packets not owned by a socket) to ipv6 as well. Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com> Tested-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28net: Fix IPv6 PMTU disc. w/ asymmetric routesMaciej Żenczykowski
[ Upstream commit ae878ae280bea286ff2b1e1cb6e609dd8cb4501d ] Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28Phonet: Correct header retrieval after pskb_may_pullKumar Sanghvi
[ Upstream commit a91e7d471e2e384035b9746ea707ccdcd353f5dd ] Retrieve the header after doing pskb_may_pull since, pskb_may_pull could change the buffer structure. This is based on the comment given by Eric Dumazet on Phonet Pipe controller patch for a similar problem. Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com> Acked-by: Linus Walleij <linus.walleij@stericsson.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28net: Fix the condition passed to sk_wait_event()Nagendra Tomar
[ Upstream commit 482964e56e1320cb7952faa1932d8ecf59c4bf75 ] This patch fixes the condition (3rd arg) passed to sk_wait_event() in sk_stream_wait_memory(). The incorrect check in sk_stream_wait_memory() causes the following soft lockup in tcp_sendmsg() when the global tcp memory pool has exhausted. >>> snip <<< localhost kernel: BUG: soft lockup - CPU#3 stuck for 11s! [sshd:6429] localhost kernel: CPU 3: localhost kernel: RIP: 0010:[sk_stream_wait_memory+0xcd/0x200] [sk_stream_wait_memory+0xcd/0x200] sk_stream_wait_memory+0xcd/0x200 localhost kernel: localhost kernel: Call Trace: localhost kernel: [sk_stream_wait_memory+0x1b1/0x200] sk_stream_wait_memory+0x1b1/0x200 localhost kernel: [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40 localhost kernel: [ipv6:tcp_sendmsg+0x6e6/0xe90] tcp_sendmsg+0x6e6/0xce0 localhost kernel: [sock_aio_write+0x126/0x140] sock_aio_write+0x126/0x140 localhost kernel: [xfs:do_sync_write+0xf1/0x130] do_sync_write+0xf1/0x130 localhost kernel: [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40 localhost kernel: [hrtimer_start+0xe3/0x170] hrtimer_start+0xe3/0x170 localhost kernel: [vfs_write+0x185/0x190] vfs_write+0x185/0x190 localhost kernel: [sys_write+0x50/0x90] sys_write+0x50/0x90 localhost kernel: [system_call+0x7e/0x83] system_call+0x7e/0x83 >>> snip <<< What is happening is, that the sk_wait_event() condition passed from sk_stream_wait_memory() evaluates to true for the case of tcp global memory exhaustion. This is because both sk_stream_memory_free() and vm_wait are true which causes sk_wait_event() to *not* call schedule_timeout(). Hence sk_stream_wait_memory() returns immediately to the caller w/o sleeping. This causes the caller to again try allocation, which again fails and again calls sk_stream_wait_memory(), and so on. [ Bug introduced by commit c1cbe4b7ad0bc4b1d98ea708a3fecb7362aa4088 ("[NET]: Avoid atomic xchg() for non-error case") -DaveM ] Signed-off-by: Nagendra Singh Tomar <tomer_iisc@yahoo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28tcp: Fix >4GB writes on 64-bit.David S. Miller
[ Upstream commit 01db403cf99f739f86903314a489fb420e0e254f ] Fixes kernel bugzilla #16603 tcp_sendmsg() truncates iov_len to an 'int' which a 4GB write to write zero bytes, for example. There is also the problem higher up of how verify_iovec() works. It wants to prevent the total length from looking like an error return value. However it does this using 'int', but syscalls return 'long' (and thus signed 64-bit on 64-bit machines). So it could trigger false-positives on 64-bit as written. So fix it to use 'long'. Reported-by: Olaf Bonorden <bono@onlinehome.de> Reported-by: Daniel Büse <dbuese@gmx.de> Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28xfrm4: strip ECN and IP Precedence bits in policy lookupUlrich Weber
[ Upstream commit 94e2238969e89f5112297ad2a00103089dde7e8f ] dont compare ECN and IP Precedence bits in find_bundle and use ECN bit stripped TOS value in xfrm_lookup Signed-off-by: Ulrich Weber <uweber@astaro.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28De-pessimize rds_page_copy_userLinus Torvalds
commit 799c10559d60f159ab2232203f222f18fa3c4a5f upstream. Don't try to "optimize" rds_page_copy_user() by using kmap_atomic() and the unsafe atomic user mode accessor functions. It's actually slower than the straightforward code on any reasonable modern CPU. Back when the code was written (although probably not by the time it was actually merged, though), 32-bit x86 may have been the dominant architecture. And there kmap_atomic() can be a lot faster than kmap() (unless you have very good locality, in which case the virtual address caching by kmap() can overcome all the downsides). But these days, x86-64 may not be more populous, but it's getting there (and if you care about performance, it's definitely already there - you'd have upgraded your CPU's already in the last few years). And on x86-64, the non-kmap_atomic() version is faster, simply because the code is simpler and doesn't have the "re-try page fault" case. People with old hardware are not likely to care about RDS anyway, and the optimization for the 32-bit case is simply buggy, since it doesn't verify the user addresses properly. Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28wext: fix potential private ioctl memory content leakJohannes Berg
commit df6d02300f7c2fbd0fbe626d819c8e5237d72c62 upstream. When a driver doesn't fill the entire buffer, old heap contents may remain, and if it also doesn't update the length properly, this old heap content will be copied back to userspace. It is very unlikely that this happens in any of the drivers using private ioctls since it would show up as junk being reported by iwpriv, but it seems better to be safe here, so use kzalloc. Reported-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-26gro: Fix bogus gso_size on the first fraglist entryHerbert Xu
commit 622e0ca1cd4d459f5af4f2c65f4dc0dd823cb4c3 upstream. When GRO produces fraglist entries, and the resulting skb hits an interface that is incapable of TSO but capable of FRAGLIST, we end up producing a bogus packet with gso_size non-zero. This was reported in the field with older versions of KVM that did not set the TSO bits on tuntap. This patch fixes that. Reported-by: Igor Zhang <yugzhang@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-26sctp: Do not reset the packet during sctp_packet_config().Vlad Yasevich
commit 4bdab43323b459900578b200a4b8cf9713ac8fab upstream. sctp_packet_config() is called when getting the packet ready for appending of chunks. The function should not touch the current state, since it's possible to ping-pong between two transports when sending, and that can result packet corruption followed by skb overlfow crash. Reported-by: Thomas Dreibholz <dreibh@iem.uni-due.de> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-26net/llc: make opt unsigned in llc_ui_setsockopt()Dan Carpenter
commit 339db11b219f36cf7da61b390992d95bb6b7ba2e upstream. The members of struct llc_sock are unsigned so if we pass a negative value for "opt" it can cause a sign bug. Also it can cause an integer overflow when we multiply "opt * HZ". Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-26UNIX: Do not loop forever at unix_autobind().Tetsuo Handa
[ Upstream commit a9117426d0fcc05a194f728159a2d43df43c7add ] We assumed that unix_autobind() never fails if kzalloc() succeeded. But unix_autobind() allows only 1048576 names. If /proc/sys/fs/file-max is larger than 1048576 (e.g. systems with more than 10GB of RAM), a local user can consume all names using fork()/socket()/bind(). If all names are in use, those who call bind() with addr_len == sizeof(short) or connect()/sendmsg() with setsockopt(SO_PASSCRED) will continue while (1) yield(); loop at unix_autobind() till a name becomes available. This patch adds a loop counter in order to give up after 1048576 attempts. Calling yield() for once per 256 attempts may not be sufficient when many names are already in use, for __unix_find_socket_byname() can take long time under such circumstance. Therefore, this patch also adds cond_resched() call. Note that currently a local user can consume 2GB of kernel memory if the user is allowed to create and autobind 1048576 UNIX domain sockets. We should consider adding some restriction for autobind operation. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>