summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2014-02-10mempolicy: fix a race in shared_policy_replace()Mel Gorman
commit b22d127a39ddd10d93deee3d96e643657ad53a49 upstream. shared_policy_replace() use of sp_alloc() is unsafe. 1) sp_node cannot be dereferenced if sp->lock is not held and 2) another thread can modify sp_node between spin_unlock for allocating a new sp node and next spin_lock. The bug was introduced before 2.6.12-rc2. Kosaki's original patch for this problem was to allocate an sp node and policy within shared_policy_replace and initialise it when the lock is reacquired. I was not keen on this approach because it partially duplicates sp_alloc(). As the paths were sp->lock is taken are not that performance critical this patch converts sp->lock to sp->mutex so it can sleep when calling sp_alloc(). [kosaki.motohiro@jp.fujitsu.com: Original patch] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Christoph Lameter <cl@linux.com> Cc: Josh Boyer <jwboyer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-02-10mm: Fix PageHead when !CONFIG_PAGEFLAGS_EXTENDEDChristoffer Dall
commit ad4b3fb7ff9940bcdb1e4cd62bd189d10fa636ba upstream. Unfortunately with !CONFIG_PAGEFLAGS_EXTENDED, (!PageHead) is false, and (PageHead) is true, for tail pages. If this is indeed the intended behavior, which I doubt because it breaks cache cleaning on some ARM systems, then the nomenclature is highly problematic. This patch makes sure PageHead is only true for head pages and PageTail is only true for tail pages, and neither is true for non-compound pages. [ This buglet seems ancient - seems to have been introduced back in Apr 2008 in commit 6a1e7f777f61: "pageflags: convert to the use of new macros". And the reason nobody noticed is because the PageHead() tests are almost all about just sanity-checking, and only used on pages that are actual page heads. The fact that the old code returned true for tail pages too was thus not really noticeable. - Linus ] Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu> Acked-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Will Deacon <Will.Deacon@arm.com> Cc: Steve Capper <Steve.Capper@arm.com> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-02-10tracing: Don't call page_to_pfn() if page is NULLWen Congyang
commit 85f2a2ef1d0ab99523e0b947a2b723f5650ed6aa upstream. When allocating memory fails, page is NULL. page_to_pfn() will cause the kernel panicked if we don't use sparsemem vmemmap. Link: http://lkml.kernel.org/r/505AB1FF.8020104@cn.fujitsu.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-02-10exec: use -ELOOP for max recursion depthKees Cook
commit d740269867021faf4ce38a449353d2b986c34a67 upstream. To avoid an explosion of request_module calls on a chain of abusive scripts, fail maximum recursion with -ELOOP instead of -ENOEXEC. As soon as maximum recursion depth is hit, the error will fail all the way back up the chain, aborting immediately. This also has the side-effect of stopping the user's shell from attempting to reexecute the top-level file as a shell script. As seen in the dash source: if (cmd != path_bshell && errno == ENOEXEC) { *argv-- = cmd; *argv = cmd = path_bshell; goto repeat; } The above logic was designed for running scripts automatically that lacked the "#!" header, not to re-try failed recursion. On a legitimate -ENOEXEC, things continue to behave as the shell expects. Additionally, when tracking recursion, the binfmt handlers should not be involved. The recursion being tracked is the depth of calls through search_binary_handler(), so that function should be exclusively responsible for tracking the depth. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: halfdog <me@halfdog.net> Cc: P J P <ppandit@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-02-10exec: do not leave bprm->interp on stackKees Cook
commit b66c5984017533316fd1951770302649baf1aa33 upstream. If a series of scripts are executed, each triggering module loading via unprintable bytes in the script header, kernel stack contents can leak into the command line. Normally execution of binfmt_script and binfmt_misc happens recursively. However, when modules are enabled, and unprintable bytes exist in the bprm->buf, execution will restart after attempting to load matching binfmt modules. Unfortunately, the logic in binfmt_script and binfmt_misc does not expect to get restarted. They leave bprm->interp pointing to their local stack. This means on restart bprm->interp is left pointing into unused stack memory which can then be copied into the userspace argv areas. After additional study, it seems that both recursion and restart remains the desirable way to handle exec with scripts, misc, and modules. As such, we need to protect the changes to interp. This changes the logic to require allocation for any changes to the bprm->interp. To avoid adding a new kmalloc to every exec, the default value is left as-is. Only when passing through binfmt_script or binfmt_misc does an allocation take place. For a proof of concept, see DoTest.sh from: http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/ Signed-off-by: Kees Cook <keescook@chromium.org> Cc: halfdog <me@halfdog.net> Cc: P J P <ppandit@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-02-10net: fix a race in sock_queue_err_skb()Eric Dumazet
commit 110c43304db6f06490961529536c362d9ac5732f upstream. As soon as an skb is queued into socket error queue, another thread can consume it, so we are not allowed to reference skb anymore, or risk use after free. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> [PG: net/core/skbuff.c --> include/net/sock.h on 2.6.34 baseline] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-02-10tcp: tcp_sendpages() should call tcp_push() onceEric Dumazet
commit 35f9c09fe9c72eb8ca2b8e89a593e1c151f28fc2 upstream. commit 2f533844242 (tcp: allow splice() to build full TSO packets) added a regression for splice() calls using SPLICE_F_MORE. We need to call tcp_flush() at the end of the last page processed in tcp_sendpages(), or else transmits can be deferred and future sends stall. Add a new internal flag, MSG_SENDPAGE_NOTLAST, acting like MSG_MORE, but with different semantic. For all sendpage() providers, its a transparent change. Only sock_sendpage() and tcp_sendpages() can differentiate the two different flags provided by pipe_to_sendpage() Reported-by: Tom Herbert <therbert@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-02-10HID: provide a helper for validating hid reportsKees Cook
commit 331415ff16a12147d57d5c953f3a961b7ede348b upstream. Many drivers need to validate the characteristics of their HID report during initialization to avoid misusing the reports. This adds a common helper to perform validation of the report exisitng, the field existing, and the expected number of values within the field. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> [PG: in 2.6.34 it is err_hid(), original baseline had hid_err().] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-02-10HID: validate HID report id sizeKees Cook
commit 43622021d2e2b82ea03d883926605bdd0525e1d1 upstream. The "Report ID" field of a HID report is used to build indexes of reports. The kernel's index of these is limited to 256 entries, so any malicious device that sets a Report ID greater than 255 will trigger memory corruption on the host: [ 1347.156239] BUG: unable to handle kernel paging request at ffff88094958a878 [ 1347.156261] IP: [<ffffffff813e4da0>] hid_register_report+0x2a/0x8b CVE-2013-2888 Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz> [PG: hid_err() --> dbg_hid() in 2.6.34 baseline] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-02-10ipv6: call udp_push_pending_frames when uncorking a socket with AF_INET ↵Hannes Frederic Sowa
pending data commit 8822b64a0fa64a5dd1dfcf837c5b0be83f8c05d1 upstream. We accidentally call down to ip6_push_pending_frames when uncorking pending AF_INET data on a ipv6 socket. This results in the following splat (from Dave Jones): skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:126! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth +netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37 task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000 RIP: 0010:[<ffffffff816e759c>] [<ffffffff816e759c>] skb_panic+0x63/0x65 RSP: 0018:ffff8801e6431de8 EFLAGS: 00010282 RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006 RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520 RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800 R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800 FS: 00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 Stack: ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4 ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6 ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0 Call Trace: [<ffffffff8159a9aa>] skb_push+0x3a/0x40 [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0 [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140 [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0 [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20 [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0 [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0 [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40 [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20 [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0 [<ffffffff816f5d54>] tracesys+0xdd/0xe2 Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 RIP [<ffffffff816e759c>] skb_panic+0x63/0x65 RSP <ffff8801e6431de8> This patch adds a check if the pending data is of address family AF_INET and directly calls udp_push_ending_frames from udp_v6_push_pending_frames if that is the case. This bug was found by Dave Jones with trinity. (Also move the initialization of fl6 below the AF_INET check, even if not strictly necessary.) Cc: Dave Jones <davej@redhat.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> [PG: The line "flowi6 *fl6 = &inet->cork.fl.u.ip6" was "flowi *fl = &inet->cork.fl" in 2.6.34 kernel.] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-02-10block: fail SCSI passthrough ioctls on partition devicesPaolo Bonzini
commit 0bfc96cb77224736dfa35c3c555d37b3646ef35e upstream. Linux allows executing the SG_IO ioctl on a partition or LVM volume, and will pass the command to the underlying block device. This is well-known, but it is also a large security problem when (via Unix permissions, ACLs, SELinux or a combination thereof) a program or user needs to be granted access only to part of the disk. This patch lets partitions forward a small set of harmless ioctls; others are logged with printk so that we can see which ioctls are actually sent. In my tests only CDROM_GET_CAPABILITY actually occurred. Of course it was being sent to a (partition on a) hard disk, so it would have failed with ENOTTY and the patch isn't changing anything in practice. Still, I'm treating it specially to avoid spamming the logs. In principle, this restriction should include programs running with CAP_SYS_RAWIO. If for example I let a program access /dev/sda2 and /dev/sdb, it still should not be able to read/write outside the boundaries of /dev/sda2 independent of the capabilities. However, for now programs with CAP_SYS_RAWIO will still be allowed to send the ioctls. Their actions will still be logged. This patch does not affect the non-libata IDE driver. That driver however already tests for bd != bd->bd_contains before issuing some ioctl; it could be restricted further to forbid these ioctls even for programs running with CAP_SYS_ADMIN/CAP_SYS_RAWIO. Cc: linux-scsi@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: James Bottomley <JBottomley@parallels.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [ Make it also print the command name when warning - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-02-10block: add and use scsi_blk_cmd_ioctlPaolo Bonzini
commit 577ebb374c78314ac4617242f509e2f5e7156649 upstream. Introduce a wrapper around scsi_cmd_ioctl that takes a block device. The function will then be enhanced to detect partition block devices and, in that case, subject the ioctls to whitelisting. Cc: linux-scsi@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: James Bottomley <JBottomley@parallels.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-02-10KVM: unmap pages from the iommu when slots are removedAlex Williamson
commit 32f6daad4651a748a58a3ab6da0611862175722f upstream. We've been adding new mappings, but not destroying old mappings. This can lead to a page leak as pages are pinned using get_user_pages, but only unpinned with put_page if they still exist in the memslots list on vm shutdown. A memslot that is destroyed while an iommu domain is enabled for the guest will therefore result in an elevated page reference count that is never cleared. Additionally, without this fix, the iommu is only programmed with the first translation for a gpa. This can result in peer-to-peer errors if a mapping is destroyed and replaced by a new mapping at the same gpa as the iommu will still be pointing to the original, pinned memory address. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> [PG: minor tweak since 2.6.34 doesnt have kvm_for_each_memslot] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-02-10inet: add RCU protection to inet->optEric Dumazet
commit f6d8bd051c391c1c0458a30b2a7abcd939329259 upstream. We lack proper synchronization to manipulate inet->opt ip_options Problem is ip_make_skb() calls ip_setup_cork() and ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options), without any protection against another thread manipulating inet->opt. Another thread can change inet->opt pointer and free old one under us. Use RCU to protect inet->opt (changed to inet->inet_opt). Instead of handling atomic refcounts, just copy ip_options when necessary, to avoid cache line dirtying. We cant insert an rcu_head in struct ip_options since its included in skb->cb[], so this patch is large because I had to introduce a new ip_options_rcu structure. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> [dannf/bwh: backported to Debian's 2.6.32] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu> [PG: use 2.6.32 patch, since it is closer to 2.6.34 than original baseline; drop net/l2tp/l2tp_ip.c chunk as we don't have that file] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-01-16x86: Get rid of asmregparmRichard Weinberger
commit 1b4ac2a935aaf194241a2f4165d6407ba9650e1a upstream. As UML does no longer need asmregparm we can remove it. Signed-off-by: Richard Weinberger <richard@nod.at> Cc: namhyung@gmail.com Cc: davem@davemloft.net Cc: fweisbec@gmail.com Cc: dhowells@redhat.com Link: http://lkml.kernel.org/r/%3C1306189085-29896-1-git-send-email-richard%40nod.at%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-01-16mutex: Place lock in contended state after fastpath_lock failureWill Deacon
commit 0bce9c46bf3b15f485d82d7e81dabed6ebcc24b1 upstream. ARM recently moved to asm-generic/mutex-xchg.h for its mutex implementation after the previous implementation was found to be missing some crucial memory barriers. However, this has revealed some problems running hackbench on SMP platforms due to the way in which the MUTEX_SPIN_ON_OWNER code operates. The symptoms are that a bunch of hackbench tasks are left waiting on an unlocked mutex and therefore never get woken up to claim it. This boils down to the following sequence of events: Task A Task B Task C Lock value 0 1 1 lock() 0 2 lock() 0 3 spin(A) 0 4 unlock() 1 5 lock() 0 6 cmpxchg(1,0) 0 7 contended() -1 8 lock() 0 9 spin(C) 0 10 unlock() 1 11 cmpxchg(1,0) 0 12 unlock() 1 At this point, the lock is unlocked, but Task B is in an uninterruptible sleep with nobody to wake it up. This patch fixes the problem by ensuring we put the lock into the contended state if we fail to acquire it on the fastpath, ensuring that any blocked waiters are woken up when the mutex is released. Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Chris Mason <chris.mason@fusionio.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@vger.kernel.org> Reviewed-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-6e9lrw2avczr0617fzl5vqb8@git.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-01-16perf_events: Fix races in group compositionPeter Zijlstra
commit 8a49542c0554af7d0073aac0ee73ee65b807ef34 upstream. Group siblings don't pin each-other or the parent, so when we destroy events we must make sure to clean up all cross referencing pointers. In particular, for destruction of a group leader we must be able to find all its siblings and remove their reference to it. This means that detaching an event from its context must not detach it from the group, otherwise we can end up failing to clear all pointers. Solve this by clearly separating the attachment to a context and attachment to a group, and keep the group composed until we destroy the events. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-01-16sctp: Enforce retransmission limit during shutdownThomas Graf
commit f8d9605243280f1870dd2c6c37a735b925c15f3c upstream. When initiating a graceful shutdown while having data chunks on the retransmission queue with a peer which is in zero window mode the shutdown is never completed because the retransmission error count is reset periodically by the following two rules: - Do not timeout association while doing zero window probe. - Reset overall error count when a heartbeat request has been acknowledged. The graceful shutdown will wait for all outstanding TSN to be acknowledged before sending the SHUTDOWN request. This never happens due to the peer's zero window not acknowledging the continuously retransmitted data chunks. Although the error counter is incremented for each failed retransmission, the receiving of the SACK announcing the zero window clears the error count again immediately. Also heartbeat requests continue to be sent periodically. The peer acknowledges these requests causing the error counter to be reset as well. This patch changes behaviour to only reset the overall error counter for the above rules while not in shutdown. After reaching the maximum number of retransmission attempts, the T5 shutdown guard timer is scheduled to give the receiver some additional time to recover. The timer is stopped as soon as the receiver acknowledges any data. The issue can be easily reproduced by establishing a sctp association over the loopback device, constantly queueing data at the sender while not reading any at the receiver. Wait for the window to reach zero, then initiate a shutdown by killing both processes simultaneously. The association will never be freed and the chunks on the retransmission queue will be retransmitted indefinitely. Signed-off-by: Thomas Graf <tgraf@infradead.org> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-01-16sctp: ABORT if receive, reassmbly, or reodering queue is not empty while ↵Thomas Graf
closing socket commit cd4fcc704f30f2064ab30b5300d44d431e46db50 upstream. Trigger user ABORT if application closes a socket which has data queued on the socket receive queue or chunks waiting on the reassembly or ordering queue as this would imply data being lost which defeats the point of a graceful shutdown. This behavior is already practiced in TCP. We do not check the input queue because that would mean to parse all chunks on it to look for unacknowledged data which seems too much of an effort. Control chunks or duplicated chunks may also be in the input queue and should not be stopping a graceful shutdown. Signed-off-by: Thomas Graf <tgraf@infradead.org> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-01-16linux/log2.h: Fix rounddown_pow_of_two(1)Linus Torvalds
commit 13c07b0286d340275f2d97adf085cecda37ede37 upstream. Exactly like roundup_pow_of_two(1), the rounddown version was buggy for the case of a compile-time constant '1' argument. Probably because it originated from the same code, sharing history with the roundup version from before the bugfix (for that one, see commit 1a06a52ee1b0: "Fix roundup_pow_of_two(1)"). However, unlike the roundup version, the fix for rounddown is to just remove the broken special case entirely. It's simply not needed - the generic code 1UL << ilog2(n) does the right thing for the constant '1' argment too. The only reason roundup needed that special case was because rounding up does so by subtracting one from the argument (and then adding one to the result) causing the obvious problems with "ilog2(0)". But rounddown doesn't do any of that, since ilog2() naturally truncates (ie "rounds down") to the right rounded down value. And without the ilog2(0) case, there's no reason for the special case that had the wrong value. tl;dr: rounddown_pow_of_two(1) should be 1, not 0. Acked-by: Dmitry Torokhov <dtor@vmware.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-01-16time: Move ktime_t overflow checking into timespec_valid_strictJohn Stultz
commit cee58483cf56e0ba355fdd97ff5e8925329aa936 upstream. Andreas Bombe reported that the added ktime_t overflow checking added to timespec_valid in commit 4e8b14526ca7 ("time: Improve sanity checking of timekeeping inputs") was causing problems with X.org because it caused timeouts larger then KTIME_T to be invalid. Previously, these large timeouts would be clamped to KTIME_MAX and would never expire, which is valid. This patch splits the ktime_t overflow checking into a new timespec_valid_strict function, and converts the timekeeping codes internal checking to use this more strict function. Reported-and-tested-by: Andreas Bombe <aeb@debian.org> Cc: Zhouping Liu <zliu@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linux Kernel <linux-kernel@vger.kernel.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-01-16time: Improve sanity checking of timekeeping inputsJohn Stultz
commit 4e8b14526ca7fb046a81c94002c1c43b6fdf0e9b upstream. Unexpected behavior could occur if the time is set to a value large enough to overflow a 64bit ktime_t (which is something larger then the year 2262). Also unexpected behavior could occur if large negative offsets are injected via adjtimex. So this patch improves the sanity check timekeeping inputs by improving the timespec_valid() check, and then makes better use of timespec_valid() to make sure we don't set the time to an invalid negative value or one that overflows ktime_t. Note: This does not protect from setting the time close to overflowing ktime_t and then letting natural accumulation cause the overflow. Reported-by: CAI Qian <caiqian@redhat.com> Reported-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Zhouping Liu <zliu@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/1344454580-17031-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linux Kernel <linux-kernel@vger.kernel.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17random: remove rand_initialize_irq()Theodore Ts'o
commit c5857ccf293968348e5eb4ebedc68074de3dcda6 upstream. With the new interrupt sampling system, we are no longer using the timer_rand_state structure in the irq descriptor, so we can stop initializing it now. [ Merged in fixes from Sedat to find some last missing references to rand_initialize_irq() ] Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com> [PG: in .34 the irqdesc.h content is in irq.h instead.] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17random: add tracepoints for easier debugging and verificationTheodore Ts'o
commit 00ce1db1a634746040ace24c09a4e3a7949a3145 upstream. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17random: add new get_random_bytes_arch() functionTheodore Ts'o
commit c2557a303ab6712bb6e09447df828c557c710ac9 upstream. Create a new function, get_random_bytes_arch() which will use the architecture-specific hardware random number generator if it is present. Change get_random_bytes() to not use the HW RNG, even if it is avaiable. The reason for this is that the hw random number generator is fast (if it is present), but it requires that we trust the hardware manufacturer to have not put in a back door. (For example, an increasing counter encrypted by an AES key known to the NSA.) It's unlikely that Intel (for example) was paid off by the US Government to do this, but it's impossible for them to prove otherwise --- especially since Bull Mountain is documented to use AES as a whitener. Hence, the output of an evil, trojan-horse version of RDRAND is statistically indistinguishable from an RDRAND implemented to the specifications claimed by Intel. Short of using a tunnelling electronic microscope to reverse engineer an Ivy Bridge chip and disassembling and analyzing the CPU microcode, there's no way for us to tell for sure. Since users of get_random_bytes() in the Linux kernel need to be able to support hardware systems where the HW RNG is not present, most time-sensitive users of this interface have already created their own cryptographic RNG interface which uses get_random_bytes() as a seed. So it's much better to use the HW RNG to improve the existing random number generator, by mixing in any entropy returned by the HW RNG into /dev/random's entropy pool, but to always _use_ /dev/random's entropy pool. This way we get almost of the benefits of the HW RNG without any potential liabilities. The only benefits we forgo is the speed/performance enhancements --- and generic kernel code can't depend on depend on get_random_bytes() having the speed of a HW RNG anyway. For those places that really want access to the arch-specific HW RNG, if it is available, we provide get_random_bytes_arch(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17random: create add_device_randomness() interfaceLinus Torvalds
commit a2080a67abe9e314f9e9c2cc3a4a176e8a8f8793 upstream. Add a new interface, add_device_randomness() for adding data to the random pool that is likely to differ between two devices (or possibly even per boot). This would be things like MAC addresses or serial numbers, or the read-out of the RTC. This does *not* add any actual entropy to the pool, but it initializes the pool to different values for devices that might otherwise be identical and have very little entropy available to them (particularly common in the embedded world). [ Modified by tytso to mix in a timestamp, since there may be some variability caused by the time needed to detect/configure the hardware in question. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17random: make 'add_interrupt_randomness()' do something saneTheodore Ts'o
commit 775f4b297b780601e61787b766f306ed3e1d23eb upstream. We've been moving away from add_interrupt_randomness() for various reasons: it's too expensive to do on every interrupt, and flooding the CPU with interrupts could theoretically cause bogus floods of entropy from a somewhat externally controllable source. This solves both problems by limiting the actual randomness addition to just once a second or after 64 interrupts, whicever comes first. During that time, the interrupt cycle data is buffered up in a per-cpu pool. Also, we make sure the the nonblocking pool used by urandom is initialized before we start feeding the normal input pool. This assures that /dev/urandom is returning unpredictable data as soon as possible. (Based on an original patch by Linus, but significantly modified by tytso.) Tested-by: Eric Wustrow <ewust@umich.edu> Reported-by: Eric Wustrow <ewust@umich.edu> Reported-by: Nadia Heninger <nadiah@cs.ucsd.edu> Reported-by: Zakir Durumeric <zakir@umich.edu> Reported-by: J. Alex Halderman <jhalderm@umich.edu>. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> [PG: minor adjustment required since .34 doesn't have f9e4989eb8 which renames "status" to "random" in kernel/irq/handle.c ] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17random: Add support for architectural random hooksH. Peter Anvin
commit 63d77173266c1791f1553e9e8ccea65dc87c4485 upstream. Add support for architecture-specific hooks into the kernel-directed random number generator interfaces. This patchset does not use the architecture random number generator interfaces for the userspace-directed interfaces (/dev/random and /dev/urandom), thus eliminating the need to distinguish between them based on a pool pointer. Changes in version 3: - Moved the hooks from extract_entropy() to get_random_bytes(). - Changes the hooks to inlines. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "Theodore Ts'o" <tytso@mit.edu> [PG: .34 already had "unsigned int ret" in get_random_int, so the diffstat here is slightly smaller than that of 63d7717. ] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17mm: avoid null pointer access in vm_struct via /proc/vmallocinfoMitsuo Hayasaka
commit f5252e009d5b87071a919221e4f6624184005368 upstream. The /proc/vmallocinfo shows information about vmalloc allocations in vmlist that is a linklist of vm_struct. It, however, may access pages field of vm_struct where a page was not allocated. This results in a null pointer access and leads to a kernel panic. Why this happens: In __vmalloc_node_range() called from vmalloc(), newly allocated vm_struct is added to vmlist at __get_vm_area_node() and then, some fields of vm_struct such as nr_pages and pages are set at __vmalloc_area_node(). In other words, it is added to vmlist before it is fully initialized. At the same time, when the /proc/vmallocinfo is read, it accesses the pages field of vm_struct according to the nr_pages field at show_numa_info(). Thus, a null pointer access happens. The patch adds the newly allocated vm_struct to the vmlist *after* it is fully initialized. So, it can avoid accessing the pages field with unallocated page when show_numa_info() is called. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Rientjes <rientjes@google.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [PG: .34 has VMALLOC_START/END vs. start/end in f5252e009d5] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17timekeeping: Provide hrtimer update functionThomas Gleixner
commit f6c06abfb3972ad4914cef57d8348fcb2932bc3b upstream. To finally fix the infamous leap second issue and other race windows caused by functions which change the offsets between the various time bases (CLOCK_MONOTONIC, CLOCK_REALTIME and CLOCK_BOOTTIME) we need a function which atomically gets the current monotonic time and updates the offsets of CLOCK_REALTIME and CLOCK_BOOTTIME with minimalistic overhead. The previous patch which provides ktime_t offsets allows us to make this function almost as cheap as ktime_get() which is going to be replaced in hrtimer_interrupt(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/1341960205-56738-7-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linux Kernel <linux-kernel@vger.kernel.org> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17hrtimer: Provide clock_was_set_delayed()John Stultz
commit f55a6faa384304c89cfef162768e88374d3312cb upstream. clock_was_set() cannot be called from hard interrupt context because it calls on_each_cpu(). For fixing the widely reported leap seconds issue it is necessary to call it from hard interrupt context, i.e. the timer tick code, which does the timekeeping updates. Provide a new function which denotes it in the hrtimer cpu base structure of the cpu on which it is called and raise the hrtimer softirq. We then execute the clock_was_set() notificiation from softirq context in run_hrtimer_softirq(). The hrtimer softirq is rarely used, so polling the flag there is not a performance issue. [ tglx: Made it depend on CONFIG_HIGH_RES_TIMERS. We really should get rid of all this ifdeffery ASAP ] Signed-off-by: John Stultz <johnstul@us.ibm.com> Reported-by: Jan Engelhardt <jengelh@inai.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1341960205-56738-2-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linux Kernel <linux-kernel@vger.kernel.org> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17ntp: Fix leap-second hrtimer livelockJohn Stultz
commit 6b43ae8a619d17c4935c3320d2ef9e92bdeed05d upstream. This should have been backported when it was commited, but I mistook the problem as requiring the ntp_lock changes that landed in 3.4 in order for it to occur. Unfortunately the same issue can happen (with only one cpu) as follows: do_adjtimex() write_seqlock_irq(&xtime_lock); process_adjtimex_modes() process_adj_status() ntp_start_leap_timer() hrtimer_start() hrtimer_reprogram() tick_program_event() clockevents_program_event() ktime_get() seq = req_seqbegin(xtime_lock); [DEADLOCK] This deadlock will no always occur, as it requires the leap_timer to force a hrtimer_reprogram which only happens if its set and there's no sooner timer to expire. NOTE: This patch, being faithful to the original commit, introduces a bug (we don't update wall_to_monotonic), which will be resovled by backporting a following fix. Original commit message below: Since commit 7dffa3c673fbcf835cd7be80bb4aec8ad3f51168 the ntp subsystem has used an hrtimer for triggering the leapsecond adjustment. However, this can cause a potential livelock. Thomas diagnosed this as the following pattern: CPU 0 CPU 1 do_adjtimex() spin_lock_irq(&ntp_lock); process_adjtimex_modes(); timer_interrupt() process_adj_status(); do_timer() ntp_start_leap_timer(); write_lock(&xtime_lock); hrtimer_start(); update_wall_time(); hrtimer_reprogram(); ntp_tick_length() tick_program_event() spin_lock(&ntp_lock); clockevents_program_event() ktime_get() seq = req_seqbegin(xtime_lock); This patch tries to avoid the problem by reverting back to not using an hrtimer to inject leapseconds, and instead we handle the leapsecond processing in the second_overflow() function. The downside to this change is that on systems that support highres timers, the leap second processing will occur on a HZ tick boundary, (ie: ~1-10ms, depending on HZ) after the leap second instead of possibly sooner (~34us in my tests w/ x86_64 lapic). This patch applies on top of tip/timers/core. CC: Sasha Levin <levinsasha928@gmail.com> CC: Thomas Gleixner <tglx@linutronix.de> Reported-by: Sasha Levin <levinsasha928@gmail.com> Diagnoised-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sasha Levin <levinsasha928@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linux Kernel <linux-kernel@vger.kernel.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17ext2,ext3,ext4: don't inherit APPEND_FL or IMMUTABLE_FL for new inodesTheodore Ts'o
commit 1cd9f0976aa4606db8d6e3dc3edd0aca8019372a upstream. This doesn't make much sense, and it exposes a bug in the kernel where attempts to create a new file in an append-only directory using O_CREAT will fail (but still leave a zero-length file). This was discovered when xfstests #79 was generalized so it could run on all file systems. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17NLM: Don't hang forever on NLM unlock requestsTrond Myklebust
commit 0b760113a3a155269a3fba93a409c640031dd68f upstream. If the NLM daemon is killed on the NFS server, we can currently end up hanging forever on an 'unlock' request, instead of aborting. Basically, if the rpcbind request fails, or the server keeps returning garbage, we really want to quit instead of retrying. Tested-by: Vasily Averin <vvs@sw.ru> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [PG: struct rpc_task in sched.h slightly different layout vs. v3.0] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17scm: lower SCM_MAX_FDEric Dumazet
commit bba14de98753cb6599a2dae0e520714b2153522d upstream. Lower SCM_MAX_FD from 255 to 253 so that allocations for scm_fp_list are halved. (commit f8d570a4 added two pointers in this structure) scm_fp_dup() should not copy whole structure (and trigger kmemcheck warnings), but only the used part. While we are at it, only allocate needed size. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17time: Change jiffies_to_clock_t() argument type to unsigned longhank
commit cbbc719fccdb8cbd87350a05c0d33167c9b79365 upstream. The parameter's origin type is long. On an i386 architecture, it can easily be larger than 0x80000000, causing this function to convert it to a sign-extended u64 type. Change the type to unsigned long so we get the correct result. Signed-off-by: hank <pyu@redhat.com> Cc: John Stultz <john.stultz@linaro.org> [ build fix ] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17KVM: Ensure all vcpus are consistent with in-kernel irqchip settingsAvi Kivity
commit 3e515705a1f46beb1c942bb8043c16f8ac7b1e9e upstream. If some vcpus are created before KVM_CREATE_IRQCHIP, then irqchip_in_kernel() and vcpu->arch.apic will be inconsistent, leading to potential NULL pointer dereferences. Fix by: - ensuring that no vcpus are installed when KVM_CREATE_IRQCHIP is called - ensuring that a vcpu has an apic if it is installed after KVM_CREATE_IRQCHIP This is somewhat long winded because vcpu->arch.apic is created without kvm->lock held. Based on earlier patch by Michael Ellerman. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Avi Kivity <avi@redhat.com> [PG: in .34 label "unlock_vcpu_destroy" is just "vcpu_destroy"] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17regset: Prevent null pointer reference on readonly regsetsH. Peter Anvin
commit c8e252586f8d5de906385d8cf6385fee289a825e upstream. The regset common infrastructure assumed that regsets would always have .get and .set methods, but not necessarily .active methods. Unfortunately people have since written regsets without .set methods. Rather than putting in stub functions everywhere, handle regsets with null .get or .set methods explicitly. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@hack.frob.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17drm: integer overflow in drm_mode_dirtyfb_ioctl()Xi Wang
commit a5cd335165e31db9dbab636fd29895d41da55dd2 upstream. There is a potential integer overflow in drm_mode_dirtyfb_ioctl() if userspace passes in a large num_clips. The call to kmalloc would allocate a small buffer, and the call to fb->funcs->dirty may result in a memory corruption. Reported-by: Haogang Chen <haogangchen@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17rose: Add length checks to CALL_REQUEST parsingBen Hutchings
commit e0bccd315db0c2f919e7fcf9cb60db21d9986f52 upstream. Define some constant offsets for CALL_REQUEST based on the description at <http://www.techfest.com/networking/wan/x25plp.htm> and the definition of ROSE as using 10-digit (5-byte) addresses. Use them consistently. Validate all implicit and explicit facilities lengths. Validate the address length byte rather than either trusting or assuming its value. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-17tty: Make tiocgicount a handlerAlan Cox
commit d281da7ff6f70efca0553c288bb883e8605b3862 upstream. Dan Rosenberg noted that various drivers return the struct with uncleared fields. Instead of spending forever trying to stomp all the drivers that get it wrong (and every new driver) do the job in one place. This first patch adds the needed operations and hooks them up, including the needed USB midlayer and serial core plumbing. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-17net: Compute protocol sequence numbers and fragment IDs using MD5.David S. Miller
commit 6e5714eaf77d79ae1c8b47e3e040ff5411b717ec upstream. Computers have become a lot faster since we compromised on the partial MD4 hash which we use currently for performance reasons. MD5 is a much safer choice, and is inline with both RFC1948 and other ISS generators (OpenBSD, Solaris, etc.) Furthermore, only having 24-bits of the sequence number be truly unpredictable is a very serious limitation. So the periodic regeneration and 8-bit counter have been removed. We compute and use a full 32-bit sequence number. For ipv6, DCCP was found to use a 32-bit truncated initial sequence number (it needs 43-bits) and that is fixed here as well. Reported-by: Dan Kaminsky <dan@doxpara.com> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net> [PG: diffstat vs. 6e5714 differs, since no secure_ipv6_id to delete in 34] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-17crypto: Move md5_transform to lib/md5.cDavid S. Miller
commit bc0b96b54a21246e377122d54569eef71cec535f upstream. We are going to use this for TCP/IP sequence number and fragment ID generation. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-17mm: prevent concurrent unmap_mapping_range() on the same inodeMiklos Szeredi
commit 2aa15890f3c191326678f1bd68af61ec6b8753ec upstream. Michael Leun reported that running parallel opens on a fuse filesystem can trigger a "kernel BUG at mm/truncate.c:475" Gurudas Pai reported the same bug on NFS. The reason is, unmap_mapping_range() is not prepared for more than one concurrent invocation per inode. For example: thread1: going through a big range, stops in the middle of a vma and stores the restart address in vm_truncate_count. thread2: comes in with a small (e.g. single page) unmap request on the same vma, somewhere before restart_address, finds that the vma was already unmapped up to the restart address and happily returns without doing anything. Another scenario would be two big unmap requests, both having to restart the unmapping and each one setting vm_truncate_count to its own value. This could go on forever without any of them being able to finish. Truncate and hole punching already serialize with i_mutex. Other callers of unmap_mapping_range() do not, and it's difficult to get i_mutex protection for all callers. In particular ->d_revalidate(), which calls invalidate_inode_pages2_range() in fuse, may be called with or without i_mutex. This patch adds a new mutex to 'struct address_space' to prevent running multiple concurrent unmap_mapping_range() on the same mapping. [ We'll hopefully get rid of all this with the upcoming mm preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex lockbreak" patch in particular. But that is for 2.6.39 ] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reported-by: Michael Leun <lkml20101129@newton.leun.net> Reported-by: Gurudas Pai <gurudas.pai@oracle.com> Tested-by: Gurudas Pai <gurudas.pai@oracle.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [PG: Some chunks dropped, since no ebdfed4dc5 in 34; came in at 2.6.37] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-17af_packet: prevent information leakEric Dumazet
commit 13fcb7bd322164c67926ffe272846d4860196dc6 upstream. In 2.6.27, commit 393e52e33c6c2 (packet: deliver VLAN TCI to userspace) added a small information leak. Add padding field and make sure its zeroed before copy to user. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-17bug.h: Add WARN_RATELIMITJoe Perches
commit b3eec79b0776e5340a3db75b34953977c7e5086e upstream. Add a generic mechanism to ratelimit WARN(foo, fmt, ...) messages using a hidden per call site static struct ratelimit_state. Also add an __WARN_RATELIMIT variant to be able to use a specific struct ratelimit_state. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-17clocksource: Make watchdog robust vs. interruptionThomas Gleixner
commit b5199515c25cca622495eb9c6a8a1d275e775088 upstream. The clocksource watchdog code is interruptible and it has been observed that this can trigger false positives which disable the TSC. The reason is that an interrupt storm or a long running interrupt handler between the read of the watchdog source and the read of the TSC brings the two far enough apart that the delta is larger than the unstable treshold. Move both reads into a short interrupt disabled region to avoid that. Reported-and-tested-by: Vernon Mauery <vernux@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-17genirq: Add IRQF_RESUME_EARLY and resume such IRQs earlierIan Campbell
commit 9bab0b7fbaceec47d32db51cd9e59c82fb071f5a upstream This adds a mechanism to resume selected IRQs during syscore_resume instead of dpm_resume_noirq. Under Xen we need to resume IRQs associated with IPIs early enough that the resched IPI is unmasked and we can therefore schedule ourselves out of the stop_machine where the suspend/resume takes place. This issue was introduced by 676dc3cf5bc3 "xen: Use IRQF_FORCE_RESUME". Back ported to 2.6.32 (which lacks syscore support) by calling the relavant resume function directly from sysdev_resume). v2: Fixed non-x86 build errors. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1318713254.11016.52.camel@dagon.hellion.org.uk Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> [PG: this v2 backport taken from v2.6.32.49 stable, commit 5e87d8ee34e3] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-17genirq: Add IRQF_FORCE_RESUMEThomas Gleixner
commit dc5f219e88294b93009eef946251251ffffb6d60 upstream. Xen needs to reenable interrupts which are marked IRQF_NO_SUSPEND in the resume path. Add a flag to force the reenabling in the resume code. Tested-and-acked-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-17pata_cm64x: fix boot crash on pariscJames Bottomley
commit 9281b16caac1276817b77033c5b8a1f5ca30102c upstream. The old IDE cmd64x checks the status of the CNTRL register to see if the ports are enabled before probing them. pata_cmd64x doesn't do this, which causes a HPMC on parisc when it tries to poke at the secondary port because apparently the BAR isn't wired up (and a non-responding piece of memory causes a HPMC). Fix this by porting the CNTRL register port detection logic from IDE cmd64x. In addition, following converns from Alan Cox, add a check to see if a mobility electronics bridge is the immediate parent and forgo the check if it is (prevents problems on hotplug controllers). Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>