summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2013-04-12spinlocks and preemption points need to be at least compiler barriersLinus Torvalds
commit 386afc91144b36b42117b0092893f15bc8798a80 upstream. In UP and non-preempt respectively, the spinlocks and preemption disable/enable points are stubbed out entirely, because there is no regular code that can ever hit the kind of concurrency they are meant to protect against. However, while there is no regular code that can cause scheduling, we _do_ end up having some exceptional (literally!) code that can do so, and that we need to make sure does not ever get moved into the critical region by the compiler. In particular, get_user() and put_user() is generally implemented as inline asm statements (even if the inline asm may then make a call instruction to call out-of-line), and can obviously cause a page fault and IO as a result. If that inline asm has been scheduled into the middle of a preemption-safe (or spinlock-protected) code region, we obviously lose. Now, admittedly this is *very* unlikely to actually ever happen, and we've not seen examples of actual bugs related to this. But partly exactly because it's so hard to trigger and the resulting bug is so subtle, we should be extra careful to get this right. So make sure that even when preemption is disabled, and we don't have to generate any actual *code* to explicitly tell the system that we are in a preemption-disabled region, we need to at least tell the compiler not to move things around the critical region. This patch grew out of the same discussion that caused commits 79e5f05edcbf ("ARC: Add implicit compiler barrier to raw_local_irq* functions") and 3e2e0d2c222b ("tile: comment assumption about __insn_mtspr for <asm/irqflags.h>") to come about. Note for stable: use discretion when/if applying this. As mentioned, this bug may never have actually bitten anybody, and gcc may never have done the required code motion for it to possibly ever trigger in practice. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-12libata: Set max sector to 65535 for Slimtype DVD A DS8A8SH driveShan Hai
commit a32450e127fc6e5ca6d958ceb3cfea4d30a00846 upstream. The Slimtype DVD A DS8A8SH drive locks up when max sector is smaller than 65535, and the blow backtrace is observed on locking up: INFO: task flush-8:32:1130 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. flush-8:32 D ffffffff8180cf60 0 1130 2 0x00000000 ffff880273aef618 0000000000000046 0000000000000005 ffff880273aee000 ffff880273aee000 ffff880273aeffd8 ffff880273aee010 ffff880273aee000 ffff880273aeffd8 ffff880273aee000 ffff88026e842ea0 ffff880274a10000 Call Trace: [<ffffffff8168fc2d>] schedule+0x5d/0x70 [<ffffffff8168fccc>] io_schedule+0x8c/0xd0 [<ffffffff81324461>] get_request+0x731/0x7d0 [<ffffffff8133dc60>] ? cfq_allow_merge+0x50/0x90 [<ffffffff81083aa0>] ? wake_up_bit+0x40/0x40 [<ffffffff81320443>] ? bio_attempt_back_merge+0x33/0x110 [<ffffffff813248ea>] blk_queue_bio+0x23a/0x3f0 [<ffffffff81322176>] generic_make_request+0xc6/0x120 [<ffffffff81322308>] submit_bio+0x138/0x160 [<ffffffff811d7596>] ? bio_alloc_bioset+0x96/0x120 [<ffffffff811d1f61>] submit_bh+0x1f1/0x220 [<ffffffff811d48b8>] __block_write_full_page+0x228/0x340 [<ffffffff811d3650>] ? attach_nobh_buffers+0xc0/0xc0 [<ffffffff811d8960>] ? I_BDEV+0x10/0x10 [<ffffffff811d8960>] ? I_BDEV+0x10/0x10 [<ffffffff811d4ab6>] block_write_full_page_endio+0xe6/0x100 [<ffffffff811d4ae5>] block_write_full_page+0x15/0x20 [<ffffffff811d9268>] blkdev_writepage+0x18/0x20 [<ffffffff81142527>] __writepage+0x17/0x40 [<ffffffff811438ba>] write_cache_pages+0x34a/0x4a0 [<ffffffff81142510>] ? set_page_dirty+0x70/0x70 [<ffffffff81143a61>] generic_writepages+0x51/0x80 [<ffffffff81143ab0>] do_writepages+0x20/0x50 [<ffffffff811c9ed6>] __writeback_single_inode+0xa6/0x2b0 [<ffffffff811ca861>] writeback_sb_inodes+0x311/0x4d0 [<ffffffff811caaa6>] __writeback_inodes_wb+0x86/0xd0 [<ffffffff811cad43>] wb_writeback+0x1a3/0x330 [<ffffffff816916cf>] ? _raw_spin_lock_irqsave+0x3f/0x50 [<ffffffff811b8362>] ? get_nr_inodes+0x52/0x70 [<ffffffff811cb0ac>] wb_do_writeback+0x1dc/0x260 [<ffffffff8168dd34>] ? schedule_timeout+0x204/0x240 [<ffffffff811cb232>] bdi_writeback_thread+0x102/0x2b0 [<ffffffff811cb130>] ? wb_do_writeback+0x260/0x260 [<ffffffff81083550>] kthread+0xc0/0xd0 [<ffffffff81083490>] ? kthread_worker_fn+0x1b0/0x1b0 [<ffffffff8169a3ec>] ret_from_fork+0x7c/0xb0 [<ffffffff81083490>] ? kthread_worker_fn+0x1b0/0x1b0 The above trace was triggered by "dd if=/dev/zero of=/dev/sr0 bs=2048 count=32768" It was previously working by accident, since another bug introduced by 4dce8ba94c7 (libata: Use 'bool' return value for ata_id_XXX) caused all drives to use maxsect=65535. Signed-off-by: Shan Hai <shan.hai@windriver.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-12libata: Use integer return value for atapi_command_packet_setShan Hai
commit d8668fcb0b257d9fdcfbe5c172a99b8d85e1cd82 upstream. The function returns type of ATAPI drives so it should return integer value. The commit 4dce8ba94c7 (libata: Use 'bool' return value for ata_id_XXX) since v2.6.39 changed the type of return value from int to bool, the change would cause all of the ATAPI class drives to be treated as TYPE_TAPE and the max_sectors of the drives to be set to 65535 because of the commit f8d8e5799b7(libata: increase 128 KB / cmd limit for ATAPI tape drives), for the function would return true for all ATAPI class drives and the TYPE_TAPE is defined as 0x01. Signed-off-by: Shan Hai <shan.hai@windriver.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-05net: fix *_DIAG_MAX constantsAndrey Vagin
[ Upstream commit ae5fc98728c8bbbd6d7cab0b9781671fc4419c1b ] Follow the common pattern and define *_DIAG_MAX like: [...] __XXX_DIAG_MAX, }; Because everyone is used to do: struct nlattr *attrs[XXX_DIAG_MAX+1]; nla_parse([...], XXX_DIAG_MAX, [...] Reported-by: Thomas Graf <tgraf@suug.ch> Cc: "David S. Miller" <davem@davemloft.net> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Eric Dumazet <edumazet@google.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-05thermal: shorten too long mcast group nameMasatake YAMATO
[ Upstream commits 73214f5d9f33b79918b1f7babddd5c8af28dd23d and f1e79e208076ffe7bad97158275f1c572c04f5c7, the latter adds an assertion to genetlink to prevent this from happening again in the future. ] The original name is too long. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-05xen/blkback: correctly respond to unknown, non-native requestsDavid Vrabel
commit 0e367ae46503cfe7791460c8ba8434a5d60b2bd5 upstream. If the frontend is using a non-native protocol (e.g., a 64-bit frontend with a 32-bit backend) and it sent an unrecognized request, the request was not translated and the response would have the incorrect ID. This may cause the frontend driver to behave incorrectly or crash. Since the ID field in the request is always in the same place, regardless of the request type we can get the correct ID and make a valid response (which will report BLKIF_RSP_EOPNOTSUPP). This bug affected 64-bit SLES 11 guests when using a 32-bit backend. This guest does a BLKIF_OP_RESERVED_1 (BLKIF_OP_PACKET in the SLES source) and would crash in blkif_int() as the ID in the response would be invalid. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-05signal: Define __ARCH_HAS_SA_RESTORER so we know whether to clear sa_restorerBen Hutchings
Vaguely based on upstream commit 574c4866e33d 'consolidate kernel-side struct sigaction declarations'. flush_signal_handlers() needs to know whether sigaction::sa_restorer is defined, not whether SA_RESTORER is defined. Define the __ARCH_HAS_SA_RESTORER macro to indicate this. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-28exec: use -ELOOP for max recursion depthKees Cook
commit d740269867021faf4ce38a449353d2b986c34a67 upstream. To avoid an explosion of request_module calls on a chain of abusive scripts, fail maximum recursion with -ELOOP instead of -ENOEXEC. As soon as maximum recursion depth is hit, the error will fail all the way back up the chain, aborting immediately. This also has the side-effect of stopping the user's shell from attempting to reexecute the top-level file as a shell script. As seen in the dash source: if (cmd != path_bshell && errno == ENOEXEC) { *argv-- = cmd; *argv = cmd = path_bshell; goto repeat; } The above logic was designed for running scripts automatically that lacked the "#!" header, not to re-try failed recursion. On a legitimate -ENOEXEC, things continue to behave as the shell expects. Additionally, when tracking recursion, the binfmt handlers should not be involved. The recursion being tracked is the depth of calls through search_binary_handler(), so that function should be exclusively responsible for tracking the depth. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: halfdog <me@halfdog.net> Cc: P J P <ppandit@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-28drm/radeon: add Richland pci idsAlex Deucher
commit b75bbaa038ffc426e88ea3df6c4ae11834fc3e4f upstream. Reviewed-by: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-28inet: limit length of fragment queue hash table bucket listsHannes Frederic Sowa
[ Upstream commit 5a3da1fe9561828d0ca7eca664b16ec2b9bf0055 ] This patch introduces a constant limit of the fragment queue hash table bucket list lengths. Currently the limit 128 is choosen somewhat arbitrary and just ensures that we can fill up the fragment cache with empty packets up to the default ip_frag_high_thresh limits. It should just protect from list iteration eating considerable amounts of cpu. If we reach the maximum length in one hash bucket a warning is printed. This is implemented on the caller side of inet_frag_find to distinguish between the different users of inet_fragment.c. I dropped the out of memory warning in the ipv4 fragment lookup path, because we already get a warning by the slab allocator. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jesper Dangaard Brouer <jbrouer@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-28tcp: fix skb_availroom()Eric Dumazet
[ Upstream commit 16fad69cfe4adbbfa813de516757b87bcae36d93 ] Chrome OS team reported a crash on a Pixel ChromeBook in TCP stack : https://code.google.com/p/chromium/issues/detail?id=182056 commit a21d45726acac (tcp: avoid order-1 allocations on wifi and tx path) did a poor choice adding an 'avail_size' field to skb, while what we really needed was a 'reserved_tailroom' one. It would have avoided commit 22b4a4f22da (tcp: fix retransmit of partially acked frames) and this commit. Crash occurs because skb_split() is not aware of the 'avail_size' management (and should not be aware) Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Mukesh Agrawal <quiche@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-28ipv4: fix definition of FIB_TABLE_HASHSZDenis V. Lunev
[ Upstream commit 5b9e12dbf92b441b37136ea71dac59f05f2673a9 ] a long time ago by the commit commit 93456b6d7753def8760b423ac6b986eb9d5a4a95 Author: Denis V. Lunev <den@openvz.org> Date: Thu Jan 10 03:23:38 2008 -0800 [IPV4]: Unify access to the routing tables. the defenition of FIB_HASH_TABLE size has obtained wrong dependency: it should depend upon CONFIG_IP_MULTIPLE_TABLES (as was in the original code) but it was depended from CONFIG_IP_ROUTE_MULTIPATH This patch returns the situation to the original state. The problem was spotted by Tingwei Liu. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Tingwei Liu <tingw.liu@gmail.com> CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-20atmel_lcdfb: fix 16-bpp modes on older SOCsJohan Hovold
commit a79eac7165ed62114e6ca197195aa5060a54f137 upstream. Fix regression introduced by commit 787f9fd23283 ("atmel_lcdfb: support 16bit BGR:565 mode, remove unsupported 15bit modes") which broke 16-bpp modes for older SOCs which use IBGR:555 (msb is intensity) rather than BGR:565. Use SOC-type to determine the pixel layout. Tested on at91sam9263 and at91sam9g45. Acked-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-20perf,x86: fix link failure for non-Intel configsDavid Rientjes
commit 6c4d3bc99b3341067775efd4d9d13cc8e655fd7c upstream. Commit 1d9d8639c063 ("perf,x86: fix kernel crash with PEBS/BTS after suspend/resume") introduces a link failure since perf_restore_debug_store() is only defined for CONFIG_CPU_SUP_INTEL: arch/x86/power/built-in.o: In function `restore_processor_state': (.text+0x45c): undefined reference to `perf_restore_debug_store' Fix it by defining the dummy function appropriately. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-20perf,x86: fix kernel crash with PEBS/BTS after suspend/resumeStephane Eranian
commit 1d9d8639c063caf6efc2447f5f26aa637f844ff6 upstream. This patch fixes a kernel crash when using precise sampling (PEBS) after a suspend/resume. Turns out the CPU notifier code is not invoked on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly by the kernel and keeps it power-on/resume value of 0 causing any PEBS measurement to crash when running on CPU0. The workaround is to add a hook in the actual resume code to restore the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0, the DS_AREA will be restored twice but this is harmless. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-04pstore: Avoid deadlock in panic and emergency-restart pathSeiji Aguchi
commit 9f244e9cfd70c7c0f82d3c92ce772ab2a92d9f64 upstream. [Issue] When pstore is in panic and emergency-restart paths, it may be blocked in those paths because it simply takes spin_lock. This is an example scenario which pstore may hang up in a panic path: - cpuA grabs psinfo->buf_lock - cpuB panics and calls smp_send_stop - smp_send_stop sends IRQ to cpuA - after 1 second, cpuB gives up on cpuA and sends an NMI instead - cpuA is now in an NMI handler while still holding buf_lock - cpuB is deadlocked This case may happen if a firmware has a bug and cpuA is stuck talking with it more than one second. Also, this is a similar scenario in an emergency-restart path: - cpuA grabs psinfo->buf_lock and stucks in a firmware - cpuB kicks emergency-restart via either sysrq-b or hangcheck timer. And then, cpuB is deadlocked by taking psinfo->buf_lock again. [Solution] This patch avoids the deadlocking issues in both panic and emergency_restart paths by introducing a function, is_non_blocking_path(), to check if a cpu can be blocked in current path. With this patch, pstore is not blocked even if another cpu has taken a spin_lock, in those paths by changing from spin_lock_irqsave to spin_trylock_irqsave. In addition, according to a comment of emergency_restart() in kernel/sys.c, spin_lock shouldn't be taken in an emergency_restart path to avoid deadlock. This patch fits the comment below. <snip> /** * emergency_restart - reboot the system * * Without shutting down any hardware or taking any locks * reboot the system. This is called when we know we are in * trouble so this is our best effort to reboot. This is * safe to call in interrupt context. */ void emergency_restart(void) <snip> Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Acked-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: CAI Qian <caiqian@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-04unbreak automounter support on 64-bit kernel with 32-bit userspace (v2)Helge Deller
commit 4f4ffc3a5398ef9bdbb32db04756d7d34e356fcf upstream. automount-support is broken on the parisc architecture, because the existing #if list does not include a check for defined(__hppa__). The HPPA (parisc) architecture is similiar to other 64bit Linux targets where we have to define autofs_wqt_t (which is passed back and forth to user space) as int type which has a size of 32bit across 32 and 64bit kernels. During the discussion on the mailing list, H. Peter Anvin suggested to invert the #if list since only specific platforms (specifically those who do not have a 32bit userspace, like IA64 and Alpha) should have autofs_wqt_t as unsigned long type. This suggestion is probably the best way to go, since Arm64 (and maybe others?) seems to have a non-working automounter. So in the long run even for other new upcoming architectures this inverted check seem to be the best solution, since it will not require them to change this #if again (unless they are 64bit only). Signed-off-by: Helge Deller <deller@gmx.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Ian Kent <raven@themaw.net> Acked-by: Catalin Marinas <catalin.marinas@arm.com> CC: James Bottomley <James.Bottomley@HansenPartnership.com> CC: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-04quota: autoload the quota_v2 module for QFMT_VFS_V1 quota formatTheodore Ts'o
commit c3ad83d9efdfe6a86efd44945a781f00c879b7b4 upstream. Otherwise, ext4 file systems with the quota featured enable will get a very confusing "No such process" error message if the quota code is built as a module and the quota_v2 module has not been loaded. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-28vlan: adjust vlan_set_encap_proto() for its callersCong Wang
[ Upstream commit da8c87241c26aac81a64c7e4d21d438a33018f4e ] There are two places to call vlan_set_encap_proto(): vlan_untag() and __pop_vlan_tci(). vlan_untag() assumes skb->data points after mac addr, otherwise the following code vhdr = (struct vlan_hdr *) skb->data; vlan_tci = ntohs(vhdr->h_vlan_TCI); __vlan_hwaccel_put_tag(skb, vlan_tci); skb_pull_rcsum(skb, VLAN_HLEN); won't be correct. But __pop_vlan_tci() assumes points _before_ mac addr. In vlan_set_encap_proto(), it looks for some magic L2 value after mac addr: rawp = skb->data; if (*(unsigned short *) rawp == 0xFFFF) ... Therefore __pop_vlan_tci() is obviously wrong. A quick fix is avoiding using skb->data in vlan_set_encap_proto(), use 'vhdr+1' is always correct in both cases. Signed-off-by: Cong Wang <amwang@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Jesse Gross <jesse@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-28ipv6: use a stronger hash for tcpEric Dumazet
[ Upstream commit 08dcdbf6a7b9d14c2302c5bd0c5390ddf122f664 ] It looks like its possible to open thousands of TCP IPv6 sessions on a server, all landing in a single slot of TCP hash table. Incoming packets have to lookup sockets in a very long list. We should hash all bits from foreign IPv6 addresses, using a salt and hash mix, not a simple XOR. inet6_ehashfn() can also separately use the ports, instead of xoring them. Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-28net: fix a compile error when SOCK_REFCNT_DEBUG is enabledYing Xue
[ Upstream commit dec34fb0f5b7873de45132a84a3af29e61084a6b ] When SOCK_REFCNT_DEBUG is enabled, below build error is met: kernel/sysctl_binary.o: In function `sk_refcnt_debug_release': include/net/sock.h:1025: multiple definition of `sk_refcnt_debug_release' kernel/sysctl.o:include/net/sock.h:1025: first defined here kernel/audit.o: In function `sk_refcnt_debug_release': include/net/sock.h:1025: multiple definition of `sk_refcnt_debug_release' kernel/sysctl.o:include/net/sock.h:1025: first defined here make[1]: *** [kernel/built-in.o] Error 1 make: *** [kernel] Error 2 So we decide to make sk_refcnt_debug_release static to eliminate the error. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-28fb: Yet another band-aid for fixing lockdep messTakashi Iwai
commit e93a9a868792ad71cdd09d75e5a02d8067473c4e upstream. I've still got lockdep warnings even after Alan's patch, and it seems that yet more band aids are required to paper over similar paths for unbind_con_driver() and unregister_con_driver(). After this hack, lockdep warnings are finally gone. Signed-off-by: Takashi Iwai <tiwai@suse.de> Cc: Alan Cox <alan@linux.intel.com> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Cc: Jiri Kosina <jkosina@suse.cz> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-28fb: rework locking to fix lock ordering on takeoverAlan Cox
commit 50e244cc793d511b86adea24972f3a7264cae114 upstream. Adjust the console layer to allow a take over call where the caller already holds the locks. Make the fb layer lock in order. This is partly a band aid, the fb layer is terminally confused about the locking rules it uses for its notifiers it seems. [akpm@linux-foundation.org: remove stray non-ascii char, tidy comment] [akpm@linux-foundation.org: export do_take_over_console()] [airlied: cleanup another non-ascii char] Signed-off-by: Alan Cox <alan@linux.intel.com> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Jiri Kosina <jkosina@suse.cz> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-28vgacon/vt: clear buffer attributes when we load a 512 character font (v2)Dave Airlie
commit 2a2483072393b27f4336ab068a1f48ca19ff1c1e upstream. When we switch from 256->512 byte font rendering mode, it means the current contents of the screen is being reinterpreted. The bit that holds the high bit of the 9-bit font, may have been previously set, and thus the new font misrenders. The problem case we see is grub2 writes spaces with the bit set, so it ends up with data like 0x820, which gets reinterpreted into 0x120 char which the font translates into G with a circumflex. This flashes up on screen at boot and is quite ugly. A current side effect of this patch though is that any rendering on the screen changes color to a slightly darker color, but at least the screen no longer corrupts. v2: as suggested by hpa, always clear the attribute space, whether we are are going to or from 512 chars. Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-28ALSA: usb: Fix Processing Unit Descriptor parsersPawel Moll
commit b531f81b0d70ffbe8d70500512483227cc532608 upstream. Commit 99fc86450c439039d2ef88d06b222fd51a779176 "ALSA: usb-mixer: parse descriptors with structs" introduced a set of useful parsers for descriptors. Unfortunately the parses for the Processing Unit Descriptor came with a very subtle bug... Functions uac_processing_unit_iProcessing() and uac_processing_unit_specific() were indexing the baSourceID array forgetting the fields before the iProcessing and process-specific descriptors. The problem was observed with Sound Blaster Extigy mixer, where nNrModes in Up/Down-mix Processing Unit Descriptor was accessed at offset 10 of the descriptor (value 0) instead of offset 15 (value 7). In result the resulting control had interesting limit values: Simple mixer control 'Channel Routing Mode Select',0 Capabilities: volume volume-joined penum Playback channels: Mono Capture channels: Mono Limits: 0 - -1 Mono: -1 [100%] Fixed by starting from the bmControls, which was calculated correctly, instead of baSourceID. Now the mentioned control is fine: Simple mixer control 'Channel Routing Mode Select',0 Capabilities: volume volume-joined penum Playback channels: Mono Capture channels: Mono Limits: 0 - 6 Mono: 0 [0%] Signed-off-by: Pawel Moll <mail@pawelmoll.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-28mm: mmu_notifier: have mmu_notifiers use a global SRCU so they may safely ↵Sagi Grimberg
schedule commit 21a92735f660eaecf69a6f2e777f18463760ec32 upstream. With an RCU based mmu_notifier implementation, any callout to mmu_notifier_invalidate_range_{start,end}() or mmu_notifier_invalidate_page() would not be allowed to call schedule() as that could potentially allow a modification to the mmu_notifier structure while it is currently being used. Since srcu allocs 4 machine words per instance per cpu, we may end up with memory exhaustion if we use srcu per mm. So all mms share a global srcu. Note that during large mmu_notifier activity exit & unregister paths might hang for longer periods, but it is tolerable for current mmu_notifier clients. Signed-off-by: Sagi Grimberg <sagig@mellanox.co.il> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Haggai Eran <haggaie@mellanox.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-21printk: fix buffer overflow when calling log_prefix function from ↵Alexandre SIMON
call_console_drivers This patch corrects a buffer overflow in kernels from 3.0 to 3.4 when calling log_prefix() function from call_console_drivers(). This bug existed in previous releases but has been revealed with commit 162a7e7500f9664636e649ba59defe541b7c2c60 (2.6.39 => 3.0) that made changes about how to allocate memory for early printk buffer (use of memblock_alloc). It disappears with commit 7ff9554bb578ba02166071d2d487b7fc7d860d62 (3.4 => 3.5) that does a refactoring of printk buffer management. In log_prefix(), the access to "p[0]", "p[1]", "p[2]" or "simple_strtoul(&p[1], &endp, 10)" may cause a buffer overflow as this function is called from call_console_drivers by passing "&LOG_BUF(cur_index)" where the index must be masked to do not exceed the buffer's boundary. The trick is to prepare in call_console_drivers() a buffer with the necessary data (PRI field of syslog message) to be safely evaluated in log_prefix(). This patch can be applied to stable kernel branches 3.0.y, 3.2.y and 3.4.y. Without this patch, one can freeze a server running this loop from shell : $ export DUMMY=`cat /dev/urandom | tr -dc '12345AZERTYUIOPQSDFGHJKLMWXCVBNazertyuiopqsdfghjklmwxcvbn' | head -c255` $ while true do ; echo $DUMMY > /dev/kmsg ; done The "server freeze" depends on where memblock_alloc does allocate printk buffer : if the buffer overflow is inside another kernel allocation the problem may not be revealed, else the server may hangs up. Signed-off-by: Alexandre SIMON <Alexandre.Simon@univ-lorraine.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-14efi: Make 'efi_enabled' a function to query EFI facilitiesMatt Fleming
commit 83e68189745ad931c2afd45d8ee3303929233e7f upstream. Originally 'efi_enabled' indicated whether a kernel was booted from EFI firmware. Over time its semantics have changed, and it now indicates whether or not we are booted on an EFI machine with bit-native firmware, e.g. 64-bit kernel with 64-bit firmware. The immediate motivation for this patch is the bug report at, https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557 which details how running a platform driver on an EFI machine that is designed to run under BIOS can cause the machine to become bricked. Also, the following report, https://bugzilla.kernel.org/show_bug.cgi?id=47121 details how running said driver can also cause Machine Check Exceptions. Drivers need a new means of detecting whether they're running on an EFI machine, as sadly the expression, if (!efi_enabled) hasn't been a sufficient condition for quite some time. Users actually want to query 'efi_enabled' for different reasons - what they really want access to is the list of available EFI facilities. For instance, the x86 reboot code needs to know whether it can invoke the ResetSystem() function provided by the EFI runtime services, while the ACPI OSL code wants to know whether the EFI config tables were mapped successfully. There are also checks in some of the platform driver code to simply see if they're running on an EFI machine (which would make it a bad idea to do BIOS-y things). This patch is a prereq for the samsung-laptop fix patch. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: Corentin Chary <corentincj@iksaif.net> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Olof Johansson <olof@lixom.net> Cc: Peter Jones <pjones@redhat.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Steve Langasek <steve.langasek@canonical.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-11usb: Using correct way to clear usb3.0 device's remote wakeup feature.Lan Tianyu
commit 54a3ac0c9e5b7213daa358ce74d154352657353a upstream. Usb3.0 device defines function remote wakeup which is only for interface recipient rather than device recipient. This is different with usb2.0 device's remote wakeup feature which is defined for device recipient. According usb3.0 spec 9.4.5, the function remote wakeup can be modified by the SetFeature() requests using the FUNCTION_SUSPEND feature selector. This patch is to use correct way to disable usb3.0 device's function remote wakeup after suspend error and resuming. This should be backported to kernels as old as 3.4, that contain the commit 623bef9e03a60adc623b09673297ca7a1cdfb367 "USB/xhci: Enable remote wakeup for USB3 devices." Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-27ptrace: introduce signal_wake_up_state() and ptrace_signal_wake_up()Oleg Nesterov
commit 910ffdb18a6408e14febbb6e4b6840fd2c928c82 upstream. Cleanup and preparation for the next change. signal_wake_up(resume => true) is overused. None of ptrace/jctl callers actually want to wakeup a TASK_WAKEKILL task, but they can't specify the necessary mask. Turn signal_wake_up() into signal_wake_up_state(state), reintroduce signal_wake_up() as a trivial helper, and add ptrace_signal_wake_up() which adds __TASK_TRACED. This way ptrace_signal_wake_up() can work "inside" ptrace_request() even if the tracee doesn't have the TASK_WAKEKILL bit set. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21target: Add link_magic for fabric allow_link destination target_itemsNicholas Bellinger
commit 0ff8754981261a80f4b77db2536dfea92c2d4539 upstream. This patch adds [dev,lun]_link_magic value assignment + checks within generic target_fabric_port_link() and target_fabric_mappedlun_link() code to ensure destination config_item *target_item sent from configfs_symlink() -> config_item_operations->allow_link() is the underlying se_device->dev_group and se_lun->lun_group that we expect to symlink. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17libceph: remove 'osdtimeout' optionSage Weil
This would reset a connection with any OSD that had an outstanding request that was taking more than N seconds. The idea was that if the OSD was buggy, the client could compensate by resending the request. In reality, this only served to hide server bugs, and we haven't actually seen such a bug in quite a while. Moreover, the userspace client code never did this. More importantly, often the request is taking a long time because the OSD is trying to recover, or overloaded, and killing the connection and retrying would only make the situation worse by giving the OSD more work to do. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> (cherry picked from commit 83aff95eb9d60aff5497e9f44a2ae906b86d8e88) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPTMichal Hocko
commit 53a59fc67f97374758e63a9c785891ec62324c81 upstream. Since commit e303297e6c3a ("mm: extended batches for generic mmu_gather") we are batching pages to be freed until either tlb_next_batch cannot allocate a new batch or we are done. This works just fine most of the time but we can get in troubles with non-preemptible kernel (CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY) on large machines where too aggressive batching might lead to soft lockups during process exit path (exit_mmap) because there are no scheduling points down the free_pages_and_swap_cache path and so the freeing can take long enough to trigger the soft lockup. The lockup is harmless except when the system is setup to panic on softlockup which is not that unusual. The simplest way to work around this issue is to limit the maximum number of batches in a single mmu_gather. 10k of collected pages should be safe to prevent from soft lockups (we would have 2ms for one) even if they are all freed without an explicit scheduling point. This patch doesn't add any new explicit scheduling points because it relies on zap_pmd_range during page tables zapping which calls cond_resched per PMD. The following lockup has been reported for 3.0 kernel with a huge process (in order of hundreds gigs but I do know any more details). BUG: soft lockup - CPU#56 stuck for 22s! [kernel:31053] Modules linked in: af_packet nfs lockd fscache auth_rpcgss nfs_acl sunrpc mptctl mptbase autofs4 binfmt_misc dm_round_robin dm_multipath bonding cpufreq_conservative cpufreq_userspace cpufreq_powersave pcc_cpufreq mperf microcode fuse loop osst sg sd_mod crc_t10dif st qla2xxx scsi_transport_fc scsi_tgt netxen_nic i7core_edac iTCO_wdt joydev e1000e serio_raw pcspkr edac_core iTCO_vendor_support acpi_power_meter rtc_cmos hpwdt hpilo button container usbhid hid dm_mirror dm_region_hash dm_log linear uhci_hcd ehci_hcd usbcore usb_common scsi_dh_emc scsi_dh_alua scsi_dh_hp_sw scsi_dh_rdac scsi_dh dm_snapshot pcnet32 mii edd dm_mod raid1 ext3 mbcache jbd fan thermal processor thermal_sys hwmon cciss scsi_mod Supported: Yes CPU 56 Pid: 31053, comm: kernel Not tainted 3.0.31-0.9-default #1 HP ProLiant DL580 G7 RIP: 0010: _raw_spin_unlock_irqrestore+0x8/0x10 RSP: 0018:ffff883ec1037af0 EFLAGS: 00000206 RAX: 0000000000000e00 RBX: ffffea01a0817e28 RCX: ffff88803ffd9e80 RDX: 0000000000000200 RSI: 0000000000000206 RDI: 0000000000000206 RBP: 0000000000000002 R08: 0000000000000001 R09: ffff887ec724a400 R10: 0000000000000000 R11: dead000000200200 R12: ffffffff8144c26e R13: 0000000000000030 R14: 0000000000000297 R15: 000000000000000e FS: 00007ed834282700(0000) GS:ffff88c03f200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000068b240 CR3: 0000003ec13c5000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kernel (pid: 31053, threadinfo ffff883ec1036000, task ffff883ebd5d4100) Call Trace: release_pages+0xc5/0x260 free_pages_and_swap_cache+0x9d/0xc0 tlb_flush_mmu+0x5c/0x80 tlb_finish_mmu+0xe/0x50 exit_mmap+0xbd/0x120 mmput+0x49/0x120 exit_mm+0x122/0x160 do_exit+0x17a/0x430 do_group_exit+0x3d/0xb0 get_signal_to_deliver+0x247/0x480 do_signal+0x71/0x1b0 do_notify_resume+0x98/0xb0 int_signal+0x12/0x17 DWARF2 unwinder stuck at int_signal+0x12/0x17 Signed-off-by: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11PCI: Reduce Ricoh 0xe822 SD card reader base clock frequency to 50MHzAndy Lutomirski
commit 812089e01b9f65f90fc8fc670d8cce72a0e01fbb upstream. Otherwise it fails like this on cards like the Transcend 16GB SDHC card: mmc0: new SDHC card at address b368 mmcblk0: mmc0:b368 SDC 15.0 GiB mmcblk0: error -110 sending status command, retrying mmcblk0: error -84 transferring data, sector 0, nr 8, cmd response 0x900, card status 0xb0 Tested on my Lenovo x200 laptop. [bhelgaas: changelog] Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Chris Ball <cjb@laptop.org> CC: Manoj Iyer <manoj.iyer@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11tcp: implement RFC 5961 4.2Eric Dumazet
[ Upstream commit 0c24604b68fc7810d429d6c3657b6f148270e528 ] Implement the RFC 5691 mitigation against Blind Reset attack using SYN bit. Section 4.2 of RFC 5961 advises to send a Challenge ACK and drop incoming packet, instead of resetting the session. Add a new SNMP counter to count number of challenge acks sent in response to SYN packets. (netstat -s | grep TCPSYNChallenge) Remove obsolete TCPAbortOnSyn, since we no longer abort a TCP session because of a SYN flag. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kiran Kumar Kella <kkiran@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11tcp: implement RFC 5961 3.2Eric Dumazet
[ Upstream commit 282f23c6ee343126156dd41218b22ece96d747e3 ] Implement the RFC 5691 mitigation against Blind Reset attack using RST bit. Idea is to validate incoming RST sequence, to match RCV.NXT value, instead of previouly accepted window : (RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND) If sequence is in window but not an exact match, send a "challenge ACK", so that the other part can resend an RST with the appropriate sequence. Add a new sysctl, tcp_challenge_ack_limit, to limit number of challenge ACK sent per second. Add a new SNMP counter to count number of challenge acks sent. (netstat -s | grep TCPChallengeACK) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kiran Kumar Kella <kkiran@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and dccp_v4/6_request_recv_sockChristoph Paasch
[ Upstream commit e337e24d6624e74a558aa69071e112a65f7b5758 ] If in either of the above functions inet_csk_route_child_sock() or __inet_inherit_port() fails, the newsk will not be freed: unreferenced object 0xffff88022e8a92c0 (size 1592): comm "softirq", pid 0, jiffies 4294946244 (age 726.160s) hex dump (first 32 bytes): 0a 01 01 01 0a 01 01 02 00 00 00 00 a7 cc 16 00 ................ 02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8153d190>] kmemleak_alloc+0x21/0x3e [<ffffffff810ab3e7>] kmem_cache_alloc+0xb5/0xc5 [<ffffffff8149b65b>] sk_prot_alloc.isra.53+0x2b/0xcd [<ffffffff8149b784>] sk_clone_lock+0x16/0x21e [<ffffffff814d711a>] inet_csk_clone_lock+0x10/0x7b [<ffffffff814ebbc3>] tcp_create_openreq_child+0x21/0x481 [<ffffffff814e8fa5>] tcp_v4_syn_recv_sock+0x3a/0x23b [<ffffffff814ec5ba>] tcp_check_req+0x29f/0x416 [<ffffffff814e8e10>] tcp_v4_do_rcv+0x161/0x2bc [<ffffffff814eb917>] tcp_v4_rcv+0x6c9/0x701 [<ffffffff814cea9f>] ip_local_deliver_finish+0x70/0xc4 [<ffffffff814cec20>] ip_local_deliver+0x4e/0x7f [<ffffffff814ce9f8>] ip_rcv_finish+0x1fc/0x233 [<ffffffff814cee68>] ip_rcv+0x217/0x267 [<ffffffff814a7bbe>] __netif_receive_skb+0x49e/0x553 [<ffffffff814a7cc3>] netif_receive_skb+0x50/0x82 This happens, because sk_clone_lock initializes sk_refcnt to 2, and thus a single sock_put() is not enough to free the memory. Additionally, things like xfrm, memcg, cookie_values,... may have been initialized. We have to free them properly. This is fixed by forcing a call to tcp_done(), ending up in inet_csk_destroy_sock, doing the final sock_put(). tcp_done() is necessary, because it ends up doing all the cleanup on xfrm, memcg, cookie_values, xfrm,... Before calling tcp_done, we have to set the socket to SOCK_DEAD, to force it entering inet_csk_destroy_sock. To avoid the warning in inet_csk_destroy_sock, inet_num has to be set to 0. As inet_csk_destroy_sock does a dec on orphan_count, we first have to increase it. Calling tcp_done() allows us to remove the calls to tcp_clear_xmit_timer() and tcp_cleanup_congestion_control(). A similar approach is taken for dccp by calling dccp_done(). This is in the kernel since 093d282321 (tproxy: fix hash locking issue when using port redirection in __inet_inherit_port()), thus since version >= 2.6.37. Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11freezer: add missing mb's to freezer_count() and freezer_should_skip()Tejun Heo
commit dd67d32dbc5de299d70cc9e10c6c1e29ffa56b92 upstream. A task is considered frozen enough between freezer_do_not_count() and freezer_count() and freezers use freezer_should_skip() to test this condition. This supposedly works because freezer_count() always calls try_to_freezer() after clearing %PF_FREEZER_SKIP. However, there currently is nothing which guarantees that freezer_count() sees %true freezing() after clearing %PF_FREEZER_SKIP when freezing is in progress, and vice-versa. A task can escape the freezing condition in effect by freezer_count() seeing !freezing() and freezer_should_skip() seeing %PF_FREEZER_SKIP. This patch adds smp_mb()'s to freezer_count() and freezer_should_skip() such that either %true freezing() is visible to freezer_count() or !PF_FREEZER_SKIP is visible to freezer_should_skip(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11mm: Fix PageHead when !CONFIG_PAGEFLAGS_EXTENDEDChristoffer Dall
commit ad4b3fb7ff9940bcdb1e4cd62bd189d10fa636ba upstream. Unfortunately with !CONFIG_PAGEFLAGS_EXTENDED, (!PageHead) is false, and (PageHead) is true, for tail pages. If this is indeed the intended behavior, which I doubt because it breaks cache cleaning on some ARM systems, then the nomenclature is highly problematic. This patch makes sure PageHead is only true for head pages and PageTail is only true for tail pages, and neither is true for non-compound pages. [ This buglet seems ancient - seems to have been introduced back in Apr 2008 in commit 6a1e7f777f61: "pageflags: convert to the use of new macros". And the reason nobody noticed is because the PageHead() tests are almost all about just sanity-checking, and only used on pages that are actual page heads. The fact that the old code returned true for tail pages too was thus not really noticeable. - Linus ] Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu> Acked-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Will Deacon <Will.Deacon@arm.com> Cc: Steve Capper <Steve.Capper@arm.com> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11exec: do not leave bprm->interp on stackKees Cook
commit b66c5984017533316fd1951770302649baf1aa33 upstream. If a series of scripts are executed, each triggering module loading via unprintable bytes in the script header, kernel stack contents can leak into the command line. Normally execution of binfmt_script and binfmt_misc happens recursively. However, when modules are enabled, and unprintable bytes exist in the bprm->buf, execution will restart after attempting to load matching binfmt modules. Unfortunately, the logic in binfmt_script and binfmt_misc does not expect to get restarted. They leave bprm->interp pointing to their local stack. This means on restart bprm->interp is left pointing into unused stack memory which can then be copied into the userspace argv areas. After additional study, it seems that both recursion and restart remains the desirable way to handle exec with scripts, misc, and modules. As such, we need to protect the changes to interp. This changes the logic to require allocation for any changes to the bprm->interp. To avoid adding a new kmalloc to every exec, the default value is left as-is. Only when passing through binfmt_script or binfmt_misc does an allocation take place. For a proof of concept, see DoTest.sh from: http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/ Signed-off-by: Kees Cook <keescook@chromium.org> Cc: halfdog <me@halfdog.net> Cc: P J P <ppandit@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-12-17tmpfs: fix shared mempolicy leakMel Gorman
commit 18a2f371f5edf41810f6469cb9be39931ef9deb9 upstream. This fixes a regression in 3.7-rc, which has since gone into stable. Commit 00442ad04a5e ("mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()") changed get_vma_policy() to raise the refcount on a shmem shared mempolicy; whereas shmem_alloc_page() went on expecting alloc_page_vma() to drop the refcount it had acquired. This deserves a rework: but for now fix the leak in shmem_alloc_page(). Hugh: shmem_swapin() did not need a fix, but surely it's clearer to use the same refcounting there as in shmem_alloc_page(), delete its onstack mempolicy, and the strange mpol_cond_copy() and __mpol_cond_copy() - those were invented to let swapin_readahead() make an unknown number of calls to alloc_pages_vma() with one mempolicy; but since 00442ad04a5e, alloc_pages_vma() has kept refcount in balance, so now no problem. Reported-and-tested-by: Tommi Rantala <tt.rantala@gmail.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-12-03drm/radeon: add new SI pci idAlex Deucher
commit 0181bd5dea2ed0696f84591a92da0b6a1f1a2e62 upstream. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26libceph: drop declaration of ceph_con_get()Alex Elder
commit 261030215d970c62f799e6e508e3c68fc7ec2aa9 upstream. For some reason the declaration of ceph_con_get() and ceph_con_put() did not get deleted in this commit: d59315ca libceph: drop ceph_con_get/put helpers and nref member Clean that up. Signed-off-by: Alex Elder <elder@inktank.com> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26libceph: check for invalid mappingSage Weil
(cherry picked from commit d63b77f4c552cc3a20506871046ab0fcbc332609) If we encounter an invalid (e.g., zeroed) mapping, return an error and avoid a divide by zero. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26libceph: clean up con flagsSage Weil
(cherry picked from commit 4a8616920860920abaa51193146fe36b38ef09aa) Rename flags with CON_FLAG prefix, move the definitions into the c file, and (better) document their meaning. Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26libceph: replace connection state bits with statesSage Weil
(cherry picked from commit 8dacc7da69a491c515851e68de6036f21b5663ce) Use a simple set of 6 enumerated values for the socket states (CON_STATE_*) and use those instead of the state bits. All of the con->state checks are now under the protection of the con mutex, so this is safe. It also simplifies many of the state checks because we can check for anything other than the expected state instead of various bits for races we can think of. This appears to hold up well to stress testing both with and without socket failure injection on the server side. Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26libceph: prevent the race of incoming work during teardownGuanjun He
(cherry picked from commit a2a3258417eb6a1799cf893350771428875a8287) Add an atomic variable 'stopping' as flag in struct ceph_messenger, set this flag to 1 in function ceph_destroy_client(), and add the condition code in function ceph_data_ready() to test the flag value, if true(1), just return. Signed-off-by: Guanjun He <gjhe@suse.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26libceph: initialize msgpool message typesSage Weil
(cherry picked from commit d50b409fb8698571d8209e5adfe122e287e31290) Initialize the type field for messages in a msgpool. The caller was doing this for osd ops, but not for the reply messages. Reported-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26libceph: set peer name on con_open, not initSage Weil
(cherry picked from commit b7a9e5dd40f17a48a72f249b8bbc989b63bae5fd) The peer name may change on each open attempt, even when the connection is reused. Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26libceph: define and use an explicit CONNECTED stateAlex Elder
(cherry picked from commit e27947c767f5bed15048f4e4dad3e2eb69133697) There is no state explicitly defined when a ceph connection is fully operational. So define one. It's set when the connection sequence completes successfully, and is cleared when the connection gets closed. Be a little more careful when examining the old state when a socket disconnect event is reported. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>