summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2013-02-17efi: Clear EFI_RUNTIME_SERVICES rather than EFI_BOOT by "noefi" boot parameterSatoru Takeuchi
commit 1de63d60cd5b0d33a812efa455d5933bf1564a51 upstream. There was a serious problem in samsung-laptop that its platform driver is designed to run under BIOS and running under EFI can cause the machine to become bricked or can cause Machine Check Exceptions. Discussion about this problem: https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557 https://bugzilla.kernel.org/show_bug.cgi?id=47121 The patches to fix this problem: efi: Make 'efi_enabled' a function to query EFI facilities 83e68189745ad931c2afd45d8ee3303929233e7f samsung-laptop: Disable on EFI hardware e0094244e41c4d0c7ad69920681972fc45d8ce34 Unfortunately this problem comes back again if users specify "noefi" option. This parameter clears EFI_BOOT and that driver continues to run even if running under EFI. Refer to the document, this parameter should clear EFI_RUNTIME_SERVICES instead. Documentation/kernel-parameters.txt: =============================================================================== ... noefi [X86] Disable EFI runtime services support. ... =============================================================================== Documentation/x86/x86_64/uefi.txt: =============================================================================== ... - If some or all EFI runtime services don't work, you can try following kernel command line parameters to turn off some or all EFI runtime services. noefi turn off all EFI runtime services ... =============================================================================== Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Link: http://lkml.kernel.org/r/511C2C04.2070108@jp.fujitsu.com Cc: Matt Fleming <matt.fleming@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-17x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS.Jan Beulich
commit 13d2b4d11d69a92574a55bfd985cfb0ca77aebdc upstream. This fixes CVE-2013-0228 / XSA-42 Drew Jones while working on CVE-2013-0190 found that that unprivileged guest user in 32bit PV guest can use to crash the > guest with the panic like this: ------------- general protection fault: 0000 [#1] SMP last sysfs file: /sys/devices/vbd-51712/block/xvda/dev Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_netfront ext4 mbcache jbd2 xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 1250, comm: r Not tainted 2.6.32-356.el6.i686 #1 EIP: 0061:[<c0407462>] EFLAGS: 00010086 CPU: 0 EIP is at xen_iret+0x12/0x2b EAX: eb8d0000 EBX: 00000001 ECX: 08049860 EDX: 00000010 ESI: 00000000 EDI: 003d0f00 EBP: b77f8388 ESP: eb8d1fe0 DS: 0000 ES: 007b FS: 0000 GS: 00e0 SS: 0069 Process r (pid: 1250, ti=eb8d0000 task=c2953550 task.ti=eb8d0000) Stack: 00000000 0027f416 00000073 00000206 b77f8364 0000007b 00000000 00000000 Call Trace: Code: c3 8b 44 24 18 81 4c 24 38 00 02 00 00 8d 64 24 30 e9 03 00 00 00 8d 76 00 f7 44 24 08 00 00 02 80 75 33 50 b8 00 e0 ff ff 21 e0 <8b> 40 10 8b 04 85 a0 f6 ab c0 8b 80 0c b0 b3 c0 f6 44 24 0d 02 EIP: [<c0407462>] xen_iret+0x12/0x2b SS:ESP 0069:eb8d1fe0 general protection fault: 0000 [#2] ---[ end trace ab0d29a492dcd330 ]--- Kernel panic - not syncing: Fatal exception Pid: 1250, comm: r Tainted: G D --------------- 2.6.32-356.el6.i686 #1 Call Trace: [<c08476df>] ? panic+0x6e/0x122 [<c084b63c>] ? oops_end+0xbc/0xd0 [<c084b260>] ? do_general_protection+0x0/0x210 [<c084a9b7>] ? error_code+0x73/ ------------- Petr says: " I've analysed the bug and I think that xen_iret() cannot cope with mangled DS, in this case zeroed out (null selector/descriptor) by either xen_failsafe_callback() or RESTORE_REGS because the corresponding LDT entry was invalidated by the reproducer. " Jan took a look at the preliminary patch and came up a fix that solves this problem: "This code gets called after all registers other than those handled by IRET got already restored, hence a null selector in %ds or a non-null one that got loaded from a code or read-only data descriptor would cause a kernel mode fault (with the potential of crashing the kernel as a whole, if panic_on_oops is set)." The way to fix this is to realize that the we can only relay on the registers that IRET restores. The two that are guaranteed are the %cs and %ss as they are always fixed GDT selectors. Also they are inaccessible from user mode - so they cannot be altered. This is the approach taken in this patch. Another alternative option suggested by Jan would be to relay on the subtle realization that using the %ebp or %esp relative references uses the %ss segment. In which case we could switch from using %eax to %ebp and would not need the %ss over-rides. That would also require one extra instruction to compensate for the one place where the register is used as scaled index. However Andrew pointed out that is too subtle and if further work was to be done in this code-path it could escape folks attention and lead to accidents. Reviewed-by: Petr Matousek <pmatouse@redhat.com> Reported-by: Petr Matousek <pmatouse@redhat.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-17x86/mm: Check if PUD is large when validating a kernel addressMel Gorman
commit 0ee364eb316348ddf3e0dfcd986f5f13f528f821 upstream. A user reported the following oops when a backup process reads /proc/kcore: BUG: unable to handle kernel paging request at ffffbb00ff33b000 IP: [<ffffffff8103157e>] kern_addr_valid+0xbe/0x110 [...] Call Trace: [<ffffffff811b8aaa>] read_kcore+0x17a/0x370 [<ffffffff811ad847>] proc_reg_read+0x77/0xc0 [<ffffffff81151687>] vfs_read+0xc7/0x130 [<ffffffff811517f3>] sys_read+0x53/0xa0 [<ffffffff81449692>] system_call_fastpath+0x16/0x1b Investigation determined that the bug triggered when reading system RAM at the 4G mark. On this system, that was the first address using 1G pages for the virt->phys direct mapping so the PUD is pointing to a physical address, not a PMD page. The problem is that the page table walker in kern_addr_valid() is not checking pud_large() and treats the physical address as if it was a PMD. If it happens to look like pmd_none then it'll silently fail, probably returning zeros instead of real data. If the data happens to look like a present PMD though, it will be walked resulting in the oops above. This patch adds the necessary pud_large() check. Unfortunately the problem was not readily reproducible and now they are running the backup program without accessing /proc/kcore so the patch has not been validated but I think it makes sense. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.coM> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20130211145236.GX21389@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-17x86/apic: Work around boot failure on HP ProLiant DL980 G7 Server systemsStoney Wang
commit cb214ede7657db458fd0b2a25ea0b28dbf900ebc upstream. When a HP ProLiant DL980 G7 Server boots a regular kernel, there will be intermittent lost interrupts which could result in a hang or (in extreme cases) data loss. The reason is that this system only supports x2apic physical mode, while the kernel boots with a logical-cluster default setting. This bug can be worked around by specifying the "x2apic_phys" or "nox2apic" boot option, but we want to handle this system without requiring manual workarounds. The BIOS sets ACPI_FADT_APIC_PHYSICAL in FADT table. As all apicids are smaller than 255, BIOS need to pass the control to the OS with xapic mode, according to x2apic-spec, chapter 2.9. Current code handle x2apic when BIOS pass with xapic mode enabled: When user specifies x2apic_phys, or FADT indicates PHYSICAL: 1. During madt oem check, apic driver is set with xapic logical or xapic phys driver at first. 2. enable_IR_x2apic() will enable x2apic_mode. 3. if user specifies x2apic_phys on the boot line, x2apic_phys_probe() will install the correct x2apic phys driver and use x2apic phys mode. Otherwise it will skip the driver will let x2apic_cluster_probe to take over to install x2apic cluster driver (wrong one) even though FADT indicates PHYSICAL, because x2apic_phys_probe does not check FADT PHYSICAL. Add checking x2apic_fadt_phys in x2apic_phys_probe() to fix the problem. Signed-off-by: Stoney Wang <song-bo.wang@hp.com> [ updated the changelog and simplified the code ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1360263182-16226-1-git-send-email-yinghai@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-17x86: Do not leak kernel page mapping locationsKees Cook
commit e575a86fdc50d013bf3ad3aa81d9100e8e6cc60d upstream. Without this patch, it is trivial to determine kernel page mappings by examining the error code reported to dmesg[1]. Instead, declare the entire kernel memory space as a violation of a present page. Additionally, since show_unhandled_signals is enabled by default, switch branch hinting to the more realistic expectation, and unobfuscate the setting of the PF_PROT bit to improve readability. [1] http://vulnfactory.org/blog/2013/02/06/a-linux-memory-trick/ Reported-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Suggested-by: Brad Spengler <spender@grsecurity.net> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20130207174413.GA12485@www.outflux.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-17s390/timer: avoid overflow when programming clock comparatorHeiko Carstens
commit d911e03d097bdc01363df5d81c43f69432eb785c upstream. Since ed4f209 "s390/time: fix sched_clock() overflow" a new helper function is used to avoid overflows when converting TOD format values to nanosecond values. The kvm interrupt code formerly however only worked by accident because of an overflow. It tried to program a timer that would expire in more than ~29 years. Because of the old TOD-to-nanoseconds overflow bug the real expiry value however was much smaller, but now it isn't anymore. This however triggers yet another bug in the function that programs the clock comparator s390_next_ktime(): if the absolute "expires" value is after 2042 this will result in an overflow and the programmed value is lower than the current TOD value which immediatly triggers a clock comparator (= timer) interrupt. Since the timer isn't expired it will be programmed immediately again and so on... the result is a dead system. To fix this simply program the maximum possible value if an overflow is detected. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-11x86-64: Replace left over sti/cli in ia32 audit exit codeJan Beulich
commit 40a1ef95da85843696fc3ebe5fce39b0db32669f upstream. For some reason they didn't get replaced so far by their paravirt equivalents, resulting in code to be run with interrupts disabled that doesn't expect so (causing, in the observed case, a BUG_ON() to trigger) when syscall auditing is enabled. David (Cc-ed) came up with an identical fix, so likely this can be taken to count as an ack from him. Reported-by: Peter Moody <pmoody@google.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/5108E01902000078000BA9C5@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Tested-by: Peter Moody <pmoody@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-11powerpc/mm: Fix hash computation functionAneesh Kumar K.V
commit eda8eebdd153c48a4e2a3a3ac3cd9e2e31f5c6b3 upstream. The ASM version of hash computation function was truncating the upper bit. Make the ASM version similar to hpt_hash function. Remove masking vsid bits. Without this patch, we observed hang during bootup due to not satisfying page fault request correctly. The fault handler used wrong hash values to update the HPTE. Hence we kept looping with page fault. hash_page(ea=000001003e260008, access=203, trap=300 ip=3fff91787134 dsisr 42000000 The computed value of hash 000000000f22f390 update: avpnv=4003e46054003e00, hash=000000000722f390, f=80000006, psize: 2 ... BenH: The over-masking has been there for ever but only hurts with the new 64T support introduced in 3.7 Reported-by: Mike Qiu <qiudayu@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Tested-by: Mike Qiu <qiudayu@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03x86/Sandy Bridge: Sandy Bridge workaround depends on CONFIG_PCIH. Peter Anvin
commit e43b3cec711a61edf047adf6204d542f3a659ef8 upstream. early_pci_allowed() and read_pci_config_16() are only available if CONFIG_PCI is defined. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Abdallah Chatila <abdallah.chatila@ericsson.com>
2013-02-03x86, efi: Set runtime_version to the EFI spec revisionMatt Fleming
commit 712ba9e9afc4b3d3d6fa81565ca36fe518915c01 upstream. efi.runtime_version is erroneously being set to the value of the vendor's firmware revision instead of that of the implemented EFI specification. We can't deduce which EFI functions are available based on the revision of the vendor's firmware since the version scheme is likely to be unique to each vendor. What we really need to know is the revision of the implemented EFI specification, which is available in the EFI System Table header. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03efi, x86: Pass a proper identity mapping in efi_call_phys_prelogNathan Zimmer
commit b8f2c21db390273c3eaf0e5308faeaeb1e233840 upstream. Update efi_call_phys_prelog to install an identity mapping of all available memory. This corrects a bug on very large systems with more then 512 GB in which bios would not be able to access addresses above not in the mapping. The result is a crash that looks much like this. BUG: unable to handle kernel paging request at 000000effd870020 IP: [<0000000078bce331>] 0x78bce330 PGD 0 Oops: 0000 [#1] SMP Modules linked in: CPU 0 Pid: 0, comm: swapper/0 Tainted: G W 3.8.0-rc1-next-20121224-medusa_ntz+ #2 Intel Corp. Stoutland Platform RIP: 0010:[<0000000078bce331>] [<0000000078bce331>] 0x78bce330 RSP: 0000:ffffffff81601d28 EFLAGS: 00010006 RAX: 0000000078b80e18 RBX: 0000000000000004 RCX: 0000000000000004 RDX: 0000000078bcf958 RSI: 0000000000002400 RDI: 8000000000000000 RBP: 0000000078bcf760 R08: 000000effd870000 R09: 0000000000000000 R10: 0000000000000000 R11: 00000000000000c3 R12: 0000000000000030 R13: 000000effd870000 R14: 0000000000000000 R15: ffff88effd870000 FS: 0000000000000000(0000) GS:ffff88effe400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000effd870020 CR3: 000000000160c000 CR4: 00000000000006b0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task ffffffff81614400) Stack: 0000000078b80d18 0000000000000004 0000000078bced7b ffff880078b81fff 0000000000000000 0000000000000082 0000000078bce3a8 0000000000002400 0000000060000202 0000000078b80da0 0000000078bce45d ffffffff8107cb5a Call Trace: [<ffffffff8107cb5a>] ? on_each_cpu+0x77/0x83 [<ffffffff8102f4eb>] ? change_page_attr_set_clr+0x32f/0x3ed [<ffffffff81035946>] ? efi_call4+0x46/0x80 [<ffffffff816c5abb>] ? efi_enter_virtual_mode+0x1f5/0x305 [<ffffffff816aeb24>] ? start_kernel+0x34a/0x3d2 [<ffffffff816ae5ed>] ? repair_env_string+0x60/0x60 [<ffffffff816ae2be>] ? x86_64_start_reservations+0xba/0xc1 [<ffffffff816ae120>] ? early_idt_handlers+0x120/0x120 [<ffffffff816ae419>] ? x86_64_start_kernel+0x154/0x163 Code: Bad RIP value. RIP [<0000000078bce331>] 0x78bce330 RSP <ffffffff81601d28> CR2: 000000effd870020 ---[ end trace ead828934fef5eab ]--- Signed-off-by: Nathan Zimmer <nzimmer@sgi.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Robin Holt <holt@sgi.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03x86, efi: Fix 32-bit EFI handover protocol entry pointDavid Woodhouse
commit f791620fa7517e1045742c475a7f005db9a634b8 upstream. If the bootloader calls the EFI handover entry point as a standard function call, then it'll have a return address on the stack. We need to pop that before calling efi_main(), or the arguments will all be out of position on the stack. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Link: http://lkml.kernel.org/r/1358513837.2397.247.camel@shinybook.infradead.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03x86, efi: Fix display detection in EFI boot stubDavid Woodhouse
commit 70a479cbe80296d3113e65cc2f713a5101061daf upstream. When booting under OVMF we have precisely one GOP device, and it implements the ConOut protocol. We break out of the loop when we look at it... and then promptly abort because 'first_gop' never gets set. We should set first_gop *before* breaking out of the loop. Yes, it doesn't really mean "first" any more, but that doesn't matter. It's only a flag to indicate that a suitable GOP was found. In fact, we'd do just as well to initialise 'width' to zero in this function, then just check *that* instead of first_gop. But I'll do the minimal fix for now (and for stable@). Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Link: http://lkml.kernel.org/r/1358513837.2397.247.camel@shinybook.infradead.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03efi: Make 'efi_enabled' a function to query EFI facilitiesMatt Fleming
commit 83e68189745ad931c2afd45d8ee3303929233e7f upstream. Originally 'efi_enabled' indicated whether a kernel was booted from EFI firmware. Over time its semantics have changed, and it now indicates whether or not we are booted on an EFI machine with bit-native firmware, e.g. 64-bit kernel with 64-bit firmware. The immediate motivation for this patch is the bug report at, https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557 which details how running a platform driver on an EFI machine that is designed to run under BIOS can cause the machine to become bricked. Also, the following report, https://bugzilla.kernel.org/show_bug.cgi?id=47121 details how running said driver can also cause Machine Check Exceptions. Drivers need a new means of detecting whether they're running on an EFI machine, as sadly the expression, if (!efi_enabled) hasn't been a sufficient condition for quite some time. Users actually want to query 'efi_enabled' for different reasons - what they really want access to is the list of available EFI facilities. For instance, the x86 reboot code needs to know whether it can invoke the ResetSystem() function provided by the EFI runtime services, while the ACPI OSL code wants to know whether the EFI config tables were mapped successfully. There are also checks in some of the platform driver code to simply see if they're running on an EFI machine (which would make it a bad idea to do BIOS-y things). This patch is a prereq for the samsung-laptop fix patch. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: Corentin Chary <corentincj@iksaif.net> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Olof Johansson <olof@lixom.net> Cc: Peter Jones <pjones@redhat.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Steve Langasek <steve.langasek@canonical.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03x86/msr: Add capabilities checkAlan Cox
commit c903f0456bc69176912dee6dd25c6a66ee1aed00 upstream. At the moment the MSR driver only relies upon file system checks. This means that anything as root with any capability set can write to MSRs. Historically that wasn't very interesting but on modern processors the MSRs are such that writing to them provides several ways to execute arbitary code in kernel space. Sample code and documentation on doing this is circulating and MSR attacks are used on Windows 64bit rootkits already. In the Linux case you still need to be able to open the device file so the impact is fairly limited and reduces the security of some capability and security model based systems down towards that of a generic "root owns the box" setup. Therefore they should require CAP_SYS_RAWIO to prevent an elevation of capabilities. The impact of this is fairly minimal on most setups because they don't have heavy use of capabilities. Those using SELinux, SMACK or AppArmor rules might want to consider if their rulesets on the MSR driver could be tighter. Signed-off-by: Alan Cox <alan@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03s390/thp: implement pmdp_set_wrprotect()Gerald Schaefer
commit be3286507dab888d4aad9f91fd6ff5202b24cd5b upstream. On s390, an architecture-specific implementation of the function pmdp_set_wrprotect() is missing and the generic version is currently being used. The generic version does not flush the tlb as it would be needed on s390 when modifying an active pmd, which can lead to subtle tlb errors on s390 when using transparent hugepages. This patch adds an s390-specific implementation of pmdp_set_wrprotect() including the missing tlb flush. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03ARM: virt: simplify __hyp_stub_install epilogMarc Zyngier
commit d01723479e6a6c70c83295f7847477a016d5e14a upstream. __hyp_stub_install duplicates quite a bit of safe_svcmode_maskall by forcing the CPU back to SVC. This is unnecessary, as safe_svcmode_maskall is called just after. Furthermore, the way we build SPSR_hyp is buggy as we fail to mask the interrupts, leading to interesting behaviours on TC2 + UEFI. The fix is to simply remove this code and rely on safe_svcmode_maskall to do the right thing. Reviewed-by: Dave Martin <dave.martin@linaro.org> Reported-by: Harry Liebel <harry.liebel@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03ARM: virt: boot secondary CPUs through the right entry pointMarc Zyngier
commit 6e484be1ccca3ea495db45900fd42aac8d49d754 upstream. Secondary CPUs should use the __hyp_stub_install_secondary entry point, so boot mode inconsistencies can be detected. Acked-by: Dave Martin <dave.martin@linaro.org> Reported-by: Ian Molton <ian.molton@collabora.co.uk> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03ARM: virt: Avoid bx instruction for compatibility with <=ARMv4Dave Martin
commit a4a12e008e292a81d312659529b71be2026ab355 upstream. Non-T variants of ARMv4 do not support the bx instruction. However, __hyp_stub_install is always called from the same instruction set used to build the bulk of the kernel, so bx should not be necessary. This patch uses the traditional "mov pc" instead of bx. Signed-off-by: Dave Martin <dave.martin@linaro.org> [will: fixed up remaining bx instruction] Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03ARM: 7628/1: head.S: map one extra section for the ATAG/DTB areaNicolas Pitre
commit 6f16f4998f98e42e3f2dedf663cfb691ff0324af upstream. We currently use a temporary 1MB section aligned to a 1MB boundary for mapping the provided device tree until the final page table is created. However, if the device tree happens to cross that 1MB boundary, the end of it remains unmapped and the kernel crashes when it attempts to access it. Given no restriction on the location of that DTB, it could end up with only a few bytes mapped at the end of a section. Solve this issue by mapping two consecutive sections. Signed-off-by: Nicolas Pitre <nico@linaro.org> Tested-by: Sascha Hauer <s.hauer@pengutronix.de> Tested-by: Tomasz Figa <t.figa@samsung.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03ARM: 7627/1: Predicate preempt logic on PREEMP_COUNT not PREEMPT aloneStephen Boyd
commit 568dca15aa2a0f4ddee255894ec393a159f13147 upstream. Patrik Kluba reports that the preempt count becomes invalid due to the preempt_enable() call being unbalanced with a preempt_disable() call in the vfp assembly routines. This happens because preempt_enable() and preempt_disable() update preempt counts under PREEMPT_COUNT=y but the vfp assembly routines do so under PREEMPT=y. In a configuration where PREEMPT=n and DEBUG_ATOMIC_SLEEP=y, PREEMPT_COUNT=y and so the preempt_enable() call in VFP_bounce() keeps subtracting from the preempt count until it goes negative. Fix this by always using PREEMPT_COUNT to decided when to update preempt counts in the ARM assembly code. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Reported-by: Patrik Kluba <pkluba@dension.com> Tested-by: Patrik Kluba <pkluba@dension.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03ARM: S3C64XX: Fix up IRQ mapping for balblair on CragganmoreDimitris Papastamos
commit b86dc0d8c12bbb9fed3f392c284bdc7114ce00c1 upstream. We are using S3C_EINT(4) instead of S3C_EINT(5). Signed-off-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03ARM: at91: rm9200: remake the BGA as default versionJean-Christophe PLAGNIOL-VILLARD
commit 36224d0fe0f34cdde66a381708853ebadeac799c upstream. Make BGA as the default version as we are supposed to just have to specify when we use the PQFP version. Issue was existing since commit: 3e90772 (ARM: at91: fix at91rm9200 soc subtype handling). Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03ARM: OMAP2+: omap4-panda: add UART2 muxing for WiLink shared transportLuciano Coelho
commit 7662a9c60fee25d7234da4be6d8eab2b2ac88448 upstream. Add the UART2 muxing data to the board file (this used to be, erroneously, done in the bootloader). Signed-off-by: Luciano Coelho <coelho@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03ARM: DMA: Fix struct page iterator in dma_cache_maint() to work with sparsememRussell King
commit 15653371c67c3fbe359ae37b720639dd4c7b42c5 upstream. Subhash Jadavani reported this partial backtrace: Now consider this call stack from MMC block driver (this is on the ARMv7 based board): [<c001b50c>] (v7_dma_inv_range+0x30/0x48) from [<c0017b8c>] (dma_cache_maint_page+0x1c4/0x24c) [<c0017b8c>] (dma_cache_maint_page+0x1c4/0x24c) from [<c0017c28>] (___dma_page_cpu_to_dev+0x14/0x1c) [<c0017c28>] (___dma_page_cpu_to_dev+0x14/0x1c) from [<c0017ff8>] (dma_map_sg+0x3c/0x114) This is caused by incrementing the struct page pointer, and running off the end of the sparsemem page array. Fix this by incrementing by pfn instead, and convert the pfn to a struct page. Suggested-by: James Bottomley <JBottomley@Parallels.com> Tested-by: Subhash Jadavani <subhashj@codeaurora.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03powerpc/book3e: Disable interrupt after preempt_schedule_irqTiejun Chen
commit 572177d7c77db1981ba2563e01478126482c43bc upstream. In preempt case current arch_local_irq_restore() from preempt_schedule_irq() may enable hard interrupt but we really should disable interrupts when we return from the interrupt, and so that we don't get interrupted after loading SRR0/1. Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03KVM: PPC: Emulate dcbfAlexander Graf
commit d3286144c92ec876da9e30320afa875699b7e0f1 upstream. Guests can trigger MMIO exits using dcbf. Since we don't emulate cache incoherent MMIO, just do nothing and move on. Reported-by: Ben Collins <ben.c@servergy.com> Signed-off-by: Alexander Graf <agraf@suse.de> Tested-by: Ben Collins <ben.c@servergy.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-27arm64: elf: fix core dumping to match what glibc expectsWill Deacon
commit 9cf2b72b25f3f6a5a1a46a4f36037e66de52465c upstream. The kernel's internal definition of ELF_NGREG uses struct pt_regs, which means that we disagree with userspace on the size of coredumps since glibc correctly uses the user-visible struct user_pt_regs. This patch fixes our ELF_NGREG definition to use struct user_pt_regs and introduces our own ELF_CORE_COPY_REGS to convert between the user and kernel structure definitions. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-27ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILLOleg Nesterov
commit 9899d11f654474d2d54ea52ceaa2a1f4db3abd68 upstream. putreg() assumes that the tracee is not running and pt_regs_access() can safely play with its stack. However a killed tracee can return from ptrace_stop() to the low-level asm code and do RESTORE_REST, this means that debugger can actually read/modify the kernel stack until the tracee does SAVE_REST again. set_task_blockstep() can race with SIGKILL too and in some sense this race is even worse, the very fact the tracee can be woken up breaks the logic. As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace() call, this ensures that nobody can ever wakeup the tracee while the debugger looks at it. Not only this fixes the mentioned problems, we can do some cleanups/simplifications in arch_ptrace() paths. Probably ptrace_unfreeze_traced() needs more callers, for example it makes sense to make the tracee killable for oom-killer before access_process_vm(). While at it, add the comment into may_ptrace_stop() to explain why ptrace_stop() still can't rely on SIGKILL and signal_pending_state(). Reported-by: Salman Qazi <sqazi@google.com> Reported-by: Suleiman Souhlal <suleiman@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-27perf x86: revert 20b279 - require exclude_guest to use PEBS - kernel sideDavid Ahern
commit a706d965dcfdff73bf2bad1c300f8119900714c7 upstream. This patch is brought to you by the letter 'H'. Commit 20b279 breaks compatiblity with older perf binaries when run with precise modifier (:p or :pp) by requiring the exclude_guest attribute to be set. Older binaries default exclude_guest to 0 (ie., wanting guest-based samples) unless host only profiling is requested (:H modifier). The workaround for older binaries is to add H to the modifier list (e.g., -e cycles:ppH - toggles exclude_guest to 1). This was deemed unacceptable by Linus: https://lkml.org/lkml/2012/12/12/570 Between family in town and the fresh snow in Breckenridge there is no time left to be working on the proper fix for this over the holidays. In the New Year I have more pressing problems to resolve -- like some memory leaks in perf which are proving to be elusive -- although the aforementioned snow is probably why they are proving to be elusive. Either way I do not have any spare time to work on this and from the time I have managed to spend on it the solution is more difficult than just moving to a new exclude_guest flag (does not work) or flipping the logic to include_guest (which is not as trivial as one would think). So, two options: silently force exclude_guest on as suggested by Gleb which means no impact to older perf binaries or revert the original patch which caused the breakage. This patch does the latter -- reverts the original patch that introduced the regression. The problem can be revisited in the future as time allows. Signed-off-by: David Ahern <dsahern@gmail.com> Cc: Avi Kivity <avi@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> Link: http://lkml.kernel.org/r/1356749767-17322-1-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21xen: Fix stack corruption in xen_failsafe_callback for 32bit PVOPS guests.Frediano Ziglio
commit 9174adbee4a9a49d0139f5d71969852b36720809 upstream. This fixes CVE-2013-0190 / XSA-40 There has been an error on the xen_failsafe_callback path for failed iret, which causes the stack pointer to be wrong when entering the iret_exc error path. This can result in the kernel crashing. In the classic kernel case, the relevant code looked a little like: popl %eax # Error code from hypervisor jz 5f addl $16,%esp jmp iret_exc # Hypervisor said iret fault 5: addl $16,%esp # Hypervisor said segment selector fault Here, there are two identical addls on either option of a branch which appears to have been optimised by hoisting it above the jz, and converting it to an lea, which leaves the flags register unaffected. In the PVOPS case, the code looks like: popl_cfi %eax # Error from the hypervisor lea 16(%esp),%esp # Add $16 before choosing fault path CFI_ADJUST_CFA_OFFSET -16 jz 5f addl $16,%esp # Incorrectly adjust %esp again jmp iret_exc It is possible unprivileged userspace applications to cause this behaviour, for example by loading an LDT code selector, then changing the code selector to be not-present. At this point, there is a race condition where it is possible for the hypervisor to return back to userspace from an interrupt, fault on its own iret, and inject a failsafe_callback into the kernel. This bug has been present since the introduction of Xen PVOPS support in commit 5ead97c84 (xen: Core Xen implementation), in 2.6.23. Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21x86/Sandy Bridge: reserve pages when integrated graphics is presentJesse Barnes
commit a9acc5365dbda29f7be2884efb63771dc24bd815 upstream. SNB graphics devices have a bug that prevent them from accessing certain memory ranges, namely anything below 1M and in the pages listed in the table. So reserve those at boot if set detect a SNB gfx device on the CPU to avoid GPU hangs. Stephane Marchesin had a similar patch to the page allocator awhile back, but rather than reserving pages up front, it leaked them at allocation time. [ hpa: made a number of stylistic changes, marked arrays as static const, and made less verbose; use "memblock=debug" for full verbosity. ] Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: CAI Qian <caiqian@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21s390/time: fix sched_clock() overflowHeiko Carstens
commit ed4f20943cd4c7b55105c04daedf8d63ab6d499c upstream. Converting a 64 Bit TOD format value to nanoseconds means that the value must be divided by 4.096. In order to achieve that we multiply with 125 and divide by 512. When used within sched_clock() this triggers an overflow after appr. 417 days. Resulting in a sched_clock() return value that is much smaller than previously and therefore may cause all sort of weird things in subsystems that rely on a monotonic sched_clock() behaviour. To fix this implement a tod_to_ns() helper function which converts TOD values without overflow and call this function from both places that open coded the conversion: sched_clock() and kvm_s390_handle_wait(). Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21arm64: mm: only wrprotect clean ptes if they are presentWill Deacon
commit 02522463c84748b3b8ad770f9424bcfa70a5b4c4 upstream. Marking non-present ptes as read-only can corrupt file ptes, breaking things like swap and file mappings. This patch ensures that we only manipulate user pte bits when the pte is marked present. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21sh: Fix FDPIC binary loaderThomas Schwinge
commit 4a71997a3279a339e7336ea5d0cd27282e2dea44 upstream. Ensure that the aux table is properly initialized, even when optional features are missing. Without this, the FDPIC loader did not work. This was meant to be included in commit d5ab780305bb6d60a7b5a74f18cf84eb6ad153b1. Signed-off-by: Thomas Schwinge <thomas@codesourcery.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17KVM: PPC: 44x: fix DCR read/writeAlexander Graf
commit e43a028752fed049e4bd94ef895542f96d79fa74 upstream. When remembering the direction of a DCR transaction, we should write to the same variable that we interpret on later when doing vcpu_run again. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17Revert "MIPS: Optimise TLB handlers for MIPS32/64 R2 cores."Ralf Baechle
commit 9120963578320532dfb3a9a7947e8d05b39900b5 upstream. This reverts commit ff401e52100dcdc85e572d1ad376d3307b3fe28e. This breaks on MIPS64 R2 cores such as Broadcom's. Signed-off-by: Jayachandran C <jchandra@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17ALSA: pxa27x: fix ac97 warm resetMike Dunn
commit 3b4bc7bccc7857274705b05cf81a0c72cfd0b0dd upstream. This patch fixes some code that implements a work-around to a hardware bug in the ac97 controller on the pxa27x. A bug in the controller's warm reset functionality requires that the mfp used by the controller as the AC97_nRESET line be temporarily reconfigured as a generic output gpio (AF0) and manually held high for the duration of the warm reset cycle. This is what was done in the original code, but it was broken long ago by commit fb1bf8cd ([ARM] pxa: introduce processor specific pxa27x_assert_ac97reset()) which changed the mfp to a GPIO input instead of a high output. The fix requires the ac97 controller to obtain the gpio via gpio_request_one(), with arguments that configure the gpio as an output initially driven high. Tested on a palm treo 680 machine. Reportedly, this broken code only prevents a warm reset on hardware that lacks a pull-up on the line, which appears to be the case for me. Signed-off-by: Mike Dunn <mikedunn@newsguy.com> Signed-off-by: Igor Grinberg <grinberg@compulab.co.il> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17OMAP: board-files: fix i2c_bus for tfp410Tomi Valkeinen
commit ca2e16faa7378878c1522a7c1b6c38211de3331d upstream. The i2c handling in tfp410 driver, which handles converting parallel RGB to DVI, was changed in 958f2717b84e88bf833d996997fda8f73276f2af (OMAPDSS: TFP410: pdata rewrite). The patch changed what value the driver considers as invalid/undefined. Before the patch, 0 was the invalid value, but as 0 is a valid bus number, the patch changed this to -1. However, the fact was missed that many board files do not define the bus number at all, thus it's left to 0. This causes the driver to fail to get the i2c bus, exiting from the driver's probe with an error, meaning that the DVI output does not work for those boards. This patch fixes the issue by changing the i2c_bus number field in the driver's platform data from u16 to int, and setting the bus number to -1 in the board files for the boards that did not define the bus. The exception is devkit8000, for which the bus is set to 1, which is the correct bus for that board. The bug exists in v3.5+ kernels. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Reported-by: Thomas Weber <thomas@tomweber.eu> Cc: Thomas Weber <thomas@tomweber.eu> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17MIPS: Fix poweroff failure when HOTPLUG_CPU configured.Huacai Chen
commit 8add1ecb81f541ef2fcb0b85a5470ad9ecfb4a84 upstream. When poweroff machine, kernel_power_off() call disable_nonboot_cpus(). And if we have HOTPLUG_CPU configured, disable_nonboot_cpus() is not an empty function but attempt to actually disable the nonboot cpus. Since system state is SYSTEM_POWER_OFF, play_dead() won't be called and thus disable_nonboot_cpus() hangs. Therefore, we make this patch to avoid poweroff failure. Signed-off-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Hongliang Tao <taohl@lemote.com> Signed-off-by: Hua Yan <yanh@lemote.com> Cc: Yong Zhang <yong.zhang@windriver.com> Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com> Patchwork: https://patchwork.linux-mips.org/patch/4211/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17KVM: PPC: e500: fix allocation size error on g2h_tlb1_mapScott Wood
commit e400e72f250d2567e89c9bafb47ab91e8d9a15a2 upstream. We were only allocating half the bytes we need, which was made more obvious by a recent fix to the memset in clear_tlb1_bitmap(). Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17powerpc: Add missing NULL terminator to avoid boot panic on PPC40xGabor Juhos
commit e6449c9b2d90c1bd9a5985bf05ddebfd1631cd6b upstream. The missing NULL terminator can cause a panic on PPC405 boards during boot: Linux/PowerPC load: console=ttyS0,115200 root=/dev/mtdblock1 rootfstype=squashfs,jffs2 noinitrd init=/etc/preinit Finalizing device tree... flat tree at 0x6a5160 bootconsole [udbg0] enabled Page fault in user mode with in_atomic() = 1 mm = (null) NIP = c0275f50 MSR = fffffffe Oops: Weird page fault, sig: 11 [#1] PowerPC 40x Platform Modules linked in: NIP: c0275f50 LR: c0275f60 CTR: c0280000 REGS: c0275eb0 TRAP: 636f7265 Not tainted (3.7.1) MSR: fffffffe <VEC,VSX,EE,PR,FP,ME,SE,BE,IR,DR,PMM,RI> CR: c06a6190 XER: 00000001 TASK = c02662a8[0] 'swapper' THREAD: c0274000 GPR00: c0275ec0 c000c658 c027c4bf 00000000 c0275ee0 c000a0ec c020a1a8 c020a1f0 GPR08: c020f631 c020f404 c025f078 c025f080 c0275f10 Call Trace: ---[ end trace 31fd0ba7d8756001 ]--- Kernel panic - not syncing: Attempted to kill the idle task! The panic happens since commit 9597abe00c1bab2aedce6b49866bf6d1e81c9eed (sections: fix section conflicts in arch/powerpc), however the root cause of this is that the NULL terminator were not added in commit a4f740cf33f7f6c164bbde3c0cdbcc77b0c4997c (of/flattree: Add of_flat_dt_match() helper function). Signed-off-by: Gabor Juhos <juhosg@openwrt.org> Cc: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17powerpc/vdso: Remove redundant locking in update_vsyscall_tz()Shan Hai
commit ce73ec6db47af84d1466402781ae0872a9e7873c upstream. The locking in update_vsyscall_tz() is not only unnecessary because the vdso code copies the data unproteced in __kernel_gettimeofday() but also introduces a hard to reproduce race condition between update_vsyscall() and update_vsyscall_tz(), which causes user space process to loop forever in vdso code. The following patch removes the locking from update_vsyscall_tz(). Locking is not only unnecessary because the vdso code copies the data unprotected in __kernel_gettimeofday() but also erroneous because updating the tb_update_count is not atomic and introduces a hard to reproduce race condition between update_vsyscall() and update_vsyscall_tz(), which further causes user space process to loop forever in vdso code. The below scenario describes the race condition, x==0 Boot CPU other CPU proc_P: x==0 timer interrupt update_vsyscall x==1 x++;sync settimeofday update_vsyscall_tz x==2 x++;sync x==3 sync;x++ sync;x++ proc_P: x==3 (loops until x becomes even) Because the ++ operator would be implemented as three instructions and not atomic on powerpc. A similar change was made for x86 in commit 6c260d58634 ("x86: vdso: Remove bogus locking in update_vsyscall_tz") Signed-off-by: Shan Hai <shan.hai@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17powerpc: Fix CONFIG_RELOCATABLE=y CONFIG_CRASH_DUMP=n buildAnton Blanchard
commit 11ee7e99f35ecb15f59b21da6a82d96d2cd3fcc8 upstream. If we build a kernel with CONFIG_RELOCATABLE=y CONFIG_CRASH_DUMP=n, the kernel fails when we run at a non zero offset. It turns out we were incorrectly wrapping some of the relocatable kernel code with CONFIG_CRASH_DUMP. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17s390/kvm: Fix address space mixupChristian Borntraeger
commit ce6a04ac1b759beafc88dbc443ae5da867579eeb upstream. I was chasing down a bug of random validity intercepts on s390. (guest prefix page not mapped in the host virtual aspace). Turns out that the problem was a wrong address space control element. The cause was quite complex: During paging activity a DAT protection during SIE caused a program interrupt. Normally, the sie retry loop tries to catch all interrupts during and shortly before sie to rerun the setup. The problem is now that protection causes a suppressing program interrupt, causing the PSW to point to the instruction AFTER SIE in case of DAT protection. This confused the logic of the retry loop to not trigger, instead we jumped directly back to SIE after return from the program interrupt. (the protection fault handler itself did a rewind of the psw). This usually works quite well, but: If now the protection fault handler has to wait, another program might be scheduled in. Later on the sie process will be schedules in again. In that case the content of CR1 (primary address space) will be wrong because switch_to will put the user space ASCE into CR1 and not the guest ASCE. In addition the program parameter is also wrong for every protection fault of a guest, since we dont issue the SPP instruction. So lets also check for PSW == instruction after SIE in the program check handler. Instead of expensively checking all program interruption codes that might be suppressing we assume that a program interrupt pointing after SIE was always a program interrupt in SIE. (Otherwise we have a kernel bug anyway). We also have to compensate the rewinding, since the C-level handlers will do that. Therefore we need to add a nop with the same length as SIE before the sie_loop. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> CC: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17s390/kvm: dont announce RRBM supportChristian Borntraeger
commit 87cac8f879a5ecd7109dbe688087e8810b3364eb upstream. Newer kernels (linux-next with the transparent huge page patches) use rrbm if the feature is announced via feature bit 66. RRBM will cause intercepts, so KVM does not handle it right now, causing an illegal instruction in the guest. The easy solution is to disable the feature bit for the guest. This fixes bugs like: Kernel BUG at 0000000000124c2a [verbose debug info unavailable] illegal operation: 0001 [#1] SMP Modules linked in: virtio_balloon virtio_net ipv6 autofs4 CPU: 0 Not tainted 3.5.4 #1 Process fmempig (pid: 659, task: 000000007b712fd0, ksp: 000000007bed3670) Krnl PSW : 0704d00180000000 0000000000124c2a (pmdp_clear_flush_young+0x5e/0x80) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3 00000000003cc000 0000000000000004 0000000000000000 0000000079800000 0000000000040000 0000000000000000 000000007bed3918 000000007cf40000 0000000000000001 000003fff7f00000 000003d281a94000 000000007bed383c 000000007bed3918 00000000005ecbf8 00000000002314a6 000000007bed36e0 Krnl Code:>0000000000124c2a: b9810025 ogr %r2,%r5 0000000000124c2e: 41343000 la %r3,0(%r4,%r3) 0000000000124c32: a716fffa brct %r1,124c26 0000000000124c36: b9010022 lngr %r2,%r2 0000000000124c3a: e3d0f0800004 lg %r13,128(%r15) 0000000000124c40: eb22003f000c srlg %r2,%r2,63 [ 2150.713198] Call Trace: [ 2150.713223] ([<00000000002312c4>] page_referenced_one+0x6c/0x27c) [ 2150.713749] [<0000000000233812>] page_referenced+0x32a/0x410 [...] Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> CC: Alex Graf <agraf@suse.de> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11CRIS: fix I/O macrosCorey Minyard
commit c24bf9b4cc6a0f330ea355d73bfdf1dae7e63a05 upstream. The inb/outb macros for CRIS are broken from a number of points of view, missing () around parameters and they have an unprotected if statement in them. This was breaking the compile of IPMI on CRIS and thus I was being annoyed by build regressions, so I fixed them. Plus I don't think they would have worked at all, since the data values were missing "&" and the outsl had a "3" instead of a "4" for the size. From what I can tell, this stuff is not used at all, so this can't be any more broken than it was before, anyway. Signed-off-by: Corey Minyard <cminyard@mvista.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Mikael Starvik <starvik@axis.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11PCI: Work around Stratus ftServer broken PCIe hierarchy (fix DMI check)Myron Stowe
commit 1278998f8ff6d66044ed00b581bbf14aacaba215 upstream. Commit 284f5f9 was intended to disable the "only_one_child()" optimization on Stratus ftServer systems, but its DMI check is wrong. It looks for DMI_SYS_VENDOR that contains "ftServer", when it should look for DMI_SYS_VENDOR containing "Stratus" and DMI_PRODUCT_NAME containing "ftServer". Tested on Stratus ftServer 6400. Reported-by: Fadeeva Marina <astarta@rat.ru> Reference: https://bugzilla.kernel.org/show_bug.cgi?id=51331 Signed-off-by: Myron Stowe <myron.stowe@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11ARM: 7606/1: cache: flush to LoUU instead of LoUIS on uniprocessor CPUsWill Deacon
commit d056a699dd3d9366dd3b4d9996e7848209199cda upstream. flush_cache_louis flushes the D-side caches to the point of unification inner-shareable. On uniprocessor CPUs, this is defined as zero and therefore no flushing will take place. Rather than invent a new interface for UP systems, instead use our SMP_ON_UP patching code to read the LoUU from the CLIDR instead. Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com> Tested-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11ARM: 7607/1: realview: fix private peripheral memory base for EB rev. B boardsWill Deacon
commit e6ee4b2b57a8e0d8e551031173de080b338d3969 upstream. Commit 34ae6c96a6a7 ("ARM: 7298/1: realview: fix mapping of MPCore private memory region") accidentally broke the definition for the base address of the private peripheral region on revision B Realview-EB boards. This patch uses the correct address for REALVIEW_EB11MP_PRIV_MEM_BASE. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>