summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2012-05-11percpu, x86: don't use PMD_SIZE as embedded atom_size on 32bitTejun Heo
commit d5e28005a1d2e67833852f4c9ea8ec206ea3ff85 upstream. With the embed percpu first chunk allocator, x86 uses either PAGE_SIZE or PMD_SIZE for atom_size. PMD_SIZE is used when CPU supports PSE so that percpu areas are aligned to PMD mappings and possibly allow using PMD mappings in vmalloc areas in the future. Using larger atom_size doesn't waste actual memory; however, it does require larger vmalloc space allocation later on for !first chunks. With reasonably sized vmalloc area, PMD_SIZE shouldn't be a problem but x86_32 at this point is anything but reasonable in terms of address space and using larger atom_size reportedly leads to frequent percpu allocation failures on certain setups. As there is no reason to not use PMD_SIZE on x86_64 as vmalloc space is aplenty and most x86_64 configurations support PSE, fix the issue by always using PMD_SIZE on x86_64 and PAGE_SIZE on x86_32. v2: drop cpu_has_pse test and make x86_64 always use PMD_SIZE and x86_32 PAGE_SIZE as suggested by hpa. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Yanmin Zhang <yanmin.zhang@intel.com> Reported-by: ShuoX Liu <shuox.liu@intel.com> Acked-by: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <4F97BA98.6010001@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11xen/pci: don't use PCI BIOS service for configuration space accessesDavid Vrabel
commit 76a8df7b49168509df02461f83fab117a4a86e08 upstream. The accessing PCI configuration space with the PCI BIOS32 service does not work in PV guests. On systems without MMCONFIG or where the BIOS hasn't marked the MMCONFIG region as reserved in the e820 map, the BIOS service is probed (even though direct access is preferred) and this hangs. Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> [v1: Fixed compile error when CONFIG_PCI is not set] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11xen/pte: Fix crashes when trying to see non-existent PGD/PMD/PUD/PTEsKonrad Rzeszutek Wilk
commit b7e5ffe5d83fa40d702976d77452004abbe35791 upstream. If I try to do "cat /sys/kernel/debug/kernel_page_tables" I end up with: BUG: unable to handle kernel paging request at ffffc7fffffff000 IP: [<ffffffff8106aa51>] ptdump_show+0x221/0x480 PGD 0 Oops: 0000 [#1] SMP CPU 0 .. snip.. RAX: 0000000000000000 RBX: ffffc00000000fff RCX: 0000000000000000 RDX: 0000800000000000 RSI: 0000000000000000 RDI: ffffc7fffffff000 which is due to the fact we are trying to access a PFN that is not accessible to us. The reason (at least in this case) was that PGD[256] is set to __HYPERVISOR_VIRT_START which was setup (by the hypervisor) to point to a read-only linear map of the MFN->PFN array. During our parsing we would get the MFN (a valid one), try to look it up in the MFN->PFN tree and find it invalid and return ~0 as PFN. Then pte_mfn_to_pfn would happilly feed that in, attach the flags and return it back to the caller. 'ptdump_show' bitshifts it and gets and invalid value that it tries to dereference. Instead of doing all of that, we detect the ~0 case and just return !_PAGE_PRESENT. This bug has been in existence .. at least until 2.6.37 (yikes!) Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11ARM: 7414/1: SMP: prevent use of the console when using idmap_pgdColin Cross
commit fde165b2a29673aabf18ceff14dea1f1cfb0daad upstream. Commit 4e8ee7de227e3ab9a72040b448ad728c5428a042 (ARM: SMP: use idmap_pgd for mapping MMU enable during secondary booting) switched secondary boot to use idmap_pgd, which is initialized during early_initcall, instead of a page table initialized during __cpu_up. This causes idmap_pgd to contain the static mappings but be missing all dynamic mappings. If a console is registered that creates a dynamic mapping, the printk in secondary_start_kernel will trigger a data abort on the missing mapping before the exception handlers have been initialized, leading to a hang. Initial boot is not affected because no consoles have been registered, and resume is usually not affected because the offending console is suspended. Onlining a cpu with hotplug triggers the problem. A workaround is to the printk in secondary_start_kernel until after the page tables have been switched back to init_mm. Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11ARM: 7410/1: Add extra clobber registers for assembly in kernel_execveTim Bird
commit e787ec1376e862fcea1bfd523feb7c5fb43ecdb9 upstream. The inline assembly in kernel_execve() uses r8 and r9. Since this code sequence does not return, it usually doesn't matter if the register clobber list is accurate. However, I saw a case where a particular version of gcc used r8 as an intermediate for the value eventually passed to r9. Because r8 is used in the inline assembly, and not mentioned in the clobber list, r9 was set to an incorrect value. This resulted in a kernel panic on execution of the first user-space program in the system. r9 is used in ret_to_user as the thread_info pointer, and if it's wrong, bad things happen. Signed-off-by: Tim Bird <tim.bird@am.sony.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11x86, relocs: Remove an unused variableKusanagi Kouichi
commit 7c77cda0fe742ed07622827ce80963bbeebd1e3f upstream. sh_symtab is set but not used. [ hpa: putting this in urgent because of the sheer harmlessness of the patch: it quiets a build warning but does not change any generated code. ] Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp> Link: http://lkml.kernel.org/r/20120401082932.D5E066FC03D@msa105.auone-net.jp Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11ARM: 7406/1: hotplug: copy the affinity mask when forcefully migrating IRQsWill Deacon
commit 5e7371ded05adfcfcee44a8bc070bfc37979b8f2 upstream. When a CPU is hotplugged off, we migrate any IRQs currently affine to it away and onto another online CPU by calling the irq_set_affinity function of the relevant interrupt controller chip. This function returns either IRQ_SET_MASK_OK or IRQ_SET_MASK_OK_NOCOPY, to indicate whether irq_data.affinity was updated. If we are forcefully migrating an interrupt (because the affinity mask no longer identifies any online CPUs) then we should update the IRQ affinity mask to reflect the new CPU set. Failure to do so can potentially leave /proc/irq/n/smp_affinity identifying only offline CPUs, which may confuse userspace IRQ balancing daemons. This patch updates migrate_one_irq to copy the affinity mask when the interrupt chip returns IRQ_SET_MASK_OK after forcefully changing the affinity of an interrupt. Reported-by: Leif Lindholm <leif.lindholm@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11ARM: 7403/1: tls: remove covert channel via TPIDRURWWill Deacon
commit 6a1c53124aa161eb624ce7b1e40ade728186d34c upstream. TPIDRURW is a user read/write register forming part of the group of thread registers in more recent versions of the ARM architecture (~v6+). Currently, the kernel does not touch this register, which allows tasks to communicate covertly by reading and writing to the register without context-switching affecting its contents. This patch clears TPIDRURW when TPIDRURO is updated via the set_tls macro, which is called directly from __switch_to. Since the current behaviour makes the register useless to userspace as far as thread pointers are concerned, simply clearing the register (rather than saving and restoring it) will not cause any problems to userspace. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11ARM: 7398/1: l2x0: only write to debug registers on PL310Will Deacon
commit ab4d536890853ab6675ede65db40e2c0980cb0ea upstream. PL310 errata #588369 and #727915 require writes to the debug registers of the cache controller to work around known problems. Writing these registers on L220 may cause deadlock, so ensure that we only perform this operation when we identify a PL310 at probe time. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11ARM: 7397/1: l2x0: only apply workaround for erratum #753970 on PL310Will Deacon
commit f154fe9b806574437b47f08e924ad10c0e240b23 upstream. The workaround for PL310 erratum #753970 can lead to deadlock on systems with an L220 cache controller. This patch makes the workaround effective only when the cache controller is identified as a PL310 at probe time. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11ARM: 7396/1: errata: only handle ARM erratum #326103 on affected coresWill Deacon
commit f0c4b8d653f5ee091fb8d4d02ed7eaad397491bb upstream. Erratum #326103 ("FSR write bit incorrect on a SWP to read-only memory") only affects the ARM 1136 core prior to r1p0. The workaround disassembles the faulting instruction to determine whether it was a read or write access on all v6 cores. An issue has been reported on the ARM 11MPCore whereby loading the faulting instruction may happen in parallel with that page being unmapped, resulting in a deadlock due to the lack of TLB broadcasting in hardware: http://lists.infradead.org/pipermail/linux-arm-kernel/2012-March/091561.html This patch limits the workaround so that it is only used on affected cores, which are known to be UP only. Other v6 cores can rely on the FSR to indicate the access type correctly. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11xen/smp: Fix crash when booting with ACPI hotplug CPUs.Konrad Rzeszutek Wilk
commit cf405ae612b0f7e2358db7ff594c0e94846137aa upstream. When we boot on a machine that can hotplug CPUs and we are using 'dom0_max_vcpus=X' on the Xen hypervisor line to clip the amount of CPUs available to the initial domain, we get this: (XEN) Command line: com1=115200,8n1 dom0_mem=8G noreboot dom0_max_vcpus=8 sync_console mce_verbosity=verbose console=com1,vga loglvl=all guest_loglvl=all .. snip.. DMI: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.99.99.x032.072520111118 07/25/2011 .. snip. SMP: Allowing 64 CPUs, 32 hotplug CPUs installing Xen timer for CPU 7 cpu 7 spinlock event irq 361 NMI watchdog: disabled (cpu7): hardware events not enabled Brought up 8 CPUs .. snip.. [acpi processor finds the CPUs are not initialized and starts calling arch_register_cpu, which creates /sys/devices/system/cpu/cpu8/online] CPU 8 got hotplugged CPU 9 got hotplugged CPU 10 got hotplugged .. snip.. initcall 1_acpi_battery_init_async+0x0/0x1b returned 0 after 406 usecs calling erst_init+0x0/0x2bb @ 1 [and the scheduler sticks newly started tasks on the new CPUs, but said CPUs cannot be initialized b/c the hypervisor has limited the amount of vCPUS to 8 - as per the dom0_max_vcpus=8 flag. The spinlock tries to kick the other CPU, but the structure for that is not initialized and we crash.] BUG: unable to handle kernel paging request at fffffffffffffed8 IP: [<ffffffff81035289>] xen_spin_lock+0x29/0x60 PGD 180d067 PUD 180e067 PMD 0 Oops: 0002 [#1] SMP CPU 7 Modules linked in: Pid: 1, comm: swapper/0 Not tainted 3.4.0-rc2upstream-00001-gf5154e8 #1 Intel Corporation S2600CP/S2600CP RIP: e030:[<ffffffff81035289>] [<ffffffff81035289>] xen_spin_lock+0x29/0x60 RSP: e02b:ffff8801fb9b3a70 EFLAGS: 00010282 With this patch, we cap the amount of vCPUS that the initial domain can run, to exactly what dom0_max_vcpus=X has specified. In the future, if there is a hypercall that will allow a running domain to expand past its initial set of vCPUS, this patch should be re-evaluated. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11xen: correctly check for pending events when restoring irq flagsDavid Vrabel
commit 7eb7ce4d2e8991aff4ecb71a81949a907ca755ac upstream. In xen_restore_fl_direct(), xen_force_evtchn_callback() was being called even if no events were pending. This resulted in (depending on workload) about a 100 times as many xen_version hypercalls as necessary. Fix this by correcting the sense of the conditional jump. This seems to give a significant performance benefit for some workloads. There is some subtle tricksy "..since the check here is trying to check both pending and masked in a single cmpw, but I think this is correct. It will call check_events now only when the combined mask+pending word is 0x0001 (aka unmasked, pending)." (Ian) Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11x86, apic: APIC code touches invalid MSR on P5 class machinesBryan O'Donoghue
commit cbf2829b61c136edcba302a5e1b6b40e97d32c00 upstream. Current APIC code assumes MSR_IA32_APICBASE is present for all systems. Pentium Classic P5 and friends didn't have this MSR. MSR_IA32_APICBASE was introduced as an architectural MSR by Intel @ P6. Code paths that can touch this MSR invalidly are when vendor == Intel && cpu-family == 5 and APIC bit is set in CPUID - or when you simply pass lapic on the kernel command line, on a P5. The below patch stops Linux incorrectly interfering with the MSR_IA32_APICBASE for P5 class machines. Other code paths exist that touch the MSR - however those paths are not currently reachable for a conformant P5. Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linux.intel.com> Link: http://lkml.kernel.org/r/4F8EEDD3.1080404@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11x86, microcode: Fix sysfs warning during module unload on unsupported CPUsAndreas Herrmann
commit a956bd6f8583326b18348ab1452b4686778f785d upstream. Loading the microcode driver on an unsupported CPU and subsequently unloading the driver causes WARNING: at fs/sysfs/group.c:138 mc_device_remove+0x5f/0x70 [microcode]() Hardware name: 01972NG sysfs group ffffffffa00013d0 not found for kobject 'cpu0' Modules linked in: snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel btusb snd_hda_codec bluetooth thinkpad_acpi rfkill microcode(-) [last unloaded: cfg80211] Pid: 4560, comm: modprobe Not tainted 3.4.0-rc2-00002-g258f742 #5 Call Trace: [<ffffffff8103113b>] ? warn_slowpath_common+0x7b/0xc0 [<ffffffff81031235>] ? warn_slowpath_fmt+0x45/0x50 [<ffffffff81120e74>] ? sysfs_remove_group+0x34/0x120 [<ffffffffa00000ef>] ? mc_device_remove+0x5f/0x70 [microcode] [<ffffffff81331eb9>] ? subsys_interface_unregister+0x69/0xa0 [<ffffffff81563526>] ? mutex_lock+0x16/0x40 [<ffffffffa0000c3e>] ? microcode_exit+0x50/0x92 [microcode] [<ffffffff8107051d>] ? sys_delete_module+0x16d/0x260 [<ffffffff810a0065>] ? wait_iff_congested+0x45/0x110 [<ffffffff815656af>] ? page_fault+0x1f/0x30 [<ffffffff81565ba2>] ? system_call_fastpath+0x16/0x1b on recent kernels. This is due to commit 8a25a2fd126c ("cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem") which renders commit 6c53cbfced04 ("x86, microcode: Correct sysdev_add error path") useless. See http://marc.info/?l=linux-kernel&m=133416246406478 Avoid above warning by restoring the old driver behaviour before 6c53cbfced04 ("x86, microcode: Correct sysdev_add error path"). Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: http://lkml.kernel.org/r/20120411163849.GE4794@alberich.amd.com Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> [bwh: Backported to 3.2: deleted line uses sys_dev, not dev] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11ARM: OMAP1: DMTIMER: fix broken timer clock source selectionPaul Walmsley
commit 6aaec67da1e41a0752a2b903b989e73b9f02e182 upstream. DMTIMER source selection on OMAP1 is broken. omap1_dm_timer_set_src() tries to use __raw_{read,write}l() to read from and write to physical addresses, but those functions take virtual addresses. sparse caught this: arch/arm/mach-omap1/timer.c:50:13: warning: incorrect type in argument 1 (different base types) arch/arm/mach-omap1/timer.c:50:13: expected void const volatile [noderef] <asn:2>*<noident> arch/arm/mach-omap1/timer.c:50:13: got unsigned int arch/arm/mach-omap1/timer.c:52:9: warning: incorrect type in argument 1 (different base types) arch/arm/mach-omap1/timer.c:52:9: expected void const volatile [noderef] <asn:2>*<noident> arch/arm/mach-omap1/timer.c:52:9: got unsigned int Fix by using omap_{read,writel}(), just like the other users of the MOD_CONF_CTRL_1 register in the OMAP1 codebase. Of course, in the long term, removing omap_{read,write}l() is the appropriate thing to do; but this will take some work to do this cleanly. Looks like this was caused by 97933d6 (ARM: OMAP1: dmtimer: conversion to platform devices) that dangerously moved code and changed it in the same patch. Signed-off-by: Paul Walmsley <paul@pwsan.com> Cc: Tarun Kanti DebBarma <tarun.kanti@ti.com> [tony@atomide.com: updated comments to include the breaking commit] Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-04-22fix tlb flushing for page table pagesMartin Schwidefsky
commit cd94154cc6a28dd9dc271042c1a59c08d26da886 upstream. Git commit 36409f6353fc2d7b6516e631415f938eadd92ffa "use generic RCU page-table freeing code" introduced a tlb flushing bug. Partially revert the above git commit and go back to s390 specific page table flush code. For s390 the TLB can contain three types of entries, "normal" TLB page-table entries, TLB combined region-and-segment-table (CRST) entries and real-space entries. Linux does not use real-space entries which leaves normal TLB entries and CRST entries. The CRST entries are intermediate steps in the page-table translation called translation paths. For example a 4K page access in a three-level page table setup will create two CRST TLB entries and one page-table TLB entry. The advantage of that approach is that a page access next to the previous one can reuse the CRST entries and needs just a single read from memory to create the page-table TLB entry. The disadvantage is that the TLB flushing rules are more complicated, before any page-table may be freed the TLB needs to be flushed. In short: the generic RCU page-table freeing code is incorrect for the CRST entries, in particular the check for mm_users < 2 is troublesome. This is applicable to 3.0+ kernels. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-22sparc64: Fix bootup crash on sun4v.David S. Miller
commit 9e0daff30fd7ecf698e5d20b0fa7f851e427cca5 upstream. The DS driver registers as a subsys_initcall() but this can be too early, in particular this risks registering before we've had a chance to allocate and setup module_kset in kernel/params.c which is performed also as a subsyts_initcall(). Register DS using device_initcall() insteal. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-22sparc64: Eliminate obsolete __handle_softirq() functionPaul E. McKenney
commit 3d3eeb2ef26112a200785e5fca58ec58dd33bf1e upstream. The invocation of softirq is now handled by irq_exit(), so there is no need for sparc64 to invoke it on the trap-return path. In fact, doing so is a bug because if the trap occurred in the idle loop, this invocation can result in lockdep-RCU failures. The problem is that RCU ignores idle CPUs, and the sparc64 trap-return path to the softirq handlers fails to tell RCU that the CPU must be considered non-idle while those handlers are executing. This means that RCU is ignoring any RCU read-side critical sections in those handlers, which in turn means that RCU-protected data can be yanked out from under those read-side critical sections. The shiny new lockdep-RCU ability to detect RCU read-side critical sections that RCU is ignoring located this problem. The fix is straightforward: Make sparc64 stop manually invoking the softirq handlers. Reported-by: Meelis Roos <mroos@linux.ee> Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-22ia64: fix futex_atomic_cmpxchg_inatomic()Luck, Tony
commit c76f39bddb84f93f70a5520d9253ec0317bec216 upstream. Michel Lespinasse cleaned up the futex calling conventions in commit 37a9d912b24f ("futex: Sanitize cmpxchg_futex_value_locked API"). But the ia64 implementation was subtly broken. Gcc does not know that register "r8" will be updated by the fault handler if the cmpxchg instruction takes an exception. So it feels safe in letting the initialization of r8 slide to after the cmpxchg. Result: we always return 0 whether the user address faulted or not. Fix by moving the initialization of r8 into the __asm__ code so gcc won't move it. Reported-by: <emeric.maschino@gmail.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=42757 Tested-by: <emeric.maschino@gmail.com> Acked-by: Michel Lespinasse <walken@google.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-22ARM: 7384/1: ThumbEE: Disable userspace TEEHBR access for !CONFIG_ARM_THUMBEEJonathan Austin
commit 078c04545ba56da21567728a909a496df5ff730d upstream. Currently when ThumbEE is not enabled (!CONFIG_ARM_THUMBEE) the ThumbEE register states are not saved/restored at context switch. The default state of the ThumbEE Ctrl register (TEECR) allows userspace accesses to the ThumbEE Base Handler register (TEEHBR). This can cause unexpected behaviour when people use ThumbEE on !CONFIG_ARM_THUMBEE kernels, as well as allowing covert communication - eg between userspace tasks running inside chroot jails. This patch sets up TEECR in order to prevent user-space access to TEEHBR when !CONFIG_ARM_THUMBEE. In this case, tasks are sent SIGILL if they try to access TEEHBR. Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Jonathan Austin <jonathan.austin@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-22ARM: 7379/1: DT: fix atags_to_fdt() second call siteMarc Zyngier
commit 9c5fd9e85f574d9d0361b2b878f55732290afe5b upstream. atags_to_fdt() returns 1 when it fails to find a valid FDT signature. The CONFIG_ARM_ATAG_DTB_COMPAT code is supposed to retry with another location, but only does so when the initial call doesn't fail. Fix this by using the correct condition in the assembly code. Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-13sched/x86: Fix overflow in cyc2ns_offsetSalman Qazi
commit 9993bc635d01a6ee7f6b833b4ee65ce7c06350b1 upstream. When a machine boots up, the TSC generally gets reset. However, when kexec is used to boot into a kernel, the TSC value would be carried over from the previous kernel. The computation of cycns_offset in set_cyc2ns_scale is prone to an overflow, if the machine has been up more than 208 days prior to the kexec. The overflow happens when we multiply *scale, even though there is enough room to store the final answer. We fix this issue by decomposing tsc_now into the quotient and remainder of division by CYC2NS_SCALE_FACTOR and then performing the multiplication separately on the two components. Refactor code to share the calculation with the previous fix in __cycles_2_ns(). Signed-off-by: Salman Qazi <sqazi@google.com> Acked-by: John Stultz <john.stultz@linaro.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Turner <pjt@google.com> Cc: john stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/20120310004027.19291.88460.stgit@dungbeetle.mtv.corp.google.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-13Revert "x86/ioapic: Add register level checks to detect bogus io-apic entries"Greg Kroah-Hartman
This reverts commit 273fb194e86b795b08a724c7646d0f694949070b [73d63d038ee9f769f5e5b46792d227fe20e442c5 upstream] It causes problems, so needs to be reverted from 3.2-stable for now. Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jon Dufresne <jon@jondufresne.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: <yinghai@kernel.org> Cc: Josh Boyer <jwboyer@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Teck Choon Giam <giamteckchoon@gmail.com> Cc: Ben Guthro <ben@guthro.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-13x86/PCI: do not tie MSI MS-7253 use_crs quirk to BIOS versionJonathan Nieder
commit a97f4f5e524bcd09a85ef0b8821a14d35e69335f upstream. Carlos was getting WARNING: at drivers/pci/pci.c:118 pci_ioremap_bar+0x24/0x52() when probing his sound card, and sound did not work. After adding pci=use_crs to the kernel command line, no more trouble. Ok, we can add a quirk. dmidecode output reveals that this is an MSI MS-7253, for which we already have a quirk, but the short-sighted author tied the quirk to a single BIOS version, making it not kick in on Carlos's machine with BIOS V1.2. If a later BIOS update makes it no longer necessary to look at the _CRS info it will still be harmless, so let's stop trying to guess which versions have and don't have accurate _CRS tables. Addresses https://bugtrack.alsa-project.org/alsa-bug/view.php?id=5533 Also see <https://bugzilla.kernel.org/show_bug.cgi?id=42619>. Reported-by: Carlos Luna <caralu74@gmail.com> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-13x86/PCI: use host bridge _CRS info on MSI MS-7253Jonathan Nieder
commit 8411371709610c826bf65684f886bfdfb5780ca1 upstream. In the spirit of commit 29cf7a30f8a0 ("x86/PCI: use host bridge _CRS info on ASUS M2V-MX SE"), this DMI quirk turns on "pci_use_crs" by default on a board that needs it. This fixes boot failures and oopses introduced in 3e3da00c01d0 ("x86/pci: AMD one chain system to use pci read out res"). The quirk is quite targetted (to a specific board and BIOS version) for two reasons: (1) to emphasize that this method of tackling the problem one quirk at a time is a little insane (2) to give BIOS vendors an opportunity to use simpler tables and allow us to return to generic behavior (whatever that happens to be) with a later BIOS update In other words, I am not at all happy with having quirks like this. But it is even worse for the kernel not to work out of the box on these machines, so... Reference: https://bugzilla.kernel.org/show_bug.cgi?id=42619 Reported-by: Svante Signell <svante.signell@telia.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-13ARM: tegra: remove Tegra30 errata from MACH_TEGRA_DTStephen Warren
[no upstream commit match, as this is a fix for a mis-applied patch in the previous 3.2-stable release. - gregkh] Commit 83e4194 "ARM: tegra: select required CPU and L2 errata options" contained two chunks; one was errata for Tegra20 (correctly applied) and the second errata for Tegra30. The latter was accidentally applied to the wrong config option; Tegra30 support wasn't added until v3.3, and so the second chunk should have just been dropped. This patch does so. Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-13x86,kgdb: Fix DEBUG_RODATA limitation using text_poke()Jason Wessel
commit 3751d3e85cf693e10e2c47c03c8caa65e171099b upstream. There has long been a limitation using software breakpoints with a kernel compiled with CONFIG_DEBUG_RODATA going back to 2.6.26. For this particular patch, it will apply cleanly and has been tested all the way back to 2.6.36. The kprobes code uses the text_poke() function which accommodates writing a breakpoint into a read-only page. The x86 kgdb code can solve the problem similarly by overriding the default breakpoint set/remove routines and using text_poke() directly. The x86 kgdb code will first attempt to use the traditional probe_kernel_write(), and next try using a the text_poke() function. The break point install method is tracked such that the correct break point removal routine will get called later on. Cc: x86@kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Inspried-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-13m68k/mac: Add missing platform check before registering platform devicesGeert Uytterhoeven
commit 6cfeba53911d6d2f17ebbd1246893557d5ff5aeb upstream. On multi-platform kernels, the Mac platform devices should be registered when running on Mac only. Else it may crash later. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-13x86 bpf_jit: fix a bug in emitting the 16-bit immediate operand of ANDzhuangfeiran@ict.ac.cn
[ Upstream commit 1d24fb3684f347226747c6b11ea426b7b992694e ] When K >= 0xFFFF0000, AND needs the two least significant bytes of K as its operand, but EMIT2() gives it the least significant byte of K and 0x2. EMIT() should be used here to replace EMIT2(). Signed-off-by: Feiran Zhuang <zhuangfeiran@ict.ac.cn> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02x86, tls: Off by one limit checkDan Carpenter
commit 8f0750f19789cf352d7e24a6cc50f2ab1b4f1372 upstream. These are used as offsets into an array of GDT_ENTRY_TLS_ENTRIES members so GDT_ENTRY_TLS_ENTRIES is one past the end of the array. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: http://lkml.kernel.org/r/20120324075250.GA28258@elgon.mountain Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02x86, tsc: Skip refined tsc calibration on systems with reliable TSCAlok Kataria
commit 57779dc2b3b75bee05ef5d1ada47f615f7a13932 upstream. While running the latest Linux as guest under VMware in highly over-committed situations, we have seen cases when the refined TSC algorithm fails to get a valid tsc_start value in tsc_refine_calibration_work from multiple attempts. As a result the kernel keeps on scheduling the tsc_irqwork task for later. Subsequently after several attempts when it gets a valid start value it goes through the refined calibration and either bails out or uses the new results. Given that the kernel originally read the TSC frequency from the platform, which is the best it can get, I don't think there is much value in refining it. So for systems which get the TSC frequency from the platform we should skip the refined tsc algorithm. We can use the TSC_RELIABLE cpu cap flag to detect this, right now it is set only on VMware and for Moorestown Penwell both of which have there own TSC calibration methods. Signed-off-by: Alok N Kataria <akataria@vmware.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Dirk Brandewie <dirk.brandewie@gmail.com> Cc: Alan Cox <alan@linux.intel.com> [jstultz: Reworked to simply not schedule the refining work, rather then scheduling the work and bombing out later] Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02net: bpf_jit: fix BPF_S_LDX_B_MSH compilationEric Dumazet
[ Upstream commit dc72d99dabb870ca5bd6d9fff674be853bb4a88d ] Matt Evans spotted that x86 bpf_jit was incorrectly handling negative constant offsets in BPF_S_LDX_B_MSH instruction. We need to abort JIT compilation like we do in common_load so that filter uses the interpreter code and can call __load_pointer() Reference: http://lists.openwall.net/netdev/2011/07/19/11 Thanks to Indan Zupancic to bring back this issue. Reported-by: Matt Evans <matt@ozlabs.org> Reported-by: Indan Zupancic <indan@nul.nu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02ARM: tegra: select required CPU and L2 errata optionsStephen Warren
commit f35b431dde39fb40944d1024f08d88fbf04a3193 upstream. The ARM IP revisions in Tegra are: Tegra20: CPU r1p1, PL310 r2p0 Tegra30: CPU A01=r2p7/>=A02=r2p9, NEON r2p3-50, PL310 r3p1-50 Based on work by Olof Johansson, although the actual list of errata is somewhat different here, since I added a bunch more and removed one PL310 erratum that doesn't seem applicable. Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02x86-32: Fix endless loop when processing signals for kernel tasksDmitry Adamushko
commit 29a2e2836ff9ea65a603c89df217f4198973a74f upstream. The problem occurs on !CONFIG_VM86 kernels [1] when a kernel-mode task returns from a system call with a pending signal. A real-life scenario is a child of 'khelper' returning from a failed kernel_execve() in ____call_usermodehelper() [ kernel/kmod.c ]. kernel_execve() fails due to a pending SIGKILL, which is the result of "kill -9 -1" (at least, busybox's init does it upon reboot). The loop is as follows: * syscall_exit_work: - work_pending: // start_of_the_loop - work_notify_sig: - do_notify_resume() - do_signal() - if (!user_mode(regs)) return; - resume_userspace // TIF_SIGPENDING is still set - work_pending // so we call work_pending => goto // start_of_the_loop More information can be found in another LKML thread: http://www.serverphorums.com/read.php?12,457826 [1] the problem was also seen on MIPS. Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Link: http://lkml.kernel.org/r/1332448765.2299.68.camel@dimm Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02KVM: x86: fix missing checks in syscall emulationStephan Bärwolf
commit c2226fc9e87ba3da060e47333657cd6616652b84 upstream. On hosts without this patch, 32bit guests will crash (and 64bit guests may behave in a wrong way) for example by simply executing following nasm-demo-application: [bits 32] global _start SECTION .text _start: syscall (I tested it with winxp and linux - both always crashed) Disassembly of section .text: 00000000 <_start>: 0: 0f 05 syscall The reason seems a missing "invalid opcode"-trap (int6) for the syscall opcode "0f05", which is not available on Intel CPUs within non-longmodes, as also on some AMD CPUs within legacy-mode. (depending on CPU vendor, MSR_EFER and cpuid) Because previous mentioned OSs may not engage corresponding syscall target-registers (STAR, LSTAR, CSTAR), they remain NULL and (non trapping) syscalls are leading to multiple faults and finally crashs. Depending on the architecture (AMD or Intel) pretended by guests, various checks according to vendor's documentation are implemented to overcome the current issue and behave like the CPUs physical counterparts. [mtosatti: cleanup/beautify code] Signed-off-by: Stephan Baerwolf <stephan.baerwolf@tu-ilmenau.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02KVM: x86: extend "struct x86_emulate_ops" with "get_cpuid"Stephan Bärwolf
commit bdb42f5afebe208eae90406959383856ae2caf2b upstream. In order to be able to proceed checks on CPU-specific properties within the emulator, function "get_cpuid" is introduced. With "get_cpuid" it is possible to virtually call the guests "cpuid"-opcode without changing the VM's context. [mtosatti: cleanup/beautify code] Signed-off-by: Stephan Baerwolf <stephan.baerwolf@tu-ilmenau.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read modeAndrea Arcangeli
commit 1a5a9906d4e8d1976b701f889d8f35d54b928f25 upstream. In some cases it may happen that pmd_none_or_clear_bad() is called with the mmap_sem hold in read mode. In those cases the huge page faults can allocate hugepmds under pmd_none_or_clear_bad() and that can trigger a false positive from pmd_bad() that will not like to see a pmd materializing as trans huge. It's not khugepaged causing the problem, khugepaged holds the mmap_sem in write mode (and all those sites must hold the mmap_sem in read mode to prevent pagetables to go away from under them, during code review it seems vm86 mode on 32bit kernels requires that too unless it's restricted to 1 thread per process or UP builds). The race is only with the huge pagefaults that can convert a pmd_none() into a pmd_trans_huge(). Effectively all these pmd_none_or_clear_bad() sites running with mmap_sem in read mode are somewhat speculative with the page faults, and the result is always undefined when they run simultaneously. This is probably why it wasn't common to run into this. For example if the madvise(MADV_DONTNEED) runs zap_page_range() shortly before the page fault, the hugepage will not be zapped, if the page fault runs first it will be zapped. Altering pmd_bad() not to error out if it finds hugepmds won't be enough to fix this, because zap_pmd_range would then proceed to call zap_pte_range (which would be incorrect if the pmd become a pmd_trans_huge()). The simplest way to fix this is to read the pmd in the local stack (regardless of what we read, no need of actual CPU barriers, only compiler barrier needed), and be sure it is not changing under the code that computes its value. Even if the real pmd is changing under the value we hold on the stack, we don't care. If we actually end up in zap_pte_range it means the pmd was not none already and it was not huge, and it can't become huge from under us (khugepaged locking explained above). All we need is to enforce that there is no way anymore that in a code path like below, pmd_trans_huge can be false, but pmd_none_or_clear_bad can run into a hugepmd. The overhead of a barrier() is just a compiler tweak and should not be measurable (I only added it for THP builds). I don't exclude different compiler versions may have prevented the race too by caching the value of *pmd on the stack (that hasn't been verified, but it wouldn't be impossible considering pmd_none_or_clear_bad, pmd_bad, pmd_trans_huge, pmd_none are all inlines and there's no external function called in between pmd_trans_huge and pmd_none_or_clear_bad). if (pmd_trans_huge(*pmd)) { if (next-addr != HPAGE_PMD_SIZE) { VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem)); split_huge_page_pmd(vma->vm_mm, pmd); } else if (zap_huge_pmd(tlb, vma, pmd, addr)) continue; /* fall through */ } if (pmd_none_or_clear_bad(pmd)) Because this race condition could be exercised without special privileges this was reported in CVE-2012-1179. The race was identified and fully explained by Ulrich who debugged it. I'm quoting his accurate explanation below, for reference. ====== start quote ======= mapcount 0 page_mapcount 1 kernel BUG at mm/huge_memory.c:1384! At some point prior to the panic, a "bad pmd ..." message similar to the following is logged on the console: mm/memory.c:145: bad pmd ffff8800376e1f98(80000000314000e7). The "bad pmd ..." message is logged by pmd_clear_bad() before it clears the page's PMD table entry. 143 void pmd_clear_bad(pmd_t *pmd) 144 { -> 145 pmd_ERROR(*pmd); 146 pmd_clear(pmd); 147 } After the PMD table entry has been cleared, there is an inconsistency between the actual number of PMD table entries that are mapping the page and the page's map count (_mapcount field in struct page). When the page is subsequently reclaimed, __split_huge_page() detects this inconsistency. 1381 if (mapcount != page_mapcount(page)) 1382 printk(KERN_ERR "mapcount %d page_mapcount %d\n", 1383 mapcount, page_mapcount(page)); -> 1384 BUG_ON(mapcount != page_mapcount(page)); The root cause of the problem is a race of two threads in a multithreaded process. Thread B incurs a page fault on a virtual address that has never been accessed (PMD entry is zero) while Thread A is executing an madvise() system call on a virtual address within the same 2 MB (huge page) range. virtual address space .---------------------. | | | | .-|---------------------| | | | | | |<-- B(fault) | | | 2 MB | |/////////////////////|-. huge < |/////////////////////| > A(range) page | |/////////////////////|-' | | | | | | '-|---------------------| | | | | '---------------------' - Thread A is executing an madvise(..., MADV_DONTNEED) system call on the virtual address range "A(range)" shown in the picture. sys_madvise // Acquire the semaphore in shared mode. down_read(&current->mm->mmap_sem) ... madvise_vma switch (behavior) case MADV_DONTNEED: madvise_dontneed zap_page_range unmap_vmas unmap_page_range zap_pud_range zap_pmd_range // // Assume that this huge page has never been accessed. // I.e. content of the PMD entry is zero (not mapped). // if (pmd_trans_huge(*pmd)) { // We don't get here due to the above assumption. } // // Assume that Thread B incurred a page fault and .---------> // sneaks in here as shown below. | // | if (pmd_none_or_clear_bad(pmd)) | { | if (unlikely(pmd_bad(*pmd))) | pmd_clear_bad | { | pmd_ERROR | // Log "bad pmd ..." message here. | pmd_clear | // Clear the page's PMD entry. | // Thread B incremented the map count | // in page_add_new_anon_rmap(), but | // now the page is no longer mapped | // by a PMD entry (-> inconsistency). | } | } | v - Thread B is handling a page fault on virtual address "B(fault)" shown in the picture. ... do_page_fault __do_page_fault // Acquire the semaphore in shared mode. down_read_trylock(&mm->mmap_sem) ... handle_mm_fault if (pmd_none(*pmd) && transparent_hugepage_enabled(vma)) // We get here due to the above assumption (PMD entry is zero). do_huge_pmd_anonymous_page alloc_hugepage_vma // Allocate a new transparent huge page here. ... __do_huge_pmd_anonymous_page ... spin_lock(&mm->page_table_lock) ... page_add_new_anon_rmap // Here we increment the page's map count (starts at -1). atomic_set(&page->_mapcount, 0) set_pmd_at // Here we set the page's PMD entry which will be cleared // when Thread A calls pmd_clear_bad(). ... spin_unlock(&mm->page_table_lock) The mmap_sem does not prevent the race because both threads are acquiring it in shared mode (down_read). Thread B holds the page_table_lock while the page's map count and PMD table entry are updated. However, Thread A does not synchronize on that lock. ====== end quote ======= [akpm@linux-foundation.org: checkpatch fixes] Reported-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Jones <davej@redhat.com> Acked-by: Larry Woodman <lwoodman@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Mark Salter <msalter@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02x86/ioapic: Add register level checks to detect bogus io-apic entriesSuresh Siddha
commit 73d63d038ee9f769f5e5b46792d227fe20e442c5 upstream. With the recent changes to clear_IO_APIC_pin() which tries to clear remoteIRR bit explicitly, some of the users started to see "Unable to reset IRR for apic .." messages. Close look shows that these are related to bogus IO-APIC entries which return's all 1's for their io-apic registers. And the above mentioned error messages are benign. But kernel should have ignored such io-apic's in the first place. Check if register 0, 1, 2 of the listed io-apic are all 1's and ignore such io-apic. Reported-by: Álvaro Castillo <midgoon@gmail.com> Tested-by: Jon Dufresne <jon@jondufresne.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: yinghai@kernel.org Cc: kernel-team@fedoraproject.org Cc: Josh Boyer <jwboyer@redhat.com> Link: http://lkml.kernel.org/r/1331577393.31585.94.camel@sbsiddha-desk.sc.intel.com [ Performed minor cleanup of affected code. ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-23powerpc/pmac: Fix SMP kernels on pre-core99 UP machinesBenjamin Herrenschmidt
commit 78c5c68a4cf4329d17abfa469345ddf323d4fd62 upstream. The code for "powersurge" SMP would kick in and cause a crash at boot due to the lack of a NULL test. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jeremy Kerr <jeremy.kerr@canonical.com> Reported-by: Adam Conrad <adconrad@ubuntu.com> Tested-by: Adam Conrad <adconrad@ubuntu.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-19sparc32: Add -Av8 to assembler command line.David S. Miller
commit e0adb9902fb338a9fe634c3c2a3e474075c733ba upstream. Newer version of binutils are more strict about specifying the correct options to enable certain classes of instructions. The sparc32 build is done for v7 in order to support sun4c systems which lack hardware integer multiply and divide instructions. So we have to pass -Av8 when building the assembler routines that use these instructions and get patched into the kernel when we find out that we have a v8 capable cpu. Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-19perf/x86: Fix local vs remote memory events for NHM/WSMPeter Zijlstra
commit 87e24f4b67e68d9fd8df16e0bf9c66d1ad2a2533 upstream. Verified using the below proglet.. before: [root@westmere ~]# perf stat -e node-stores -e node-store-misses ./numa 0 remote write Performance counter stats for './numa 0': 2,101,554 node-stores 2,096,931 node-store-misses 5.021546079 seconds time elapsed [root@westmere ~]# perf stat -e node-stores -e node-store-misses ./numa 1 local write Performance counter stats for './numa 1': 501,137 node-stores 199 node-store-misses 5.124451068 seconds time elapsed After: [root@westmere ~]# perf stat -e node-stores -e node-store-misses ./numa 0 remote write Performance counter stats for './numa 0': 2,107,516 node-stores 2,097,187 node-store-misses 5.012755149 seconds time elapsed [root@westmere ~]# perf stat -e node-stores -e node-store-misses ./numa 1 local write Performance counter stats for './numa 1': 2,063,355 node-stores 165 node-store-misses 5.082091494 seconds time elapsed #define _GNU_SOURCE #include <sched.h> #include <stdio.h> #include <errno.h> #include <sys/mman.h> #include <sys/types.h> #include <dirent.h> #include <signal.h> #include <unistd.h> #include <numaif.h> #include <stdlib.h> #define SIZE (32*1024*1024) volatile int done; void sig_done(int sig) { done = 1; } int main(int argc, char **argv) { cpu_set_t *mask, *mask2; size_t size; int i, err, t; int nrcpus = 1024; char *mem; unsigned long nodemask = 0x01; /* node 0 */ DIR *node; struct dirent *de; int read = 0; int local = 0; if (argc < 2) { printf("usage: %s [0-3]\n", argv[0]); printf(" bit0 - local/remote\n"); printf(" bit1 - read/write\n"); exit(0); } switch (atoi(argv[1])) { case 0: printf("remote write\n"); break; case 1: printf("local write\n"); local = 1; break; case 2: printf("remote read\n"); read = 1; break; case 3: printf("local read\n"); local = 1; read = 1; break; } mask = CPU_ALLOC(nrcpus); size = CPU_ALLOC_SIZE(nrcpus); CPU_ZERO_S(size, mask); node = opendir("/sys/devices/system/node/node0/"); if (!node) perror("opendir"); while ((de = readdir(node))) { int cpu; if (sscanf(de->d_name, "cpu%d", &cpu) == 1) CPU_SET_S(cpu, size, mask); } closedir(node); mask2 = CPU_ALLOC(nrcpus); CPU_ZERO_S(size, mask2); for (i = 0; i < size; i++) CPU_SET_S(i, size, mask2); CPU_XOR_S(size, mask2, mask2, mask); // invert if (!local) mask = mask2; err = sched_setaffinity(0, size, mask); if (err) perror("sched_setaffinity"); mem = mmap(0, SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); err = mbind(mem, SIZE, MPOL_BIND, &nodemask, 8*sizeof(nodemask), MPOL_MF_MOVE); if (err) perror("mbind"); signal(SIGALRM, sig_done); alarm(5); if (!read) { while (!done) { for (i = 0; i < SIZE; i++) mem[i] = 0x01; } } else { while (!done) { for (i = 0; i < SIZE; i++) t += *(volatile char *)(mem + i); } } return 0; } Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-tq73sxus35xmqpojf7ootxgs@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-19x86: Derandom delay_tsc for 64 bitThomas Gleixner
commit a7f4255f906f60f72e00aad2fb000939449ff32e upstream. Commit f0fbf0abc093 ("x86: integrate delay functions") converted delay_tsc() into a random delay generator for 64 bit. The reason is that it merged the mostly identical versions of delay_32.c and delay_64.c. Though the subtle difference of the result was: static void delay_tsc(unsigned long loops) { - unsigned bclock, now; + unsigned long bclock, now; Now the function uses rdtscl() which returns the lower 32bit of the TSC. On 32bit that's not problematic as unsigned long is 32bit. On 64 bit this fails when the lower 32bit are close to wrap around when bclock is read, because the following check if ((now - bclock) >= loops) break; evaluated to true on 64bit for e.g. bclock = 0xffffffff and now = 0 because the unsigned long (now - bclock) of these values results in 0xffffffff00000001 which is definitely larger than the loops value. That explains Tvortkos observation: "Because I am seeing udelay(500) (_occasionally_) being short, and that by delaying for some duration between 0us (yep) and 491us." Make those variables explicitely u32 again, so this works for both 32 and 64 bit. Reported-by: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12ARM: OMAP: fix iommu, not mailboxOhad Ben-Cohen
commit 134d12fae0bb8f3d60dc7440a9e1950bb5427167 upstream. For some weird (freudian?) reason, commit 435792d "ARM: OMAP: make iommu subsys_initcall to fix builtin omap3isp" unintentionally changed the mailbox's initcall instead of the iommu's. Fix that. Reported-by: Fernando Guzman Lugo <fernando.lugo@ti.com> Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12ARM: 7357/1: perf: fix overflow handling for xscale2 PMUsWill Deacon
commit 3f31ae121348afd9ed39700ea2a63c17cd7eeed1 upstream. xscale2 PMUs indicate overflow not via the PMU control register, but by a separate overflow FLAG register instead. This patch fixes the xscale2 PMU code to use this register to detect to overflow and ensures that we clear any pending overflow when disabling a counter. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12ARM: 7356/1: perf: check that we have an event in the PMU IRQ handlersWill Deacon
commit f6f5a30c834135c9f2fa10400c59ebbdd9188567 upstream. The PMU IRQ handlers in perf assume that if a counter has overflowed then perf must be responsible. In the paranoid world of crazy hardware, this could be false, so check that we do have a valid event before attempting to dereference NULL in the interrupt path. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12ARM: 7355/1: perf: clear overflow flag when disabling counter on ARMv7 PMUWill Deacon
commit 99c1745b9c76910e195889044f914b4898b7c9a5 upstream. When disabling a counter on an ARMv7 PMU, we should also clear the overflow flag in case an overflow occurred whilst stopping the counter. This prevents a spurious overflow being picked up later and leading to either false accounting or a NULL dereference. Reported-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12ARM: 7354/1: perf: limit sample_period to half max_period in non-sampling modeWill Deacon
commit 5727347180ebc6b4a866fcbe00dcb39cc03acb37 upstream. On ARM, the PMU does not stop counting after an overflow and therefore IRQ latency affects the new counter value read by the kernel. This is significant for non-sampling runs where it is possible for the new value to overtake the previous one, causing the delta to be out by up to max_period events. Commit a737823d ("ARM: 6835/1: perf: ensure overflows aren't missed due to IRQ latency") attempted to fix this problem by allowing interrupt handlers to pass an overflow flag to the event update function, causing the overflow calculation to assume that the counter passed through zero when going from prev to new. Unfortunately, this doesn't work when overflow occurs on the perf_task_tick path because we have the flag cleared and end up computing a large negative delta. This patch removes the overflow flag from armpmu_event_update and instead limits the sample_period to half of the max_period for non-sampling profiling runs. Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12ARM: 7345/1: errata: update workaround for A9 erratum #743622Will Deacon
commit efbc74ace95338484f8d732037b99c7c77098fce upstream. Erratum #743622 affects all r2 variants of the Cortex-A9 processor, so ensure that the workaround is applied regardless of the revision. Reported-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12OMAPDSS: HDMI: PHY burnout fixTomi Valkeinen
commit c49d005b6cc8491fad5b24f82805be2d6bcbd3dd upstream. A hardware bug in the OMAP4 HDMI PHY causes physical damage to the board if the HDMI PHY is kept powered on when the cable is not connected. This patch solves the problem by adding hot-plug-detection into the HDMI IP driver. This is not a real HPD support in the sense that nobody else than the IP driver gets to know about the HPD events, but is only meant to fix the HW bug. The strategy is simple: If the display device is turned off by the user, the PHY power is set to OFF. When the display device is turned on by the user, the PHY power is set either to LDOON or TXON, depending on whether the HDMI cable is connected. The reason to avoid PHY OFF when the display device is on, but the cable is disconnected, is that when the PHY is turned OFF, the HDMI IP is not "ticking" and thus the DISPC does not receive pixel clock from the HDMI IP. This would, for example, prevent any VSYNCs from happening, and would thus affect the users of omapdss. By using LDOON when the cable is disconnected we'll avoid the HW bug, but keep the HDMI working as usual from the user's point of view. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>