summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2011-01-07x86/microcode: Fix double vfree() and remove redundant pointer checks before ↵Jesper Juhl
vfree() commit 5cdd2de0a76d0ac47f107c8a7b32d75d25768dc1 upstream. In arch/x86/kernel/microcode_intel.c::generic_load_microcode() we have this: while (leftover) { ... if (get_ucode_data(mc, ucode_ptr, mc_size) || microcode_sanity_check(mc) < 0) { vfree(mc); break; } ... } if (mc) vfree(mc); This will cause a double free of 'mc'. This patch fixes that by just removing the vfree() call in the loop since 'mc' will be freed nicely just after we break out of the loop. There's also a second change in the patch. I noticed a lot of checks for pointers being NULL before passing them to vfree(). That's completely redundant since vfree() deals gracefully with being passed a NULL pointer. Removing the redundant checks yields a nice size decrease for the object file. Size before the patch: text data bss dec hex filename 4578 240 1032 5850 16da arch/x86/kernel/microcode_intel.o Size after the patch: text data bss dec hex filename 4489 240 984 5713 1651 arch/x86/kernel/microcode_intel.o Signed-off-by: Jesper Juhl <jj@chaosbits.net> Acked-by: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Shaohua Li <shaohua.li@intel.com> LKML-Reference: <alpine.LNX.2.00.1012251946100.10759@swampdragon.chaosbits.net> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-01-07x86, vt-d: Fix the vt-d fault handling irq migration in the x2apic modeKenji Kaneshige
commit 086e8ced65d9bcc4a8e8f1cd39b09640f2883f90 upstream. In x2apic mode, we need to set the upper address register of the fault handling interrupt register of the vt-d hardware. Without this irq migration of the vt-d fault handling interrupt is broken. Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> LKML-Reference: <1291225233.2648.39.camel@sbsiddha-MOBL3> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Tested-by: Takao Indoh <indou.takao@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-01-07x86: Enable the intr-remap fault handling after local APIC setupKenji Kaneshige
commit 7f7fbf45c6b748074546f7f16b9488ca71de99c1 upstream. Interrupt-remapping gets enabled very early in the boot, as it determines the apic mode that the processor can use. And the current code enables the vt-d fault handling before the setup_local_APIC(). And hence the APIC LDR registers and data structure in the memory may not be initialized. So the vt-d fault handling in logical xapic/x2apic modes were broken. Fix this by enabling the vt-d fault handling in the end_local_APIC_setup() A cleaner fix of enabling fault handling while enabling intr-remapping will be addressed for v2.6.38. [ Enabling intr-remapping determines the usage of x2apic mode and the apic mode determines the fault-handling configuration. ] Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> LKML-Reference: <20101201062244.541996375@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-01-07x86, amd: Fix panic on AMD CPU family 0x15Andreas Herrmann
[The mainline kernel doesn't have this problem. Commit "(23588c3) x86, amd: Add support for CPUID topology extension of AMD CPUs" removed the family check. But 2.6.32.y needs to be fixed.] This CPU family check is not required -- existence of the NodeId MSR is indicated by a CPUID feature flag which is already checked in amd_fixup_dcm() -- and it needlessly prevents amd_fixup_dcm() to be called for newer AMD CPUs. In worst case this can lead to a panic in the scheduler code for AMD family 0x15 multi-node AMD CPUs. I just have a picture of VGA console output so I can't copy-and-paste it herein, but the call stack of such a panic looked like: do_divide_error ... find_busiest_group run_rebalance_domains ... apic_timer_interrupt ... cpu_idle The mainline kernel doesn't have this problem. Commit "(23588c3) x86, amd: Add support for CPUID topology extension of AMD CPUs" removed the family check. But 2.6.32.y needs to be fixed. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-01-07x86, xsave: Use alloc_bootmem_align() instead of alloc_bootmem()Suresh Siddha
commit 10340ae130fb70352eae1ae8a00b7906d91bf166 upstream. Alignment of alloc_bootmem() depends on the value of L1_CACHE_SHIFT. What we need here, however, is 64 byte alignment. Use alloc_bootmem_align() and explicitly specify the alignment instead. This fixes a kernel boot crash reported by Jody when the cpu in .config is set to MPENTIUMII but the kernel is booted on a xsave-capable CPU. Reported-by: Jody Bruchon <jody@nctritech.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20101116212442.059967454@sbsiddha-MOBL3.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-01-07x86, hotplug: Use mwait to offline a processor, fix the legacy caseH. Peter Anvin
upstream ea53069231f9317062910d6e772cca4ce93de8c8 x86, hotplug: Use mwait to offline a processor, fix the legacy case Here included also some small follow-on patches to the same code: upstream a68e5c94f7d3dd64fef34dd5d97e365cae4bb42a x86, hotplug: Move WBINVD back outside the play_dead loop upstream ce5f68246bf2385d6174856708d0b746dc378f20 x86, hotplug: In the MWAIT case of play_dead, CLFLUSH the cache line https://bugzilla.kernel.org/show_bug.cgi?id=5471 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09x86: Ignore trap bits on single step exceptionsFrederic Weisbecker
commit 6c0aca288e726405b01dacb12cac556454d34b2a upstream. When a single step exception fires, the trap bits, used to signal hardware breakpoints, are in a random state. These trap bits might be set if another exception will follow, like a breakpoint in the next instruction, or a watchpoint in the previous one. Or there can be any junk there. So if we handle these trap bits during the single step exception, we are going to handle an exception twice, or we are going to handle junk. Just ignore them in this case. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=21332 Reported-by: Michael Stefaniuc <mstefani@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Maciej Rutecki <maciej.rutecki@gmail.com> Cc: Alexandre Julliard <julliard@winehq.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09acpi-cpufreq: fix a memleak when unloading driverZhang Rui
commit dab5fff14df2cd16eb1ad4c02e83915e1063fece upstream. We didn't free per_cpu(acfreq_data, cpu)->freq_table when acpi_freq driver is unloaded. Resulting in the following messages in /sys/kernel/debug/kmemleak: unreferenced object 0xf6450e80 (size 64): comm "modprobe", pid 1066, jiffies 4294677317 (age 19290.453s) hex dump (first 32 bytes): 00 00 00 00 e8 a2 24 00 01 00 00 00 00 9f 24 00 ......$.......$. 02 00 00 00 00 6a 18 00 03 00 00 00 00 35 0c 00 .....j.......5.. backtrace: [<c123ba97>] kmemleak_alloc+0x27/0x50 [<c109f96f>] __kmalloc+0xcf/0x110 [<f9da97ee>] acpi_cpufreq_cpu_init+0x1ee/0x4e4 [acpi_cpufreq] [<c11cd8d2>] cpufreq_add_dev+0x142/0x3a0 [<c11920b7>] sysdev_driver_register+0x97/0x110 [<c11cce56>] cpufreq_register_driver+0x86/0x140 [<f9dad080>] 0xf9dad080 [<c1001130>] do_one_initcall+0x30/0x160 [<c10626e9>] sys_init_module+0x99/0x1e0 [<c1002d97>] sysenter_do_call+0x12/0x26 [<ffffffff>] 0xffffffff https://bugzilla.kernel.org/show_bug.cgi?id=15807#c21 Tested-by: Toralf Forster <toralf.foerster@gmx.de> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-22x86, vm86: Fix preemption bug for int1 debug and int3 breakpoint handlers.Bart Oldeman
commit 6554287b1de0448f1e02e200d02b43914e997d15 upstream. Impact: fix kernel bug such as: BUG: scheduling while atomic: dosemu.bin/19680/0x00000004 See also Ubuntu bug 455067 at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/455067 Commits 4915a35e35a037254550a2ba9f367a812bc37d40 ("Use preempt_conditional_sti/cli in do_int3, like on x86_64.") and 3d2a71a596bd9c761c8487a2178e95f8a61da083 ("x86, traps: converge do_debug handlers") started disabling preemption in int1 and int3 handlers on i386. The problem with vm86 is that the call to handle_vm86_trap() may jump straight to entry_32.S and never returns so preempt is never enabled again, and there is an imbalance in the preempt count. Commit be716615fe596ee117292dc615e95f707fb67fd1 ("x86, vm86: fix preemption bug"), which was later (accidentally?) reverted by commit 08d68323d1f0c34452e614263b212ca556dae47f ("hw-breakpoints: modifying generic debug exception to use thread-specific debug registers") fixed the problem for debug exceptions but not for breakpoints. There are three solutions to this problem. 1. Reenable preemption before calling handle_vm86_trap(). This was the approach that was later reverted. 2. Do not disable preemption for i386 in breakpoint and debug handlers. This was the situation before October 2008. As far as I understand preemption only needs to be disabled on x86_64 because a seperate stack is used, but it's nice to have things work the same way on i386 and x86_64. 3. Let handle_vm86_trap() return instead of jumping to assembly code. By setting a flag in _TIF_WORK_MASK, either TIF_IRET or TIF_NOTIFY_RESUME, the code in entry_32.S is instructed to return to 32 bit mode from V86 mode. The logic in entry_32.S was already present to handle signals. (I chose TIF_IRET because it's slightly more efficient in do_notify_resume() in signal.c, but in fact TIF_IRET can probably be replaced by TIF_NOTIFY_RESUME everywhere.) I'm submitting approach 3, because I believe it is the most elegant and prevents future confusion. Still, an obvious preempt_conditional_cli(regs); is necessary in traps.c to correct the bug. [ hpa: This is technically a regression, but because: 1. the regression is so old, 2. the patch seems relatively high risk, justifying more testing, and 3. we're late in the 2.6.36-rc cycle, I'm queuing it up for the 2.6.37 merge window. It might, however, justify as a -stable backport at a latter time, hence Cc: stable. ] Signed-off-by: Bart Oldeman <bartoldeman@users.sourceforge.net> LKML-Reference: <alpine.DEB.2.00.1009231312330.4732@localhost.localdomain> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: K.Prasad <prasad@linux.vnet.ibm.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-22x86, kdump: Change copy_oldmem_page() to use cached addressingCliff Wickman
commit 37a2f9f30a360fb03522d15c85c78265ccd80287 upstream. The copy of /proc/vmcore to a user buffer proceeds much faster if the kernel addresses memory as cached. With this patch we have seen an increase in transfer rate from less than 15MB/s to 80-460MB/s, depending on size of the transfer. This makes a big difference in time needed to save a system dump. Signed-off-by: Cliff Wickman <cpw@sgi.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: kexec@lists.infradead.org LKML-Reference: <E1OtMLz-0001yp-Ia@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-22x86, intr-remap: Set redirection hint in the IRTESuresh Siddha
commit 75e3cfbed6f71a8f151dc6e413b6ce3c390030cb upstream. Currently the redirection hint in the interrupt-remapping table entry is set to 0, which means the remapped interrupt is directed to the processors listed in the destination. So in logical flat mode in the presence of intr-remapping, this results in a single interrupt multi-casted to multiple cpu's as specified by the destination bit mask. But what we really want is to send that interrupt to one of the cpus based on the lowest priority delivery mode. Set the redirection hint in the IRTE to '1' to indicate that we want the remapped interrupt to be directed to only one of the processors listed in the destination. This fixes the issue of same interrupt getting delivered to multiple cpu's in the logical flat mode in the presence of interrupt-remapping. While there is no functional issue observed with this behavior, this will impact performance of such configurations (<=8 cpu's using logical flat mode in the presence of interrupt-remapping) Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100827181049.013051492@sbsiddha-MOBL3.sc.intel.com> Cc: Weidong Han <weidong.han@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-22x86, mtrr: Assume SYS_CFG[Tom2ForceMemTypeWB] exists on all future AMD CPUsAndreas Herrmann
commit 3fdbf004c1706480a7c7fac3c9d836fa6df20d7d upstream. Instead of adapting the CPU family check in amd_special_default_mtrr() for each new CPU family assume that all new AMD CPUs support the necessary bits in SYS_CFG MSR. Tom2Enabled is architectural (defined in APM Vol.2). Tom2ForceMemTypeWB is defined in all BKDGs starting with K8 NPT. In pre K8-NPT BKDG this bit is reserved (read as zero). W/o this adaption Linux would unnecessarily complain about bad MTRR settings on every new AMD CPU family, e.g. [ 0.000000] WARNING: BIOS bug: CPU MTRRs don't cover all of memory, losing 4863MB of RAM. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100930123235.GB20545@loge.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-22x86, olpc: Don't retry EC commands foreverPaul Fox
commit 286e5b97eb22baab9d9a41ca76c6b933a484252c upstream. Avoids a potential infinite loop. It was observed once, during an EC hacking/debugging session - not in regular operation. Signed-off-by: Daniel Drake <dsd@laptop.org> Cc: dilinger@queued.net Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-22x86, kexec: Make sure to stop all CPUs before exiting the kernelAlok Kataria
commit 76fac077db6b34e2c6383a7b4f3f4f7b7d06d8ce upstream. x86 smp_ops now has a new op, stop_other_cpus which takes a parameter "wait" this allows the caller to specify if it wants to stop until all the cpus have processed the stop IPI. This is required specifically for the kexec case where we should wait for all the cpus to be stopped before starting the new kernel. We now wait for the cpus to stop in all cases except for panic/kdump where we expect things to be broken and we are doing our best to make things work anyway. This patch fixes a legitimate regression, which was introduced during 2.6.30, by commit id 4ef702c10b5df18ab04921fc252c26421d4d6c75. Signed-off-by: Alok N Kataria <akataria@vmware.com> LKML-Reference: <1286833028.1372.20.camel@ank32.eng.vmware.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-22mm, x86: Saving vmcore with non-lazy freeing of vmasCliff Wickman
commit 3ee48b6af49cf534ca2f481ecc484b156a41451d upstream. During the reading of /proc/vmcore the kernel is doing ioremap()/iounmap() repeatedly. And the buildup of un-flushed vm_area_struct's is causing a great deal of overhead. (rb_next() is chewing up most of that time). This solution is to provide function set_iounmap_nonlazy(). It causes a subsequent call to iounmap() to immediately purge the vma area (with try_purge_vmap_area_lazy()). With this patch we have seen the time for writing a 250MB compressed dump drop from 71 seconds to 44 seconds. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: kexec@lists.infradead.org LKML-Reference: <E1OwHZ4-0005WK-Tw@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-22perf_events: Fix bogus AMD64 generic TLB eventsStephane Eranian
commit ba0cef3d149ce4db293c572bf36ed352b11ce7b9 upstream. PERF_COUNT_HW_CACHE_DTLB:READ:MISS had a bogus umask value of 0 which counts nothing. Needed to be 0x7 (to count all possibilities). PERF_COUNT_HW_CACHE_ITLB:READ:MISS had a bogus umask value of 0 which counts nothing. Needed to be 0x3 (to count all possibilities). Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cb85478.41e9d80a.44e2.3f00@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-11x86, AMD, MCE thresholding: Fix the MCi_MISCj iteration orderBorislav Petkov
This fixes possible cases of not collecting valid error info in the MCE error thresholding groups on F10h hardware. The current code contains a subtle problem of checking only the Valid bit of MSR0000_0413 (which is MC4_MISC0 - DRAM thresholding group) in its first iteration and breaking out if the bit is cleared. But (!), this MSR contains an offset value, BlkPtr[31:24], which points to the remaining MSRs in this thresholding group which might contain valid information too. But if we bail out only after we checked the valid bit in the first MSR and not the block pointer too, we miss that other information. The thing is, MC4_MISC0[BlkPtr] is not predicated on MCi_STATUS[MiscV] or MC4_MISC0[Valid] and should be checked prior to iterating over the MCI_MISCj thresholding group, irrespective of the MC4_MISC0[Valid] setting. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-08x86, mce, therm_throt.c: Fix missing curly braces in error handling logicJin Dongming
When the feature PTS is not supported by CPU, the sysfile package_power_limit_count for package should not be generated. This patch is used for fixing missing { and }. The patch is not complete as there are other error handling problems in this function - but that can wait until the merge window. Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com> Reviewed-by: Fenghua Yu <fenghua.yu@initel.com> Acked-by: Jean Delvare <khali@linux-fr.org> Cc: Brown Len <len.brown@intel.com> Cc: Guenter Roeck <guenter.roeck@ericsson.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: lm-sensors@lm-sensors.org <lm-sensors@lm-sensors.org> LKML-Reference: <4C7625D1.4060201@np.css.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-05Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf trace scripting: Fix extern struct definitions perf ui hist browser: Fix segfault on 'a' for annotate perf tools: Fix build breakage perf, x86: Handle in flight NMIs on P4 platform oprofile, ARM: Release resources on failure oprofile: Add Support for Intel CPU Family 6 / Model 29
2010-10-05modules: Fix module_bug_list list corruption raceLinus Torvalds
With all the recent module loading cleanups, we've minimized the code that sits under module_mutex, fixing various deadlocks and making it possible to do most of the module loading in parallel. However, that whole conversion totally missed the rather obscure code that adds a new module to the list for BUG() handling. That code was doubly obscure because (a) the code itself lives in lib/bugs.c (for dubious reasons) and (b) it gets called from the architecture-specific "module_finalize()" rather than from generic code. Calling it from arch-specific code makes no sense what-so-ever to begin with, and is now actively wrong since that code isn't protected by the module loading lock any more. So this commit moves the "module_bug_{finalize,cleanup}()" calls away from the arch-specific code, and into the generic code - and in the process protects it with the module_mutex so that the list operations are now safe. Future fixups: - move the module list handling code into kernel/module.c where it belongs. - get rid of 'module_bug_list' and just use the regular list of modules (called 'modules' - imagine that) that we already create and maintain for other reasons. Reported-and-tested-by: Thomas Gleixner <tglx@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Adrian Bunk <bunk@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-04Merge branch 'fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] Fix memory leaks in pcc_cpufreq_do_osc [CPUFREQ] acpi-cpufreq: add missing __percpu markup
2010-10-01Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, hpet: Fix bogus error check in hpet_assign_irq() x86, irq: Plug memory leak in sparse irq x86, cpu: After uncapping CPUID, re-run CPU feature detection
2010-09-30x86, hpet: Fix bogus error check in hpet_assign_irq()Thomas Gleixner
create_irq() returns -1 if the interrupt allocation failed, but the code checks for irq == 0. Use create_irq_nr() instead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Venkatesh Pallipadi <venki@google.com> LKML-Reference: <alpine.LFD.2.00.1009282310360.2416@localhost6.localdomain6> Cc: stable@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-30x86, irq: Plug memory leak in sparse irqThomas Gleixner
free_irq_cfg() is not freeing the cpumask_vars in irq_cfg. Fixing this triggers a use after free caused by the fact that copying struct irq_cfg is done with memcpy, which copies the pointer not the cpumask. Fix both places. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yhlu.kernel@gmail.com> LKML-Reference: <alpine.LFD.2.00.1009282052570.2416@localhost6.localdomain6> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-30[CPUFREQ] Fix memory leaks in pcc_cpufreq_do_oscPekka Enberg
If acpi_evaluate_object() function call doesn't fail, we must kfree() output.buffer before returning from pcc_cpufreq_do_osc(). Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Dave Jones <davej@redhat.com>
2010-09-30[CPUFREQ] acpi-cpufreq: add missing __percpu markupNamhyung Kim
acpi_perf_data is a percpu pointer but was missing __percpu markup. Add it. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Dave Jones <davej@redhat.com>
2010-09-30perf, x86: Handle in flight NMIs on P4 platformCyrill Gorcunov
Stephane reported we've forgot to guard the P4 platform against spurious in-flight performance IRQs. Fix it. This fixes potential spurious 'dazed and confused' NMI messages. Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: fweisbec@gmail.com Cc: peterz@infradead.org Cc: Robert Richter <robert.richter@amd.com> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <1285815698-4298-1-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-28ACPI: add missing __percpu markup in arch/x86/kernel/acpi/cstate.cNamhyung Kim
cpu_cstate_entry is a percpu pointer but was missing __percpu markup. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Len Brown <len.brown@intel.com>
2010-09-28x86, cpu: After uncapping CPUID, re-run CPU feature detectionH. Peter Anvin
After uncapping the CPUID level, we need to also re-run the CPU feature detection code. This resolves kernel bugzilla 16322. Reported-by: boris64 <bugzilla.kernel.org@boris64.net> Cc: <stable@kernel.org> v2.6.29..2.6.35 LKML-Reference: <tip-@git.kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-27Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/amd-iommu: Fix rounding-bug in __unmap_single x86/amd-iommu: Work around S3 BIOS bug x86/amd-iommu: Set iommu configuration flags in enable-loop x86, setup: Fix earlyprintk=serial,0x3f8,115200 x86, setup: Fix earlyprintk=serial,ttyS0,115200
2010-09-27Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86: Catch spurious interrupts after disabling counters tracing/x86: Don't use mcount in kvmclock.c tracing/x86: Don't use mcount in pvclock.c
2010-09-24x86/hwmon: fix initialization of coretempJan Beulich
Using cpuid_eax() to determine feature availability on other than the current CPU is invalid. And feature availability should also be checked in the hotplug code path. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Rudolf Marek <r.marek@assembler.cz> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
2010-09-24perf, x86: Catch spurious interrupts after disabling countersRobert Richter
Some cpus still deliver spurious interrupts after disabling a counter. This caused 'undelivered NMI' messages. This patch fixes this. Introduced by: 4177c42: perf, x86: Try to handle unknown nmis with an enabled PMU Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Robert Richter <robert.richter@amd.com> Cc: Don Zickus <dzickus@redhat.com> Cc: gorcunov@gmail.com <gorcunov@gmail.com> Cc: fweisbec@gmail.com <fweisbec@gmail.com> Cc: ying.huang@intel.com <ying.huang@intel.com> Cc: ming.m.lin@intel.com <ming.m.lin@intel.com> Cc: yinghai@kernel.org <yinghai@kernel.org> Cc: andi@firstfloor.org <andi@firstfloor.org> Cc: eranian@google.com <eranian@google.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20100915162034.GO13563@erda.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-24Merge branch 'amd-iommu/2.6.36' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent
2010-09-23x86/amd-iommu: Fix rounding-bug in __unmap_singleJoerg Roedel
In the __unmap_single function the dma_addr is rounded down to a page boundary before the dma pages are unmapped. The address is later also used to flush the TLB entries for that mapping. But without the offset into the dma page the amount of pages to flush might be miscalculated in the TLB flushing path. This patch fixes this bug by using the original address to flush the TLB. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-09-23x86/amd-iommu: Work around S3 BIOS bugJoerg Roedel
This patch adds a workaround for an IOMMU BIOS problem to the AMD IOMMU driver. The result of the bug is that the IOMMU does not execute commands anymore when the system comes out of the S3 state resulting in system failure. The bug in the BIOS is that is does not restore certain hardware specific registers correctly. This workaround reads out the contents of these registers at boot time and restores them on resume from S3. The workaround is limited to the specific IOMMU chipset where this problem occurs. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-09-23x86/amd-iommu: Set iommu configuration flags in enable-loopJoerg Roedel
This patch moves the setting of the configuration and feature flags out out the acpi table parsing path and moves it into the iommu-enable path. This is needed to reliably fix resume-from-s3. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-09-22tracing/x86: Don't use mcount in kvmclock.cSteven Rostedt
The guest can use the paravirt clock in kvmclock.c which is used by sched_clock(), which in turn is used by the tracing mechanism for timestamps, which leads to infinite recursion. Disable mcount/tracing for kvmclock.o. Cc: stable@kernel.org Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-22tracing/x86: Don't use mcount in pvclock.cJeremy Fitzhardinge
When using a paravirt clock, pvclock.c can be used by sched_clock(), which in turn is used by the tracing mechanism for timestamps, which leads to infinite recursion. Disable mcount/tracing for pvclock.o. Cc: stable@kernel.org Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> LKML-Reference: <4C9A9A3F.4040201@goop.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-21Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hw breakpoints: Fix pid namespace bug x86: Fix instruction breakpoint encoding oprofile: Add Support for Intel CPU Family 6 / Model 22 (Intel Celeron 540) kprobes: Fix Kconfig dependency
2010-09-16Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: hpet: Work around hardware stupidity x86, build: Disable -fPIE when compiling with CONFIG_CC_STACKPROTECTOR=y x86, cpufeature: Suppress compiler warning with gcc 3.x x86, UV: Fix initialization of max_pnode
2010-09-17x86: Fix instruction breakpoint encodingFrederic Weisbecker
Lengths and types of breakpoints are encoded in a half byte into CPU registers. However when we extract these values and store them, we add a high half byte part to them: 0x40 to the length and 0x80 to the type. When that gets reloaded to the CPU registers, the high part is masked. While making the instruction breakpoints available for perf, I zapped that high part on instruction breakpoint encoding and that broke the arch -> generic translation used by ptrace instruction breakpoints. Writing dr7 to set an inst breakpoint was then failing. There is no apparent reason for these high parts so we could get rid of them altogether. That's an invasive change though so let's do that later and for now fix the problem by restoring that inst breakpoint high part encoding in this sole patch. Reported-by: Kelvie Wong <kelvie@ieee.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Will Deacon <will.deacon@arm.com>
2010-09-15x86: hpet: Work around hardware stupidityThomas Gleixner
This more or less reverts commits 08be979 (x86: Force HPET readback_cmp for all ATI chipsets) and 30a564be (x86, hpet: Restrict read back to affected ATI chipsets) to the status of commit 8da854c (x86, hpet: Erratum workaround for read after write of HPET comparator). The delta to commit 8da854c is mostly comments and the change from WARN_ONCE to printk_once as we know the call path of this function already. This needs really in depth explanation: First of all the HPET design is a complete failure. Having a counter compare register which generates an interrupt on matching values forces the software to do at least one superfluous readback of the counter register. While it is nice in theory to program "absolute" time events it is practically useless because the timer runs at some absurd frequency which can never be matched to real world units. So we are forced to calculate a relative delta and this forces a readout of the actual counter value, adding the delta and programming the compare register. When the delta is small enough we run into the danger that we program a compare value which is already in the past. Due to the compare for equal nature of HPET we need to read back the counter value after writing the compare rehgister (btw. this is necessary for absolute timeouts as well) to make sure that we did not miss the timer event. We try to work around that by setting the minimum delta to a value which is larger than the theoretical time which elapses between the counter readout and the compare register write, but that's only true in theory. A NMI or SMI which hits between the readout and the write can easily push us beyond that limit. This would result in waiting for the next HPET timer interrupt until the 32bit wraparound of the counter happens which takes about 306 seconds. So we designed the next event function to look like: match = read_cnt() + delta; write_compare_ref(match); return read_cnt() < match ? 0 : -ETIME; At some point we got into trouble with certain ATI chipsets. Even the above "safe" procedure failed. The reason was that the write to the compare register was delayed probably for performance reasons. The theory was that they wanted to avoid the synchronization of the write with the HPET clock, which is understandable. So the write does not hit the compare register directly instead it goes to some intermediate register which is copied to the real compare register in sync with the HPET clock. That opens another window for hitting the dreaded "wait for a wraparound" problem. To work around that "optimization" we added a read back of the compare register which either enforced the update of the just written value or just delayed the readout of the counter enough to avoid the issue. We unfortunately never got any affirmative info from ATI/AMD about this. One thing is sure, that we nuked the performance "optimization" that way completely and I'm pretty sure that the result is worse than before some HW folks came up with those. Just for paranoia reasons I added a check whether the read back compare register value was the same as the value we wrote right before. That paranoia check triggered a couple of years after it was added on an Intel ICH9 chipset. Venki added a workaround (commit 8da854c) which was reading the compare register twice when the first check failed. We considered this to be a penalty in general and restricted the readback (thus the wasted CPU cycles) to the known to be affected ATI chipsets. This turned out to be a utterly wrong decision. 2.6.35 testers experienced massive problems and finally one of them bisected it down to commit 30a564be which spured some further investigation. Finally we got confirmation that the write to the compare register can be delayed by up to two HPET clock cycles which explains the problems nicely. All we can do about this is to go back to Venki's initial workaround in a slightly modified version. Just for the record I need to say, that all of this could have been avoided if hardware designers and of course the HPET committee would have thought about the consequences for a split second. It's out of my comprehension why designing a working timer is so hard. There are two ways to achieve it: 1) Use a counter wrap around aware compare_reg <= counter_reg implementation instead of the easy compare_reg == counter_reg Downsides: - It needs more silicon. - It needs a readout of the counter to apply a relative timeout. This is necessary as the counter does not run in any useful (and adjustable) frequency and there is no guarantee that the counter which is used for timer events is the same which is used for reading the actual time (and therefor for calculating the delta) Upsides: - None 2) Use a simple down counter for relative timer events Downsides: - Absolute timeouts are not possible, which is not a problem at all in the context of an OS and the expected max. latencies/jitter (also see Downsides of #1) Upsides: - It needs less or equal silicon. - It works ALWAYS - It is way faster than a compare register based solution (One write versus one write plus at least one and up to four reads) I would not be so grumpy about all of this, if I would not have been ignored for many years when pointing out these flaws to various hardware folks. I really hate timers (at least those which seem to be designed by janitors). Though finally we got a reasonable explanation plus a solution and I want to thank all the folks involved in chasing it down and providing valuable input to this. Bisected-by: Nix <nix@esperi.org.uk> Reported-by: Artur Skawina <art.08.09@gmail.com> Reported-by: Damien Wyart <damien.wyart@free.fr> Reported-by: John Drescher <drescherjm@gmail.com> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: stable@kernel.org Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-09-10x86, tsc: Fix a preemption leak in restore_sched_clock_state()Peter Zijlstra
A real life genuine preemption leak.. Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-10x86, UV: Fix initialization of max_pnodeJack Steiner
Fix calculation of "max_pnode" for systems where the the highest blade has neither cpus or memory. (And, yes, although rare this does occur). Signed-off-by: Jack Steiner <steiner@sgi.com> LKML-Reference: <20100910150808.GA19802@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-08Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, mcheck: Avoid duplicate sysfs links/files for thresholding banks io-mapping: Fix the address space annotations x86: Fix the address space annotations of iomap_atomic_prot_pfn() x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline x86, hwmon: Fix unsafe smp_processor_id() in thermal_throttle_add_dev
2010-09-08Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86: Try to handle unknown nmis with an enabled PMU perf, x86: Fix handle_irq return values perf, x86: Fix accidentally ack'ing a second event on intel perf counter oprofile, x86: fix init_sysfs() function stub lockup_detector: Sync touch_*_watchdog back to old semantics tracing: Fix a race in function profile oprofile, x86: fix init_sysfs error handling perf_events: Fix time tracking for events with pid != -1 and cpu != -1 perf: Initialize callchains roots's childen hits oprofile: fix crash when accessing freed task structs
2010-09-05x86, mcheck: Avoid duplicate sysfs links/files for thresholding banksAndreas Herrmann
kobject_add_internal failed for threshold_bank2 with -EEXIST, don't try to register things with the same name in the same directory: Pid: 1, comm: swapper Tainted: G W 2.6.31 #1 Call Trace: [<ffffffff81161b07>] ? kobject_add_internal+0x156/0x180 [<ffffffff81161cc0>] ? kobject_add+0x66/0x6b [<ffffffff81161793>] ? kobject_init+0x42/0x82 [<ffffffff81161cf9>] ? kobject_create_and_add+0x34/0x63 [<ffffffff81393963>] ? threshold_create_bank+0x14f/0x259 [<ffffffff8139310a>] ? mce_create_device+0x8d/0x1b8 [<ffffffff81646497>] ? threshold_init_device+0x3f/0x80 [<ffffffff81646458>] ? threshold_init_device+0x0/0x80 [<ffffffff81009050>] ? do_one_initcall+0x4f/0x143 [<ffffffff816413a0>] ? kernel_init+0x14c/0x1a2 [<ffffffff8100c8da>] ? child_rip+0xa/0x20 [<ffffffff81641254>] ? kernel_init+0x0/0x1a2 [<ffffffff8100c8d0>] ? child_rip+0x0/0x20 kobject_create_and_add: kobject_add error: -17 (Probably the for_each_cpu loop should be entirely removed.) Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100827092006.GB5348@loge.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03perf, x86: Try to handle unknown nmis with an enabled PMURobert Richter
When the PMU is enabled it is valid to have unhandled nmis, two events could trigger 'simultaneously' raising two back-to-back NMIs. If the first NMI handles both, the latter will be empty and daze the CPU. The solution to avoid an 'unknown nmi' massage in this case was simply to stop the nmi handler chain when the PMU is enabled by stating the nmi was handled. This has the drawback that a) we can not detect unknown nmis anymore, and b) subsequent nmi handlers are not called. This patch addresses this. Now, we check this unknown NMI if it could be a PMU back-to-back NMI. Otherwise we pass it and let the kernel handle the unknown nmi. This is a debug log: cpu #6, nmi #32333, skip_nmi #32330, handled = 1, time = 1934364430 cpu #6, nmi #32334, skip_nmi #32330, handled = 1, time = 1934704616 cpu #6, nmi #32335, skip_nmi #32336, handled = 2, time = 1936032320 cpu #6, nmi #32336, skip_nmi #32336, handled = 0, time = 1936034139 cpu #6, nmi #32337, skip_nmi #32336, handled = 1, time = 1936120100 cpu #6, nmi #32338, skip_nmi #32336, handled = 1, time = 1936404607 cpu #6, nmi #32339, skip_nmi #32336, handled = 1, time = 1937983416 cpu #6, nmi #32340, skip_nmi #32341, handled = 2, time = 1938201032 cpu #6, nmi #32341, skip_nmi #32341, handled = 0, time = 1938202830 cpu #6, nmi #32342, skip_nmi #32341, handled = 1, time = 1938443743 cpu #6, nmi #32343, skip_nmi #32341, handled = 1, time = 1939956552 cpu #6, nmi #32344, skip_nmi #32341, handled = 1, time = 1940073224 cpu #6, nmi #32345, skip_nmi #32341, handled = 1, time = 1940485677 cpu #6, nmi #32346, skip_nmi #32347, handled = 2, time = 1941947772 cpu #6, nmi #32347, skip_nmi #32347, handled = 1, time = 1941949818 cpu #6, nmi #32348, skip_nmi #32347, handled = 0, time = 1941951591 Uhhuh. NMI received for unknown reason 00 on CPU 6. Do you have a strange power saving mode enabled? Dazed and confused, but trying to continue Deltas: nmi #32334 340186 nmi #32335 1327704 nmi #32336 1819 <<<< back-to-back nmi [1] nmi #32337 85961 nmi #32338 284507 nmi #32339 1578809 nmi #32340 217616 nmi #32341 1798 <<<< back-to-back nmi [2] nmi #32342 240913 nmi #32343 1512809 nmi #32344 116672 nmi #32345 412453 nmi #32346 1462095 <<<< 1st nmi (standard) handling 2 counters nmi #32347 2046 <<<< 2nd nmi (back-to-back) handling one counter nmi #32348 1773 <<<< 3rd nmi (back-to-back) handling no counter! [3] For back-to-back nmi detection there are the following rules: The PMU nmi handler was handling more than one counter and no counter was handled in the subsequent nmi (see [1] and [2] above). There is another case if there are two subsequent back-to-back nmis [3]. The 2nd is detected as back-to-back because the first handled more than one counter. If the second handles one counter and the 3rd handles nothing, we drop the 3rd nmi because it could be a back-to-back nmi. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> [ renamed nmi variable to pmu_nmi to avoid clash with .nmi in entry.S ] Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: peterz@infradead.org Cc: gorcunov@gmail.com Cc: fweisbec@gmail.com Cc: ying.huang@intel.com Cc: ming.m.lin@intel.com Cc: eranian@google.com LKML-Reference: <1283454469-1909-3-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03perf, x86: Fix handle_irq return valuesPeter Zijlstra
Now that we rely on the number of handled overflows, ensure all handle_irq implementations actually return the right number. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: peterz@infradead.org Cc: robert.richter@amd.com Cc: gorcunov@gmail.com Cc: fweisbec@gmail.com Cc: ying.huang@intel.com Cc: ming.m.lin@intel.com Cc: eranian@google.com LKML-Reference: <1283454469-1909-4-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>