summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2009-10-05Fix NULL ptr regression in powernow-k8Kurt Roeckx
commit f0adb134d8dc9993a9998dc50845ec4f6ff4fadc upstream. Fixes bugzilla #13780 From: Kurt Roeckx <kurt@roeckx.be> Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05Revert "KVM: x86: check for cr3 validity in ioctl_set_sregs"Marcelo Tosatti
(cherry picked from commit dc7e795e3dd2a763e5ceaa1615f307e808cf3932) This reverts commit 6c20e1442bb1c62914bb85b7f4a38973d2a423ba. To my understanding, it became obsolete with the advent of the more robust check in mmu_alloc_roots (89da4ff17f). Moreover, it prevents the conceptually safe pattern 1. set sregs 2. register mem-slots 3. run vcpu by setting a sticky triple fault during step 1. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05KVM: Protect update_cr8_intercept() when running without an apicAvi Kivity
(cherry picked from commit 88c808fd42b53a7e01a2ac3253ef31fef74cb5af) update_cr8_intercept() can be triggered from userspace while there is no apic present. Signed-off-by: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05KVM: MMU: fix bogus alloc_mmu_pages assignmentMarcelo Tosatti
(cherry picked from commit b90c062c65cc8839edfac39778a37a55ca9bda36) Remove the bogus n_free_mmu_pages assignment from alloc_mmu_pages. It breaks accounting of mmu pages, since n_free_mmu_pages is modified but the real number of pages remains the same. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05KVM: MMU: fix missing locking in alloc_mmu_pagesMarcelo Tosatti
(cherry picked from commit 6a1ac77110ee3e8d8dfdef8442f3b30b3d83e6a2) n_requested_mmu_pages/n_free_mmu_pages are used by kvm_mmu_change_mmu_pages to calculate the number of pages to zap. alloc_mmu_pages, called from the vcpu initialization path, modifies this variables without proper locking, which can result in a negative value in kvm_mmu_change_mmu_pages (say, with cpu hotplug). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05KVM: fix cpuid E2BIG handling for extended request typesMark McLoughlin
(cherry picked from commit cb007648de83cf226d69ec76e1c01848b4e8e49f) If we run out of cpuid entries for extended request types we should return -E2BIG, just like we do for the standard request types. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05agp/intel: Fix the pre-9xx chipset flush.Eric Anholt
commit e517a5e97080bbe52857bd0d7df9b66602d53c4d upstream. Ever since we enabled GEM, the pre-9xx chipsets (particularly 865) have had serious stability issues. Back in May a wbinvd was added to the DRM to work around much of the problem. Some failure remained -- easily visible by dragging a window around on an X -retro desktop, or by looking at bugzilla. The chipset flush was on the right track -- hitting the right amount of memory, and it appears to be the only way to flush on these chipsets, but the flush page was mapped uncached. As a result, the writes trying to clear the writeback cache ended up bypassing the cache, and not flushing anything! The wbinvd would flush out other writeback data and often cause the data we wanted to get flushed, but not always. By removing the setting of the page to UC and instead just clflushing the data we write to try to flush it, we get the desired behavior with no wbinvd. This exports clflush_cache_range(), which was laying around and happened to basically match the code I was otherwise going to copy from the DRM. Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05x86: SGI UV: Fix IPI macrosJack Steiner
commit d2374aecda3f6c9b0d13287027132a37311da300 upstream. The UV BIOS has changed the way interrupt remapping is being done. This affects the id used for sending IPIs. The upper id bits no longer need to be masked off. Signed-off-by: Jack Steiner <steiner@sgi.com> LKML-Reference: <20090909154104.GA25083@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05xen: check EFER for NX before setting up GDT mappingJeremy Fitzhardinge
commit b75fe4e5b869f8dbebd36df64a7fcda0c5b318ed upstream. x86-64 assumes NX is available by default, so we need to explicitly check for it before using NX. Some first-generation Intel x86-64 processors didn't support NX, and even recent systems allow it to be disabled in BIOS. [ Impact: prevent Xen crash on NX-less 64-bit machines ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05xen: use stronger barrier after unlocking lockYang Xiaowei
commit 2496afbf1e50c70f80992656bcb730c8583ddac3 upstream. We need to have a stronger barrier between releasing the lock and checking for any waiting spinners. A compiler barrier is not sufficient because the CPU's ordering rules do not prevent the read xl->spinners from happening before the unlock assignment, as they are different memory locations. We need to have an explicit barrier to enforce the write-read ordering to different memory locations. Because of it, I can't bring up > 4 HVM guests on one SMP machine. [ Code and commit comments expanded -J ] [ Impact: avoid deadlock when using Xen PV spinlocks ] Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05xen: only enable interrupts while actually blocking for spinlockJeremy Fitzhardinge
commit 4d576b57b50a92801e6493e76e5243d6cff193d2 upstream. Where possible we enable interrupts while waiting for a spinlock to become free, in order to reduce big latency spikes in interrupt handling. However, at present if we manage to pick up the spinlock just before blocking, we'll end up holding the lock with interrupts enabled for a while. This will cause a deadlock if we recieve an interrupt in that window, and the interrupt handler tries to take the lock too. Solve this by shrinking the interrupt-enabled region to just around the blocking call. [ Impact: avoid race/deadlock when using Xen PV spinlocks ] Reported-by: "Yang, Xiaowei" <xiaowei.yang@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05xen: make -fstack-protector work under XenJeremy Fitzhardinge
commit 577eebeae34d340685d8985dfdb7dfe337c511e8 upstream. -fstack-protector uses a special per-cpu "stack canary" value. gcc generates special code in each function to test the canary to make sure that the function's stack hasn't been overrun. On x86-64, this is simply an offset of %gs, which is the usual per-cpu base segment register, so setting it up simply requires loading %gs's base as normal. On i386, the stack protector segment is %gs (rather than the usual kernel percpu %fs segment register). This requires setting up the full kernel GDT and then loading %gs accordingly. We also need to make sure %gs is initialized when bringing up secondary cpus too. To keep things consistent, we do the full GDT/segment register setup on both architectures. Because we need to avoid -fstack-protected code before setting up the GDT and because there's no way to disable it on a per-function basis, several files need to have stack-protector inhibited. [ Impact: allow Xen booting with stack-protector enabled ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05x86: Increase MIN_GAP to include randomized stackMichal Hocko
commit 80938332d8cf652f6b16e0788cf0ca136befe0b5 upstream. Currently we are not including randomized stack size when calculating mmap_base address in arch_pick_mmap_layout for topdown case. This might cause that mmap_base starts in the stack reserved area because stack is randomized by 1GB for 64b (8MB for 32b) and the minimum gap is 128MB. If the stack really grows down to mmap_base then we can get silent mmap region overwrite by the stack values. Let's include maximum stack randomization size into MIN_GAP which is used as the low bound for the gap in mmap. Signed-off-by: Michal Hocko <mhocko@suse.cz> LKML-Reference: <1252400515-6866-1-git-send-email-mhocko@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05KVM: VMX: Fix EPT with WP bit change during pagingSheng Yang
commit 95eb84a7588d7d7afd3096807efc052adc7479e1 upstream QNX update WP bit when paging enabled, which is not covered yet. This one fix QNX boot with EPT. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24KVM: limit lapic periodic timer frequencyMarcelo Tosatti
commit 1444885a045fe3b1905a14ea1b52540bf556578b upstream. Otherwise its possible to starve the host by programming lapic timer with a very high frequency. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24KVM: x86 emulator: fix jmp far decoding (opcode 0xea)Avi Kivity
commit ee3d29e8bee8d7c321279a9bd9bd25d4cfbf79b7 upstream. The jump target should not be sign extened; use an unsigned decode flag. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24KVM: MMU: make __kvm_mmu_free_some_pages handle empty listIzik Eidus
commit 3b80fffe2b31fb716d3ebe729c54464ee7856723 upstream. First check if the list is empty before attempting to look at list entries. Signed-off-by: Izik Eidus <ieidus@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24KVM: x86 emulator: Implement zero-extended immediate decodingAvi Kivity
commit c9eaf20f268c7051bfde2ba212c5ea76a6cbc7a1 upstream. Absolute jumps use zero extended immediate operands. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24KVM: VMX: Fix cr8 exiting control clobbering by EPTGleb Natapov
commit 5fff7d270bd6a4759b6d663741b729cdee370257 upstream. Don't call adjust_vmx_controls() two times for the same control. It restores options that were dropped earlier. This loses us the cr8 exit control, which causes a massive performance regression Windows x64. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24KVM: x86: Disallow hypercalls for guest callers in rings > 0Jan Kiszka
commit 07708c4af1346ab1521b26a202f438366b7bcffd upstream. So far unprivileged guest callers running in ring 3 can issue, e.g., MMU hypercalls. Normally, such callers cannot provide any hand-crafted MMU command structure as it has to be passed by its physical address, but they can still crash the guest kernel by passing random addresses. To close the hole, this patch considers hypercalls valid only if issued from guest ring 0. This may still be relaxed on a per-hypercall base in the future once required. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24KVM guest: fix bogus wallclock physical address calculationGlauber Costa
commit a20316d2aa41a8f4fd171648bad8f044f6060826 upstream. The use of __pa() to calculate the address of a C-visible symbol is wrong, and can lead to unpredictable results. See arch/x86/include/asm/page.h for details. It should be replaced with __pa_symbol(), that does the correct math here, by taking relocations into account. This ensures the correct wallclock data structure physical address is passed to the hypervisor. Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24KVM: VMX: Check cpl before emulating debug register accessAvi Kivity
commit 0a79b009525b160081d75cef5dbf45817956acf2 upstream. Debug registers may only be accessed from cpl 0. Unfortunately, vmx will code to emulate the instruction even though it was issued from guest userspace, possibly leading to an unexpected trap later. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24KVM guest: do not batch pte updates from interrupt contextMarcelo Tosatti
commit 6ba661787594868512a71c129062ebd57d0c01e7 upstream. Commit b8bcfe997e4 made paravirt pte updates synchronous in interrupt context. Unfortunately the KVM pv mmu code caches the lazy/nonlazy mode internally, so a pte update from interrupt context during a lazy mmu operation can be batched while it should be performed synchronously. https://bugzilla.redhat.com/show_bug.cgi?id=518022 Drop the internal mode variable and use paravirt_get_lazy_mode(), which returns the correct state. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24x86, pat: Fix cacheflush address in change_page_attr_set_clr()Jack Steiner
commit fa526d0d641b5365676a1fb821ce359e217c9b85 upstream. Fix address passed to cpa_flush_range() when changing page attributes from WB to UC. The address (*addr) is modified by __change_page_attr_set_clr(). The result is that the pages being flushed start at the _end_ of the changed range instead of the beginning. This should be considered for 2.6.30-stable and 2.6.31-stable. Signed-off-by: Jack Steiner <steiner@sgi.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24x86/i386: Make sure stack-protector segment base is cache alignedJeremy Fitzhardinge
commit 1ea0d14e480c245683927eecc03a70faf06e80c8 upstream. The Intel Optimization Reference Guide says: In Intel Atom microarchitecture, the address generation unit assumes that the segment base will be 0 by default. Non-zero segment base will cause load and store operations to experience a delay. - If the segment base isn't aligned to a cache line boundary, the max throughput of memory operations is reduced to one [e]very 9 cycles. [...] Assembly/Compiler Coding Rule 15. (H impact, ML generality) For Intel Atom processors, use segments with base set to 0 whenever possible; avoid non-zero segment base address that is not aligned to cache line boundary at all cost. We can't avoid having a non-zero base for the stack-protector segment, but we can make it cache-aligned. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> LKML-Reference: <4AA01893.6000507@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24x86: Fix x86_model test in es7000_apic_is_cluster()Roel Kluin
commit 005155b1f626d2b2d7932e4afdf4fead168c6888 upstream. For the x86_model to be greater than 6 or less than 12 is logically always true. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24x86/amd-iommu: fix broken check in amd_iommu_flush_all_devicesJoerg Roedel
commit e0faf54ee82bf9c07f0307b4391caad4020bd659 upstream. The amd_iommu_pd_table is indexed by protection domain number and not by device id. So this check is broken and must be removed. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-08-26x86: Fix vSMP boot crashYinghai Lu
2.6.31-rc7 does not boot on vSMP systems: [ 8.501108] CPU31: Thermal monitoring enabled (TM1) [ 8.501127] CPU 31 MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8 [ 8.650254] CPU31: Intel(R) Xeon(R) CPU E5540 @ 2.53GHz stepping 04 [ 8.710324] Brought up 32 CPUs [ 8.713916] Total of 32 processors activated (162314.96 BogoMIPS). [ 8.721489] ERROR: parent span is not a superset of domain->span [ 8.727686] ERROR: domain->groups does not contain CPU0 [ 8.733091] ERROR: groups don't span domain->span [ 8.737975] ERROR: domain->cpu_power not set [ 8.742416] Ravikiran Thirumalai bisected it to: | commit 2759c3287de27266e06f1f4e82cbd2d65f6a044c | x86: don't call read_apic_id if !cpu_has_apic The problem is that on vSMP systems the CPUID derived initial-APICIDs are overlapping - so we need to fall back on hard_smp_processor_id() which reads the local APIC. Both come from the hardware (influenced by firmware though) so it's a tough call which one to trust. Doing the quirk expresses the vSMP property properly and also does not affect other systems, so we go for this solution instead of a revert. Reported-and-Tested-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shai Fultheim <shai@scalex86.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <4A944D3C.5030100@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-25x86, xen: Initialize cx to suppress warningH. Peter Anvin
Initialize cx before calling xen_cpuid(), in order to suppress the "may be used uninitialized in this function" warning. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org>
2009-08-25x86, xen: Suppress WP test on XenJeremy Fitzhardinge
Xen always runs on CPUs which properly support WP enforcement in privileged mode, so there's no need to test for it. This also works around a crash reported by Arnd Hannemann, though I think its just a band-aid for that case. Reported-by: Arnd Hannemann <hannemann@nets.rwth-aachen.de> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-25Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: clockevent: Prevent dead lock on clockevents_lock timers: Drop write permission on /proc/timer_list
2009-08-25Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix build with older binutils and consolidate linker script x86: Fix an incorrect argument of reserve_bootmem() x86: add vmlinux.lds to targets in arch/x86/boot/compressed/Makefile xen: rearrange things to fix stackprotector x86: make sure load_percpu_segment has no stackprotector i386: Fix section mismatches for init code with !HOTPLUG_CPU x86, pat: Allow ISA memory range uncacheable mapping requests
2009-08-25x86: Fix build with older binutils and consolidate linker scriptJan Beulich
binutils prior to 2.17 can't deal with the currently possible situation of a new segment following the per-CPU segment, but that new segment being empty - objcopy misplaces the .bss (and perhaps also the .brk) sections outside of any segment. However, the current ordering of sections really just appears to be the effect of cumulative unrelated changes; re-ordering things allows to easily guarantee that the segment following the per-CPU one is non-empty, and at once eliminates the need for the bogus data.init2 segment. Once touching this code, also use the various data section helper macros from include/asm-generic/vmlinux.lds.h. -v2: fix !SMP builds. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: <sam@ravnborg.org> LKML-Reference: <4A94085D02000078000119A5@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-24x86: Fix an incorrect argument of reserve_bootmem()Amerigo Wang
This line looks suspicious, because if this is true, then the 'flags' parameter of function reserve_bootmem_generic() will be unused when !CONFIG_NUMA. I don't think this is what we want. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: akpm@linux-foundation.org LKML-Reference: <20090821083709.5098.52505.sendpatchset@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-21x86: don't call '->send_IPI_mask()' with an empty maskLinus Torvalds
As noted in 83d349f35e1ae72268c5104dbf9ab2ae635425d4 ("x86: don't send an IPI to the empty set of CPU's"), some APIC's will be very unhappy with an empty destination mask. That commit added a WARN_ON() for that case, and avoided the resulting problem, but didn't fix the underlying reason for why those empty mask cases happened. This fixes that, by checking the result of 'cpumask_andnot()' of the current CPU actually has any other CPU's left in the set of CPU's to be sent a TLB flush, and not calling down to the IPI code if the mask is empty. The reason this started happening at all is that we started passing just the CPU mask pointers around in commit 4595f9620 ("x86: change flush_tlb_others to take a const struct cpumask"), and when we did that, the cpumask was no longer thread-local. Before that commit, flush_tlb_mm() used to create it's own copy of 'mm->cpu_vm_mask' and pass that copy down to the low-level flush routines after having tested that it was not empty. But after changing it to just pass down the CPU mask pointer, the lower level TLB flush routines would now get a pointer to that 'mm->cpu_vm_mask', and that could still change - and become empty - after the test due to other CPU's having flushed their own TLB's. See http://bugzilla.kernel.org/show_bug.cgi?id=13933 for details. Tested-by: Thomas Björnell <thomas.bjornell@gmail.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-21x86: don't send an IPI to the empty set of CPU'sLinus Torvalds
The default_send_IPI_mask_logical() function uses the "flat" APIC mode to send an IPI to a set of CPU's at once, but if that set happens to be empty, some older local APIC's will apparently be rather unhappy. So just warn if a caller gives us an empty mask, and ignore it. This fixes a regression in 2.6.30.x, due to commit 4595f9620 ("x86: change flush_tlb_others to take a const struct cpumask"), documented here: http://bugzilla.kernel.org/show_bug.cgi?id=13933 which causes a silent lock-up. It only seems to happen on PPro, P2, P3 and Athlon XP cores. Most developers sadly (or not so sadly, if you're a developer..) have more modern CPU's. Also, on x86-64 we don't use the flat APIC mode, so it would never trigger there even if the APIC didn't like sending an empty IPI mask. Reported-by: Pavel Vilim <wylda@volny.cz> Reported-and-tested-by: Thomas Björnell <thomas.bjornell@gmail.com> Reported-and-tested-by: Martin Rogge <marogge@onlinehome.de> Cc: Mike Travis <travis@sgi.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-20x86: add vmlinux.lds to targets in arch/x86/boot/compressed/MakefileJan Beulich
The absence of vmlinux.lds here keeps .vmlinux.lds.cmd from being included, which in turn leads to it and all its dependents always getting rebuilt independent of whether they are already up-to-date. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4A8D84670200007800010D31@vpn.id2.novell.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-20Merge branch 'bugfix' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen into x86/urgent
2009-08-19xen: rearrange things to fix stackprotectorJeremy Fitzhardinge
Make sure the stack-protector segment registers are properly set up before calling any functions which may have stack-protection compiled into them. [ Impact: prevent Xen early-boot crash when stack-protector is enabled ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-08-19x86: make sure load_percpu_segment has no stackprotectorJeremy Fitzhardinge
load_percpu_segment() is used to set up the per-cpu segment registers, which are also used for -fstack-protector. Make sure that the load_percpu_segment() function doesn't have stackprotector enabled. [ Impact: allow percpu setup before calling stack-protected functions ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-08-19clockevent: Prevent dead lock on clockevents_lockSuresh Siddha
Currently clockevents_notify() is called with interrupts enabled at some places and interrupts disabled at some other places. This results in a deadlock in this scenario. cpu A holds clockevents_lock in clockevents_notify() with irqs enabled cpu B waits for clockevents_lock in clockevents_notify() with irqs disabled cpu C doing set_mtrr() which will try to rendezvous of all the cpus. This will result in C and A come to the rendezvous point and waiting for B. B is stuck forever waiting for the spinlock and thus not reaching the rendezvous point. Fix the clockevents code so that clockevents_lock is taken with interrupts disabled and thus avoid the above deadlock. Also call lapic_timer_propagate_broadcast() on the destination cpu so that we avoid calling smp_call_function() in the clockevents notifier chain. This issue left us wondering if we need to change the MTRR rendezvous logic to use stop machine logic (instead of smp_call_function) or add a check in spinlock debug code to see if there are other spinlocks which gets taken under both interrupts enabled/disabled conditions. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: "Pallipadi Venkatesh" <venkatesh.pallipadi@intel.com> Cc: "Brown Len" <len.brown@intel.com> LKML-Reference: <1250544899.2709.210.camel@sbs-t61.sc.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-18Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: use the right flag for get_vm_area() percpu, sparc64: fix sparse possible cpu map handling init: set nr_cpu_ids before setup_per_cpu_areas()
2009-08-18i386: Fix section mismatches for init code with !HOTPLUG_CPUJan Beulich
Commit 0e83815be719d3391bf5ea24b7fe696c07dbd417 changed the section the initial_code variable gets allocated in, in an attempt to address a section conflict warning. This, however created a new section conflict when building without HOTPLUG_CPU. The apparently only (reasonable) way to address this is to always use __REFDATA. Once at it, also fix a second section mismatch when not using HOTPLUG_CPU. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4A8AE7CD020000780001054B@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-17x86, pat: Allow ISA memory range uncacheable mapping requestsSuresh Siddha
Max Vozeler reported: > Bug 13877 - bogl-term broken with CONFIG_X86_PAT=y, works with =n > > strace of bogl-term: > 814 mmap2(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 0) > = -1 EAGAIN (Resource temporarily unavailable) > 814 write(2, "bogl: mmaping /dev/fb0: Resource temporarily unavailable\n", > 57) = 57 PAT code maps the ISA memory range as WB in the PAT attribute, so that fixed range MTRR registers define the actual memory type (UC/WC/WT etc). But the upper level is_new_memtype_allowed() API checks are failing, as the request here is for UC and the return tracked type is WB (Tracked type is WB as MTRR type for this legacy range potentially will be different for each 4k page). Fix is_new_memtype_allowed() by always succeeding the ISA address range checks, as the null PAT (WB) and def MTRR fixed range register settings satisfy the memory type needs of the applications that map the ISA address range. Reported-and-Tested-by: Max Vozeler <xam@debian.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-17x86, mce: Don't initialize MCEs on unknown CPUsIngo Molnar
An older test-box started hanging at the following point during bootup: [ 0.022996] Mount-cache hash table entries: 512 [ 0.024996] Initializing cgroup subsys debug [ 0.025996] Initializing cgroup subsys cpuacct [ 0.026995] Initializing cgroup subsys devices [ 0.027995] Initializing cgroup subsys freezer [ 0.028995] mce: CPU supports 5 MCE banks I've bisected it down to commit 4efc0670 ("x86, mce: use 64bit machine check code on 32bit"), which utilizes the MCE code on 32-bit systems too. The problem is caused by this detail in my config: # CONFIG_CPU_SUP_INTEL is not set This disables the quirks in mce_cpu_quirks() but still enables MCE support - which then hangs due to the missing quirk workaround needed on this CPU: if (c->x86 == 6 && c->x86_model < 0x1A && banks > 0) mce_banks[0].init = 0; The safe solution is to not initialize MCEs if we dont know on what CPU we are running (or if that CPU's support code got disabled in the config). Also be a bit more defensive on 32-bit systems: dont do a boot-time dump of pending MCEs not just on the specific system that we found a problem with (Pentium-M), but earlier ones as well. Now this problem is probably not common and disabling CPU support is rare - but still being more defensive in something we turned on for a wide range of CPUs is prudent. Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> LKML-Reference: Message-ID: <4A88E3E4.40506@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-17x86, mce: don't log boot MCEs on Pentium M (model == 13) CPUsBartlomiej Zolnierkiewicz
On my legacy Pentium M laptop (Acer Extensa 2900) I get bogus MCE on a cold boot with CONFIG_X86_NEW_MCE enabled, i.e. (after decoding it with mcelog): MCE 0 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 BANK 1 MCG status: MCi status: Error overflow Uncorrected error Error enabled Processor context corrupt MCA: Data CACHE Level-1 UNKNOWN Error STATUS f200000000000195 MCGSTATUS 0 [ The other STATUS values observed: f2000000000001b5 (... UNKNOWN error) and f200000000000115 (... READ Error). To verify that this is not a CONFIG_X86_NEW_MCE bug I also modified the CONFIG_X86_OLD_MCE code (which doesn't log any MCEs) to dump content of STATUS MSR before it is cleared during initialization. ] Since the bogus MCE results in a kernel taint (which in turn disables lockdep support) don't log boot MCEs on Pentium M (model == 13) CPUs by default ("mce=bootlog" boot parameter can be be used to get the old behavior). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Reviewed-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-16x86: Annotate section mismatch warnings in kernel/apic/x2apic_uv_x.cLeonardo Potenza
The function uv_acpi_madt_oem_check() has been marked __init, the struct apic_x2apic_uv_x has been marked __refdata. The aim is to address the following section mismatch messages: WARNING: arch/x86/kernel/apic/built-in.o(.data+0x1368): Section mismatch in reference from the variable apic_x2apic_uv_x to the function .cpuinit.text:uv_wakeup_secondary() The variable apic_x2apic_uv_x references the function __cpuinit uv_wakeup_secondary() If the reference is valid then annotate the variable with __init* or __refdata (see linux/init.h) or name the variable: *driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console, WARNING: arch/x86/kernel/built-in.o(.data+0x68e8): Section mismatch in reference from the variable apic_x2apic_uv_x to the function .cpuinit.text:uv_wakeup_secondary() The variable apic_x2apic_uv_x references the function __cpuinit uv_wakeup_secondary() If the reference is valid then annotate the variable with __init* or __refdata (see linux/init.h) or name the variable: *driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console, WARNING: arch/x86/built-in.o(.text+0x7b36f): Section mismatch in reference from the function uv_acpi_madt_oem_check() to the function .init.text:early_ioremap() The function uv_acpi_madt_oem_check() references the function __init early_ioremap(). This is often because uv_acpi_madt_oem_check lacks a __init annotation or the annotation of early_ioremap is wrong. WARNING: arch/x86/built-in.o(.text+0x7b38d): Section mismatch in reference from the function uv_acpi_madt_oem_check() to the function .init.text:early_iounmap() The function uv_acpi_madt_oem_check() references the function __init early_iounmap(). This is often because uv_acpi_madt_oem_check lacks a __init annotation or the annotation of early_iounmap is wrong. WARNING: arch/x86/built-in.o(.data+0x8668): Section mismatch in reference from the variable apic_x2apic_uv_x to the function .cpuinit.text:uv_wakeup_secondary() The variable apic_x2apic_uv_x references the function __cpuinit uv_wakeup_secondary() If the reference is valid then annotate the variable with __init* or __refdata (see linux/init.h) or name the variable: *driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console, Signed-off-by: Leonardo Potenza <lpotenza@inwind.it> LKML-Reference: <200908161855.48302.lpotenza@inwind.it> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-16x86, mce: therm_throt: Don't log redundant normalityHugh Dickins
0d01f31439c1e4d602bf9fdc924ab66f407f5e38 "x86, mce: therm_throt - change when we print messages" removed redundant announcements of "Temperature/speed normal". They're not worth logging and remove their accompanying "Machine check events logged" messages as well from the console. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Dmitry Torokhov <dtor@mail.ru> LKML-Reference: <Pine.LNX.4.64.0908161544100.7929@sister.anvils> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-15x86: Fix UV BAU destination subnode idCliff Wickman
The SGI UV Broadcast Assist Unit is used to send TLB shootdown messages to remote nodes of the system. The header of the message must contain the subnode id of the block in the receiving hub that handles such messages. It should always be 0x10, the id of the "LB" block. It had previously been documented as a "must be zero" field. Signed-off-by: Cliff Wickman <cpw@sgi.com> Acked-by: Jack Steiner <steiner@sgi.com> LKML-Reference: <E1Mc1x7-0005Ce-6t@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-14percpu, sparc64: fix sparse possible cpu map handlingTejun Heo
percpu code has been assuming num_possible_cpus() == nr_cpu_ids which is incorrect if cpu_possible_map contains holes. This causes percpu code to access beyond allocated memories and vmalloc areas. On a sparc64 machine with cpus 0 and 2 (u60), this triggers the following warning or fails boot. WARNING: at /devel/tj/os/work/mm/vmalloc.c:106 vmap_page_range_noflush+0x1f0/0x240() Modules linked in: Call Trace: [00000000004b17d0] vmap_page_range_noflush+0x1f0/0x240 [00000000004b1840] map_vm_area+0x20/0x60 [00000000004b1950] __vmalloc_area_node+0xd0/0x160 [0000000000593434] deflate_init+0x14/0xe0 [0000000000583b94] __crypto_alloc_tfm+0xd4/0x1e0 [00000000005844f0] crypto_alloc_base+0x50/0xa0 [000000000058b898] alg_test_comp+0x18/0x80 [000000000058dad4] alg_test+0x54/0x180 [000000000058af00] cryptomgr_test+0x40/0x60 [0000000000473098] kthread+0x58/0x80 [000000000042b590] kernel_thread+0x30/0x60 [0000000000472fd0] kthreadd+0xf0/0x160 ---[ end trace 429b268a213317ba ]--- This patch fixes generic percpu functions and sparc64 setup_per_cpu_areas() so that they handle sparse cpu_possible_map properly. Please note that on x86, cpu_possible_map() doesn't contain holes and thus num_possible_cpus() == nr_cpu_ids and this patch doesn't cause any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu>