summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2015-09-21ACPI, PCI: Penalize legacy IRQ used by ACPI SCIJiang Liu
commit 5d0ddfebb93069061880fc57ee4ba7246bd1e1ee upstream. Nick Meier reported a regression with HyperV that " After rebooting the VM, the following messages are logged in syslog when trying to load the tulip driver: tulip: Linux Tulip drivers version 1.1.15 (Feb 27, 2007) tulip: 0000:00:0a.0: PCI INT A: failed to register GSI tulip: Cannot enable tulip board #0, aborting tulip: probe of 0000:00:0a.0 failed with error -16 Errors occur in 3.19.0 kernel Works in 3.17 kernel. " According to the ACPI dump file posted by Nick at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1440072 The ACPI MADT table includes an interrupt source overridden entry for ACPI SCI: [236h 0566 1] Subtable Type : 02 <Interrupt Source Override> [237h 0567 1] Length : 0A [238h 0568 1] Bus : 00 [239h 0569 1] Source : 09 [23Ah 0570 4] Interrupt : 00000009 [23Eh 0574 2] Flags (decoded below) : 000D Polarity : 1 Trigger Mode : 3 And in DSDT table, we have _PRT method to define PCI interrupts, which eventually goes to: Name (PRSA, ResourceTemplate () { IRQ (Level, ActiveLow, Shared, ) {3,4,5,7,9,10,11,12,14,15} }) Name (PRSB, ResourceTemplate () { IRQ (Level, ActiveLow, Shared, ) {3,4,5,7,9,10,11,12,14,15} }) Name (PRSC, ResourceTemplate () { IRQ (Level, ActiveLow, Shared, ) {3,4,5,7,9,10,11,12,14,15} }) Name (PRSD, ResourceTemplate () { IRQ (Level, ActiveLow, Shared, ) {3,4,5,7,9,10,11,12,14,15} }) According to the MADT and DSDT tables, IRQ 9 may be used for: 1) ACPI SCI in level, high mode 2) PCI legacy IRQ in level, low mode So there's a conflict in polarity setting for IRQ 9. Prior to commit cd68f6bd53cf ("x86, irq, acpi: Get rid of special handling of GSI for ACPI SCI"), ACPI SCI is handled specially and there's no check for conflicts between ACPI SCI and PCI legagy IRQ. And it seems that the HyperV hypervisor doesn't make use of the polarity configuration in IOAPIC entry, so it just works. Commit cd68f6bd53cf gets rid of the specially handling of ACPI SCI, and then the pin attribute checking code discloses the conflicts between ACPI SCI and PCI legacy IRQ on HyperV virtual machine, and rejects the request to assign IRQ9 to PCI devices. So penalize legacy IRQ used by ACPI SCI and mark it unusable if ACPI SCI attributes conflict with PCI IRQ attributes. Please refer to following links for more information: https://bugzilla.kernel.org/show_bug.cgi?id=101301 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1440072 Fixes: cd68f6bd53cf ("x86, irq, acpi: Get rid of special handling of GSI for ACPI SCI") Reported-and-tested-by: Nick Meier <nmeier@microsoft.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-21x86/mce: Reenable CMCI banks when swiching back to interrupt modeXie XiuQi
commit 1b48465500611a2dc5e75800c61ac352e22d41c3 upstream. Zhang Liguang reported the following issue: 1) System detects a CMCI storm on the current CPU. 2) Kernel disables the CMCI interrupt on banks owned by the current CPU and switches to poll mode 3) After the CMCI storm subsides, kernel switches back to interrupt mode 4) We expect the system to reenable the CMCI interrupt on banks owned by the current CPU mce_intel_adjust_timer |-> cmci_reenable |-> cmci_discover # owned banks are ignored here static void cmci_discover(int banks) ... for (i = 0; i < banks; i++) { ... if (test_bit(i, owned)) # ownd banks is ignore here continue; So convert cmci_storm_disable_banks() to cmci_toggle_interrupt_mode() which controls whether to enable or disable CMCI interrupts with its argument. NB: We cannot clear the owned bit because the banks won't be polled, otherwise. See: 27f6c573e0f7 ("x86, CMCI: Add proper detection of end of CMCI storms") for more info. Reported-by: Zhang Liguang <zhangliguang@huawei.com> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: huawei.libin@huawei.com Cc: linux-edac <linux-edac@vger.kernel.org> Cc: rui.xiang@huawei.com Link: http://lkml.kernel.org/r/1439396985-12812-10-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-21KVM: x86: Use adjustment in guest cycles when handling MSR_IA32_TSC_ADJUSTHaozhong Zhang
commit d7add05458084a5e3d65925764a02ca9c8202c1e upstream. When kvm_set_msr_common() handles a guest's write to MSR_IA32_TSC_ADJUST, it will calcuate an adjustment based on the data written by guest and then use it to adjust TSC offset by calling a call-back adjust_tsc_offset(). The 3rd parameter of adjust_tsc_offset() indicates whether the adjustment is in host TSC cycles or in guest TSC cycles. If SVM TSC scaling is enabled, adjust_tsc_offset() [i.e. svm_adjust_tsc_offset()] will first scale the adjustment; otherwise, it will just use the unscaled one. As the MSR write here comes from the guest, the adjustment is in guest TSC cycles. However, the current kvm_set_msr_common() uses it as a value in host TSC cycles (by using true as the 3rd parameter of adjust_tsc_offset()), which can result in an incorrect adjustment of TSC offset if SVM TSC scaling is enabled. This patch fixes this problem. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-21KVM: MMU: fix validation of mmio page faultXiao Guangrong
commit 6f691251c0350ac52a007c54bf3ef62e9d8cdc5e upstream. We got the bug that qemu complained with "KVM: unknown exit, hardware reason 31" and KVM shown these info: [84245.284948] EPT: Misconfiguration. [84245.285056] EPT: GPA: 0xfeda848 [84245.285154] ept_misconfig_inspect_spte: spte 0x5eaef50107 level 4 [84245.285344] ept_misconfig_inspect_spte: spte 0x5f5fadc107 level 3 [84245.285532] ept_misconfig_inspect_spte: spte 0x5141d18107 level 2 [84245.285723] ept_misconfig_inspect_spte: spte 0x52e40dad77 level 1 This is because we got a mmio #PF and the handler see the mmio spte becomes normal (points to the ram page) However, this is valid after introducing fast mmio spte invalidation which increases the generation-number instead of zapping mmio sptes, a example is as follows: 1. QEMU drops mmio region by adding a new memslot 2. invalidate all mmio sptes 3. VCPU 0 VCPU 1 access the invalid mmio spte access the region originally was MMIO before set the spte to the normal ram map mmio #PF check the spte and see it becomes normal ram mapping !!! This patch fixes the bug just by dropping the check in mmio handler, it's good for backport. Full check will be introduced in later patches Reported-by: Pavel Shirshov <ru.pchel@gmail.com> Tested-by: Pavel Shirshov <ru.pchel@gmail.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-21crypto: ghash-clmulni: specify context size for ghash async algorithmAndrey Ryabinin
commit 71c6da846be478a61556717ef1ee1cea91f5d6a8 upstream. Currently context size (cra_ctxsize) doesn't specified for ghash_async_alg. Which means it's zero. Thus crypto_create_tfm() doesn't allocate needed space for ghash_async_ctx, so any read/write to ctx (e.g. in ghash_async_init_tfm()) is not valid. Signed-off-by: Andrey Ryabinin <aryabinin@odin.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-21x86/ldt: Further fix FPU emulationAndy Lutomirski
commit 12e244f4b550498bbaf654a52f93633f7dde2dc7 upstream. The previous fix confused a selector with a segment prefix. Fix it. Compile-tested only. Cc: Juergen Gross <jgross@suse.com> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: 4809146b86c3 ("x86/ldt: Correct FPU emulation access to LDT") Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-21x86/ldt: Correct FPU emulation access to LDTJuergen Gross
commit 4809146b86c3d41ce588fdb767d021e2a80600dd upstream. Commit 37868fe113ff ("x86/ldt: Make modify_ldt synchronous") introduced a new struct ldt_struct anchored at mm->context.ldt. Adapt the x86 fpu emulation code to use that new structure. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: billm@melbpc.org.au Link: http://lkml.kernel.org/r/1438883674-1240-1-git-send-email-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-21x86/ldt: Correct LDT access in single stepping logicJuergen Gross
commit 136d9d83c07c5e30ac49fc83b27e8c4842f108fc upstream. Commit 37868fe113ff ("x86/ldt: Make modify_ldt synchronous") introduced a new struct ldt_struct anchored at mm->context.ldt. convert_ip_to_linear() was changed to reflect this, but indexing into the ldt has to be changed as the pointer is no longer void *. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@suse.de Link: http://lkml.kernel.org/r/1438848278-12906-1-git-send-email-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-21x86/ldt: Make modify_ldt synchronousAndy Lutomirski
commit 37868fe113ff2ba814b3b4eb12df214df555f8dc upstream. modify_ldt() has questionable locking and does not synchronize threads. Improve it: redesign the locking and synchronize all threads' LDTs using an IPI on all modifications. This will dramatically slow down modify_ldt in multithreaded programs, but there shouldn't be any multithreaded programs that care about modify_ldt's performance in the first place. This fixes some fallout from the CVE-2015-5157 fixes. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: security@kernel.org <security@kernel.org> Cc: xen-devel <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/4c6978476782160600471bd865b318db34c7b628.1438291540.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-13x86/idle: Restore trace_cpu_idle to mwait_idle() callsJisheng Zhang
commit e43d0189ac02415fe4487f79fc35e8f147e9ea0d upstream. Commit b253149b843f ("sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance") restores mwait_idle(), but the trace_cpu_idle related calls are missing. This causes powertop on my old desktop powered by Intel Core2 E6550 to report zero wakeups and zero events. Add them back to restore the proper behaviour. Fixes: b253149b843f ("sched/idle/x86: Restore mwait_idle() to ...") Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Cc: <len.brown@intel.com> Link: http://lkml.kernel.org/r/1440046479-4262-1-git-send-email-jszhang@marvell.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-13x86/apic: Fix fallout from x2apic cleanupThomas Gleixner
commit a57e456a7b28431b55e407e5ab78ebd5b378d19e upstream. In the recent x2apic cleanup I got two things really wrong: 1) The safety check in __disable_x2apic which allows the function to be called unconditionally is backwards. The check is there to prevent access to the apic MSR in case that the machine has no apic. Though right now it returns if the machine has an apic and therefor the disabling of x2apic is never invoked. 2) x2apic_disable() sets x2apic_mode to 0 after registering the local apic. That's wrong, because register_lapic_address() checks x2apic mode and therefor takes the wrong code path. This results in boot failures on machines with x2apic preenabled by BIOS and can also lead to an fatal MSR access on machines without apic. The solutions are simple: 1) Correct the sanity check for apic availability 2) Clear x2apic_mode _before_ calling register_lapic_address() Fixes: 659006bf3ae3 'x86/x2apic: Split enable and setup function' Reported-and-tested-by: Javier Monteagudo <javiermon@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://bugzilla.redhat.com/show_bug.cgi?id=1224764 Cc: Laura Abbott <labbott@redhat.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-13x86/xen: make CONFIG_XEN depend on CONFIG_X86_LOCAL_APICDavid Vrabel
commit 87ffd2b9bb74061c120f450e4d0f3409bb603ae0 upstream. Since commit feb44f1f7a4ac299d1ab1c3606860e70b9b89d69 (x86/xen: Provide a "Xen PV" APIC driver to support >255 VCPUs) Xen guests need a full APIC driver and thus should depend on X86_LOCAL_APIC. This fixes an i386 build failure with !SMP && !CONFIG_X86_UP_APIC by disabling Xen support in this configuration. Users needing Xen support in a non-SMP i386 kernel will need to enable CONFIG_X86_UP_APIC. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-13Revert x86 sigcontext cleanupsLinus Torvalds
commit ed596cde9425509ec6ce88e19f03e9b13b6f518b upstream. This reverts commits 9a036b93a344 ("x86/signal/64: Remove 'fs' and 'gs' from sigcontext") and c6f2062935c8 ("x86/signal/64: Fix SS handling for signals delivered to 64-bit programs"). They were cleanups, but they break dosemu by changing the signal return behavior (and removing 'fs' and 'gs' from the sigcontext struct - while not actually changing any behavior - causes build problems). Reported-and-tested-by: Stas Sergeev <stsp@list.ru> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-13x86/xen: build "Xen PV" APIC driver for domU as wellJason A. Donenfeld
commit fc5fee86bdd3d720e2d1d324e4fae0c35845fa63 upstream. It turns out that a PV domU also requires the "Xen PV" APIC driver. Otherwise, the flat driver is used and we get stuck in busy loops that never exit, such as in this stack trace: (gdb) target remote localhost:9999 Remote debugging using localhost:9999 __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56 56 while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY) (gdb) bt #0 __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56 #1 __default_send_IPI_shortcut (shortcut=<optimized out>, dest=<optimized out>, vector=<optimized out>) at ./arch/x86/include/asm/ipi.h:75 #2 apic_send_IPI_self (vector=246) at arch/x86/kernel/apic/probe_64.c:54 #3 0xffffffff81011336 in arch_irq_work_raise () at arch/x86/kernel/irq_work.c:47 #4 0xffffffff8114990c in irq_work_queue (work=0xffff88000fc0e400) at kernel/irq_work.c:100 #5 0xffffffff8110c29d in wake_up_klogd () at kernel/printk/printk.c:2633 #6 0xffffffff8110ca60 in vprintk_emit (facility=0, level=<optimized out>, dict=0x0 <irq_stack_union>, dictlen=<optimized out>, fmt=<optimized out>, args=<optimized out>) at kernel/printk/printk.c:1778 #7 0xffffffff816010c8 in printk (fmt=<optimized out>) at kernel/printk/printk.c:1868 #8 0xffffffffc00013ea in ?? () #9 0x0000000000000000 in ?? () Mailing-list-thread: https://lkml.org/lkml/2015/8/4/755 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-16kvm: x86: fix kvm_apic_has_events to check for NULL pointerPaolo Bonzini
commit ce40cd3fc7fa40a6119e5fe6c0f2bc0eb4541009 upstream. Malicious (or egregiously buggy) userspace can trigger it, but it should never happen in normal operation. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Wang Kai <morgan.wang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-16x86/xen: Probe target addresses in set_aliased_prot() before the hypercallAndy Lutomirski
commit aa1acff356bbedfd03b544051f5b371746735d89 upstream. The update_va_mapping hypercall can fail if the VA isn't present in the guest's page tables. Under certain loads, this can result in an OOPS when the target address is in unpopulated vmap space. While we're at it, add comments to help explain what's going on. This isn't a great long-term fix. This code should probably be changed to use something like set_memory_ro. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Vrabel <dvrabel@cantab.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: security@kernel.org <security@kernel.org> Cc: xen-devel <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-16x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detectionAndy Lutomirski
commit 810bc075f78ff2c221536eb3008eac6a492dba2d upstream. We have a tricky bug in the nested NMI code: if we see RSP pointing to the NMI stack on NMI entry from kernel mode, we assume that we are executing a nested NMI. This isn't quite true. A malicious userspace program can point RSP at the NMI stack, issue SYSCALL, and arrange for an NMI to happen while RSP is still pointing at the NMI stack. Fix it with a sneaky trick. Set DF in the region of code that the RSP check is intended to detect. IRET will clear DF atomically. ( Note: other than paravirt, there's little need for all this complexity. We could check RIP instead of RSP. ) Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-16x86/nmi/64: Reorder nested NMI checksAndy Lutomirski
commit a27507ca2d796cfa8d907de31ad730359c8a6d06 upstream. Check the repeat_nmi .. end_repeat_nmi special case first. The next patch will rework the RSP check and, as a side effect, the RSP check will no longer detect repeat_nmi .. end_repeat_nmi, so we'll need this ordering of the checks. Note: this is more subtle than it appears. The check for repeat_nmi .. end_repeat_nmi jumps straight out of the NMI code instead of adjusting the "iret" frame to force a repeat. This is necessary, because the code between repeat_nmi and end_repeat_nmi sets "NMI executing" and then writes to the "iret" frame itself. If a nested NMI comes in and modifies the "iret" frame while repeat_nmi is also modifying it, we'll end up with garbage. The old code got this right, as does the new code, but the new code is a bit more explicit. If we were to move the check right after the "NMI executing" check, then we'd get it wrong and have random crashes. ( Because the "NMI executing" check would jump to the code that would modify the "iret" frame without checking if the interrupted NMI was currently modifying it. ) Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-16x86/nmi/64: Improve nested NMI commentsAndy Lutomirski
commit 0b22930ebad563ae97ff3f8d7b9f12060b4c6e6b upstream. I found the nested NMI documentation to be difficult to follow. Improve the comments. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-16x86/nmi/64: Switch stacks on userspace NMI entryAndy Lutomirski
commit 9b6e6a8334d56354853f9c255d1395c2ba570e0a upstream. Returning to userspace is tricky: IRET can fail, and ESPFIX can rearrange the stack prior to IRET. The NMI nesting fixup relies on a precise stack layout and atomic IRET. Rather than trying to teach the NMI nesting fixup to handle ESPFIX and failed IRET, punt: run NMIs that came from user mode on the normal kernel stack. This will make some nested NMIs visible to C code, but the C code is okay with that. As a side effect, this should speed up perf: it eliminates an RDMSR when NMIs come from user mode. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-16x86/nmi/64: Remove asm code that saves CR2Andy Lutomirski
commit 0e181bb58143cb4a2e8f01c281b0816cd0e4798e upstream. Now that do_nmi saves CR2, we don't need to save it in asm. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-16x86/nmi: Enable nested do_nmi() handling for 64-bit kernelsAndy Lutomirski
commit 9d05041679904b12c12421cbcf9cb5f4860a8d7b upstream. 32-bit kernels handle nested NMIs in C. Enable the exact same handling on 64-bit kernels as well. This isn't currently necessary, but it will become necessary once the asm code starts allowing limited nesting. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-16x86/asm/entry/64: Remove pointless jump to irq_returnAndy Lutomirski
commit 5ca6f70f387b4f82903037cc3c5488e2c97dcdbc upstream. INTERRUPT_RETURN turns into a jmp instruction. There's no need for extra indirection. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: <linux-kernel@vger.kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/2f2318653dbad284a59311f13f08cea71298fd7c.1433449436.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-10perf/x86/intel/cqm: Return cached counter value from IRQ contextMatt Fleming
commit 2c534c0da0a68418693e10ce1c4146e085f39518 upstream. Peter reported the following potential crash which I was able to reproduce with his test program, [ 148.765788] ------------[ cut here ]------------ [ 148.765796] WARNING: CPU: 34 PID: 2840 at kernel/smp.c:417 smp_call_function_many+0xb6/0x260() [ 148.765797] Modules linked in: [ 148.765800] CPU: 34 PID: 2840 Comm: perf Not tainted 4.2.0-rc1+ #4 [ 148.765803] ffffffff81cdc398 ffff88085f105950 ffffffff818bdfd5 0000000000000007 [ 148.765805] 0000000000000000 ffff88085f105990 ffffffff810e413a 0000000000000000 [ 148.765807] ffffffff82301080 0000000000000022 ffffffff8107f640 ffffffff8107f640 [ 148.765809] Call Trace: [ 148.765810] <NMI> [<ffffffff818bdfd5>] dump_stack+0x45/0x57 [ 148.765818] [<ffffffff810e413a>] warn_slowpath_common+0x8a/0xc0 [ 148.765822] [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60 [ 148.765824] [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60 [ 148.765825] [<ffffffff810e422a>] warn_slowpath_null+0x1a/0x20 [ 148.765827] [<ffffffff811613f6>] smp_call_function_many+0xb6/0x260 [ 148.765829] [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60 [ 148.765831] [<ffffffff81161748>] on_each_cpu_mask+0x28/0x60 [ 148.765832] [<ffffffff8107f6ef>] intel_cqm_event_count+0x7f/0xe0 [ 148.765836] [<ffffffff811cdd35>] perf_output_read+0x2a5/0x400 [ 148.765839] [<ffffffff811d2e5a>] perf_output_sample+0x31a/0x590 [ 148.765840] [<ffffffff811d333d>] ? perf_prepare_sample+0x26d/0x380 [ 148.765841] [<ffffffff811d3497>] perf_event_output+0x47/0x60 [ 148.765843] [<ffffffff811d36c5>] __perf_event_overflow+0x215/0x240 [ 148.765844] [<ffffffff811d4124>] perf_event_overflow+0x14/0x20 [ 148.765847] [<ffffffff8107e7f4>] intel_pmu_handle_irq+0x1d4/0x440 [ 148.765849] [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0 [ 148.765853] [<ffffffff81219bad>] ? vunmap_page_range+0x19d/0x2f0 [ 148.765854] [<ffffffff81219d11>] ? unmap_kernel_range_noflush+0x11/0x20 [ 148.765859] [<ffffffff814ce6fe>] ? ghes_copy_tofrom_phys+0x11e/0x2a0 [ 148.765863] [<ffffffff8109e5db>] ? native_apic_msr_write+0x2b/0x30 [ 148.765865] [<ffffffff8109e44d>] ? x2apic_send_IPI_self+0x1d/0x20 [ 148.765869] [<ffffffff81065135>] ? arch_irq_work_raise+0x35/0x40 [ 148.765872] [<ffffffff811c8d86>] ? irq_work_queue+0x66/0x80 [ 148.765875] [<ffffffff81075306>] perf_event_nmi_handler+0x26/0x40 [ 148.765877] [<ffffffff81063ed9>] nmi_handle+0x79/0x100 [ 148.765879] [<ffffffff81064422>] default_do_nmi+0x42/0x100 [ 148.765880] [<ffffffff81064563>] do_nmi+0x83/0xb0 [ 148.765884] [<ffffffff818c7c0f>] end_repeat_nmi+0x1e/0x2e [ 148.765886] [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0 [ 148.765888] [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0 [ 148.765890] [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0 [ 148.765891] <<EOE>> [<ffffffff8110ab66>] finish_task_switch+0x156/0x210 [ 148.765898] [<ffffffff818c1671>] __schedule+0x341/0x920 [ 148.765899] [<ffffffff818c1c87>] schedule+0x37/0x80 [ 148.765903] [<ffffffff810ae1af>] ? do_page_fault+0x2f/0x80 [ 148.765905] [<ffffffff818c1f4a>] schedule_user+0x1a/0x50 [ 148.765907] [<ffffffff818c666c>] retint_careful+0x14/0x32 [ 148.765908] ---[ end trace e33ff2be78e14901 ]--- The CQM task events are not safe to be called from within interrupt context because they require performing an IPI to read the counter value on all sockets. And performing IPIs from within IRQ context is a "no-no". Make do with the last read counter value currently event in event->count when we're invoked in this context. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vikas Shivappa <vikas.shivappa@intel.com> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Will Auld <will.auld@intel.com> Link: http://lkml.kernel.org/r/1437490509-15373-1-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-10x86/efi: Use all 64 bit of efi_memmap in setup_e820()Dmitry Skorodumov
commit 7cc03e48965453b5df1cce5062c826189b04b960 upstream. The efi_info structure stores low 32 bits of memory map in efi_memmap and high 32 bits in efi_memmap_hi. While constructing pointer in the setup_e820(), need to take into account all 64 bit of the pointer. It is because on 64bit machine the function efi_get_memory_map() may return full 64bit pointer and before the patch that pointer was truncated. The issue is triggered on Parallles virtual machine and fixed with this patch. Signed-off-by: Dmitry Skorodumov <sdmitry@parallels.com> Cc: Denis V. Lunev <den@openvz.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-10efi: Check for NULL efi kernel parametersRicardo Neri
commit 9115c7589b11349a1c3099758b4bded579ff69e0 upstream. Even though it is documented how to specifiy efi parameters, it is possible to cause a kernel panic due to a dereference of a NULL pointer when parsing such parameters if "efi" alone is given: PANIC: early exception 0e rip 10:ffffffff812fb361 error 0 cr2 0 [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.2.0-rc1+ #450 [ 0.000000] ffffffff81fe20a9 ffffffff81e03d50 ffffffff8184bb0f 00000000000003f8 [ 0.000000] 0000000000000000 ffffffff81e03e08 ffffffff81f371a1 64656c62616e6520 [ 0.000000] 0000000000000069 000000000000005f 0000000000000000 0000000000000000 [ 0.000000] Call Trace: [ 0.000000] [<ffffffff8184bb0f>] dump_stack+0x45/0x57 [ 0.000000] [<ffffffff81f371a1>] early_idt_handler_common+0x81/0xae [ 0.000000] [<ffffffff812fb361>] ? parse_option_str+0x11/0x90 [ 0.000000] [<ffffffff81f4dd69>] arch_parse_efi_cmdline+0x15/0x42 [ 0.000000] [<ffffffff81f376e1>] do_early_param+0x50/0x8a [ 0.000000] [<ffffffff8106b1b3>] parse_args+0x1e3/0x400 [ 0.000000] [<ffffffff81f37a43>] parse_early_options+0x24/0x28 [ 0.000000] [<ffffffff81f37691>] ? loglevel+0x31/0x31 [ 0.000000] [<ffffffff81f37a78>] parse_early_param+0x31/0x3d [ 0.000000] [<ffffffff81f3ae98>] setup_arch+0x2de/0xc08 [ 0.000000] [<ffffffff8109629a>] ? vprintk_default+0x1a/0x20 [ 0.000000] [<ffffffff81f37b20>] start_kernel+0x90/0x423 [ 0.000000] [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c [ 0.000000] [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef [ 0.000000] RIP 0xffffffff81ba2efc This panic is not reproducible with "efi=" as this will result in a non-NULL zero-length string. Thus, verify that the pointer to the parameter string is not NULL. This is consistent with other parameter-parsing functions which check for NULL pointers. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-10x86/mm: Add parenthesis for TLB tracepoint size calculationDave Hansen
commit bbc03778b9954a2ec93baed63718e4df0192f130 upstream. flush_tlb_info->flush_start/end are both normal virtual addresses. When calculating 'nr_pages' (only used for the tracepoint), I neglected to put parenthesis in. Thanks to David Koufaty for pointing this out. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@sr71.net Link: http://lkml.kernel.org/r/20150720230153.9E834081@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-10x86, perf: Fix static_key bug in load_mm_cr4()Peter Zijlstra
commit a833581e372a4adae2319d8dc379493edbc444e9 upstream. Mikulas reported his K6-3 not booting. This is because the static_key API confusion struck and bit Andy, this wants to be static_key_false(). Reported-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Vince Weaver <vince@deater.net> Cc: hillf.zj <hillf.zj@alibaba-inc.com> Fixes: a66734297f78 ("perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks") Link: http://lkml.kernel.org/r/20150709172338.GC19282@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-10x86/kasan: Fix boot crash on AMD processorsAndrey Ryabinin
commit d4f86beacc21d538dc41e1fc75a22e084f547edf upstream. While populating zero shadow wrong bits in upper level page tables used. __PAGE_KERNEL_RO that was used for pgd/pud/pmd has _PAGE_BIT_GLOBAL set. Global bit is present only in the lowest level of the page translation hierarchy (ptes), and it should be zero in upper levels. This bug seems doesn't cause any troubles on Intel cpus, while on AMDs it cause kernel crash on boot. Use _KERNPG_TABLE bits for pgds/puds/pmds to fix this. Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Alexander Popov <alpopov@ptsecurity.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1435828178-10975-5-git-send-email-a.ryabinin@samsung.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-10x86/kasan: Flush TLBs after switching CR3Andrey Ryabinin
commit 241d2c54c62fa0939fc9a9512b48ac3434e90a89 upstream. load_cr3() doesn't cause tlb_flush if PGE enabled. This may cause tons of false positive reports spamming the kernel to death. To fix this __flush_tlb_all() should be called explicitly after CR3 changed. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Alexander Popov <alpopov@ptsecurity.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1435828178-10975-4-git-send-email-a.ryabinin@samsung.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-10x86/kasan: Fix KASAN shadow region page tablesAlexander Popov
commit 5d5aa3cfca5cf74cd928daf3674642e6004328d1 upstream. Currently KASAN shadow region page tables created without respect of physical offset (phys_base). This causes kernel halt when phys_base is not zero. So let's initialize KASAN shadow region page tables in kasan_early_init() using __pa_nodebug() which considers phys_base. This patch also separates x86_64_start_kernel() from KASAN low level details by moving kasan_map_early_shadow(init_level4_pgt) into kasan_early_init(). Remove the comment before clear_bss() which stopped bringing much profit to the code readability. Otherwise describing all the new order dependencies would be too verbose. Signed-off-by: Alexander Popov <alpopov@ptsecurity.com> Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1435828178-10975-3-git-send-email-a.ryabinin@samsung.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-10x86/init: Clear 'init_level4_pgt' earlierAndrey Ryabinin
commit d0f77d4d04b222a817925d33ba3589b190bfa863 upstream. Currently x86_64_start_kernel() has two KASAN related function calls. The first call maps shadow to early_level4_pgt, the second maps shadow to init_level4_pgt. If we move clear_page(init_level4_pgt) earlier, we could hide KASAN low level detail from generic x86_64 initialization code. The next patch will do it. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Alexander Popov <alpopov@ptsecurity.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1435828178-10975-2-git-send-email-a.ryabinin@samsung.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-03x86/mpx: Do not set ->vm_ops on MPX VMAsKirill A. Shutemov
commit a89652769470d12cd484ee3d3f7bde0742be8d96 upstream. MPX setups private anonymous mapping, but uses vma->vm_ops too. This can confuse core VM, as it relies on vm->vm_ops to distinguish file VMAs from anonymous. As result we will get SIGBUS, because handle_pte_fault() thinks it's file VMA without vm_ops->fault and it doesn't know how to handle the situation properly. Let's fix that by not setting ->vm_ops. We don't really need ->vm_ops here: MPX VMA can be detected with VM_MPX flag. And vma_merge() will not merge MPX VMA with non-MPX VMA, because ->vm_flags won't match. The only thing left is name of VMA. I'm not sure if it's part of ABI, or we can just drop it. The patch keep it by providing arch_vma_name() on x86. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@sr71.net Link: http://lkml.kernel.org/r/20150720212958.305CC3E9@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-07-10KVM: x86: make vapics_in_nmi_mode atomicRadim Krčmář
commit 42720138b06301cc8a7ee8a495a6d021c4b6a9bc upstream. Writes were a bit racy, but hard to turn into a bug at the same time. (Particularly because modern Linux doesn't use this feature anymore.) Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> [Actually the next patch makes it much, much easier to trigger the race so I'm including this one for stable@ as well. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-07-10KVM: x86: properly restore LVT0Radim Krčmář
commit db1385624c686fe99fe2d1b61a36e1537b915d08 upstream. Legacy NMI watchdog didn't work after migration/resume, because vapics_in_nmi_mode was left at 0. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-07-10x86/PCI: Use host bridge _CRS info on Foxconn K8M890-8237ABjorn Helgaas
commit 1dace0116d0b05c967d94644fc4dfe96be2ecd3d upstream. The Foxconn K8M890-8237A has two PCI host bridges, and we can't assign resources correctly without the information from _CRS that tells us which address ranges are claimed by which bridge. In the bugs mentioned below, we incorrectly assign a sound card address (this example is from 1033299): bus: 00 index 2 [mem 0x80000000-0xfcffffffff] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-7f]) pci_root PNP0A08:00: host bridge window [mem 0x80000000-0xbfefffff] (ignored) pci_root PNP0A08:00: host bridge window [mem 0xc0000000-0xdfffffff] (ignored) pci_root PNP0A08:00: host bridge window [mem 0xf0000000-0xfebfffff] (ignored) ACPI: PCI Root Bridge [PCI1] (domain 0000 [bus 80-ff]) pci_root PNP0A08:01: host bridge window [mem 0xbff00000-0xbfffffff] (ignored) pci 0000:80:01.0: [1106:3288] type 0 class 0x000403 pci 0000:80:01.0: reg 10: [mem 0xbfffc000-0xbfffffff 64bit] pci 0000:80:01.0: address space collision: [mem 0xbfffc000-0xbfffffff 64bit] conflicts with PCI Bus #00 [mem 0x80000000-0xfcffffffff] pci 0000:80:01.0: BAR 0: assigned [mem 0xfd00000000-0xfd00003fff 64bit] BUG: unable to handle kernel paging request at ffffc90000378000 IP: [<ffffffffa0345f63>] azx_create+0x37c/0x822 [snd_hda_intel] We assigned 0xfd_0000_0000, but that is not in any of the host bridge windows, and the sound card doesn't work. Turn on pci=use_crs automatically for this system. Link: https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/931368 Link: https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/1033299 Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-07-10x86/PCI: Use host bridge _CRS info on systems with >32 bit addressingBjorn Helgaas
commit 3d9fecf6bfb8b12bc2f9a4c7109895a2a2bb9436 upstream. We enable _CRS on all systems from 2008 and later. On older systems, we ignore _CRS and assume the whole physical address space (excluding RAM and other devices) is available for PCI devices, but on systems that support physical address spaces larger than 4GB, it's doubtful that the area above 4GB is really available for PCI. After d56dbf5bab8c ("PCI: Allocate 64-bit BARs above 4G when possible"), we try to use that space above 4GB *first*, so we're more likely to put a device there. On Juan's Toshiba Satellite Pro U200, BIOS left the graphics, sound, 1394, and card reader devices unassigned (but only after Windows had been booted). Only the sound device had a 64-bit BAR, so it was the only device placed above 4GB, and hence the only device that didn't work. Keep _CRS enabled even on pre-2008 systems if they support physical address space larger than 4GB. Fixes: d56dbf5bab8c ("PCI: Allocate 64-bit BARs above 4G when possible") Reported-and-tested-by: Juan Dayer <jdayer@outlook.com> Reported-and-tested-by: Alan Horsfield <alan@hazelgarth.co.uk> Link: https://bugzilla.kernel.org/show_bug.cgi?id=99221 Link: https://bugzilla.opensuse.org/show_bug.cgi?id=907092 Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-07-10KVM: nSVM: Check for NRIPS support before updating control fieldBandan Das
commit f104765b4f81fd74d69e0eb161e89096deade2db upstream. If hardware doesn't support DecodeAssist - a feature that provides more information about the intercept in the VMCB, KVM decodes the instruction and then updates the next_rip vmcb control field. However, NRIP support itself depends on cpuid Fn8000_000A_EDX[NRIPS]. Since skip_emulated_instruction() doesn't verify nrip support before accepting control.next_rip as valid, avoid writing this field if support isn't present. Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-29perf/x86: Honor the architectural performance monitoring versionPalik, Imre
commit 2c33645d366d13b969d936b68b9f4875b1fdddea upstream. Architectural performance monitoring, version 1, doesn't support fixed counters. Currently, even if a hypervisor advertises support for architectural performance monitoring version 1, perf may still try to use the fixed counters, as the constraints are set up based on the CPU model. This patch ensures that perf honors the architectural performance monitoring version returned by CPUID, and it only uses the fixed counters for version 2 and above. (Some of the ideas in this patch came from Peter Zijlstra.) Signed-off-by: Imre Palik <imrep@amazon.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Anthony Liguori <aliguori@amazon.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433767609-1039-1-git-send-email-imrep.amz@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-29perf/x86/intel/bts: Fix DS area sharing with x86_pmu eventsAlexander Shishkin
commit 6b099d9b040b0f3d0aec05b560d7caf879af5077 upstream. Currently, the intel_bts driver relies on the DS area allocated by the x86_pmu code in its event_init() path, which is a bug: creating a BTS event while no x86_pmu events are present results in a NULL pointer dereference. The same DS area is also used by PEBS sampling, which makes it quite a bit trickier to have a separate one for intel_bts' purposes. This patch makes intel_bts driver use the same DS allocation and reference counting code as x86_pmu to make sure it is always present when either intel_bts or x86_pmu need it. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Link: http://lkml.kernel.org/r/1434024837-9916-2-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-29perf/x86: Add more Broadwell model numbersAndi Kleen
commit 4b36f1a4139c9284df74c0f5d7655603d67807df upstream. This patch adds additional model numbers for Broadwell to perf. Support for Broadwell with Iris Pro (Intel Core i7-57xxC) and support for Broadwell Server Xeon. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1434055942-28253-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-29x86/boot: Fix overflow warning with 32-bit binutilsBorislav Petkov
commit 04c17341b42699a5859a8afa05e64ba08a4e5235 upstream. When building the kernel with 32-bit binutils built with support only for the i386 target, we get the following warning: arch/x86/kernel/head_32.S:66: Warning: shift count out of range (32 is not between 0 and 31) The problem is that in that case, binutils' internal type representation is 32-bit wide and the shift range overflows. In order to fix this, manipulate the shift expression which creates the 4GiB constant to not overflow the shift count. Suggested-by: Michael Matz <matz@suse.de> Reported-and-tested-by: Enrico Mioso <mrkiko.rs@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-17Merge git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm bugfix from Marcelo Tosatti: "Rrestore APIC migration functionality" * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: fix lapic.timer_mode on restore
2015-06-16KVM: x86: fix lapic.timer_mode on restoreRadim Krčmář
lapic.timer_mode was not properly initialized after migration, which broke few useful things, like login, by making every sleep eternal. Fix this by calling apic_update_lvtt in kvm_apic_post_state_restore. There are other slowpaths that update lvtt, so this patch makes sure something similar doesn't happen again by calling apic_update_lvtt after every modification. Cc: stable@vger.kernel.org Fixes: f30ebc312ca9 ("KVM: x86: optimize some accesses to LVTT and SPIV") Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-06-14Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "A regression fix for a crash, and a Intel HSW uncore PMU driver fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "perf/x86/intel/uncore: Move uncore_box_init() out of driver initialization" perf/x86/intel/uncore: Fix CBOX bit wide and UBOX reg on Haswell-EP
2015-06-10arch/x86/kvm/mmu.c: work around gcc-4.4.4 bugAndrew Morton
Fix this compile issue with gcc-4.4.4: arch/x86/kvm/mmu.c: In function 'kvm_mmu_pte_write': arch/x86/kvm/mmu.c:4256: error: unknown field 'cr0_wp' specified in initializer arch/x86/kvm/mmu.c:4257: error: unknown field 'cr4_pae' specified in initializer arch/x86/kvm/mmu.c:4257: warning: excess elements in union initializer ... gcc-4.4.4 (at least) has issues when using anonymous unions in initializers. Fixes: edc90b7dc4ceef6 ("KVM: MMU: fix SMAP virtualization") Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-09Revert "perf/x86/intel/uncore: Move uncore_box_init() out of driver ↵Ingo Molnar
initialization" This reverts commit c05199e5a57a579fea1e8fa65e2b511ceb524ffc. Vince Weaver reported the following crash while perf fuzzing: [ 79.473121] kernel BUG at mm/vmalloc.c:1335! [ 79.694391] Call Trace: [ 79.696997] <IRQ> [ 79.699090] [<ffffffff811b2130>] get_vm_area_caller+0x40/0x50 [ 79.705505] [<ffffffff81039f4d>] ? snb_uncore_imc_init_box+0x6d/0x90 [ 79.712414] [<ffffffff810635e5>] __ioremap_caller+0x195/0x350 [ 79.718610] [<ffffffff81039f4d>] ? snb_uncore_imc_init_box+0x6d/0x90 [ 79.725462] [<ffffffff81427f6b>] ? debug_object_activate+0x14b/0x1e0 [ 79.732346] [<ffffffff810637b7>] ioremap_nocache+0x17/0x20 [ 79.738283] [<ffffffff81039f4d>] snb_uncore_imc_init_box+0x6d/0x90 [ 79.744945] [<ffffffff81039cf7>] snb_uncore_imc_event_start+0xb7/0x110 [ 79.752020] [<ffffffff81039d97>] snb_uncore_imc_event_add+0x47/0x60 [ 79.758832] [<ffffffff81162cbb>] event_sched_in.isra.85+0xfb/0x330 [ 79.765519] [<ffffffff81162f5f>] group_sched_in+0x6f/0x1e0 [ 79.771481] [<ffffffff8101df1a>] ? native_sched_clock+0x2a/0x90 [ 79.777858] [<ffffffff811637bc>] __perf_event_enable+0x25c/0x2a0 [ 79.784418] [<ffffffff810f3e69>] ? tick_nohz_irq_exit+0x29/0x30 [ 79.790820] [<ffffffff8115ef30>] ? cpu_clock_event_start+0x40/0x40 [ 79.797546] [<ffffffff8115ef80>] remote_function+0x50/0x60 [ 79.803535] [<ffffffff810f8cd1>] flush_smp_call_function_queue+0x81/0x180 [ 79.810840] [<ffffffff810f9763>] generic_smp_call_function_single_interrupt+0x13/0x60 [ 79.819328] [<ffffffff8104b5e8>] smp_trace_call_function_single_interrupt+0x38/0xc0 [ 79.827614] [<ffffffff816de9be>] trace_call_function_single_interrupt+0x6e/0x80 [ 79.835465] <EOI> [ 79.837543] [<ffffffff8156e8b5>] ? cpuidle_enter_state+0x65/0x160 [ 79.844377] [<ffffffff8156e8a1>] ? cpuidle_enter_state+0x51/0x160 [ 79.851015] [<ffffffff8156e9e7>] cpuidle_enter+0x17/0x20 [ 79.856791] [<ffffffff810b6e39>] cpu_startup_entry+0x399/0x440 [ 79.863165] [<ffffffff816c9ddb>] rest_init+0xbb/0xd0 The offending commit is clearly confused as it moves heavy initialization work into IPI context. Revert it. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Yan, Zheng <zheng.z.yan@intel.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07perf/x86/intel/uncore: Fix CBOX bit wide and UBOX reg on Haswell-EPKan Liang
CBOX counters are increased to 48b on HSX. Correct the MSR address for HSWEP_U_MSR_PMON_CTR0 and HSWEP_U_MSR_PMON_CTL0. See specification in: http://www.intel.com/content/www/us/en/processors/xeon/ xeon-e5-v3-uncore-performance-monitoring.html Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1432645835-7918-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-05Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - early_idt_handlers[] fix that fixes the build with bleeding edge tooling - build warning fix on GCC 5.1 - vm86 fix plus self-test to make it harder to break it again" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers x86/asm/entry/32, selftests: Add a selftest for kernel entries from VM86 mode x86/boot: Add CONFIG_PARAVIRT_SPINLOCKS quirk to arch/x86/boot/compressed/misc.h x86/asm/entry/32: Really make user_mode() work correctly for VM86 mode
2015-06-05Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "The biggest chunk of the changes are two regression fixes: a HT workaround fix and an event-group scheduling fix. It's been verified with 5 days of fuzzer testing. Other fixes: - eBPF fix - a BIOS breakage detection fix - PMU driver fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/pt: Fix a refactoring bug perf/x86: Tweak broken BIOS rules during check_hw_exists() perf/x86/intel/pt: Untangle pt_buffer_reset_markers() perf: Disallow sparse AUX allocations for non-SG PMUs in overwrite mode perf/x86: Improve HT workaround GP counter constraint perf/x86: Fix event/group validation perf: Fix race in BPF program unregister