summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2014-12-14MIPS: Loongson: Make platform serial setup always built-in.Aaro Koskinen
commit 26927f76499849e095714452b8a4e09350f6a3b9 upstream. If SERIAL_8250 is compiled as a module, the platform specific setup for Loongson will be a module too, and it will not work very well. At least on Loongson 3 it will trigger a build failure, since loongson_sysconf is not exported to modules. Fix by making the platform specific serial code always built-in. Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Reported-by: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Huacai Chen <chenhc@lemote.com> Cc: Markos Chandras <Markos.Chandras@imgtec.com> Patchwork: https://patchwork.linux-mips.org/patch/8533/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regsAndy Lutomirski
commit 7ddc6a2199f1da405a2fb68c40db8899b1a8cd87 upstream. These functions can be executed on the int3 stack, so kprobes are dangerous. Tracing is probably a bad idea, too. Fixes: b645af2d5905 ("x86_64, traps: Rework bad_iret") Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/50e33d26adca60816f3ba968875801652507d0c4.1416870125.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.2: - Use __kprobes instead of NOKPROBE_SYMBOL() - Don't use __visible] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14x86_64, traps: Rework bad_iretAndy Lutomirski
commit b645af2d5905c4e32399005b867987919cbfc3ae upstream. It's possible for iretq to userspace to fail. This can happen because of a bad CS, SS, or RIP. Historically, we've handled it by fixing up an exception from iretq to land at bad_iret, which pretends that the failed iret frame was really the hardware part of #GP(0) from userspace. To make this work, there's an extra fixup to fudge the gs base into a usable state. This is suboptimal because it loses the original exception. It's also buggy because there's no guarantee that we were on the kernel stack to begin with. For example, if the failing iret happened on return from an NMI, then we'll end up executing general_protection on the NMI stack. This is bad for several reasons, the most immediate of which is that general_protection, as a non-paranoid idtentry, will try to deliver signals and/or schedule from the wrong stack. This patch throws out bad_iret entirely. As a replacement, it augments the existing swapgs fudge into a full-blown iret fixup, mostly written in C. It's should be clearer and more correct. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: - We didn't use the _ASM_EXTABLE macro - Don't use __visible] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in CAndy Lutomirski
commit af726f21ed8af2cdaa4e93098dc211521218ae65 upstream. There's nothing special enough about the espfix64 double fault fixup to justify writing it in assembly. Move it to C. This also fixes a bug: if the double fault came from an IST stack, the old asm code would return to a partially uninitialized stack frame. Fixes: 3891a04aafd668686239349ea58f3314ea2af86b Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: - Keep using the paranoiderrorentry macro to generate the asm code - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14x86_64, traps: Stop using IST for #SSAndy Lutomirski
commit 6f442be2fb22be02cafa606f1769fa1e6f894441 upstream. On a 32-bit kernel, this has no effect, since there are no IST stacks. On a 64-bit kernel, #SS can only happen in user code, on a failed iret to user space, a canonical violation on access via RSP or RBP, or a genuine stack segment violation in 32-bit kernel code. The first two cases don't need IST, and the latter two cases are unlikely fatal bugs, and promoting them to double faults would be fine. This fixes a bug in which the espfix64 code mishandles a stack segment violation. This saves 4k of memory per CPU and a tiny bit of code. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: - No need to define trace_stack_segment - Use the errorentry macro to generate #SS asm code - Adjust context - Checked that this matches Luis's backport for Ubuntu] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14ARM: 8216/1: xscale: correct auxiliary register in suspend/resumeDmitry Eremin-Solenikov
commit ef59a20ba375aeb97b3150a118318884743452a8 upstream. According to the manuals I have, XScale auxiliary register should be reached with opc_2 = 1 instead of crn = 1. cpu_xscale_proc_init correctly uses c1, c0, 1 arguments, but cpu_xscale_do_suspend and cpu_xscale_do_resume use c1, c1, 0. Correct suspend/resume functions to also use c1, c0, 1. The issue was primarily noticed thanks to qemu reporing "unsupported instruction" on the pxa suspend path. Confirmed in PXA210/250 and PXA255 XScale Core manuals and in PXA270 and PXA320 Developers Guides. Harware tested by me on tosa (pxa255). Robert confirmed on pxa270 board. Tested-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Acked-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14MIPS: oprofile: Fix backtrace on 64-bit kernelAaro Koskinen
commit bbaf113a481b6ce32444c125807ad3618643ce57 upstream. Fix incorrect cast that always results in wrong address for the new frame on 64-bit kernels. Signed-off-by: Aaro Koskinen <aaro.koskinen@nsn.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/8110/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14x86, mm: Set NX across entire PMD at bootKees Cook
commit 45e2a9d4701d8c624d4a4bcdd1084eae31e92f58 upstream. When setting up permissions on kernel memory at boot, the end of the PMD that was split from bss remained executable. It should be NX like the rest. This performs a PMD alignment instead of a PAGE alignment to get the correct span of memory. Before: ---[ High Kernel Mapping ]--- ... 0xffffffff8202d000-0xffffffff82200000 1868K RW GLB NX pte 0xffffffff82200000-0xffffffff82c00000 10M RW PSE GLB NX pmd 0xffffffff82c00000-0xffffffff82df5000 2004K RW GLB NX pte 0xffffffff82df5000-0xffffffff82e00000 44K RW GLB x pte 0xffffffff82e00000-0xffffffffc0000000 978M pmd After: ---[ High Kernel Mapping ]--- ... 0xffffffff8202d000-0xffffffff82200000 1868K RW GLB NX pte 0xffffffff82200000-0xffffffff82e00000 12M RW PSE GLB NX pmd 0xffffffff82e00000-0xffffffffc0000000 978M pmd [ tglx: Changed it to roundup(_brk_end, PMD_SIZE) and added a comment. We really should unmap the reminder along with the holes caused by init,initdata etc. but thats a different issue ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20141114194737.GA3091@www.outflux.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bwh: BAckported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14x86, 64bit, mm: Mark data/bss/brk to nxYinghai Lu
commit 72212675d1c96f5db8ec6fb35701879911193158 upstream. HPA said, we should not have RW and +x set at the time. for kernel layout: [ 0.000000] Kernel Layout: [ 0.000000] .text: [0x01000000-0x021434f8] [ 0.000000] .rodata: [0x02200000-0x02a13fff] [ 0.000000] .data: [0x02c00000-0x02dc763f] [ 0.000000] .init: [0x02dc9000-0x0312cfff] [ 0.000000] .bss: [0x0313b000-0x03dd6fff] [ 0.000000] .brk: [0x03dd7000-0x03dfffff] before the patch, we have ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff82200000 18M ro PSE GLB x pmd 0xffffffff82200000-0xffffffff82c00000 10M ro PSE GLB NX pmd 0xffffffff82c00000-0xffffffff82dc9000 1828K RW GLB x pte 0xffffffff82dc9000-0xffffffff82e00000 220K RW GLB NX pte 0xffffffff82e00000-0xffffffff83000000 2M RW PSE GLB NX pmd 0xffffffff83000000-0xffffffff8313a000 1256K RW GLB NX pte 0xffffffff8313a000-0xffffffff83200000 792K RW GLB x pte 0xffffffff83200000-0xffffffff83e00000 12M RW PSE GLB x pmd 0xffffffff83e00000-0xffffffffa0000000 450M pmd after patch,, we get ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff82200000 18M ro PSE GLB x pmd 0xffffffff82200000-0xffffffff82c00000 10M ro PSE GLB NX pmd 0xffffffff82c00000-0xffffffff82e00000 2M RW GLB NX pte 0xffffffff82e00000-0xffffffff83000000 2M RW PSE GLB NX pmd 0xffffffff83000000-0xffffffff83200000 2M RW GLB NX pte 0xffffffff83200000-0xffffffff83e00000 12M RW PSE GLB NX pmd 0xffffffff83e00000-0xffffffffa0000000 450M pmd so data, bss, brk get NX ... Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-33-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14x86: Require exact match for 'noxsave' command line optionDave Hansen
commit 2cd3949f702692cf4c5d05b463f19cd706a92dd3 upstream. We have some very similarly named command-line options: arch/x86/kernel/cpu/common.c:__setup("noxsave", x86_xsave_setup); arch/x86/kernel/cpu/common.c:__setup("noxsaveopt", x86_xsaveopt_setup); arch/x86/kernel/cpu/common.c:__setup("noxsaves", x86_xsaves_setup); __setup() is designed to match options that take arguments, like "foo=bar" where you would have: __setup("foo", x86_foo_func...); The problem is that "noxsave" actually _matches_ "noxsaves" in the same way that "foo" matches "foo=bar". If you boot an old kernel that does not know about "noxsaves" with "noxsaves" on the command line, it will interpret the argument as "noxsave", which is not what you want at all. This makes the "noxsave" handler only return success when it finds an *exact* match. [ tglx: We really need to make __setup() more robust. ] Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20141111220133.FE053984@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14kvm: x86: don't kill guest on unknown exit reasonMichael S. Tsirkin
commit 2bc19dc3754fc066c43799659f0d848631c44cfe upstream. KVM_EXIT_UNKNOWN is a kvm bug, we don't really know whether it was triggered by a priveledged application. Let's not kill the guest: WARN and inject #UD instead. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14MIPS: ftrace: Fix a microMIPS build problemMarkos Chandras
commit aedd153f5bb5b1f1d6d9142014f521ae2ec294cc upstream. Code before the .fixup section needs to have the .insn directive. This has no side effects on MIPS32/64 but it affects the way microMIPS loads the address for the return label. Fixes the following build problem: mips-linux-gnu-ld: arch/mips/built-in.o: .fixup+0x4a0: Unsupported jump between ISA modes; consider recompiling with interlinking enabled. mips-linux-gnu-ld: final link failed: Bad value Makefile:819: recipe for target 'vmlinux' failed The fix is similar to 1658f914ff91c3bf ("MIPS: microMIPS: Disable LL/SC and fix linker bug.") Signed-off-by: Markos Chandras <markos.chandras@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/8117/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14x86, apic: Handle a bad TSC more gracefullyAndy Lutomirski
commit b47dcbdc5161d3d5756f430191e2840d9b855492 upstream. If the TSC is unusable or disabled, then this patch fixes: - Confusion while trying to clear old APIC interrupts. - Division by zero and incorrect programming of the TSC deadline timer. This fixes boot if the CPU has a TSC deadline timer but a missing or broken TSC. The failure to boot can be observed with qemu using -cpu qemu64,-tsc,+tsc-deadline This also happens to me in nested KVM for unknown reasons. With this patch, I can boot cleanly (although without a TSC). Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Bandan Das <bsd@redhat.com> Link: http://lkml.kernel.org/r/e2fa274e498c33988efac0ba8b7e3120f7f92d78.1413393027.git.luto@amacapital.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14x86: Conditionally update time when ack-ing pending irqsShai Fultheim
commit 42fa4250436304d4650fa271f37671f6cee24e08 upstream. On virtual environments, apic_read could take a long time. As a result, under certain conditions the ack pending loop may exit without any queued irqs left, but after more than one second. A warning will be printed needlessly in this case. If the loop is about to exit regardless of max_loops, don't update it. Signed-off-by: Shai Fultheim <shai@scalemp.com> [ rebased and reworded the commit message] Signed-off-by: Ido Yariv <ido@wizery.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1334873552-31346-1-git-send-email-ido@wizery.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14xtensa: re-wire umount syscall to sys_oldumountMax Filippov
commit 2651cc6974d47fc43bef1cd8cd26966e4f5ba306 upstream. Userspace actually passes single parameter (path name) to the umount syscall, so new umount just fails. Fix it by requesting old umount syscall implementation and re-wiring umount to it. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> [bwh: Backported to 3.2: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14m68k: Disable/restore interrupts in hwreg_present()/hwreg_write()Geert Uytterhoeven
commit e4dc601bf99ccd1c95b7e6eef1d3cf3c4b0d4961 upstream. hwreg_present() and hwreg_write() temporarily change the VBR register to another vector table. This table contains a valid bus error handler only, all other entries point to arbitrary addresses. If an interrupt comes in while the temporary table is active, the processor will start executing at such an arbitrary address, and the kernel will crash. While most callers run early, before interrupts are enabled, or explicitly disable interrupts, Finn Thain pointed out that macsonic has one callsite that doesn't, causing intermittent boot crashes. There's another unsafe callsite in hilkbd. Fix this for good by disabling and restoring interrupts inside hwreg_present() and hwreg_write(). Explicitly disabling interrupts can be removed from the callsites later. Reported-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14x86/intel/quark: Switch off CR4.PGE so TLB flush uses CR3 insteadBryan O'Donoghue
commit ee1b5b165c0a2f04d2107e634e51f05d0eb107de upstream. Quark x1000 advertises PGE via the standard CPUID method PGE bits exist in Quark X1000's PTEs. In order to flush an individual PTE it is necessary to reload CR3 irrespective of the PTE.PGE bit. See Quark Core_DevMan_001.pdf section 6.4.11 This bug was fixed in Galileo kernels, unfixed vanilla kernels are expected to crash and burn on this platform. Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/1411514784-14885-1-git-send-email-pure.logic@nexus-software.ie Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14KVM: s390: unintended fallthrough for external callChristian Borntraeger
commit f346026e55f1efd3949a67ddd1dcea7c1b9a615e upstream. We must not fallthrough if the conditions for external call are not met. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14kvm: x86: fix stale mmio cache bugDavid Matlack
commit 56f17dd3fbc44adcdbc3340fe3988ddb833a47a7 upstream. The following events can lead to an incorrect KVM_EXIT_MMIO bubbling up to userspace: (1) Guest accesses gpa X without a memory slot. The gfn is cached in struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets the SPTE write-execute-noread so that future accesses cause EPT_MISCONFIGs. (2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION covering the page just accessed. (3) Guest attempts to read or write to gpa X again. On Intel, this generates an EPT_MISCONFIG. The memory slot generation number that was incremented in (2) would normally take care of this but we fast path mmio faults through quickly_check_mmio_pf(), which only checks the per-vcpu mmio cache. Since we hit the cache, KVM passes a KVM_EXIT_MMIO up to userspace. This patch fixes the issue by using the memslot generation number to validate the mmio cache. Signed-off-by: David Matlack <dmatlack@google.com> [xiaoguangrong: adjust the code to make it simpler for stable-tree fix.] Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Tested-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-11-05KVM: x86: Fix far-jump to non-canonical checkNadav Amit
commit 7e46dddd6f6cd5dbf3c7bd04a7e75d19475ac9f2 upstream. Commit d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far jumps") introduced a bug that caused the fix to be incomplete. Due to incorrect evaluation, far jump to segment with L bit cleared (i.e., 32-bit segment) and RIP with any of the high bits set (i.e, RIP[63:32] != 0) set may not trigger #GP. As we know, this imposes a security problem. In addition, the condition for two warnings was incorrect. Fixes: d1442d85cc30ea75f7d399474ca738e0bc96f715 Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> [Add #ifdef CONFIG_X86_64 to avoid complaints of undefined behavior. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-11-05x86,kvm,vmx: Preserve CR4 across VM entryAndy Lutomirski
commit d974baa398f34393db76be45f7d4d04fbdbb4a0a upstream. CR4 isn't constant; at least the TSD and PCE bits can vary. TBH, treating CR0 and CR3 as constant scares me a bit, too, but it looks like it's correct. This adds a branch and a read from cr4 to each vm entry. Because it is extremely likely that consecutive entries into the same vcpu will have the same host cr4 value, this fixes up the vmcs instead of restoring cr4 after the fact. A subsequent patch will add a kernel-wide cr4 shadow, reducing the overhead in the common case to just two memory reads and a branch. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Cc: Petr Matousek <pmatouse@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: - Adjust context - Add struct vcpu_vmx *vmx parameter to vmx_set_constant_host_state(), done upstream in commit a547c6db4d2f ("KVM: VMX: Enable acknowledge interupt on vmexit")] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-11-05KVM: x86: Handle errors when RIP is set during far jumpsNadav Amit
commit d1442d85cc30ea75f7d399474ca738e0bc96f715 upstream. Far jmp/call/ret may fault while loading a new RIP. Currently KVM does not handle this case, and may result in failed vm-entry once the assignment is done. The tricky part of doing so is that loading the new CS affects the VMCS/VMCB state, so if we fail during loading the new RIP, we are left in unconsistent state. Therefore, this patch saves on 64-bit the old CS descriptor and restores it if loading RIP failed. This fixes CVE-2014-3647. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.2: - Adjust context - __load_segment_descriptor() does not take an in_task_switch parameter] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-11-05KVM: x86: use new CS.RPL as CPL during task switchPaolo Bonzini
commit 2356aaeb2f58f491679dc0c38bc3f6dbe54e7ded upstream. During task switch, all of CS.DPL, CS.RPL, SS.DPL must match (in addition to all the other requirements) and will be the new CPL. So far this worked by carefully setting the CS selector and flag before doing the task switch; setting CS.selector will already change the CPL. However, this will not work once we get the CPL from SS.DPL, because then you will have to set the full segment descriptor cache to change the CPL. ctxt->ops->cpl(ctxt) will then return the old CPL during the task switch, and the check that SS.DPL == CPL will fail. Temporarily assume that the CPL comes from CS.RPL during task switch to a protected-mode task. This is the same approach used in QEMU's emulation code, which (until version 2.0) manually tracks the CPL. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.2: - Adjust context - load_state_from_tss32() does not support VM86 mode] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-11-05KVM: x86: Emulator fixes for eip canonical checks on near branchesNadav Amit
commit 234f3ce485d54017f15cf5e0699cff4100121601 upstream. Before changing rip (during jmp, call, ret, etc.) the target should be asserted to be canonical one, as real CPUs do. During sysret, both target rsp and rip should be canonical. If any of these values is noncanonical, a #GP exception should occur. The exception to this rule are syscall and sysenter instructions in which the assigned rip is checked during the assignment to the relevant MSRs. This patch fixes the emulator to behave as real CPUs do for near branches. Far branches are handled by the next patch. This fixes CVE-2014-3647. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.2: - Adjust context - Use ctxt->regs[] instead of reg_read(), reg_write(), reg_rmw()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-11-05KVM: x86: Fix wrong masking on relative jump/callNadav Amit
commit 05c83ec9b73c8124555b706f6af777b10adf0862 upstream. Relative jumps and calls do the masking according to the operand size, and not according to the address size as the KVM emulator does today. This patch fixes KVM behavior. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-11-05KVM: x86 emulator: Use opcode::execute for CALLTakuya Yoshikawa
commit d4ddafcdf2201326ec9717172767cfad0ede1472 upstream. CALL: E8 Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-11-05kvm: vmx: handle invvpid vm exit gracefullyPetr Matousek
commit a642fc305053cc1c6e47e4f4df327895747ab485 upstream. On systems with invvpid instruction support (corresponding bit in IA32_VMX_EPT_VPID_CAP MSR is set) guest invocation of invvpid causes vm exit, which is currently not handled and results in propagation of unknown exit to userspace. Fix this by installing an invvpid vm exit handler. This is CVE-2014-3646. Signed-off-by: Petr Matousek <pmatouse@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.2: - Adjust filename - Drop inapplicable change to exit reason string array] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-11-05nEPT: Nested INVEPTNadav Har'El
commit bfd0a56b90005f8c8a004baf407ad90045c2b11e upstream. If we let L1 use EPT, we should probably also support the INVEPT instruction. In our current nested EPT implementation, when L1 changes its EPT table for L2 (i.e., EPT12), L0 modifies the shadow EPT table (EPT02), and in the course of this modification already calls INVEPT. But if last level of shadow page is unsync not all L1's changes to EPT12 are intercepted, which means roots need to be synced when L1 calls INVEPT. Global INVEPT should not be different since roots are synced by kvm_mmu_load() each time EPTP02 changes. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.2: - Adjust context, filename - Simplify handle_invept() as recommended by Paolo - nEPT is not supported so we always raise #UD] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-11-05KVM: x86: Improve thread safety in pitAndy Honig
commit 2febc839133280d5a5e8e1179c94ea674489dae2 upstream. There's a race condition in the PIT emulation code in KVM. In __kvm_migrate_pit_timer the pit_timer object is accessed without synchronization. If the race condition occurs at the wrong time this can crash the host kernel. This fixes CVE-2014-3611. Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-11-05KVM: x86: Check non-canonical addresses upon WRMSRNadav Amit
commit 854e8bb1aa06c578c2c9145fa6bfe3680ef63b23 upstream. Upon WRMSR, the CPU should inject #GP if a non-canonical value (address) is written to certain MSRs. The behavior is "almost" identical for AMD and Intel (ignoring MSRs that are not implemented in either architecture since they would anyhow #GP). However, IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if non-canonical address is written on Intel but not on AMD (which ignores the top 32-bits). Accordingly, this patch injects a #GP on the MSRs which behave identically on Intel and AMD. To eliminate the differences between the architecutres, the value which is written to IA32_SYSENTER_ESP and IA32_SYSENTER_EIP is turned to canonical value before writing instead of injecting a #GP. Some references from Intel and AMD manuals: According to Intel SDM description of WRMSR instruction #GP is expected on WRMSR "If the source register contains a non-canonical address and ECX specifies one of the following MSRs: IA32_DS_AREA, IA32_FS_BASE, IA32_GS_BASE, IA32_KERNEL_GS_BASE, IA32_LSTAR, IA32_SYSENTER_EIP, IA32_SYSENTER_ESP." According to AMD manual instruction manual: LSTAR/CSTAR (SYSCALL): "The WRMSR instruction loads the target RIP into the LSTAR and CSTAR registers. If an RIP written by WRMSR is not in canonical form, a general-protection exception (#GP) occurs." IA32_GS_BASE and IA32_FS_BASE (WRFSBASE/WRGSBASE): "The address written to the base field must be in canonical form or a #GP fault will occur." IA32_KERNEL_GS_BASE (SWAPGS): "The address stored in the KernelGSbase MSR must be in canonical form." This patch fixes CVE-2014-3610. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.2: - The various set_msr() functions all separate msr_index and data parameters] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-11-05MIPS: Fix forgotten preempt_enable() when CPU has inclusive pcachesYoichi Yuasa
commit 5596b0b245fb9d2cefb5023b11061050351c1398 upstream. [ 1.904000] BUG: scheduling while atomic: swapper/1/0x00000002 [ 1.908000] Modules linked in: [ 1.916000] CPU: 0 PID: 1 Comm: swapper Not tainted 3.12.0-rc2-lemote-los.git-5318619-dirty #1 [ 1.920000] Stack : 0000000031aac000 ffffffff810d0000 0000000000000052 ffffffff802730a4 0000000000000000 0000000000000001 ffffffff810cdf90 ffffffff810d0000 ffffffff8068b968 ffffffff806f5537 ffffffff810cdf90 980000009f0782e8 0000000000000001 ffffffff80720000 ffffffff806b0000 980000009f078000 980000009f290000 ffffffff805f312c 980000009f05b5d8 ffffffff80233518 980000009f05b5e8 ffffffff80274b7c 980000009f078000 ffffffff8068b968 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 980000009f05b520 0000000000000000 ffffffff805f2f6c 0000000000000000 ffffffff80700000 ffffffff80700000 ffffffff806fc758 ffffffff80700000 ffffffff8020be98 ffffffff806fceb0 ffffffff805f2f6c ... [ 2.028000] Call Trace: [ 2.032000] [<ffffffff8020be98>] show_stack+0x80/0x98 [ 2.036000] [<ffffffff805f2f6c>] __schedule_bug+0x44/0x6c [ 2.040000] [<ffffffff805fac58>] __schedule+0x518/0x5b0 [ 2.044000] [<ffffffff805f8a58>] schedule_timeout+0x128/0x1f0 [ 2.048000] [<ffffffff80240314>] msleep+0x3c/0x60 [ 2.052000] [<ffffffff80495400>] do_probe+0x238/0x3a8 [ 2.056000] [<ffffffff804958b0>] ide_probe_port+0x340/0x7e8 [ 2.060000] [<ffffffff80496028>] ide_host_register+0x2d0/0x7a8 [ 2.064000] [<ffffffff8049c65c>] ide_pci_init_two+0x4e4/0x790 [ 2.068000] [<ffffffff8049f9b8>] amd74xx_probe+0x148/0x2c8 [ 2.072000] [<ffffffff803f571c>] pci_device_probe+0xc4/0x130 [ 2.076000] [<ffffffff80478f60>] driver_probe_device+0x98/0x270 [ 2.080000] [<ffffffff80479298>] __driver_attach+0xe0/0xe8 [ 2.084000] [<ffffffff80476ab0>] bus_for_each_dev+0x78/0xe0 [ 2.088000] [<ffffffff80478468>] bus_add_driver+0x230/0x310 [ 2.092000] [<ffffffff80479b44>] driver_register+0x84/0x158 [ 2.096000] [<ffffffff80200504>] do_one_initcall+0x104/0x160 Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org> Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi> Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: linux-mips@linux-mips.org Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org> Patchwork: https://patchwork.linux-mips.org/patch/5941/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-11-05MIPS: mcount: Adjust stack pointer for static trace in MIPS32Markos Chandras
commit 8a574cfa2652545eb95595d38ac2a0bb501af0ae upstream. Every mcount() call in the MIPS 32-bit kernel is done as follows: [...] move at, ra jal _mcount addiu sp, sp, -8 [...] but upon returning from the mcount() function, the stack pointer is not adjusted properly. This is explained in details in 58b69401c797 (MIPS: Function tracer: Fix broken function tracing). Commit ad8c396936e3 ("MIPS: Unbreak function tracer for 64-bit kernel.) fixed the stack manipulation for 64-bit but it didn't fix it completely for MIPS32. Signed-off-by: Markos Chandras <markos.chandras@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/7792/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-11-05ARM: 8165/1: alignment: don't break misaligned NEON load/storeRobin Murphy
commit 5ca918e5e3f9df4634077c06585c42bc6a8d699a upstream. The alignment fixup incorrectly decodes faulting ARM VLDn/VSTn instructions (where the optional alignment hint is given but incorrect) as LDR/STR, leading to register corruption. Detect these and correctly treat them as unhandled, so that userspace gets the fault it expects. Reported-by: Simon Hosie <simon.hosie@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-11-05sched: Fix unreleased llc_shared_mask bit during CPU hotplugWanpeng Li
commit 03bd4e1f7265548832a76e7919a81f3137c44fd1 upstream. The following bug can be triggered by hot adding and removing a large number of xen domain0's vcpus repeatedly: BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [..] find_busiest_group PGD 5a9d5067 PUD 13067 PMD 0 Oops: 0000 [#3] SMP [...] Call Trace: load_balance ? _raw_spin_unlock_irqrestore idle_balance __schedule schedule schedule_timeout ? lock_timer_base schedule_timeout_uninterruptible msleep lock_device_hotplug_sysfs online_store dev_attr_store sysfs_write_file vfs_write SyS_write system_call_fastpath Last level cache shared mask is built during CPU up and the build_sched_domain() routine takes advantage of it to setup the sched domain CPU topology. However, llc_shared_mask is not released during CPU disable, which leads to an invalid sched domainCPU topology. This patch fix it by releasing the llc_shared_mask correctly during CPU disable. Yasuaki also reported that this can happen on real hardware: https://lkml.org/lkml/2014/7/22/1018 His case is here: == Here is an example on my system. My system has 4 sockets and each socket has 15 cores and HT is enabled. In this case, each core of sockes is numbered as follows: | CPU# Socket#0 | 0-14 , 60-74 Socket#1 | 15-29, 75-89 Socket#2 | 30-44, 90-104 Socket#3 | 45-59, 105-119 Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000. It means that last level cache of Socket#2 is shared with CPU#30-44 and 90-104. When hot-removing socket#2 and #3, each core of sockets is numbered as follows: | CPU# Socket#0 | 0-14 , 60-74 Socket#1 | 15-29, 75-89 But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30 remains having 0x3fff80000001fffc0000000. After that, when hot-adding socket#2 and #3, each core of sockets is numbered as follows: | CPU# Socket#0 | 0-14 , 60-74 Socket#1 | 15-29, 75-89 Socket#2 | 30-59 Socket#3 | 90-119 Then llc_shared_mask of CPU#30 becomes 0x3fff8000fffffffc0000000. It means that last level cache of Socket#2 is shared with CPU#30-59 and 90-104. So the mask has the wrong value. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Tested-by: Linn Crosetto <linn@hp.com> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1411547885-48165-1-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-11-05parisc: Only use -mfast-indirect-calls option for 32-bit kernel buildsJohn David Anglin
commit d26a7730b5874a5fa6779c62f4ad7c5065a94723 upstream. In spite of what the GCC manual says, the -mfast-indirect-calls has never been supported in the 64-bit parisc compiler. Indirect calls have always been done using function descriptors irrespective of the -mfast-indirect-calls option. Recently, it was noticed that a function descriptor was always requested when the -mfast-indirect-calls option was specified. This caused problems when the option was used in application code and doesn't make any sense because the whole point of the option is to avoid using a function descriptor for indirect calls. Fixing this broke 64-bit kernel builds. I will fix GCC but for now we need the attached change. This results in the same kernel code as before. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-11-05MIPS: ZBOOT: add missing <linux/string.h> includeAurelien Jarno
commit 29593fd5a8149462ed6fad0d522234facdaee6c8 upstream. Commit dc4d7b37 (MIPS: ZBOOT: gather string functions into string.c) moved the string related functions into a separate file, which might cause the following build error, depending on the configuration: | CC arch/mips/boot/compressed/decompress.o | In file included from linux/arch/mips/boot/compressed/../../../../lib/decompress_unxz.c:234:0, | from linux/arch/mips/boot/compressed/decompress.c:67: | linux/arch/mips/boot/compressed/../../../../lib/xz/xz_dec_stream.c: In function 'fill_temp': | linux/arch/mips/boot/compressed/../../../../lib/xz/xz_dec_stream.c:162:2: error: implicit declaration of function 'memcpy' [-Werror=implicit-function-declaration] | cc1: some warnings being treated as errors | linux/scripts/Makefile.build:308: recipe for target 'arch/mips/boot/compressed/decompress.o' failed | make[6]: *** [arch/mips/boot/compressed/decompress.o] Error 1 | linux/arch/mips/Makefile:308: recipe for target 'vmlinuz' failed It does not fail with the standard configuration, as when CONFIG_DYNAMIC_DEBUG is not enabled <linux/string.h> gets included in include/linux/dynamic_debug.h. There might be other ways for it to get indirectly included. We can't add the include directly in xz_dec_stream.c as some architectures might want to use a different version for the boot/ directory (see for example arch/x86/boot/string.h). Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/7420/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-11-05KVM: s390: Fix user triggerable bug in dead codeChristian Borntraeger
commit 614a80e474b227cace52fd6e3c790554db8a396e upstream. In the early days, we had some special handling for the KVM_EXIT_S390_SIEIC exit, but this was gone in 2009 with commit d7b0b5eb3000 (KVM: s390: Make psw available on all exits, not just a subset). Now this switch statement is just a sanity check for userspace not messing with the kvm_run structure. Unfortunately, this allows userspace to trigger a kernel BUG. Let's just remove this switch statement. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-09-13microblaze: Fix makefile to work with latest toolchainMichal Simek
commit 00708d421a22a0f82de2dbb91ca6213b3dcc5267 upstream. When building with latest binutils, vmlinux includes some sections which need to be stripped out when building the binary image. Signed-off-by: Michal Simek <monstr@monstr.eu> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-09-13x86/espfix/xen: Fix allocation of pages for paravirt page tablesBoris Ostrovsky
commit 8762e5092828c4dc0f49da5a47a644c670df77f3 upstream. init_espfix_ap() is currently off by one level when informing hypervisor that allocated pages will be used for ministacks' page tables. The most immediate effect of this on a PV guest is that if 'stack_page = __get_free_page()' returns a non-zeroed-out page the hypervisor will refuse to use it for a page table (which it shouldn't be anyway). This will result in warnings by both Xen and Linux. More importantly, a subsequent write to that page (again, by a PV guest) is likely to result in fatal page fault. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1404926298-5565-1-git-send-email-boris.ostrovsky@oracle.com Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-09-13x86_64/entry/xen: Do not invoke espfix64 on XenAndy Lutomirski
commit 7209a75d2009dbf7745e2fd354abf25c3deb3ca3 upstream. This moves the espfix64 logic into native_iret. To make this work, it gets rid of the native patch for INTERRUPT_RETURN: INTERRUPT_RETURN on native kernels is now 'jmp native_iret'. This changes the 16-bit SS behavior on Xen from OOPSing to leaking some bits of the Xen hypervisor's RSP (I think). [ hpa: this is a nonzero cost on native, but probably not enough to measure. Xen needs to fix this in their own code, probably doing something equivalent to espfix64. ] Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/7b8f1d8ef6597cb16ae004a43c56980a7de3cf94.1406129132.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-09-13x86, espfix: Make it possible to disable 16-bit supportH. Peter Anvin
commit 34273f41d57ee8d854dcd2a1d754cbb546cb548f upstream. Embedded systems, which may be very memory-size-sensitive, are extremely unlikely to ever encounter any 16-bit software, so make it a CONFIG_EXPERT option to turn off support for any 16-bit software whatsoever. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-09-13x86, espfix: Make espfix64 a Kconfig option, fix UMLH. Peter Anvin
commit 197725de65477bc8509b41388157c1a2283542bb upstream. Make espfix64 a hidden Kconfig option. This fixes the x86-64 UML build which had broken due to the non-existence of init_espfix_bsp() in UML: since UML uses its own Kconfig, this option does not appear in the UML build. This also makes it possible to make support for 16-bit segments a configuration option, for the people who want to minimize the size of the kernel. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Richard Weinberger <richard@nod.at> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-09-13x86, espfix: Fix broken header guardH. Peter Anvin
commit 20b68535cd27183ebd3651ff313afb2b97dac941 upstream. Header guard is #ifndef, not #ifdef... Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-09-13x86, espfix: Move espfix definitions into a separate header fileH. Peter Anvin
commit e1fe9ed8d2a4937510d0d60e20705035c2609aea upstream. Sparse warns that the percpu variables aren't declared before they are defined. Rather than hacking around it, move espfix definitions into a proper header file. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-09-13x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stackH. Peter Anvin
commit 3891a04aafd668686239349ea58f3314ea2af86b upstream. The IRET instruction, when returning to a 16-bit segment, only restores the bottom 16 bits of the user space stack pointer. This causes some 16-bit software to break, but it also leaks kernel state to user space. We have a software workaround for that ("espfix") for the 32-bit kernel, but it relies on a nonzero stack segment base which is not available in 64-bit mode. In checkin: b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels we "solved" this by forbidding 16-bit segments on 64-bit kernels, with the logic that 16-bit support is crippled on 64-bit kernels anyway (no V86 support), but it turns out that people are doing stuff like running old Win16 binaries under Wine and expect it to work. This works around this by creating percpu "ministacks", each of which is mapped 2^16 times 64K apart. When we detect that the return SS is on the LDT, we copy the IRET frame to the ministack and use the relevant alias to return to userspace. The ministacks are mapped readonly, so if IRET faults we promote #GP to #DF which is an IST vector and thus has its own stack; we then do the fixup in the #DF handler. (Making #GP an IST exception would make the msr_safe functions unsafe in NMI/MC context, and quite possibly have other effects.) Special thanks to: - Andy Lutomirski, for the suggestion of using very small stack slots and copy (as opposed to map) the IRET frame there, and for the suggestion to mark them readonly and let the fault promote to #DF. - Konrad Wilk for paravirt fixup and testing. - Borislav Petkov for testing help and useful comments. Reported-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrew Lutomriski <amluto@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dirk Hohndel <dirk@hohndel.org> Cc: Arjan van de Ven <arjan.van.de.ven@intel.com> Cc: comex <comexk@gmail.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-09-13Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option"H. Peter Anvin
commit 7ed6fb9b5a5510e4ef78ab27419184741169978a upstream. This reverts commit fa81511bb0bbb2b1aace3695ce869da9762624ff in preparation of merging in the proper fix (espfix64). Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-09-13sparc: use asm-generic version of types.hSam Ravnborg
commit cbf1ef6b3345d2cc7e62407eec6a6f72a8b1346f upstream. In sparc headers we use the following pattern: #if defined(__sparc__) && defined(__arch64__) sparc64 specific stuff #else sparc32 specific stuff #endif In types.h this pattern was not followed and here we only checked for __sparc__ for no good reason. It was a left-over from long time ago. I checked other architectures - and most of them do not have any such checks. And all the recently merged versions uses the asm-generic version. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Guenter Roeck <linux@roeck-us.net> [bwh: Guenter backported this to 3.2: - Adjusted filenames, context - There's no duplicate export of types.h to delete] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-09-13arch/sparc/math-emu/math_32.c: drop stray break operatorAndrey Utkin
[ Upstream commit 093758e3daede29cb4ce6aedb111becf9d4bfc57 ] This commit is a guesswork, but it seems to make sense to drop this break, as otherwise the following line is never executed and becomes dead code. And that following line actually saves the result of local calculation by the pointer given in function argument. So the proposed change makes sense if this code in the whole makes sense (but I am unable to analyze it in the whole). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81641 Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-09-13sparc64: ldc_connect() should not return EINVAL when handshake is in progress.Sowmini Varadhan
[ Upstream commit 4ec1b01029b4facb651b8ef70bc20a4be4cebc63 ] The LDC handshake could have been asynchronously triggered after ldc_bind() enables the ldc_rx() receive interrupt-handler (and thus intercepts incoming control packets) and before vio_port_up() calls ldc_connect(). If that is the case, ldc_connect() should return 0 and let the state-machine progress. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Karl Volz <karl.volz@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-09-13sparc64: Guard against flushing openfirmware mappings.David S. Miller
[ Upstream commit 4ca9a23765da3260058db3431faf5b4efd8cf926 ] Based almost entirely upon a patch by Christopher Alexander Tobias Schulze. In commit db64fe02258f1507e13fe5212a989922323685ce ("mm: rewrite vmap layer") lazy VMAP tlb flushing was added to the vmalloc layer. This causes problems on sparc64. Sparc64 has two VMAP mapped regions and they are not contiguous with eachother. First we have the malloc mapping area, then another unrelated region, then the vmalloc region. This "another unrelated region" is where the firmware is mapped. If the lazy TLB flushing logic in the vmalloc code triggers after we've had both a module unload and a vfree or similar, it will pass an address range that goes from somewhere inside the malloc region to somewhere inside the vmalloc region, and thus covering the openfirmware area entirely. The sparc64 kernel learns about openfirmware's dynamic mappings in this region early in the boot, and then services TLB misses in this area. But openfirmware has some locked TLB entries which are not mentioned in those dynamic mappings and we should thus not disturb them. These huge lazy TLB flush ranges causes those openfirmware locked TLB entries to be removed, resulting in all kinds of problems including hard hangs and crashes during reboot/reset. Besides causing problems like this, such huge TLB flush ranges are also incredibly inefficient. A plea has been made with the author of the VMAP lazy TLB flushing code, but for now we'll put a safety guard into our flush_tlb_kernel_range() implementation. Since the implementation has become non-trivial, stop defining it as a macro and instead make it a function in a C source file. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>