summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2015-02-11x86,kvm,vmx: Preserve CR4 across VM entryAndy Lutomirski
commit d974baa398f34393db76be45f7d4d04fbdbb4a0a upstream. CR4 isn't constant; at least the TSD and PCE bits can vary. TBH, treating CR0 and CR3 as constant scares me a bit, too, but it looks like it's correct. This adds a branch and a read from cr4 to each vm entry. Because it is extremely likely that consecutive entries into the same vcpu will have the same host cr4 value, this fixes up the vmcs instead of restoring cr4 after the fact. A subsequent patch will add a kernel-wide cr4 shadow, reducing the overhead in the common case to just two memory reads and a branch. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [wangkai: Backport to 3.10: adjust context] Signed-off-by: Wang Kai <morgan.wang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-11kvm: vmx: handle invvpid vm exit gracefullyPetr Matousek
commit a642fc305053cc1c6e47e4f4df327895747ab485 upstream. On systems with invvpid instruction support (corresponding bit in IA32_VMX_EPT_VPID_CAP MSR is set) guest invocation of invvpid causes vm exit, which is currently not handled and results in propagation of unknown exit to userspace. Fix this by installing an invvpid vm exit handler. This is CVE-2014-3646. Cc: stable@vger.kernel.org Signed-off-by: Petr Matousek <pmatouse@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [wangkai: Backport to 3.10: adjust context] Signed-off-by: Wang Kai <morgan.wang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-11arm64: Fix up /proc/cpuinfoMark Rutland
commit 44b82b7700d05a52cd983799d3ecde1a976b3bed upstream. Commit d7a49086f263164a (arm64: cpuinfo: print info for all CPUs) attempted to clean up /proc/cpuinfo, but due to concerns regarding further changes was reverted in commit 5e39977edf6500fd (Revert "arm64: cpuinfo: print info for all CPUs"). There are two major issues with the arm64 /proc/cpuinfo format currently: * The "Features" line describes (only) the 64-bit hwcaps, which is problematic for some 32-bit applications which attempt to parse it. As the same names are used for analogous ISA features (e.g. aes) despite these generally being architecturally unrelated, it is not possible to simply append the 64-bit and 32-bit hwcaps in a manner that might not be misleading to some applications. Various potential solutions have appeared in vendor kernels. Typically the format of the Features line varies depending on whether the task is 32-bit. * Information is only printed regarding a single CPU. This does not match the ARM format, and does not provide sufficient information in big.LITTLE systems where CPUs are heterogeneous. The CPU information printed is queried from the current CPU's registers, which is racy w.r.t. cross-cpu migration. This patch attempts to solve these issues. The following changes are made: * When a task with a LINUX32 personality attempts to read /proc/cpuinfo, the "Features" line contains the decoded 32-bit hwcaps, as with the arm port. Otherwise, the decoded 64-bit hwcaps are shown. This aligns with the behaviour of COMPAT_UTS_MACHINE and COMPAT_ELF_PLATFORM. In the absense of compat support, the Features line is empty. The set of hwcaps injected into a task's auxval are unaffected. * Properties are printed per-cpu, as with the ARM port. The per-cpu information is queried from pre-recorded cpu information (as used by the sanity checks). * As with the previous attempt at fixing up /proc/cpuinfo, the hardware field is removed. The only users so far are 32-bit applications tied to particular boards, so no portable applications should be affected, and this should prevent future tying to particular boards. The following differences remain: * No model_name is printed, as this cannot be queried from the hardware and cannot be provided in a stable fashion. Use of the CPU {implementor,variant,part,revision} fields is sufficient to identify a CPU and is portable across arm and arm64. * The following system-wide properties are not provided, as they are not possible to provide generally. Programs relying on these are already tied to particular (32-bit only) boards: - Hardware - Revision - Serial No software has yet been identified for which these remaining differences are problematic. Cc: Greg Hackmann <ghackmann@google.com> Cc: Ian Campbell <ijc@hellion.org.uk> Cc: Serban Constantinescu <serban.constantinescu@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: cross-distro@lists.linaro.org Cc: linux-api@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Acked-by: Catalin Marinas <catalin.marinas@arm.com> [Mark: backport to v3.10.x] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-11MIPS: Fix kernel lockup or crash after CPU offline/onlineHemmo Nieminen
commit c7754e75100ed5e3068ac5085747f2bfc386c8d6 upstream. As printk() invocation can cause e.g. a TLB miss, printk() cannot be called before the exception handlers have been properly initialized. This can happen e.g. when netconsole has been loaded as a kernel module and the TLB table has been cleared when a CPU was offline. Call cpu_report() in start_secondary() only after the exception handlers have been initialized to fix this. Without the patch the kernel will randomly either lockup or crash after a CPU is onlined and the console driver is a module. Signed-off-by: Hemmo Nieminen <hemmo.nieminen@iki.fi> Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/8953/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-11MIPS: IRQ: Fix disable_irq on CPU IRQsFelix Fietkau
commit a3e6c1eff54878506b2dddcc202df9cc8180facb upstream. If the irq_chip does not define .irq_disable, any call to disable_irq will defer disabling the IRQ until it fires while marked as disabled. This assumes that the handler function checks for this condition, which handle_percpu_irq does not. In this case, calling disable_irq leads to an IRQ storm, if the interrupt fires while disabled. This optimization is only useful when disabling the IRQ is slow, which is not true for the MIPS CPU IRQ. Disable this optimization by implementing .irq_disable and .irq_enable Signed-off-by: Felix Fietkau <nbd@openwrt.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/8949/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-11PCI: Add NEC variants to Stratus ftServer PCIe DMI checkCharlotte Richardson
commit 51ac3d2f0c505ca36ffc9715ffd518d756589ef8 upstream. NEC OEMs the same platforms as Stratus does, which have multiple devices on some PCIe buses under downstream ports. Link: https://bugzilla.kernel.org/show_bug.cgi?id=51331 Fixes: 1278998f8ff6 ("PCI: Work around Stratus ftServer broken PCIe hierarchy (fix DMI check)") Signed-off-by: Charlotte Richardson <charlotte.richardson@stratus.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Myron Stowe <myron.stowe@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAESteven Capper
commit ded9477984690d026e46dd75e8157392cea3f13f upstream. For LPAE, we have the following means for encoding writable or dirty ptes: L_PTE_DIRTY L_PTE_RDONLY !pte_dirty && !pte_write 0 1 !pte_dirty && pte_write 0 1 pte_dirty && !pte_write 1 1 pte_dirty && pte_write 1 0 So we can't distinguish between writeable clean ptes and read only ptes. This can cause problems with ptes being incorrectly flagged as read only when they are writeable but not dirty. This patch renumbers L_PTE_RDONLY from AP[2] to a software bit #58, and adds additional logic to set AP[2] whenever the pte is read only or not dirty. That way we can distinguish between clean writeable ptes and read only ptes. HugeTLB pages will use this new logic automatically. We need to add some logic to Transparent HugePages to ensure that they correctly interpret the revised pgprot permissions (L_PTE_RDONLY has moved and no longer matches PMD_SECT_AP2). In the process of revising THP, the names of the PMD software bits have been prefixed with L_ to make them easier to distinguish from their hardware bit counterparts. Signed-off-by: Steve Capper <steve.capper@linaro.org> Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> [hpy: Backported to 3.10 - adjust the context - ignore change related to pmd, because 3.10 does not support HugePage ] Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05ARM: 8108/1: mm: Introduce {pte,pmd}_isset and {pte,pmd}_isclearSteven Capper
commit f2950706871c4b6e8c0f0d7c3f62d35930b8de63 upstream. Long descriptors on ARM are 64 bits, and some pte functions such as pte_dirty return a bitwise-and of a flag with the pte value. If the flag to be tested resides in the upper 32 bits of the pte, then we run into the danger of the result being dropped if downcast. For example: gather_stats(page, md, pte_dirty(*pte), 1); where pte_dirty(*pte) is downcast to an int. This patch introduces a new macro pte_isset which performs the bitwise and, then performs a double logical invert (where needed) to ensure predictable downcasting. The logical inverse pte_isclear is also introduced. Equivalent pmd functions for Transparent HugePages have also been added. Signed-off-by: Steve Capper <steve.capper@linaro.org> Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> [hpy: Backported to 3.10: - adjust the context - ignore change to pmd, because 3.10 does not support HugePage.] Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05ARM: DMA: ensure that old section mappings are flushed from the TLBRussell King
commit 6b076991dca9817e75c37e2f0db6d52611ea42fa upstream. When setting up the CMA region, we must ensure that the old section mappings are flushed from the TLB before replacing them with page tables, otherwise we can suffer from mismatched aliases if the CPU speculatively prefetches from these mappings at an inopportune time. A mismatched alias can occur when the TLB contains a section mapping, but a subsequent prefetch causes it to load a page table mapping, resulting in the possibility of the TLB containing two matching mappings for the same virtual address region. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05ARM: 7931/1: Correct virt_addr_validLaura Abbott
commit efea3403d4b7c6d1dd5d5ac3234c161e8b314d66 upstream. The definition of virt_addr_valid is that virt_addr_valid should return true if and only if virt_to_page returns a valid pointer. The current definition of virt_addr_valid only checks against the virtual address range. There's no guarantee that just because a virtual address falls bewteen PAGE_OFFSET and high_memory the associated physical memory has a valid backing struct page. Follow the example of other architectures and convert to pfn_valid to verify that the virtual address is actually valid. The check for an address between PAGE_OFFSET and high_memory is still necessary as vmalloc/highmem addresses are not valid with virt_to_page. Cc: Will Deacon <will.deacon@arm.com> Cc: Nicolas Pitre <nico@linaro.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05ARM: fix asm/memory.h build errorRussell King
commit b713aa0b15015a65ad5421543b80df86de043d62 upstream. Jason Gunthorpe reports a build failure when ARM_PATCH_PHYS_VIRT is not defined: In file included from arch/arm/include/asm/page.h:163:0, from include/linux/mm_types.h:16, from include/linux/sched.h:24, from arch/arm/kernel/asm-offsets.c:13: arch/arm/include/asm/memory.h: In function '__virt_to_phys': arch/arm/include/asm/memory.h:244:40: error: 'PHYS_OFFSET' undeclared (first use in this function) arch/arm/include/asm/memory.h:244:40: note: each undeclared identifier is reported only once for each function it appears in arch/arm/include/asm/memory.h: In function '__phys_to_virt': arch/arm/include/asm/memory.h:249:13: error: 'PHYS_OFFSET' undeclared (first use in this function) Fixes: ca5a45c06cd4 ("ARM: mm: use phys_addr_t appropriately in p2v and v2p conversions") Tested-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> [hpy: Backported to 3.10: - adjust the context - MPU is not supported by 3.10, so ignore fix to MPU compared with the original patch.] Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05ARM: 7867/1: include: asm: use 'int' instead of 'unsigned long' for 'oldval' ↵Chen Gang
in atomic_cmpxchg(). commit 4dcc1cf7316a26e112f5c9fcca531ff98ef44700 upstream. For atomic_cmpxchg(), the type of 'oldval' need be 'int' to match the type of "*ptr" (used by 'ldrex' instruction) and 'old' (used by 'teq' instruction). Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05ARM: 7866/1: include: asm: use 'long long' instead of 'u64' within atomic.hChen Gang
commit 237f12337cfa2175474e4dd015bc07a25eb9080d upstream. atomic* value is signed value, and atomic* functions need also process signed value (parameter value, and return value), so 32-bit arm need use 'long long' instead of 'u64'. After replacement, it will also fix a bug for atomic64_add_negative(): "u64 is never less than 0". The modifications are: in vim, use "1,% s/\<u64\>/long long/g" command. remove '__aligned(8)' which is useless for 64-bit. be sure of 80 column limitation after replacement. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05ARM: lpae: fix definition of PTE_HWTABLE_PTRSWill Deacon
commit e38a517578d6c0f764b0d0f6e26dcdf9f70c69d7 upstream. For 2-level page tables, PTE_HWTABLE_PTRS describes the offset between Linux PTEs and hardware PTEs. On LPAE, there is no distinction (since we have 64-bit descriptors with plenty of space) so PTE_HWTABLE_PTRS should be 0. Unfortunately, it is wrongly defined as PTRS_PER_PTE, meaning that current pte table flushing is off by a page. Luckily, all current LPAE implementations are SMP, so the hardware walker can snoop L1. This patch fixes the broken definition. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05ARM: fix type of PHYS_PFN_OFFSET to unsigned longCyril Chemparathy
commit 5b20c5b2f014ecc0a6310988af69cd7ede9e7c67 upstream. On LPAE machines, PHYS_OFFSET evaluates to a phys_addr_t and this type is inherited by the PHYS_PFN_OFFSET definition as well. Consequently, the kernel build emits warnings of the form: init/main.c: In function 'start_kernel': init/main.c:588:7: warning: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'phys_addr_t' [-Wformat] This patch fixes this warning by pinning down the PFN type to unsigned long. Signed-off-by: Cyril Chemparathy <cyril@ti.com> Acked-by: Nicolas Pitre <nico@linaro.org> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Tested-by: Subash Patel <subash.rp@samsung.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05ARM: LPAE: use phys_addr_t in alloc_init_pud()Vitaly Andrianov
commit 20d6956d8cd2452cec0889ff040f18afc03c2e6b upstream. This patch fixes the alloc_init_pud() function to use phys_addr_t instead of unsigned long when passing in the phys argument. This is an extension to commit 97092e0c56830457af0639f6bd904537a150ea4a (ARM: pgtable: use phys_addr_t for physical addresses), which applied similar changes elsewhere in the ARM memory management code. Signed-off-by: Vitaly Andrianov <vitalya@ti.com> Signed-off-by: Cyril Chemparathy <cyril@ti.com> Acked-by: Nicolas Pitre <nico@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Tested-by: Subash Patel <subash.rp@samsung.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05ARM: LPAE: use signed arithmetic for mask definitionsCyril Chemparathy
commit 926edcc747e2efb3c9add7ed4dbc4e7a3a959d02 upstream. This patch applies to PAGE_MASK, PMD_MASK, and PGDIR_MASK, where forcing unsigned long math truncates the mask at the 32-bits. This clearly does bad things on PAE systems. This patch fixes this problem by defining these masks as signed quantities. We then rely on sign extension to do the right thing. Signed-off-by: Cyril Chemparathy <cyril@ti.com> Signed-off-by: Vitaly Andrianov <vitalya@ti.com> Reviewed-by: Nicolas Pitre <nico@linaro.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Tested-by: Subash Patel <subash.rp@samsung.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05ARM: mm: correct pte_same behaviour for LPAE.Steve Capper
commit dde1b65110353517816bcbc58539463396202244 upstream. For 3 levels of paging the PTE_EXT_NG bit will be set for user address ptes that are written to a page table but not for ptes created with mk_pte. This can cause some comparison tests made by pte_same to fail spuriously and lead to other problems. To correct this behaviour, we mask off PTE_EXT_NG for any pte that is present before running the comparison. Signed-off-by: Steve Capper <steve.capper@linaro.org> Reviewed-by: Will Deacon <will.deacon@arm.com> Cc: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05ARM: 7829/1: Add ".text.unlikely" and ".text.hot" to arm unwind tablesDouglas Anderson
commit 849b882b52df0f276d9ffded01d85654aa0da422 upstream. It appears that gcc may put some code in ".text.unlikely" or ".text.hot" sections. Right now those aren't accounted for in unwind tables. Add them. I found some docs about this at: http://gcc.gnu.org/onlinedocs/gcc-4.6.2/gcc.pdf Without this, if you have slub_debug turned on, you can get messages that look like this: unwind: Index not found 7f008c50 Signed-off-by: Doug Anderson <dianders@chromium.org> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> [wangkai: backport to 3.10 - adjust context ] Signed-off-by: Wang Kai <morgan.wang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05powerpc/xmon: Fix another endiannes issue in RTAS call from xmonLaurent Dufour
commit e6eb2eba494d6f99e69ca3c3748cd37a2544ab38 upstream. The commit 3b8a3c010969 ("powerpc/pseries: Fix endiannes issue in RTAS call from xmon") was fixing an endianness issue in the call made from xmon to RTAS. However, as Michael Ellerman noticed, this fix was not complete, the token value was not byte swapped. This lead to call an unexpected and most of the time unexisting RTAS function, which is silently ignored by RTAS. This fix addresses this hole. Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-29crypto: add missing crypto module aliasesMathias Krause
commit 3e14dcf7cb80b34a1f38b55bc96f02d23fdaaaaf upstream. Commit 5d26a105b5a7 ("crypto: prefix module autoloading with "crypto-"") changed the automatic module loading when requesting crypto algorithms to prefix all module requests with "crypto-". This requires all crypto modules to have a crypto specific module alias even if their file name would otherwise match the requested crypto algorithm. Even though commit 5d26a105b5a7 added those aliases for a vast amount of modules, it was missing a few. Add the required MODULE_ALIAS_CRYPTO annotations to those files to make them get loaded automatically, again. This fixes, e.g., requesting 'ecb(blowfish-generic)', which used to work with kernels v3.18 and below. Also change MODULE_ALIAS() lines to MODULE_ALIAS_CRYPTO(). The former won't work for crypto modules any more. Fixes: 5d26a105b5a7 ("crypto: prefix module autoloading with "crypto-"") Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-29crypto: include crypto- module prefix in templateKees Cook
commit 4943ba16bbc2db05115707b3ff7b4874e9e3c560 upstream. This adds the module loading prefix "crypto-" to the template lookup as well. For example, attempting to load 'vfat(blowfish)' via AF_ALG now correctly includes the "crypto-" prefix at every level, correctly rejecting "vfat": net-pf-38 algif-hash crypto-vfat(blowfish) crypto-vfat(blowfish)-all crypto-vfat Reported-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-29crypto: prefix module autoloading with "crypto-"Kees Cook
commit 5d26a105b5a73e5635eae0629b42fa0a90e07b7b upstream. This prefixes all crypto module loading with "crypto-" so we never run the risk of exposing module auto-loading to userspace via a crypto API, as demonstrated by Mathias Krause: https://lkml.org/lkml/2013/3/4/70 Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-29x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regsAndy Lutomirski
commit 7ddc6a2199f1da405a2fb68c40db8899b1a8cd87 upstream. These functions can be executed on the int3 stack, so kprobes are dangerous. Tracing is probably a bad idea, too. Fixes: b645af2d5905 ("x86_64, traps: Rework bad_iret") Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/50e33d26adca60816f3ba968875801652507d0c4.1416870125.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.10: - Use __kprobes instead of NOKPROBE_SYMBOL() - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-29x86, tls: Interpret an all-zero struct user_desc as "no segment"Andy Lutomirski
commit 3669ef9fa7d35f573ec9c0e0341b29251c2734a7 upstream. The Witcher 2 did something like this to allocate a TLS segment index: struct user_desc u_info; bzero(&u_info, sizeof(u_info)); u_info.entry_number = (uint32_t)-1; syscall(SYS_set_thread_area, &u_info); Strictly speaking, this code was never correct. It should have set read_exec_only and seg_not_present to 1 to indicate that it wanted to find a free slot without putting anything there, or it should have put something sensible in the TLS slot if it wanted to allocate a TLS entry for real. The actual effect of this code was to allocate a bogus segment that could be used to exploit espfix. The set_thread_area hardening patches changed the behavior, causing set_thread_area to return -EINVAL and crashing the game. This changes set_thread_area to interpret this as a request to find a free slot and to leave it empty, which isn't *quite* what the game expects but should be close enough to keep it working. In particular, using the code above to allocate two segments will allocate the same segment both times. According to FrostbittenKing on Github, this fixes The Witcher 2. If this somehow still causes problems, we could instead allocate a limit==0 32-bit data segment, but that seems rather ugly to me. Fixes: 41bdc78544b8 x86/tls: Validate TLS entries to protect espfix Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/0cb251abe1ff0958b8e468a9a9a905b80ae3a746.1421954363.git.luto@amacapital.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-29x86, tls, ldt: Stop checking lm in LDT_emptyAndy Lutomirski
commit e30ab185c490e9a9381385529e0fd32f0a399495 upstream. 32-bit programs don't have an lm bit in their ABI, so they can't reliably cause LDT_empty to return true without resorting to memset. They shouldn't need to do this. This should fix a longstanding, if minor, issue in all 64-bit kernels as well as a potential regression in the TLS hardening code. Fixes: 41bdc78544b8 x86/tls: Validate TLS entries to protect espfix Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/72a059de55e86ad5e2935c80aa91880ddf19d07c.1421954363.git.luto@amacapital.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-29x86/tsc: Change Fast TSC calibration failed from error to infoAlexandre Demers
commit 520452172e6b318f3a8bd9d4fe1e25066393de25 upstream. Many users see this message when booting without knowning that it is of no importance and that TSC calibration may have succeeded by another way. As explained by Paul Bolle in http://lkml.kernel.org/r/1348488259.1436.22.camel@x61.thuisdomein "Fast TSC calibration failed" should not be considered as an error since other calibration methods are being tried afterward. At most, those send a warning if they fail (not an error). So let's change the message from error to warning. [ tglx: Make if pr_info. It's really not important at all ] Fixes: c767a54ba065 x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level> Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Link: http://lkml.kernel.org/r/1418106470-6906-1-git-send-email-alexandre.f.demers@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-29x86, hyperv: Mark the Hyper-V clocksource as being continuousK. Y. Srinivasan
commit 32c6590d126836a062b3140ed52d898507987017 upstream. The Hyper-V clocksource is continuous; mark it accordingly. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Acked-by: jasowang@redhat.com Cc: gregkh@linuxfoundation.org Cc: devel@linuxdriverproject.org Cc: olaf@aepfle.de Cc: apw@canonical.com Link: http://lkml.kernel.org/r/1421108762-3331-1-git-send-email-kys@microsoft.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-29ARM: dts: imx25: Fix PWM "per" clocksFabio Estevam
commit 7ecd0bde5bfea524a843ad8fa8cb66ccbce68779 upstream. Currently PWM functionality is broken on mx25 due to the wrong assignment of the PWM "per" clock. According to Documentation/devicetree/bindings/clock/imx25-clock.txt: pwm_ipg_per 52 ,so update the pwm "per" to use 'pwm_ipg_per' instead of 'per10' clock. With this change PWM can work fine on mx25. Reported-by: Carlos Soto <csotoalonso@gmail.com> Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-27x86, um: actually mark system call tables readonlyDaniel Borkmann
commit b485342bd79af363c77ef1a421c4a0aef2de9812 upstream. Commit a074335a370e ("x86, um: Mark system call tables readonly") was supposed to mark the sys_call_table in UML as RO by adding the const, but it doesn't have the desired effect as it's nevertheless being placed into the data section since __cacheline_aligned enforces sys_call_table being placed into .data..cacheline_aligned instead. We need to use the ____cacheline_aligned version instead to fix this issue. Before: $ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table" U sys_writev 0000000000000000 D sys_call_table 0000000000000000 D syscall_table_size After: $ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table" U sys_writev 0000000000000000 R sys_call_table 0000000000000000 D syscall_table_size Fixes: a074335a370e ("x86, um: Mark system call tables readonly") Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-27um: Skip futex_atomic_cmpxchg_inatomic() testRichard Weinberger
commit f911d731054ab3d82ee72a16b889e17ca3a2332a upstream. futex_atomic_cmpxchg_inatomic() does not work on UML because it triggers a copy_from_user() in kernel context. On UML copy_from_user() can only be used if the kernel was called by a real user space process such that UML can use ptrace() to fetch the value. Reported-by: Miklos Szeredi <miklos@szeredi.hu> Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Richard Weinberger <richard@nod.at> Tested-by: Daniel Walter <d.walter@0x90.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-27ARM: shmobile: sh73a0 legacy: Set .control_parent for all irqpin instancesGeert Uytterhoeven
commit b0ddb319db3d7a1943445f0de0a45c07a7f3457a upstream. The sh73a0 INTC can't mask interrupts properly most likely due to a hardware bug. Set the .control_parent flag to delegate masking to the parent interrupt controller, like was already done for irqpin1. Without this, accessing the three-axis digital accelerometer ADXL345 on kzm9g through /dev/input/event1 causes an interrupt storm, which requires a power-cycle to recover from. This was inspired by a patch for arch/arm/boot/dts/sh73a0.dtsi from Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Fixes: 341eb5465f67437a ("ARM: shmobile: INTC External IRQ pin driver on sh73a0") Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-27ARM: omap5/dra7xx: Fix frequency typosLennart Sorensen
commit 572b24e6d85d98cdc552f07e9fb9870d9460d81b upstream. The switch statement of the possible list of SYSCLK1 frequencies is missing a 0 in 4 out of the 7 frequencies. Fixes: fa6d79d27614 ("ARM: OMAP: Add initialisation for the real-time counter") Signed-off-by: Len Sorensen <lsorense@csclub.uwaterloo.ca> Reviewed-by: Lokesh Vutla <lokeshvutla@ti.com> Acked-by: Nishanth Menon <nm@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-27ARM: clk-imx6q: fix video divider for rev T0 1.0Gary Bisson
commit 81ef447950bf0955aca46f4a7617d8ce435cf0ce upstream. The post dividers do not work on i.MX6Q rev T0 1.0 so they must be fixed to 1. As the table index was wrong, a divider a of 4 could still be requested which implied the clock not to be set properly. This is the root cause of the HDMI not working at high resolution on rev T0 1.0 of the SoC. Signed-off-by: Gary Bisson <bisson.gary@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-27ARM: imx6q: drop unnecessary semicolonDmitry Voytik
commit d2a10a1727b3948019128e83162f22c65859f1fd upstream. Drop unnecessary semicolon after closing curly bracket. Signed-off-by: Dmitry Voytik <voytikd@gmail.com> Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-27ARM: dts: imx25: Fix the SPI1 clocksFabio Estevam
commit 7a87e9cbc3a2f0ff0955815335e08c9862359130 upstream. From Documentation/devicetree/bindings/clock/imx25-clock.txt: cspi1_ipg 78 cspi2_ipg 79 cspi3_ipg 80 , so fix the SPI1 clocks accordingly to avoid a kernel hang when trying to access SPI1. Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-27ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracingSteven Rostedt (Red Hat)
commit 237d28db036e411f22c03cfd5b0f6dc2aa9bf3bc upstream. If the function graph tracer traces a jprobe callback, the system will crash. This can easily be demonstrated by compiling the jprobe sample module that is in the kernel tree, loading it and running the function graph tracer. # modprobe jprobe_example.ko # echo function_graph > /sys/kernel/debug/tracing/current_tracer # ls The first two commands end up in a nice crash after the first fork. (do_fork has a jprobe attached to it, so "ls" just triggers that fork) The problem is caused by the jprobe_return() that all jprobe callbacks must end with. The way jprobes works is that the function a jprobe is attached to has a breakpoint placed at the start of it (or it uses ftrace if fentry is supported). The breakpoint handler (or ftrace callback) will copy the stack frame and change the ip address to return to the jprobe handler instead of the function. The jprobe handler must end with jprobe_return() which swaps the stack and does an int3 (breakpoint). This breakpoint handler will then put back the saved stack frame, simulate the instruction at the beginning of the function it added a breakpoint to, and then continue on. For function tracing to work, it hijakes the return address from the stack frame, and replaces it with a hook function that will trace the end of the call. This hook function will restore the return address of the function call. If the function tracer traces the jprobe handler, the hook function for that handler will not be called, and its saved return address will be used for the next function. This will result in a kernel crash. To solve this, pause function tracing before the jprobe handler is called and unpause it before it returns back to the function it probed. Some other updates: Used a variable "saved_sp" to hold kcb->jprobe_saved_sp. This makes the code look a bit cleaner and easier to understand (various tries to fix this bug required this change). Note, if fentry is being used, jprobes will change the ip address before the function graph tracer runs and it will not be able to trace the function that the jprobe is probing. Link: http://lkml.kernel.org/r/20150114154329.552437962@goodmis.org Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-27ARC: [nsimosci] move peripherals to match model to FPGAVineet Gupta
commit e8ef060b37c2d3cc5fd0c0edbe4e42ec1cb9768b upstream. This allows the sdplite/Zebu images to run on OSCI simulation platform Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-16perf/x86/intel/uncore: Make sure only uncore events are collectedJiri Olsa
commit af91568e762d04931dcbdd6bef4655433d8b9418 upstream. The uncore_collect_events functions assumes that event group might contain only uncore events which is wrong, because it might contain any type of events. This bug leads to uncore framework touching 'not' uncore events, which could end up all sorts of bugs. One was triggered by Vince's perf fuzzer, when the uncore code touched breakpoint event private event space as if it was uncore event and caused BUG: BUG: unable to handle kernel paging request at ffffffff82822068 IP: [<ffffffff81020338>] uncore_assign_events+0x188/0x250 ... The code in uncore_assign_events() function was looking for event->hw.idx data while the event was initialized as a breakpoint with different members in event->hw union. This patch forces uncore_collect_events() to collect only uncore events. Reported-by: Vince Weaver <vince@deater.net> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Yan, Zheng <zheng.z.yan@intel.com> Link: http://lkml.kernel.org/r/1418243031-20367-2-git-send-email-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-16ARM: mvebu: disable I/O coherency on non-SMP situations on Armada 370/375/38x/XPThomas Petazzoni
commit e55355453600a33bb5ca4f71f2d7214875f3b061 upstream. Enabling the hardware I/O coherency on Armada 370, Armada 375, Armada 38x and Armada XP requires a certain number of conditions: - On Armada 370, the cache policy must be set to write-allocate. - On Armada 375, 38x and XP, the cache policy must be set to write-allocate, the pages must be mapped with the shareable attribute, and the SMP bit must be set Currently, on Armada XP, when CONFIG_SMP is enabled, those conditions are met. However, when Armada XP is used in a !CONFIG_SMP kernel, none of these conditions are met. With Armada 370, the situation is worse: since the processor is single core, regardless of whether CONFIG_SMP or !CONFIG_SMP is used, the cache policy will be set to write-back by the kernel and not write-allocate. Since solving this problem turns out to be quite complicated, and we don't want to let users with a mainline kernel known to have infrequent but existing data corruptions, this commit proposes to simply disable hardware I/O coherency in situations where it is known not to work. And basically, the is_smp() function of the kernel tells us whether it is OK to enable hardware I/O coherency or not, so this commit slightly refactors the coherency_type() function to return COHERENCY_FABRIC_TYPE_NONE when is_smp() is false, or the appropriate type of the coherency fabric in the other case. Thanks to this, the I/O coherency fabric will no longer be used at all in !CONFIG_SMP configurations. It will continue to be used in CONFIG_SMP configurations on Armada XP, Armada 375 and Armada 38x (which are multiple cores processors), but will no longer be used on Armada 370 (which is a single core processor). In the process, it simplifies the implementation of the coherency_type() function, and adds a missing call to of_node_put(). Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Fixes: e60304f8cb7bb545e79fe62d9b9762460c254ec2 ("arm: mvebu: Add hardware I/O Coherency support") Cc: <stable@vger.kernel.org> # v3.8+ Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Link: https://lkml.kernel.org/r/1415871540-20302-3-git-send-email-thomas.petazzoni@free-electrons.com Signed-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-16x86, vdso: Use asm volatile in __getcpuAndy Lutomirski
commit 1ddf0b1b11aa8a90cef6706e935fc31c75c406ba upstream. In Linux 3.18 and below, GCC hoists the lsl instructions in the pvclock code all the way to the beginning of __vdso_clock_gettime, slowing the non-paravirt case significantly. For unknown reasons, presumably related to the removal of a branch, the performance issue is gone as of e76b027e6408 x86,vdso: Use LSL unconditionally for vgetcpu but I don't trust GCC enough to expect the problem to stay fixed. There should be no correctness issue, because the __getcpu calls in __vdso_vlock_gettime were never necessary in the first place. Note to stable maintainers: In 3.18 and below, depending on configuration, gcc 4.9.2 generates code like this: 9c3: 44 0f 03 e8 lsl %ax,%r13d 9c7: 45 89 eb mov %r13d,%r11d 9ca: 0f 03 d8 lsl %ax,%ebx This patch won't apply as is to any released kernel, but I'll send a trivial backported version if needed. [ Backported by Andy Lutomirski. Should apply to all affected versions. This fixes a functionality bug as well as a performance bug: buggy kernels can infinite loop in __vdso_clock_gettime on affected compilers. See, for exammple: https://bugzilla.redhat.com/show_bug.cgi?id=1178975 ] Fixes: 51c19b4f5927 x86: vdso: pvclock gettime support Cc: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-16x86_64, vdso: Fix the vdso address randomization algorithmAndy Lutomirski
commit 394f56fe480140877304d342dec46d50dc823d46 upstream. The theory behind vdso randomization is that it's mapped at a random offset above the top of the stack. To avoid wasting a page of memory for an extra page table, the vdso isn't supposed to extend past the lowest PMD into which it can fit. Other than that, the address should be a uniformly distributed address that meets all of the alignment requirements. The current algorithm is buggy: the vdso has about a 50% probability of being at the very end of a PMD. The current algorithm also has a decent chance of failing outright due to incorrect handling of the case where the top of the stack is near the top of its PMD. This fixes the implementation. The paxtest estimate of vdso "randomisation" improves from 11 bits to 18 bits. (Disclaimer: I don't know what the paxtest code is actually calculating.) It's worth noting that this algorithm is inherently biased: the vdso is more likely to end up near the end of its PMD than near the beginning. Ideally we would either nix the PMD sharing requirement or jointly randomize the vdso and the stack to reduce the bias. In the mean time, this is a considerable improvement with basically no risk of compatibility issues, since the allowed outputs of the algorithm are unchanged. As an easy test, doing this: for i in `seq 10000` do grep -P vdso /proc/self/maps |cut -d- -f1 done |sort |uniq -d used to produce lots of output (1445 lines on my most recent run). A tiny subset looks like this: 7fffdfffe000 7fffe01fe000 7fffe05fe000 7fffe07fe000 7fffe09fe000 7fffe0bfe000 7fffe0dfe000 Note the suspicious fe000 endings. With the fix, I get a much more palatable 76 repeated addresses. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08groups: Consolidate the setgroups permission checksEric W. Biederman
commit 7ff4d90b4c24a03666f296c3d4878cd39001e81e upstream. Today there are 3 instances of setgroups and due to an oversight their permission checking has diverged. Add a common function so that they may all share the same permission checking code. This corrects the current oversight in the current permission checks and adds a helper to avoid this in the future. A user namespace security fix will update this new helper, shortly. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08x86/tls: Don't validate lm in set_thread_area() after allAndy Lutomirski
commit 3fb2f4237bb452eb4e98f6a5dbd5a445b4fed9d0 upstream. It turns out that there's a lurking ABI issue. GCC, when compiling this in a 32-bit program: struct user_desc desc = { .entry_number = idx, .base_addr = base, .limit = 0xfffff, .seg_32bit = 1, .contents = 0, /* Data, grow-up */ .read_exec_only = 0, .limit_in_pages = 1, .seg_not_present = 0, .useable = 0, }; will leave .lm uninitialized. This means that anything in the kernel that reads user_desc.lm for 32-bit tasks is unreliable. Revert the .lm check in set_thread_area(). The value never did anything in the first place. Fixes: 0e58af4e1d21 ("x86/tls: Disallow unusual TLS segments") Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefitAndy Lutomirski
commit 29fa6825463c97e5157284db80107d1bfac5d77b upstream. paravirt_enabled has the following effects: - Disables the F00F bug workaround warning. There is no F00F bug workaround any more because Linux's standard IDT handling already works around the F00F bug, but the warning still exists. This is only cosmetic, and, in any event, there is no such thing as KVM on a CPU with the F00F bug. - Disables 32-bit APM BIOS detection. On a KVM paravirt system, there should be no APM BIOS anyway. - Disables tboot. I think that the tboot code should check the CPUID hypervisor bit directly if it matters. - paravirt_enabled disables espfix32. espfix32 should *not* be disabled under KVM paravirt. The last point is the purpose of this patch. It fixes a leak of the high 16 bits of the kernel stack address on 32-bit KVM paravirt guests. Fixes CVE-2014-8134. Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08x86_64, switch_to(): Load TLS descriptors before switching DS and ESAndy Lutomirski
commit f647d7c155f069c1a068030255c300663516420e upstream. Otherwise, if buggy user code points DS or ES into the TLS array, they would be corrupted after a context switch. This also significantly improves the comments and documents some gotchas in the code. Before this patch, the both tests below failed. With this patch, the es test passes, although the gsbase test still fails. ----- begin es test ----- /* * Copyright (c) 2014 Andy Lutomirski * GPL v2 */ static unsigned short GDT3(int idx) { return (idx << 3) | 3; } static int create_tls(int idx, unsigned int base) { struct user_desc desc = { .entry_number = idx, .base_addr = base, .limit = 0xfffff, .seg_32bit = 1, .contents = 0, /* Data, grow-up */ .read_exec_only = 0, .limit_in_pages = 1, .seg_not_present = 0, .useable = 0, }; if (syscall(SYS_set_thread_area, &desc) != 0) err(1, "set_thread_area"); return desc.entry_number; } int main() { int idx = create_tls(-1, 0); printf("Allocated GDT index %d\n", idx); unsigned short orig_es; asm volatile ("mov %%es,%0" : "=rm" (orig_es)); int errors = 0; int total = 1000; for (int i = 0; i < total; i++) { asm volatile ("mov %0,%%es" : : "rm" (GDT3(idx))); usleep(100); unsigned short es; asm volatile ("mov %%es,%0" : "=rm" (es)); asm volatile ("mov %0,%%es" : : "rm" (orig_es)); if (es != GDT3(idx)) { if (errors == 0) printf("[FAIL]\tES changed from 0x%hx to 0x%hx\n", GDT3(idx), es); errors++; } } if (errors) { printf("[FAIL]\tES was corrupted %d/%d times\n", errors, total); return 1; } else { printf("[OK]\tES was preserved\n"); return 0; } } ----- end es test ----- ----- begin gsbase test ----- /* * gsbase.c, a gsbase test * Copyright (c) 2014 Andy Lutomirski * GPL v2 */ static unsigned char *testptr, *testptr2; static unsigned char read_gs_testvals(void) { unsigned char ret; asm volatile ("movb %%gs:%1, %0" : "=r" (ret) : "m" (*testptr)); return ret; } int main() { int errors = 0; testptr = mmap((void *)0x200000000UL, 1, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0); if (testptr == MAP_FAILED) err(1, "mmap"); testptr2 = mmap((void *)0x300000000UL, 1, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0); if (testptr2 == MAP_FAILED) err(1, "mmap"); *testptr = 0; *testptr2 = 1; if (syscall(SYS_arch_prctl, ARCH_SET_GS, (unsigned long)testptr2 - (unsigned long)testptr) != 0) err(1, "ARCH_SET_GS"); usleep(100); if (read_gs_testvals() == 1) { printf("[OK]\tARCH_SET_GS worked\n"); } else { printf("[FAIL]\tARCH_SET_GS failed\n"); errors++; } asm volatile ("mov %0,%%gs" : : "r" (0)); if (read_gs_testvals() == 0) { printf("[OK]\tWriting 0 to gs worked\n"); } else { printf("[FAIL]\tWriting 0 to gs failed\n"); errors++; } usleep(100); if (read_gs_testvals() == 0) { printf("[OK]\tgsbase is still zero\n"); } else { printf("[FAIL]\tgsbase was corrupted\n"); errors++; } return errors == 0 ? 0 : 1; } ----- end gsbase test ----- Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Andi Kleen <andi@firstfloor.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/509d27c9fec78217691c3dad91cec87e1006b34a.1418075657.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08x86/tls: Disallow unusual TLS segmentsAndy Lutomirski
commit 0e58af4e1d2166e9e33375a0f121e4867010d4f8 upstream. Users have no business installing custom code segments into the GDT, and segments that are not present but are otherwise valid are a historical source of interesting attacks. For completeness, block attempts to set the L bit. (Prior to this patch, the L bit would have been silently dropped.) This is an ABI break. I've checked glibc, musl, and Wine, and none of them look like they'll have any trouble. Note to stable maintainers: this is a hardening patch that fixes no known bugs. Given the possibility of ABI issues, this probably shouldn't be backported quickly. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08x86/tls: Validate TLS entries to protect espfixAndy Lutomirski
commit 41bdc78544b8a93a9c6814b8bbbfef966272abbe upstream. Installing a 16-bit RW data segment into the GDT defeats espfix. AFAICT this will not affect glibc, Wine, or dosemu at all. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-16powerpc: 32 bit getcpu VDSO function uses 64 bit instructionsAnton Blanchard
commit 152d44a853e42952f6c8a504fb1f8eefd21fd5fd upstream. I used some 64 bit instructions when adding the 32 bit getcpu VDSO function. Fix it. Fixes: 18ad51dd342a ("powerpc: Add VDSO version of getcpu") Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-16ARM: sched_clock: Load cycle count after epoch stabilizesStephen Boyd
commit 336ae1180df5f69b9e0fb6561bec01c5f64361cf upstream. There is a small race between when the cycle count is read from the hardware and when the epoch stabilizes. Consider this scenario: CPU0 CPU1 ---- ---- cyc = read_sched_clock() cyc_to_sched_clock() update_sched_clock() ... cd.epoch_cyc = cyc; epoch_cyc = cd.epoch_cyc; ... epoch_ns + cyc_to_ns((cyc - epoch_cyc) The cyc on cpu0 was read before the epoch changed. But we calculate the nanoseconds based on the new epoch by subtracting the new epoch from the old cycle count. Since epoch is most likely larger than the old cycle count we calculate a large number that will be converted to nanoseconds and added to epoch_ns, causing time to jump forward too much. Fix this problem by reading the hardware after the epoch has stabilized. Cc: Russell King <linux@arm.linux.org.uk> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>