summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2014-08-14arch/sparc/math-emu/math_32.c: drop stray break operatorAndrey Utkin
[ Upstream commit 093758e3daede29cb4ce6aedb111becf9d4bfc57 ] This commit is a guesswork, but it seems to make sense to drop this break, as otherwise the following line is never executed and becomes dead code. And that following line actually saves the result of local calculation by the pointer given in function argument. So the proposed change makes sense if this code in the whole makes sense (but I am unable to analyze it in the whole). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81641 Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-14sparc64: ldc_connect() should not return EINVAL when handshake is in progress.Sowmini Varadhan
[ Upstream commit 4ec1b01029b4facb651b8ef70bc20a4be4cebc63 ] The LDC handshake could have been asynchronously triggered after ldc_bind() enables the ldc_rx() receive interrupt-handler (and thus intercepts incoming control packets) and before vio_port_up() calls ldc_connect(). If that is the case, ldc_connect() should return 0 and let the state-machine progress. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Karl Volz <karl.volz@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-14sparc64: Guard against flushing openfirmware mappings.David S. Miller
[ Upstream commit 4ca9a23765da3260058db3431faf5b4efd8cf926 ] Based almost entirely upon a patch by Christopher Alexander Tobias Schulze. In commit db64fe02258f1507e13fe5212a989922323685ce ("mm: rewrite vmap layer") lazy VMAP tlb flushing was added to the vmalloc layer. This causes problems on sparc64. Sparc64 has two VMAP mapped regions and they are not contiguous with eachother. First we have the malloc mapping area, then another unrelated region, then the vmalloc region. This "another unrelated region" is where the firmware is mapped. If the lazy TLB flushing logic in the vmalloc code triggers after we've had both a module unload and a vfree or similar, it will pass an address range that goes from somewhere inside the malloc region to somewhere inside the vmalloc region, and thus covering the openfirmware area entirely. The sparc64 kernel learns about openfirmware's dynamic mappings in this region early in the boot, and then services TLB misses in this area. But openfirmware has some locked TLB entries which are not mentioned in those dynamic mappings and we should thus not disturb them. These huge lazy TLB flush ranges causes those openfirmware locked TLB entries to be removed, resulting in all kinds of problems including hard hangs and crashes during reboot/reset. Besides causing problems like this, such huge TLB flush ranges are also incredibly inefficient. A plea has been made with the author of the VMAP lazy TLB flushing code, but for now we'll put a safety guard into our flush_tlb_kernel_range() implementation. Since the implementation has become non-trivial, stop defining it as a macro and instead make it a function in a C source file. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-14sparc64: Do not insert non-valid PTEs into the TSB hash table.David S. Miller
[ Upstream commit 18f38132528c3e603c66ea464727b29e9bbcb91b ] The assumption was that update_mmu_cache() (and the equivalent for PMDs) would only be called when the PTE being installed will be accessible by the user. This is not true for code paths originating from remove_migration_pte(). There are dire consequences for placing a non-valid PTE into the TSB. The TLB miss frramework assumes thatwhen a TSB entry matches we can just load it into the TLB and return from the TLB miss trap. So if a non-valid PTE is in there, we will deadlock taking the TLB miss over and over, never satisfying the miss. Just exit early from update_mmu_cache() and friends in this situation. Based upon a report and patch from Christopher Alexander Tobias Schulze. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-14sparc64: Add membar to Niagara2 memcpy code.David S. Miller
[ Upstream commit 5aa4ecfd0ddb1e6dcd1c886e6c49677550f581aa ] This is the prevent previous stores from overlapping the block stores done by the memcpy loop. Based upon a glibc patch by Jose E. Marchesi Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-14sparc64: Fix huge TSB mapping on pre-UltraSPARC-III cpus.David S. Miller
[ Upstream commit b18eb2d779240631a098626cb6841ee2dd34fda0 ] Access to the TSB hash tables during TLB misses requires that there be an atomic 128-bit quad load available so that we fetch a matching TAG and DATA field at the same time. On cpus prior to UltraSPARC-III only virtual address based quad loads are available. UltraSPARC-III and later provide physical address based variants which are easier to use. When we only have virtual address based quad loads available this means that we have to lock the TSB into the TLB at a fixed virtual address on each cpu when it runs that process. We can't just access the PAGE_OFFSET based aliased mapping of these TSBs because we cannot take a recursive TLB miss inside of the TLB miss handler without risking running out of hardware trap levels (some trap combinations can be deep, such as those generated by register window spill and fill traps). Without huge pages it's working perfectly fine, but when the huge TSB got added another chunk of fixed virtual address space was not allocated for this second TSB mapping. So we were mapping both the 8K and 4MB TSBs to the same exact virtual address, causing multiple TLB matches which gives undefined behavior. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-14sparc64: Don't bark so loudly about 32-bit tasks generating 64-bit fault ↵David S. Miller
addresses. [ Upstream commit e5c460f46ae7ee94831cb55cb980f942aa9e5a85 ] This was found using Dave Jone's trinity tool. When a user process which is 32-bit performs a load or a store, the cpu chops off the top 32-bits of the effective address before translating it. This is because we run 32-bit tasks with the PSTATE_AM (address masking) bit set. We can't run the kernel with that bit set, so when the kernel accesses userspace no address masking occurs. Since a 32-bit process will have no mappings in that region we will properly fault, so we don't try to handle this using access_ok(), which can safely just be a NOP on sparc64. Real faults from 32-bit processes should never generate such addresses so a bug check was added long ago, and it barks in the logs if this happens. But it also barks when a kernel user access causes this condition, and that _can_ happen. For example, if a pointer passed into a system call is "0xfffffffc" and the kernel access 4 bytes offset from that pointer. Just handle such faults normally via the exception entries. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-14sparc64: Fix top-level fault handling bugs.David S. Miller
[ Upstream commit 70ffc6ebaead783ac8dafb1e87df0039bb043596 ] Make get_user_insn() able to cope with huge PMDs. Next, make do_fault_siginfo() more robust when get_user_insn() can't actually fetch the instruction. In particular, use the MMU announced fault address when that happens, instead of calling compute_effective_address() and computing garbage. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-14sparc64: Handle 32-bit tasks properly in compute_effective_address().David S. Miller
[ Upstream commit d037d16372bbe4d580342bebbb8826821ad9edf0 ] If we have a 32-bit task we must chop off the top 32-bits of the 64-bit value just as the cpu would. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-14sparc64: Make itc_sync_lock rawKirill Tkhai
[ Upstream commit 49b6c01f4c1de3b5e5427ac5aba80f9f6d27837a ] One more place where we must not be able to be preempted or to be interrupted in RT. Always actually disable interrupts during synchronization cycle. Signed-off-by: Kirill Tkhai <tkhai@yandex.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-14sparc64: Fix argument sign extension for compat_sys_futex().David S. Miller
[ Upstream commit aa3449ee9c87d9b7660dd1493248abcc57769e31 ] Only the second argument, 'op', is signed. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-07x86/espfix/xen: Fix allocation of pages for paravirt page tablesBoris Ostrovsky
commit 8762e5092828c4dc0f49da5a47a644c670df77f3 upstream. init_espfix_ap() is currently off by one level when informing hypervisor that allocated pages will be used for ministacks' page tables. The most immediate effect of this on a PV guest is that if 'stack_page = __get_free_page()' returns a non-zeroed-out page the hypervisor will refuse to use it for a page table (which it shouldn't be anyway). This will result in warnings by both Xen and Linux. More importantly, a subsequent write to that page (again, by a PV guest) is likely to result in fatal page fault. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1404926298-5565-1-git-send-email-boris.ostrovsky@oracle.com Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-07x86_64/entry/xen: Do not invoke espfix64 on XenAndy Lutomirski
commit 7209a75d2009dbf7745e2fd354abf25c3deb3ca3 upstream. This moves the espfix64 logic into native_iret. To make this work, it gets rid of the native patch for INTERRUPT_RETURN: INTERRUPT_RETURN on native kernels is now 'jmp native_iret'. This changes the 16-bit SS behavior on Xen from OOPSing to leaking some bits of the Xen hypervisor's RSP (I think). [ hpa: this is a nonzero cost on native, but probably not enough to measure. Xen needs to fix this in their own code, probably doing something equivalent to espfix64. ] Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/7b8f1d8ef6597cb16ae004a43c56980a7de3cf94.1406129132.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-07x86, espfix: Make it possible to disable 16-bit supportH. Peter Anvin
commit 34273f41d57ee8d854dcd2a1d754cbb546cb548f upstream. Embedded systems, which may be very memory-size-sensitive, are extremely unlikely to ever encounter any 16-bit software, so make it a CONFIG_EXPERT option to turn off support for any 16-bit software whatsoever. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-07x86, espfix: Make espfix64 a Kconfig option, fix UMLH. Peter Anvin
commit 197725de65477bc8509b41388157c1a2283542bb upstream. Make espfix64 a hidden Kconfig option. This fixes the x86-64 UML build which had broken due to the non-existence of init_espfix_bsp() in UML: since UML uses its own Kconfig, this option does not appear in the UML build. This also makes it possible to make support for 16-bit segments a configuration option, for the people who want to minimize the size of the kernel. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Richard Weinberger <richard@nod.at> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-07x86, espfix: Fix broken header guardH. Peter Anvin
commit 20b68535cd27183ebd3651ff313afb2b97dac941 upstream. Header guard is #ifndef, not #ifdef... Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-07x86, espfix: Move espfix definitions into a separate header fileH. Peter Anvin
commit e1fe9ed8d2a4937510d0d60e20705035c2609aea upstream. Sparse warns that the percpu variables aren't declared before they are defined. Rather than hacking around it, move espfix definitions into a proper header file. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-07x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stackH. Peter Anvin
commit 3891a04aafd668686239349ea58f3314ea2af86b upstream. The IRET instruction, when returning to a 16-bit segment, only restores the bottom 16 bits of the user space stack pointer. This causes some 16-bit software to break, but it also leaks kernel state to user space. We have a software workaround for that ("espfix") for the 32-bit kernel, but it relies on a nonzero stack segment base which is not available in 64-bit mode. In checkin: b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels we "solved" this by forbidding 16-bit segments on 64-bit kernels, with the logic that 16-bit support is crippled on 64-bit kernels anyway (no V86 support), but it turns out that people are doing stuff like running old Win16 binaries under Wine and expect it to work. This works around this by creating percpu "ministacks", each of which is mapped 2^16 times 64K apart. When we detect that the return SS is on the LDT, we copy the IRET frame to the ministack and use the relevant alias to return to userspace. The ministacks are mapped readonly, so if IRET faults we promote #GP to #DF which is an IST vector and thus has its own stack; we then do the fixup in the #DF handler. (Making #GP an IST exception would make the msr_safe functions unsafe in NMI/MC context, and quite possibly have other effects.) Special thanks to: - Andy Lutomirski, for the suggestion of using very small stack slots and copy (as opposed to map) the IRET frame there, and for the suggestion to mark them readonly and let the fault promote to #DF. - Konrad Wilk for paravirt fixup and testing. - Borislav Petkov for testing help and useful comments. Reported-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrew Lutomriski <amluto@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dirk Hohndel <dirk@hohndel.org> Cc: Arjan van de Ven <arjan.van.de.ven@intel.com> Cc: comex <comexk@gmail.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-07Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option"H. Peter Anvin
commit 7ed6fb9b5a5510e4ef78ab27419184741169978a upstream. This reverts commit fa81511bb0bbb2b1aace3695ce869da9762624ff in preparation of merging in the proper fix (espfix64). Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-07ARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layoutKonstantin Khlebnikov
commit 811a2407a3cf7bbd027fbe92d73416f17485a3d8 upstream. On LPAE, each level 1 (pgd) page table entry maps 1GiB, and the level 2 (pmd) entries map 2MiB. When the identity mapping is created on LPAE, the pgd pointers are copied from the swapper_pg_dir. If we find that we need to modify the contents of a pmd, we allocate a new empty pmd table and insert it into the appropriate 1GB slot, before then filling it with the identity mapping. However, if the 1GB slot covers the kernel lowmem mappings, we obliterate those mappings. When replacing a PMD, first copy the old PMD contents to the new PMD, so that we preserve the existing mappings, particularly the mappings of the kernel itself. [rewrote commit message and added code comment -- rmk] Fixes: ae2de101739c ("ARM: LPAE: Add identity mapping support for the 3-level page table format") Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-31x86/efi: Include a .bss section within the PE/COFF headersMichael Brown
commit c7fb93ec51d462ec3540a729ba446663c26a0505 upstream. The PE/COFF headers currently describe only the initialised-data portions of the image, and result in no space being allocated for the uninitialised-data portions. Consequently, the EFI boot stub will end up overwriting unexpected areas of memory, with unpredictable results. Fix by including a .bss section in the PE/COFF headers (functionally equivalent to the init_size field in the bzImage header). Signed-off-by: Michael Brown <mbrown@fensystems.co.uk> Cc: Thomas Bächler <thomas@archlinux.org> Cc: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-31s390/ptrace: fix PSW mask checkMartin Schwidefsky
commit dab6cf55f81a6e16b8147aed9a843e1691dcd318 upstream. The PSW mask check of the PTRACE_POKEUSR_AREA command is incorrect. The PSW_MASK_USER define contains the PSW_MASK_ASC bits, the ptrace interface accepts all combinations for the address-space-control bits. To protect the kernel space the PSW mask check in ptrace needs to reject the address-space-control bit combination for home space. Fixes CVE-2014-3534 Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-31x86_32, entry: Store badsys error code in %eaxSven Wegener
commit 8142b215501f8b291a108a202b3a053a265b03dd upstream. Commit 554086d ("x86_32, entry: Do syscall exit work on badsys (CVE-2014-4508)") introduced a regression in the x86_32 syscall entry code, resulting in syscall() not returning proper errors for undefined syscalls on CPUs supporting the sysenter feature. The following code: > int result = syscall(666); > printf("result=%d errno=%d error=%s\n", result, errno, strerror(errno)); results in: > result=666 errno=0 error=Success Obviously, the syscall return value is the called syscall number, but it should have been an ENOSYS error. When run under ptrace it behaves correctly, which makes it hard to debug in the wild: > result=-1 errno=38 error=Function not implemented The %eax register is the return value register. For debugging via ptrace the syscall entry code stores the complete register context on the stack. The badsys handlers only store the ENOSYS error code in the ptrace register set and do not set %eax like a regular syscall handler would. The old resume_userspace call chain contains code that clobbers %eax and it restores %eax from the ptrace registers afterwards. The same goes for the ptrace-enabled call chain. When ptrace is not used, the syscall return value is the passed-in syscall number from the untouched %eax register. Use %eax as the return value register in syscall_badsys and sysenter_badsys, like a real syscall handler does, and have the caller push the value onto the stack for ptrace access. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Link: http://lkml.kernel.org/r/alpine.LNX.2.11.1407221022380.31021@titan.int.lan.stealer.net Reviewed-and-tested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-31parisc: Remove SA_RESTORER defineJohn David Anglin
commit 20dbea494543aefaace874cc3ec93a39b94b1ec4 upstream. The sa_restorer field in struct sigaction is obsolete and no longer in the parisc implementation. However, the core code assumes the field is present if SA_RESTORER is defined. So, the define needs to be removed. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28ARC: Implement ptrace(PTRACE_GET_THREAD_AREA)Anton Kolesov
commit a4b6cb735b25aa84a462a1985e3e43bebaf5beb4 upstream. This patch adds implementation of GET_THREAD_AREA ptrace request type. This is required by GDB to debug NPTL applications. Signed-off-by: Anton Kolesov <Anton.Kolesov@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28locking/mutex: Disable optimistic spinning on some architecturesPeter Zijlstra
commit 4badad352a6bb202ec68afa7a574c0bb961e5ebc upstream. The optimistic spin code assumes regular stores and cmpxchg() play nice; this is found to not be true for at least: parisc, sparc32, tile32, metag-lock1, arc-!llsc and hexagon. There is further wreckage, but this in particular seemed easy to trigger, so blacklist this. Opt in for known good archs. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Reported-by: Mikulas Patocka <mpatocka@redhat.com> Cc: David Miller <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Jason Low <jason.low2@hp.com> Cc: Waiman Long <waiman.long@hp.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: John David Anglin <dave.anglin@bell.net> Cc: James Hogan <james.hogan@imgtec.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: sparclinux@vger.kernel.org Link: http://lkml.kernel.org/r/20140606175316.GV13930@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28perf/x86/intel: ignore CondChgd bit to avoid false NMI handlingHATAYAMA Daisuke
commit b292d7a10487aee6e74b1c18b8d95b92f40d4a4f upstream. Currently, any NMI is falsely handled by a NMI handler of NMI watchdog if CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR is set. For example, we use external NMI to make system panic to get crash dump, but in this case, the external NMI is falsely handled do to the issue. This commit deals with the issue simply by ignoring CondChgd bit. Here is explanation in detail. On x86 NMI watchdog uses performance monitoring feature to periodically signal NMI each time performance counter gets overflowed. intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI handler of NMI watchdog, perf_event_nmi_handler(). It identifies an owner of a given NMI by looking at overflow status bits in MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it handles the given NMI as its own NMI. The problem is that the intel_pmu_handle_irq() doesn't distinguish CondChgd bit from other bits. Unlike the other status bits, CondChgd bit doesn't represent overflow status for performance counters. Thus, CondChgd bit cannot be thought of as a mark indicating a given NMI is NMI watchdog's. As a result, if CondChgd bit is set, any NMI is falsely handled by the NMI handler of NMI watchdog. Also, if type of the falsely handled NMI is either NMI_UNKNOWN, NMI_SERR or NMI_IO_CHECK, the corresponding action is never performed until CondChgd bit is cleared. I noticed this behavior on systems with Ivy Bridge processors: Intel Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems, CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set in the beginning at boot. Then the CondChgd bit is immediately cleared by next wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain 0. On the other hand, on older processors such as Nehalem, Xeon E7540, CondChgd bit is not set in the beginning at boot. I'm not sure about exact behavior of CondChgd bit, in particular when this bit is set. Although I read Intel System Programmer's Manual to figure out that, the descriptions I found are: In 18.9.1: "The MSR_PERF_GLOBAL_STATUS MSR also provides a ¡sticky bit¢ to indicate changes to the state of performancmonitoring hardware" In Table 35-2 IA-32 Architectural MSRs 63 CondChg: status bits of this register has changed. These are different from the bahviour I see on the actual system as I explained above. At least, I think ignoring CondChgd bit should be enough for NMI watchdog perspective. Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Acked-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20140625.103503.409316067.d.hatayama@jp.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-17x86, ioremap: Speed up check for RAM pagesRoland Dreier
commit c81c8a1eeede61e92a15103748c23d100880cc8a upstream. In __ioremap_caller() (the guts of ioremap), we loop over the range of pfns being remapped and checks each one individually with page_is_ram(). For large ioremaps, this can be very slow. For example, we have a device with a 256 GiB PCI BAR, and ioremapping this BAR can take 20+ seconds -- sometimes long enough to trigger the soft lockup detector! Internally, page_is_ram() calls walk_system_ram_range() on a single page. Instead, we can make a single call to walk_system_ram_range() from __ioremap_caller(), and do our further checks only for any RAM pages that we find. For the common case of MMIO, this saves an enormous amount of work, since the range being ioremapped doesn't intersect system RAM at all. With this change, ioremap on our 256 GiB BAR takes less than 1 second. Signed-off-by: Roland Dreier <roland@purestorage.com> Link: http://lkml.kernel.org/r/1399054721-1331-1-git-send-email-roland@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-17Score: Modify the Makefile of Score, remove -mlong-calls for compilingLennox Wu
commit df9e4d1c39c472cb44d81ab2ed2db503fc486e3b upstream. Signed-off-by: Lennox Wu <lennox.wu@gmail.com> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-17Score: The commit is for compiling successfully.Lennox Wu
commit 5fbbf8a1a93452b26e7791cf32cefce62b0a480b upstream. The modifications include: 1. Kconfig of Score: we don't support ioremap 2. Missed headfile including 3. There are some errors in other people's commit not checked by us, we fix it now 3.1 arch/score/kernel/entry.S: wrong instructions 3.2 arch/score/kernel/process.c : just some typos Signed-off-by: Lennox Wu <lennox.wu@gmail.com> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-17Score: Implement the function csum_ipv6_magicLennox Wu
commit 1ed62ca648557b884d117a4a8bbcf2ae4e2d1153 upstream. Signed-off-by: Lennox Wu <lennox.wu@gmail.com> Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-17score: normalize global variables exported by vmlinux.ldsJiang Liu
commit ae49b83dcacfb69e22092cab688c415c2f2d870c upstream. Generate mandatory global variables _sdata in file vmlinux.lds. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Lennox Wu <lennox.wu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-17arm64: implement TASK_SIZE_OFColin Cross
commit fa2ec3ea10bd377f9d55772b1dab65178425a1a2 upstream. include/linux/sched.h implements TASK_SIZE_OF as TASK_SIZE if it is not set by the architecture headers. TASK_SIZE uses the current task to determine the size of the virtual address space. On a 64-bit kernel this will cause reading /proc/pid/pagemap of a 64-bit process from a 32-bit process to return EOF when it reads past 0xffffffff. Implement TASK_SIZE_OF exactly the same as TASK_SIZE with test_tsk_thread_flag instead of test_thread_flag. Signed-off-by: Colin Cross <ccross@android.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-17crypto: sha512_ssse3 - fix byte count to bit count conversionJussi Kivilinna
commit cfe82d4f45c7cc39332a2be7c4c1d3bf279bbd3d upstream. Byte-to-bit-count computation is only partly converted to big-endian and is mixing in CPU-endian values. Problem was noticed by sparce with warning: CHECK arch/x86/crypto/sha512_ssse3_glue.c arch/x86/crypto/sha512_ssse3_glue.c:144:19: warning: restricted __be64 degrades to integer arch/x86/crypto/sha512_ssse3_glue.c:144:17: warning: incorrect type in assignment (different base types) arch/x86/crypto/sha512_ssse3_glue.c:144:17: expected restricted __be64 <noident> arch/x86/crypto/sha512_ssse3_glue.c:144:17: got unsigned long long Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-17powerpc/perf: Clear MMCR2 when enabling PMUJoel Stanley
commit b50a6c584bb47b370f84bfd746770c0bbe7129b7 upstream. On POWER8 when switching to a KVM guest we set bits in MMCR2 to freeze the PMU counters. Aside from on boot they are then never reset, resulting in stuck perf counters for any user in the guest or host. We now set MMCR2 to 0 whenever enabling the PMU, which provides a sane state for perf to use the PMU counters under either the guest or the host. This was manifesting as a bug with ppc64_cpu --frequency: $ sudo ppc64_cpu --frequency WARNING: couldn't run on cpu 0 WARNING: couldn't run on cpu 8 ... WARNING: couldn't run on cpu 144 WARNING: couldn't run on cpu 152 min: 18446744073.710 GHz (cpu -1) max: 0.000 GHz (cpu -1) avg: 0.000 GHz The command uses a perf counter to measure CPU cycles over a fixed amount of time, in order to approximate the frequency of the machine. The counters were returning zero once a guest was started, regardless of weather it was still running or had been shut down. By dumping the value of MMCR2, it was observed that once a guest is running MMCR2 is set to 1s - which stops counters from running: $ sudo sh -c 'echo p > /proc/sysrq-trigger' CPU: 0 PMU registers, ppmu = POWER8 n_counters = 6 PMC1: 5b635e38 PMC2: 00000000 PMC3: 00000000 PMC4: 00000000 PMC5: 1bf5a646 PMC6: 5793d378 PMC7: deadbeef PMC8: deadbeef MMCR0: 0000000080000000 MMCR1: 000000001e000000 MMCRA: 0000040000000000 MMCR2: fffffffffffffc00 EBBHR: 0000000000000000 EBBRR: 0000000000000000 BESCR: 0000000000000000 SIAR: 00000000000a51cc SDAR: c00000000fc40000 SIER: 0000000001000000 This is done unconditionally in book3s_hv_interrupts.S upon entering the guest, and the original value is only save/restored if the host has indicated it was using the PMU. This is okay, however the user of the PMU needs to ensure that it is in a defined state when it starts using it. Fixes: e05b9b9e5c10 ("powerpc/perf: Power8 PMU support") Signed-off-by: Joel Stanley <joel@jms.id.au> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-17powerpc/perf: Add PPMU_ARCH_207S defineJoel Stanley
commit 4d9690dd56b0d18f2af8a9d4a279cb205aae3345 upstream. Instead of separate bits for every POWER8 PMU feature, have a single one for v2.07 of the architecture. This saves us adding a MMCR2 define for a future patch. Signed-off-by: Joel Stanley <joel@jms.id.au> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-17powerpc/perf: Never program book3s PMCs with values >= 0x80000000Anton Blanchard
commit f56029410a13cae3652d1f34788045c40a13ffc7 upstream. We are seeing a lot of PMU warnings on POWER8: Can't find PMC that caused IRQ Looking closer, the active PMC is 0 at this point and we took a PMU exception on the transition from negative to 0. Some versions of POWER8 have an issue where they edge detect and not level detect PMC overflows. A number of places program the PMC with (0x80000000 - period_left), where period_left can be negative. We can either fix all of these or just ensure that period_left is always >= 1. This patch takes the second option. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-17parisc: add serial ports of C8000/1GHz machine to hardware databaseHelge Deller
commit eadcc7208a2237016be7bdff4551ba7614da85c8 upstream. Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-09arch/unicore32/mm/alignment.c: include "asm/pgtable.h" to avoid compiling errorChen Gang
commit 1ff38c56cbd095c4c0dfa581a859ba3557830f78 upstream. Need include "asm/pgtable.h" to include "asm-generic/pgtable-nopmd.h", so can let 'pmd_t' defined. The related error with allmodconfig: CC arch/unicore32/mm/alignment.o In file included from arch/unicore32/mm/alignment.c:24: arch/unicore32/include/asm/tlbflush.h:135: error: expected .). before .*. token arch/unicore32/include/asm/tlbflush.h:154: error: expected .). before .*. token In file included from arch/unicore32/mm/alignment.c:27: arch/unicore32/mm/mm.h:15: error: expected .=., .,., .;., .sm. or ._attribute__. before .*. token arch/unicore32/mm/mm.h:20: error: expected .=., .,., .;., .sm. or ._attribute__. before .*. token arch/unicore32/mm/mm.h:25: error: expected .=., .,., .;., .sm. or ._attribute__. before .*. token make[1]: *** [arch/unicore32/mm/alignment.o] Error 1 make: *** [arch/unicore32/mm] Error 2 Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Acked-by: Xuetao Guan <gxt@mprc.pku.edu.cn> Signed-off-by: Xuetao Guan <gxt@mprc.pku.edu.cn> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-09KVM: x86: preserve the high 32-bits of the PAT registerPaolo Bonzini
commit 7cb060a91c0efc5ff94f83c6df3ed705e143cdb9 upstream. KVM does not really do much with the PAT, so this went unnoticed for a long time. It is exposed however if you try to do rdmsr on the PAT register. Reported-by: Valentine Sinitsyn <valentine.sinitsyn@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-09KVM: x86: Increase the number of fixed MTRR regs to 10Nadav Amit
commit 682367c494869008eb89ef733f196e99415ae862 upstream. Recent Intel CPUs have 10 variable range MTRRs. Since operating systems sometime make assumptions on CPUs while they ignore capability MSRs, it is better for KVM to be consistent with recent CPUs. Reporting more MTRRs than actually supported has no functional implications. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-09arm64: Bug fix in stack alignment exceptionChiaHao
commit 3906c2b53cd23c2ae03e6ce41432c8e7f0a3cbbb upstream. The value of ESR has been stored into x1, and should be directly pass to do_sp_pc_abort function, "MOV x1, x25" is an extra operation and do_sp_pc_abort will get the wrong value of ESR. Signed-off-by: ChiaHao <andy.jhshiu@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-09ARM: OMAP2+: Fix parser-bug in platform muxing codeDavid R. Piegdon
commit c021f241f4fab2bb4fc4120a38a828a03dd3f970 upstream. Fix a parser-bug in the omap2 muxing code where muxtable-entries will be wrongly selected if the requested muxname is a *prefix* of their m0-entry and they have a matching mN-entry. Fix by additionally checking that the length of the m0_entry is equal. For example muxing of "dss_data2.dss_data2" on omap32xx will fail because the prefix "dss_data2" will match the mux-entries "dss_data2" as well as "dss_data20", with the suffix "dss_data2" matching m0 (for dss_data2) and m4 (for dss_data20). Thus both are recognized as signal path candidates: Relevant muxentries from mux34xx.c: _OMAP3_MUXENTRY(DSS_DATA20, 90, "dss_data20", NULL, "mcspi3_somi", "dss_data2", "gpio_90", NULL, NULL, "safe_mode"), _OMAP3_MUXENTRY(DSS_DATA2, 72, "dss_data2", NULL, NULL, NULL, "gpio_72", NULL, NULL, "safe_mode"), This will result in a failure to mux the pin at all: _omap_mux_get_by_name: Multiple signal paths (2) for dss_data2.dss_data2 Patch should apply to linus' latest master down to rather old linux-2.6 trees. Signed-off-by: David R. Piegdon <lkml@p23q.org> Cc: stable@vger.kernel.org [tony@atomide.com: updated description to include full description] Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-06ptrace,x86: force IRET path after a ptrace_stop()Tejun Heo
commit b9cd18de4db3c9ffa7e17b0dc0ca99ed5aa4d43a upstream. The 'sysret' fastpath does not correctly restore even all regular registers, much less any segment registers or reflags values. That is very much part of why it's faster than 'iret'. Normally that isn't a problem, because the normal ptrace() interface catches the process using the signal handler infrastructure, which always returns with an iret. However, some paths can get caught using ptrace_event() instead of the signal path, and for those we need to make sure that we aren't going to return to user space using 'sysret'. Otherwise the modifications that may have been done to the register set by the tracer wouldn't necessarily take effect. Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from arch_ptrace_stop_needed() which is invoked from ptrace_stop(). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Oleg Nesterov <oleg@redhat.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-06MIPS: KVM: Fix memory leak on VCPUDeng-Cheng Zhu
commit 8c9eb041cf76038eb3b62ee259607eec9b89f48d upstream. kvm_arch_vcpu_free() is called in 2 code paths: 1) kvm_vm_ioctl() kvm_vm_ioctl_create_vcpu() kvm_arch_vcpu_destroy() kvm_arch_vcpu_free() 2) kvm_put_kvm() kvm_destroy_vm() kvm_arch_destroy_vm() kvm_mips_free_vcpus() kvm_arch_vcpu_free() Neither of the paths handles VCPU free. We need to do it in kvm_arch_vcpu_free() corresponding to the memory allocation in kvm_arch_vcpu_create(). Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-06MIPS: KVM: Remove redundant NULL checks before kfree()James Hogan
commit c6c0a6637f9da54f9472144d44f71cf847f92e20 upstream. The kfree() function already NULL checks the parameter so remove the redundant NULL checks before kfree() calls in arch/mips/kvm/. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Sanjay Lal <sanjayl@kymasys.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-06powerpc: Add AT_HWCAP2 to indicate V.CRYPTO category supportBenjamin Herrenschmidt
commit dd58a092c4202f2bd490adab7285b3ff77f8e467 upstream. The Vector Crypto category instructions are supported by current POWER8 chips, advertise them to userspace using a specific bit to properly differentiate with chips of the same architecture level that might not have them. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-06powerpc: fix typo 'CONFIG_PPC_CPU'Paul Bolle
commit b69a1da94f3d1589d1942b5d1b384d8cfaac4500 upstream. Commit cd64d1697cf0 ("powerpc: mtmsrd not defined") added a check for CONFIG_PPC_CPU were a check for CONFIG_PPC_FPU was clearly intended. Fixes: cd64d1697cf0 ("powerpc: mtmsrd not defined") Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-06powerpc: fix typo 'CONFIG_PMAC'Paul Bolle
commit 6e0fdf9af216887e0032c19d276889aad41cad00 upstream. Commit b0d278b7d3ae ("powerpc/perf_event: Reduce latency of calling perf_event_do_pending") added a check for CONFIG_PMAC were a check for CONFIG_PPC_PMAC was clearly intended. Fixes: b0d278b7d3ae ("powerpc/perf_event: Reduce latency of calling perf_event_do_pending") Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-06powerpc: 64bit sendfile is capped at 2GBAnton Blanchard
commit 5d73320a96fcce80286f1447864c481b5f0b96fa upstream. commit 8f9c0119d7ba (compat: fs: Generic compat_sys_sendfile implementation) changed the PowerPC 64bit sendfile call from sys_sendile64 to sys_sendfile. Unfortunately this broke sendfile of lengths greater than 2G because sys_sendfile caps at MAX_NON_LFS. Restore what we had previously which fixes the bug. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>