summaryrefslogtreecommitdiff
path: root/Documentation/x86
AgeCommit message (Collapse)Author
2019-11-12x86/speculation/taa: Add documentation for TSX Async AbortPawan Gupta
commit a7a248c593e4fd7a67c50b5f5318fe42a0db335e upstream. Add the documenation for TSX Async Abort. Include the description of the issue, how to check the mitigation state, control the mitigation, guidance for system administrators. [ bp: Add proper SPDX tags, touch ups by Josh and me. ] Co-developed-by: Antonio Gomez Iglesias <antonio.gomez.iglesias@intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Antonio Gomez Iglesias <antonio.gomez.iglesias@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mark Gross <mgross@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21x86/speculation/mds: Improve CPU buffer clear documentationAndy Lutomirski
commit 9d8d0294e78a164d407133dea05caf4b84247d6a upstream. On x86_64, all returns to usermode go through prepare_exit_to_usermode(), with the sole exception of do_nmi(). This even includes machine checks -- this was added several years ago to support MCE recovery. Update the documentation. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jon Masters <jcm@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 04dcbdb80578 ("x86/speculation/mds: Clear CPU buffers on exit to user") Link: http://lkml.kernel.org/r/999fa9e126ba6a48e9d214d2f18dbde5c62ac55c.1557865329.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21x86/speculation/mds: Revert CPU buffer clear on double fault exitAndy Lutomirski
commit 88640e1dcd089879530a49a8d212d1814678dfe7 upstream. The double fault ESPFIX path doesn't return to user mode at all -- it returns back to the kernel by simulating a #GP fault. prepare_exit_to_usermode() will run on the way out of general_protection before running user code. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jon Masters <jcm@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 04dcbdb80578 ("x86/speculation/mds: Clear CPU buffers on exit to user") Link: http://lkml.kernel.org/r/ac97612445c0a44ee10374f6ea79c222fe22a5c4.1557865329.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-14x86/speculation/mds: Fix documentation typoJosh Poimboeuf
commit 95310e348a321b45fb746c176961d4da72344282 upstream Fix a minor typo in the MDS documentation: "eanbled" -> "enabled". Reported-by: Jeff Bastian <jbastian@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-14x86/mds: Add MDSUM variant to the MDS documentationspeck for Pawan Gupta
commit e672f8bf71c66253197e503f75c771dd28ada4a0 upstream Updated the documentation for a new CVE-2019-11091 Microarchitectural Data Sampling Uncacheable Memory (MDSUM) which is a variant of Microarchitectural Data Sampling (MDS). MDS is a family of side channel attacks on internal buffers in Intel CPUs. MDSUM is a special case of MSBDS, MFBDS and MLPDS. An uncacheable load from memory that takes a fault or assist can leave data in a microarchitectural structure that may later be observed using one of the same methods used by MSBDS, MFBDS or MLPDS. There are no new code changes expected for MDSUM. The existing mitigation for MDS applies to MDSUM as well. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Reviewed-by: Jon Masters <jcm@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-14x86/speculation/mds: Add mitigation mode VMWERVThomas Gleixner
commit 22dd8365088b6403630b82423cf906491859b65e upstream In virtualized environments it can happen that the host has the microcode update which utilizes the VERW instruction to clear CPU buffers, but the hypervisor is not yet updated to expose the X86_FEATURE_MD_CLEAR CPUID bit to guests. Introduce an internal mitigation mode VMWERV which enables the invocation of the CPU buffer clearing even if X86_FEATURE_MD_CLEAR is not set. If the system has no updated microcode this results in a pointless execution of the VERW instruction wasting a few CPU cycles. If the microcode is updated, but not exposed to a guest then the CPU buffers will be cleared. That said: Virtual Machines Will Eventually Receive Vaccine Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jon Masters <jcm@redhat.com> Tested-by: Jon Masters <jcm@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-14x86/speculation/mds: Conditionally clear CPU buffers on idle entryThomas Gleixner
commit 07f07f55a29cb705e221eda7894dd67ab81ef343 upstream Add a static key which controls the invocation of the CPU buffer clear mechanism on idle entry. This is independent of other MDS mitigations because the idle entry invocation to mitigate the potential leakage due to store buffer repartitioning is only necessary on SMT systems. Add the actual invocations to the different halt/mwait variants which covers all usage sites. mwaitx is not patched as it's not available on Intel CPUs. The buffer clear is only invoked before entering the C-State to prevent that stale data from the idling CPU is spilled to the Hyper-Thread sibling after the Store buffer got repartitioned and all entries are available to the non idle sibling. When coming out of idle the store buffer is partitioned again so each sibling has half of it available. Now CPU which returned from idle could be speculatively exposed to contents of the sibling, but the buffers are flushed either on exit to user space or on VMENTER. When later on conditional buffer clearing is implemented on top of this, then there is no action required either because before returning to user space the context switch will set the condition flag which causes a flush on the return to user path. Note, that the buffer clearing on idle is only sensible on CPUs which are solely affected by MSBDS and not any other variant of MDS because the other MDS variants cannot be mitigated when SMT is enabled, so the buffer clearing on idle would be a window dressing exercise. This intentionally does not handle the case in the acpi/processor_idle driver which uses the legacy IO port interface for C-State transitions for two reasons: - The acpi/processor_idle driver was replaced by the intel_idle driver almost a decade ago. Anything Nehalem upwards supports it and defaults to that new driver. - The legacy IO port interface is likely to be used on older and therefore unaffected CPUs or on systems which do not receive microcode updates anymore, so there is no point in adding that. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Jon Masters <jcm@redhat.com> Tested-by: Jon Masters <jcm@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-14x86/speculation/mds: Clear CPU buffers on exit to userThomas Gleixner
commit 04dcbdb8057827b043b3c71aa397c4c63e67d086 upstream Add a static key which controls the invocation of the CPU buffer clear mechanism on exit to user space and add the call into prepare_exit_to_usermode() and do_nmi() right before actually returning. Add documentation which kernel to user space transition this covers and explain why some corner cases are not mitigated. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Jon Masters <jcm@redhat.com> Tested-by: Jon Masters <jcm@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-14x86/speculation/mds: Add mds_clear_cpu_buffers()Thomas Gleixner
commit 6a9e529272517755904b7afa639f6db59ddb793e upstream The Microarchitectural Data Sampling (MDS) vulernabilities are mitigated by clearing the affected CPU buffers. The mechanism for clearing the buffers uses the unused and obsolete VERW instruction in combination with a microcode update which triggers a CPU buffer clear when VERW is executed. Provide a inline function with the assembly magic. The argument of the VERW instruction must be a memory operand as documented: "MD_CLEAR enumerates that the memory-operand variant of VERW (for example, VERW m16) has been extended to also overwrite buffers affected by MDS. This buffer overwriting functionality is not guaranteed for the register operand variant of VERW." Documentation also recommends to use a writable data segment selector: "The buffer overwriting occurs regardless of the result of the VERW permission check, as well as when the selector is null or causes a descriptor load segment violation. However, for lowest latency we recommend using a selector that indicates a valid writable data segment." Add x86 specific documentation about MDS and the internal workings of the mitigation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Jon Masters <jcm@redhat.com> Tested-by: Jon Masters <jcm@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-27x86/mm: Move LDT remap out of KASLR region on 5-level pagingKirill A. Shutemov
commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream On 5-level paging the LDT remap area is placed in the middle of the KASLR randomization region and it can overlap with the direct mapping, the vmalloc or the vmap area. The LDT mapping is per mm, so it cannot be moved into the P4D page table next to the CPU_ENTRY_AREA without complicating PGD table allocation for 5-level paging. The 4 PGD slot gap just before the direct mapping is reserved for hypervisors, so it cannot be used. Move the direct mapping one slot deeper and use the resulting gap for the LDT remap area. The resulting layout is the same for 4 and 5 level paging. [ tglx: Massaged changelog ] Fixes: f55f0501cbf6 ("x86/pti: Put the LDT in its own PGD if PTI is on") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: peterz@infradead.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: bhe@redhat.com Cc: willy@infradead.org Cc: linux-mm@kvack.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181026122856.66224-2-kirill.shutemov@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-01-23x86/pti: Document fix wrong indexzhenwei.pi
commit 98f0fceec7f84d80bc053e49e596088573086421 upstream. In section <2. Runtime Cost>, fix wrong index. Signed-off-by: zhenwei.pi <zhenwei.pi@youruncloud.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@linux.intel.com Link: https://lkml.kernel.org/r/1516237492-27739-1-git-send-email-zhenwei.pi@youruncloud.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17x86/Documentation: Add PTI descriptionDave Hansen
commit 01c9b17bf673b05bb401b76ec763e9730ccf1376 upstream. Add some details about how PTI works, what some of the downsides are, and how to debug it when things go wrong. Also document the kernel parameter: 'pti/nopti'. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Moritz Lipp <moritz.lipp@iaik.tugraz.at> Cc: Daniel Gruss <daniel.gruss@iaik.tugraz.at> Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at> Cc: Richard Fellner <richard.fellner@student.tugraz.at> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Andi Lutomirsky <luto@kernel.org> Link: https://lkml.kernel.org/r/20180105174436.1BC6FA2B@viggo.jf.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-10x86/kaslr: Fix the vaddr_end messThomas Gleixner
commit 1dddd25125112ba49706518ac9077a1026a18f37 upstream. vaddr_end for KASLR is only documented in the KASLR code itself and is adjusted depending on config options. So it's not surprising that a change of the memory layout causes KASLR to have the wrong vaddr_end. This can map arbitrary stuff into other areas causing hard to understand problems. Remove the whole ifdef magic and define the start of the cpu_entry_area to be the end of the KASLR vaddr range. Add documentation to that effect. Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap") Reported-by: Benjamin Gilbert <benjamin.gilbert@coreos.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Benjamin Gilbert <benjamin.gilbert@coreos.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-10x86/mm: Map cpu_entry_area at the same place on 4/5 levelThomas Gleixner
commit f2078904810373211fb15f91888fba14c01a4acc upstream. There is no reason for 4 and 5 level pagetables to have a different layout. It just makes determining vaddr_end for KASLR harder than necessary. Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Gilbert <benjamin.gilbert@coreos.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-10x86/mm: Set MODULES_END to 0xffffffffff000000Andrey Ryabinin
commit f5a40711fa58f1c109165a4fec6078bf2dfd2bdc upstream. Since f06bdd4001c2 ("x86/mm: Adapt MODULES_END based on fixmap section size") kasan_mem_to_shadow(MODULES_END) could be not aligned to a page boundary. So passing page unaligned address to kasan_populate_zero_shadow() have two possible effects: 1) It may leave one page hole in supposed to be populated area. After commit 21506525fb8d ("x86/kasan/64: Teach KASAN about the cpu_entry_area") that hole happens to be in the shadow covering fixmap area and leads to crash: BUG: unable to handle kernel paging request at fffffbffffe8ee04 RIP: 0010:check_memory_region+0x5c/0x190 Call Trace: <NMI> memcpy+0x1f/0x50 ghes_copy_tofrom_phys+0xab/0x180 ghes_read_estatus+0xfb/0x280 ghes_notify_nmi+0x2b2/0x410 nmi_handle+0x115/0x2c0 default_do_nmi+0x57/0x110 do_nmi+0xf8/0x150 end_repeat_nmi+0x1a/0x1e Note, the crash likely disappeared after commit 92a0f81d8957, which changed kasan_populate_zero_shadow() call the way it was before commit 21506525fb8d. 2) Attempt to load module near MODULES_END will fail, because __vmalloc_node_range() called from kasan_module_alloc() will hit the WARN_ON(!pte_none(*pte)) in the vmap_pte_range() and bail out with error. To fix this we need to make kasan_mem_to_shadow(MODULES_END) page aligned which means that MODULES_END should be 8*PAGE_SIZE aligned. The whole point of commit f06bdd4001c2 was to move MODULES_END down if NR_CPUS is big, so the cpu_entry_area takes a lot of space. But since 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap") the cpu_entry_area is no longer in fixmap, so we could just set MODULES_END to a fixed 8*PAGE_SIZE aligned address. Fixes: f06bdd4001c2 ("x86/mm: Adapt MODULES_END based on fixmap section size") Reported-by: Jakub Kicinski <kubakici@wp.pl> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Garnier <thgarnie@google.com> Link: https://lkml.kernel.org/r/20171228160620.23818-1-aryabinin@virtuozzo.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02x86/pti: Put the LDT in its own PGD if PTI is onAndy Lutomirski
commit f55f0501cbf65ec41cca5058513031b711730b1d upstream. With PTI enabled, the LDT must be mapped in the usermode tables somewhere. The LDT is per process, i.e. per mm. An earlier approach mapped the LDT on context switch into a fixmap area, but that's a big overhead and exhausted the fixmap space when NR_CPUS got big. Take advantage of the fact that there is an address space hole which provides a completely unused pgd. Use this pgd to manage per-mm LDT mappings. This has a down side: the LDT isn't (currently) randomized, and an attack that can write the LDT is instant root due to call gates (thanks, AMD, for leaving call gates in AMD64 but designing them wrong so they're only useful for exploits). This can be mitigated by making the LDT read-only or randomizing the mapping, either of which is strightforward on top of this patch. This will significantly slow down LDT users, but that shouldn't matter for important workloads -- the LDT is only used by DOSEMU(2), Wine, and very old libc implementations. [ tglx: Cleaned it up. ] Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02x86/mm/64: Make a full PGD-entry size hole in the memory mapAndy Lutomirski
commit 9f449772a3106bcdd4eb8fdeb281147b0e99fb30 upstream. Shrink vmalloc space from 16384TiB to 12800TiB to enlarge the hole starting at 0xff90000000000000 to be a full PGD entry. A subsequent patch will use this hole for the pagetable isolation LDT alias. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-29x86/cpu_entry_area: Move it out of the fixmapThomas Gleixner
commit 92a0f81d89571e3e8759366e050ee05cc545ef99 upstream. Put the cpu_entry_area into a separate P4D entry. The fixmap gets too big and 0-day already hit a case where the fixmap PTEs were cleared by cleanup_highmap(). Aside of that the fixmap API is a pain as it's all backwards. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-29x86/doc: Remove obvious weirdnesses from the x86 MM layout documentationPeter Zijlstra
commit e8ffe96e5933d417195268478479933d56213a3f upstream. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-29x86/mm/64: Improve the memory map documentationAndy Lutomirski
commit 5a7ccf4754fb3660569a6de52ba7f7fc3dfaf280 upstream. The old docs had the vsyscall range wrong and were missing the fixmap. Fix both. There used to be 8 MB reserved for future vsyscalls, but that's long gone. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25x86/kasan: Use the same shadow offset for 4- and 5-level pagingAndrey Ryabinin
commit 12a8cc7fcf54a8575f094be1e99032ec38aa045c upstream. We are going to support boot-time switching between 4- and 5-level paging. For KASAN it means we cannot have different KASAN_SHADOW_OFFSET for different paging modes: the constant is passed to gcc to generate code and cannot be changed at runtime. This patch changes KASAN code to use 0xdffffc0000000000 as shadow offset for both 4- and 5-level paging. For 5-level paging it means that shadow memory region is not aligned to PGD boundary anymore and we have to handle unaligned parts of the region properly. In addition, we have to exclude paravirt code from KASAN instrumentation as we now use set_pgd() before KASAN is fully ready. [kirill.shutemov@linux.intel.com: clenaup, changelog message] Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@suse.de> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170929140821.37654-4-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25x86/unwind: Rename unwinder config options to 'CONFIG_UNWINDER_*'Josh Poimboeuf
commit 11af847446ed0d131cf24d16a7ef3d5ea7a49554 upstream. Rename the unwinder config options from: CONFIG_ORC_UNWINDER CONFIG_FRAME_POINTER_UNWINDER CONFIG_GUESS_UNWINDER to: CONFIG_UNWINDER_ORC CONFIG_UNWINDER_FRAME_POINTER CONFIG_UNWINDER_GUESS ... in order to give them a more logical config namespace. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/73972fc7e2762e91912c6b9584582703d6f1b8cc.1507924831.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-04Merge branch 'x86-cache-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cache quality monitoring update from Thomas Gleixner: "This update provides a complete rewrite of the Cache Quality Monitoring (CQM) facility. The existing CQM support was duct taped into perf with a lot of issues and the attempts to fix those turned out to be incomplete and horrible. After lengthy discussions it was decided to integrate the CQM support into the Resource Director Technology (RDT) facility, which is the obvious choise as in hardware CQM is part of RDT. This allowed to add Memory Bandwidth Monitoring support on top. As a result the mechanisms for allocating cache/memory bandwidth and the corresponding monitoring mechanisms are integrated into a single management facility with a consistent user interface" * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) x86/intel_rdt: Turn off most RDT features on Skylake x86/intel_rdt: Add command line options for resource director technology x86/intel_rdt: Move special case code for Haswell to a quirk function x86/intel_rdt: Remove redundant ternary operator on return x86/intel_rdt/cqm: Improve limbo list processing x86/intel_rdt/mbm: Fix MBM overflow handler during CPU hotplug x86/intel_rdt: Modify the intel_pqr_state for better performance x86/intel_rdt/cqm: Clear the default RMID during hotcpu x86/intel_rdt: Show bitmask of shareable resource with other executing units x86/intel_rdt/mbm: Handle counter overflow x86/intel_rdt/mbm: Add mbm counter initialization x86/intel_rdt/mbm: Basic counting of MBM events (total and local) x86/intel_rdt/cqm: Add CPU hotplug support x86/intel_rdt/cqm: Add sched_in support x86/intel_rdt: Introduce rdt_enable_key for scheduling x86/intel_rdt/cqm: Add mount,umount support x86/intel_rdt/cqm: Add rmdir support x86/intel_rdt: Separate the ctrl bits from rmdir x86/intel_rdt/cqm: Add mon_data x86/intel_rdt: Prepare for RDT monitor data support ...
2017-09-04Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "PCID support, 5-level paging support, Secure Memory Encryption support The main changes in this cycle are support for three new, complex hardware features of x86 CPUs: - Add 5-level paging support, which is a new hardware feature on upcoming Intel CPUs allowing up to 128 PB of virtual address space and 4 PB of physical RAM space - a 512-fold increase over the old limits. (Supercomputers of the future forecasting hurricanes on an ever warming planet can certainly make good use of more RAM.) Many of the necessary changes went upstream in previous cycles, v4.14 is the first kernel that can enable 5-level paging. This feature is activated via CONFIG_X86_5LEVEL=y - disabled by default. (By Kirill A. Shutemov) - Add 'encrypted memory' support, which is a new hardware feature on upcoming AMD CPUs ('Secure Memory Encryption', SME) allowing system RAM to be encrypted and decrypted (mostly) transparently by the CPU, with a little help from the kernel to transition to/from encrypted RAM. Such RAM should be more secure against various attacks like RAM access via the memory bus and should make the radio signature of memory bus traffic harder to intercept (and decrypt) as well. This feature is activated via CONFIG_AMD_MEM_ENCRYPT=y - disabled by default. (By Tom Lendacky) - Enable PCID optimized TLB flushing on newer Intel CPUs: PCID is a hardware feature that attaches an address space tag to TLB entries and thus allows to skip TLB flushing in many cases, even if we switch mm's. (By Andy Lutomirski) All three of these features were in the works for a long time, and it's coincidence of the three independent development paths that they are all enabled in v4.14 at once" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (65 commits) x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y) x86/mm: Use pr_cont() in dump_pagetable() x86/mm: Fix SME encryption stack ptr handling kvm/x86: Avoid clearing the C-bit in rsvd_bits() x86/CPU: Align CR3 defines x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages acpi, x86/mm: Remove encryption mask from ACPI page protection type x86/mm, kexec: Fix memory corruption with SME on successive kexecs x86/mm/pkeys: Fix typo in Documentation/x86/protection-keys.txt x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y x86/mm: Allow userspace have mappings above 47-bit x86/mm: Prepare to expose larger address space to userspace x86/mpx: Do not allow MPX if we have mappings above 47-bit x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit() x86/xen: Redefine XEN_ELFNOTE_INIT_P2M using PUD_SIZE * PTRS_PER_PUD x86/mm/dump_pagetables: Fix printout of p4d level x86/mm/dump_pagetables: Generalize address normalization x86/boot: Fix memremap() related build failure ...
2017-09-04Merge branch 'x86-microcode-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loading updates from Ingo Molnar: "Update documentation, improve robustness and fix a memory leak" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/intel: Improve microcode patches saving flow x86/microcode: Document the three loading methods x86/microcode/AMD: Free unneeded patch before exit from update_cache()
2017-08-01x86/intel_rdt: Show bitmask of shareable resource with other executing unitsFenghua Yu
CPUID.(EAX=0x10, ECX=res#):EBX[31:0] reports a bit mask for a resource. Each set bit within the length of the CBM indicates the corresponding unit of the resource allocation may be used by other entities in the platform (e.g. an integrated graphics engine or hardware units outside the processor core and have direct access to the resource). Each cleared bit within the length of the CBM indicates the corresponding allocation unit can be configured to implement a priority-based allocation scheme without interference with other hardware agents in the system. Bits outside the length of the CBM are reserved. More details on the bit mask are described in x86 Software Developer's Manual. The bitmask is shown in "info" directory for each resource. It's up to user to decide how to use the bitmask within a CBM in a partition to share or isolate a resource with other executing units. Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: vikas.shivappa@linux.intel.com Link: http://lkml.kernel.org/r/20170725223904.12996-1-tony.luck@intel.com
2017-08-01x86/intel_rdt/cqm: Documentation for resctrl based RDT MonitoringVikas Shivappa
Add a description of resctrl based RDT(resource director technology) monitoring extension and its usage. [Tony: Added descriptions for how monitoring and allocation are measured and some cleanups] Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-3-git-send-email-vikas.shivappa@linux.intel.com
2017-07-26x86/unwind: Add the ORC unwinderJosh Poimboeuf
Add the new ORC unwinder which is enabled by CONFIG_ORC_UNWINDER=y. It plugs into the existing x86 unwinder framework. It relies on objtool to generate the needed .orc_unwind and .orc_unwind_ip sections. For more details on why ORC is used instead of DWARF, see Documentation/x86/orc-unwinder.txt - but the short version is that it's a simplified, fundamentally more robust debugninfo data structure, which also allows up to two orders of magnitude faster lookups than the DWARF unwinder - which matters to profiling workloads like perf. Thanks to Andy Lutomirski for the performance improvement ideas: splitting the ORC unwind table into two parallel arrays and creating a fast lookup table to search a subset of the unwind table. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/0a6cbfb40f8da99b7a45a1a8302dc6aef16ec812.1500938583.git.jpoimboe@redhat.com [ Extended the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-25x86/mm/pkeys: Fix typo in Documentation/x86/protection-keys.txtWang Kai
Replace PKEY_DENY_WRITE with PKEY_DISABLE_WRITE, to match the source code. Signed-off-by: Wang Kai <morgan.wang@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: corbet@lwn.net Cc: dave.hansen@intel.com Cc: dave.hansen@linux.intel.com Cc: linux-doc@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-25x86/microcode: Document the three loading methodsBorislav Petkov
Paul Menzel recently asked how to load microcode on a system and I realized that we don't really have all the methods written down somewhere. Do that, so people can go and look them up. Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170724101228.17326-3-bp@alien8.de [ Fix whitespace noise in the new description. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-21x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=yKirill A. Shutemov
Most of things are in place and we can enable support for 5-level paging. The patch makes XEN_PV and XEN_PVH dependent on !X86_5LEVEL. Both are not ready to work with 5-level paging. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170716225954.74185-9-kirill.shutemov@linux.intel.com [ Minor readability edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18x86/cpu/AMD: Document AMD Secure Memory Encryption (SME)Tom Lendacky
Create a Documentation entry to describe the AMD Secure Memory Encryption (SME) feature and add documentation for the mem_encrypt= kernel parameter. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/ca0a0c13b055fd804cfc92cbaca8acd68057eed0.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-14x86/mce: Update bootlog description to reflect behavior on AMDYazen Ghannam
The bootlog option is only disabled by default on AMD Fam10h and older systems. Update bootlog description to say this. Change the family value to hex to avoid confusion. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170613162835.30750-9-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-09x86/intel_rdt: Fix a typo in DocumentationXiaochen Shen
Example 3 contains a typo: "C0" in "# echo C0 > p0/cpus" is wrong because it specifies core 6-7 instead of wanted core 4-7. Correct this typo to avoid confusion. Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Acked-by: Fenghua Yu <fenghua.yu@intel.com> Cc: vikas.shivappa@linux.intel.com Cc: tony.luck@intel.com Link: http://lkml.kernel.org/r/1493781356-24229-1-git-send-email-xiaochen.shen@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-05-01Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "The main x86 MM changes in this cycle were: - continued native kernel PCID support preparation patches to the TLB flushing code (Andy Lutomirski) - various fixes related to 32-bit compat syscall returning address over 4Gb in applications, launched from 64-bit binaries - motivated by C/R frameworks such as Virtuozzo. (Dmitry Safonov) - continued Intel 5-level paging enablement: in particular the conversion of x86 GUP to the generic GUP code. (Kirill A. Shutemov) - x86/mpx ABI corner case fixes/enhancements (Joerg Roedel) - ... plus misc updates, fixes and cleanups" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits) mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash x86/mm: Fix flush_tlb_page() on Xen x86/mm: Make flush_tlb_mm_range() more predictable x86/mm: Remove flush_tlb() and flush_tlb_current_task() x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly() x86/mm/64: Fix crash in remove_pagetable() Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation" x86/boot/e820: Remove a redundant self assignment x86/mm: Fix dump pagetables for 4 levels of page tables x86/mpx, selftests: Only check bounds-vs-shadow when we keep shadow x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space Revert "x86/mm/numa: Remove numa_nodemask_from_meminfo()" x86/espfix: Add support for 5-level paging x86/kasan: Extend KASAN to support 5-level paging x86/mm: Add basic defines/helpers for CONFIG_X86_5LEVEL=y x86/paravirt: Add 5-level support to the paravirt code x86/mm: Define virtual memory map for 5-level paging x86/asm: Remove __VIRTUAL_MASK_SHIFT==47 assert x86/boot: Detect 5-level paging support x86/mm/numa: Remove numa_nodemask_from_meminfo() ...
2017-05-01Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Ingo Molnar: "The biggest changes are an extension of the Intel RDT code to extend it with Intel Memory Bandwidth Allocation CPU support: MBA allows bandwidth allocation between cores, while CBM (already upstream) allows CPU cache partitioning. There's also misc smaller fixes and updates" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/intel_rdt: Return error for incorrect resource names in schemata x86/intel_rdt: Trim whitespace while parsing schemata input x86/intel_rdt: Fix padding when resource is enabled via mount x86/intel_rdt: Get rid of anon union x86/cpu: Keep model defines sorted by model number x86/intel_rdt/mba: Add schemata file support for MBA x86/intel_rdt: Make schemata file parsers resource specific x86/intel_rdt/mba: Add info directory files for Memory Bandwidth Allocation x86/intel_rdt: Make information files resource specific x86/intel_rdt/mba: Add primary support for Memory Bandwidth Allocation (MBA) x86/intel_rdt/mba: Memory bandwith allocation feature detect x86/intel_rdt: Add resource specific msr update function x86/intel_rdt: Move CBM specific data into a struct x86/intel_rdt: Cleanup namespace to support multiple resource types Documentation, x86: Intel Memory bandwidth allocation x86/intel_rdt: Organize code properly x86/intel_rdt: Init padding only if a device exists x86/intel_rdt: Add cpus_list rdtgroup file x86/intel_rdt: Cleanup kernel-doc x86/intel_rdt: Update schemata read to show data in tabular format ...
2017-04-14Documentation, x86: Intel Memory bandwidth allocationVikas Shivappa
Update the 'intel_rdt_ui' documentation to have Memory bandwidth(b/w) allocation interface usage. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1491611637-20417-2-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-11Merge branch 'x86/boot' into x86/mm, to avoid conflictIngo Molnar
There's a conflict between ongoing level-5 paging support and the E820 rewrite. Since the E820 rewrite is essentially ready, merge it into x86/mm to reduce tree conflicts. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-10x86/intel_rdt: Add cpus_list rdtgroup fileJiri Olsa
The resource control filesystem provides only a bitmask based cpus file for assigning CPUs to a resource group. That's cumbersome with large cpumasks and non-intuitive when modifying the file from the command line. Range based cpu lists are commonly used along with bitmask based cpu files in various subsystems throughout the kernel. Add 'cpus_list' file which is CPU range based. # cd /sys/fs/resctrl/ # echo 1-10 > krava/cpus_list # cat krava/cpus_list 1-10 # cat krava/cpus 0007fe # cat cpus fffff9 # cat cpus_list 0,3-23 [ tglx: Massaged changelog and replaced "bitmask lists" by "CPU ranges" ] Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Shaohua Li <shli@fb.com> Link: http://lkml.kernel.org/r/20170410145232.GF25354@krava Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-05x86/intel_rdt: Implement "update" mode when writing schemata fileTony Luck
The schemata file can have multiple lines and it is cumbersome to update all lines. Remove code that requires that the user provides values for every resource (in the right order). If the user provides values for just a few resources, update them and leave the rest unchanged. Side benefit: we now check which values were updated and only send IPIs to cpus that actually have updates. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: ravi.v.shankar@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: vikas.shivappa@intel.com Cc: h.peter.anvin@intel.com Link: http://lkml.kernel.org/r/1491255857-17213-3-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-04x86/mm: Define virtual memory map for 5-level pagingKirill A. Shutemov
The first part of memory map (up to %esp fixup) simply scales existing map for 4-level paging by factor of 9 -- number of bits addressed by the additional page table level. The rest of the map is unchanged. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170330080731.65421-4-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16x86/mm: Adapt MODULES_END based on fixmap section sizeThomas Garnier
This patch aligns MODULES_END to the beginning of the fixmap section. It optimizes the space available for both sections. The address is pre-computed based on the number of pages required by the fixmap section. It will allow GDT remapping in the fixmap section. The current MODULES_END static address does not provide enough space for the kernel to support a large number of processors. Signed-off-by: Thomas Garnier <thgarnie@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@suse.de> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Luis R . Rodriguez <mcgrof@kernel.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: kasan-dev@googlegroups.com Cc: kernel-hardening@lists.openwall.com Cc: kvm@vger.kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-pm@vger.kernel.org Cc: xen-devel@lists.xenproject.org Cc: zijun_hu <zijun_hu@htc.com> Link: http://lkml.kernel.org/r/20170314170508.100882-1-thgarnie@google.com [ Small build fix. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-01Merge branch 'linus' into WIP.x86/boot, to fix up conflicts and to pick up ↵Ingo Molnar
updates Conflicts: arch/x86/xen/setup.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-28Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two documentation updates, plus a debugging annotation fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/crash: Update the stale comment in reserve_crashkernel() x86/irq, trace: Add __irq_entry annotation to x86's platform IRQ handlers Documentation, x86, resctrl: Recommend locking for resctrlfs
2017-02-20Merge branches 'x86/cache', 'x86/debug' and 'x86/irq' into x86/urgentIngo Molnar
Pick up simple singular commits from their topic branches. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-07efi: Get and store the secure boot statusDavid Howells
Get the firmware's secure-boot status in the kernel boot wrapper and stash it somewhere that the main kernel image can find. The efi_get_secureboot() function is extracted from the ARM stub and (a) generalised so that it can be called from x86 and (b) made to use efi_call_runtime() so that it can be run in mixed-mode. For x86, it is stored in boot_params and can be overridden by the boot loader or kexec. This allows secure-boot mode to be passed on to a new kernel. Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1486380166-31868-5-git-send-email-ard.biesheuvel@linaro.org [ Small readability edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28x86/boot/e820: Rename everything to e820_tableIngo Molnar
No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28x86/boot/e820: Rename 'e820_map' variables to 'e820_array'Ingo Molnar
In line with the rename to 'struct e820_array', harmonize the naming of common e820 table variable names as well: e820 => e820_array e820_saved => e820_array_saved e820_map => e820_array initial_e820 => e820_array_init This makes the variable names more consistent and easier to grep for. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28x86/boot/e820: Rename the basic e820 data types to 'struct e820_entry' and ↵Ingo Molnar
'struct e820_array' The 'e820entry' and 'e820map' names have various annoyances: - the missing underscore departs from the usual kernel style and makes the code look weird, - in the past I kept confusing the 'map' with the 'entry', because a 'map' is ambiguous in that regard, - it's not really clear from the 'e820map' that this is a regular C array. Rename them to 'struct e820_entry' and 'struct e820_array' accordingly. ( Leave the legacy UAPI header alone but do the rename in the bootparam.h and e820/types.h file - outside tools relying on these defines should either adjust their code, or should use the legacy header, or should create their private copies for the definitions. ) No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-23Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "There's a number of fixes: - a round of fixes for CPUID-less legacy CPUs - a number of microcode loader fixes - i8042 detection robustization fixes - stack dump/unwinder fixes - x86 SoC platform driver fixes - a GCC 7 warning fix - virtualization related fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) Revert "x86/unwind: Detect bad stack return address" x86/paravirt: Mark unused patch_default label x86/microcode/AMD: Reload proper initrd start address x86/platform/intel/quark: Add printf attribute to imr_self_test_result() x86/platform/intel-mid: Switch MPU3050 driver to IIO x86/alternatives: Do not use sync_core() to serialize I$ x86/topology: Document cpu_llc_id x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic x86/asm: Rewrite sync_core() to use IRET-to-self x86/microcode/intel: Replace sync_core() with native_cpuid() Revert "x86/boot: Fail the boot if !M486 and CPUID is missing" x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6 x86/tools: Fix gcc-7 warning in relocs.c x86/unwind: Dump stack data on warnings x86/unwind: Adjust last frame check for aligned function stacks x86/init: Fix a couple of comment typos x86/init: Remove i8042_detect() from platform ops Input: i8042 - Trust firmware a bit more when probing on X86 x86/init: Add i8042 state to the platform data ...