summaryrefslogtreecommitdiff
path: root/arch/x86/include/asm/paravirt.h
AgeCommit message (Collapse)Author
2018-03-18x86/paravirt, objtool: Annotate indirect callsPeter Zijlstra
commit 3010a0663fd949d122eca0561b06b0a9453f7866 upstream. Paravirt emits indirect calls which get flagged by objtool retpoline checks, annotate it away because all these indirect calls will be patched out before we start userspace. This patching happens through alternative_instructions() -> apply_paravirt() -> pv_init_ops.patch() which will eventually end up in paravirt_patch_default(). This function _will_ write direct alternatives. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-03Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull low-level x86 updates from Ingo Molnar: "In this cycle this topic tree has become one of those 'super topics' that accumulated a lot of changes: - Add CONFIG_VMAP_STACK=y support to the core kernel and enable it on x86 - preceded by an array of changes. v4.8 saw preparatory changes in this area already - this is the rest of the work. Includes the thread stack caching performance optimization. (Andy Lutomirski) - switch_to() cleanups and all around enhancements. (Brian Gerst) - A large number of dumpstack infrastructure enhancements and an unwinder abstraction. The secret long term plan is safe(r) live patching plus maybe another attempt at debuginfo based unwinding - but all these current bits are standalone enhancements in a frame pointer based debug environment as well. (Josh Poimboeuf) - More __ro_after_init and const annotations. (Kees Cook) - Enable KASLR for the vmemmap memory region. (Thomas Garnier)" [ The virtually mapped stack changes are pretty fundamental, and not x86-specific per se, even if they are only used on x86 right now. ] * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits) x86/asm: Get rid of __read_cr4_safe() thread_info: Use unsigned long for flags x86/alternatives: Add stack frame dependency to alternative_call_2() x86/dumpstack: Fix show_stack() task pointer regression x86/dumpstack: Remove dump_trace() and related callbacks x86/dumpstack: Convert show_trace_log_lvl() to use the new unwinder oprofile/x86: Convert x86_backtrace() to use the new unwinder x86/stacktrace: Convert save_stack_trace_*() to use the new unwinder perf/x86: Convert perf_callchain_kernel() to use the new unwinder x86/unwind: Add new unwind interface and implementations x86/dumpstack: Remove NULL task pointer convention fork: Optimize task creation by caching two thread stacks per CPU if CONFIG_VMAP_STACK=y sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK lib/syscall: Pin the task stack in collect_syscall() x86/process: Pin the target stack in get_wchan() x86/dumpstack: Pin the target stack when dumping it kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function sched/core: Add try_get_task_stack() and put_task_stack() x86/entry/64: Fix a minor comment rebase error iommu/amd: Don't put completion-wait semaphore on stack ...
2016-09-30x86/asm: Get rid of __read_cr4_safe()Andy Lutomirski
We use __read_cr4() vs __read_cr4_safe() inconsistently. On CR4-less CPUs, all CR4 bits are effectively clear, so we can make the code simpler and more robust by making __read_cr4() always fix up faults on 32-bit kernels. This may fix some bugs on old 486-like CPUs, but I don't have any easy way to test that. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: david@saggiorato.net Link: http://lkml.kernel.org/r/ea647033d357d9ce2ad2bbde5a631045f5052fb6.1475178370.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-30x86, locking/spinlocks: Remove ticket (spin)lock implementationPeter Zijlstra
We've unconditionally used the queued spinlock for many releases now. Its time to remove the old ticket lock code. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <waiman.long@hpe.com> Cc: Waiman.Long@hpe.com Cc: david.vrabel@citrix.com Cc: dhowells@redhat.com Cc: pbonzini@redhat.com Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20160518184302.GO3193@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-16Merge branch 'x86-boot-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: "The biggest changes in this cycle were: - prepare for more KASLR related changes, by restructuring, cleaning up and fixing the existing boot code. (Kees Cook, Baoquan He, Yinghai Lu) - simplifly/concentrate subarch handling code, eliminate paravirt_enabled() usage. (Luis R Rodriguez)" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits) x86/KASLR: Clarify purpose of each get_random_long() x86/KASLR: Add virtual address choosing function x86/KASLR: Return earliest overlap when avoiding regions x86/KASLR: Add 'struct slot_area' to manage random_addr slots x86/boot: Add missing file header comments x86/KASLR: Initialize mapping_info every time x86/boot: Comment what finalize_identity_maps() does x86/KASLR: Build identity mappings on demand x86/boot: Split out kernel_ident_mapping_init() x86/boot: Clean up indenting for asm/boot.h x86/KASLR: Improve comments around the mem_avoid[] logic x86/boot: Simplify pointer casting in choose_random_location() x86/KASLR: Consolidate mem_avoid[] entries x86/boot: Clean up pointer casting x86/boot: Warn on future overlapping memcpy() use x86/boot: Extract error reporting functions x86/boot: Correctly bounds-check relocations x86/KASLR: Clean up unused code from old 'run_size' and rename it to 'kernel_total_size' x86/boot: Fix "run_size" calculation x86/boot: Calculate decompression size during boot not build ...
2016-04-22x86/paravirt: Remove paravirt_enabled()Luis R. Rodriguez
Now that all previous paravirt_enabled() uses were replaced with proper x86 semantics by the previous patches we can remove the unused paravirt_enabled() mechanism. Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Acked-by: Juergen Gross <jgross@suse.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andrew.cooper3@citrix.com Cc: andriy.shevchenko@linux.intel.com Cc: bigeasy@linutronix.de Cc: boris.ostrovsky@oracle.com Cc: david.vrabel@citrix.com Cc: ffainelli@freebox.fr Cc: george.dunlap@citrix.com Cc: glin@suse.com Cc: jlee@suse.com Cc: josh@joshtriplett.org Cc: julien.grall@linaro.org Cc: konrad.wilk@oracle.com Cc: kozerkov@parallels.com Cc: lenb@kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-acpi@vger.kernel.org Cc: lv.zheng@intel.com Cc: matt@codeblueprint.co.uk Cc: mbizon@freebox.fr Cc: rjw@rjwysocki.net Cc: robert.moore@intel.com Cc: rusty@rustcorp.com.au Cc: tiwai@suse.de Cc: toshi.kani@hp.com Cc: xen-devel@lists.xensource.com Link: http://lkml.kernel.org/r/1460592286-300-15-git-send-email-mcgrof@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-22x86/rtc: Replace paravirt rtc check with platform legacy quirkLuis R. Rodriguez
We have 4 types of x86 platforms that disable RTC: * Intel MID * Lguest - uses paravirt * Xen dom-U - uses paravirt * x86 on legacy systems annotated with an ACPI legacy flag We can consolidate all of these into a platform specific legacy quirk set early in boot through i386_start_kernel() and through x86_64_start_reservations(). This deals with the RTC quirks which we can rely on through the hardware subarch, the ACPI check can be dealt with separately. For Xen things are bit more complex given that the @X86_SUBARCH_XEN x86_hardware_subarch is shared on for Xen which uses the PV path for both domU and dom0. Since the semantics for differentiating between the two are Xen specific we provide a platform helper to help override default legacy features -- x86_platform.set_legacy_features(). Use of this helper is highly discouraged, its only purpose should be to account for the lack of semantics available within your given x86_hardware_subarch. As per 0-day, this bumps the vmlinux size using i386-tinyconfig as follows: TOTAL TEXT init.text x86_early_init_platform_quirks() +70 +62 +62 +43 Only 8 bytes overhead total, as the main increase in size is all removed via __init. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andrew.cooper3@citrix.com Cc: andriy.shevchenko@linux.intel.com Cc: bigeasy@linutronix.de Cc: boris.ostrovsky@oracle.com Cc: david.vrabel@citrix.com Cc: ffainelli@freebox.fr Cc: george.dunlap@citrix.com Cc: glin@suse.com Cc: jlee@suse.com Cc: josh@joshtriplett.org Cc: julien.grall@linaro.org Cc: konrad.wilk@oracle.com Cc: kozerkov@parallels.com Cc: lenb@kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-acpi@vger.kernel.org Cc: lv.zheng@intel.com Cc: matt@codeblueprint.co.uk Cc: mbizon@freebox.fr Cc: rjw@rjwysocki.net Cc: robert.moore@intel.com Cc: rusty@rustcorp.com.au Cc: tiwai@suse.de Cc: toshi.kani@hp.com Cc: xen-devel@lists.xensource.com Link: http://lkml.kernel.org/r/1460592286-300-5-git-send-email-mcgrof@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13x86/paravirt: Make "unsafe" MSR accesses unsafe even if PARAVIRT=yAndy Lutomirski
Enabling CONFIG_PARAVIRT had an unintended side effect: rdmsr() turned into rdmsr_safe() and wrmsr() turned into wrmsr_safe(), even on bare metal. Undo that by using the new unsafe paravirt MSR callbacks. Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: KVM list <kvm@vger.kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel <Xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/414fabd6d3527703077c6c2a797223d0a9c3b081.1459605520.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13x86/paravirt: Add paravirt_{read,write}_msr()Andy Lutomirski
This adds paravirt callbacks for unsafe MSR access. On native, they call native_{read,write}_msr(). On Xen, they use xen_{read,write}_msr_safe(). Nothing uses them yet for ease of bisection. The next patch will use them in rdmsrl(), wrmsrl(), etc. I intentionally didn't make them warn on #GP on Xen. I think that should be done separately by the Xen maintainers. Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: KVM list <kvm@vger.kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel <Xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/880eebc5dcd2ad9f310d41345f82061ea500e9fa.1459605520.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13x86/paravirt: Add _safe to the read_ms()r and write_msr() PV callbacksAndy Lutomirski
These callbacks match the _safe variants, so name them accordingly. This will make room for unsafe PV callbacks. Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: KVM list <kvm@vger.kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel <Xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/9ee3fb6a196a514c93325bdfa15594beecf04876.1459605520.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-24x86/paravirt: Create a stack frame in PV_CALLEE_SAVE_REGS_THUNKJosh Poimboeuf
A function created with the PV_CALLEE_SAVE_REGS_THUNK macro doesn't set up a new stack frame before the call instruction, which breaks frame pointer convention if CONFIG_FRAME_POINTER is enabled and can result in a bad stack trace. Also, the thunk functions aren't annotated as ELF callable functions. Create a stack frame when CONFIG_FRAME_POINTER is enabled and add the ELF function type. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Alok Kataria <akataria@vmware.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chris J Arges <chris.j.arges@canonical.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Pedro Alves <palves@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/a2cad74e87c4aba7fd0f54a1af312e66a824a575.1453405861.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-11Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "The main changes in this cycle were: - code patching and cpu_has cleanups (Borislav Petkov) - paravirt cleanups (Juergen Gross) - TSC cleanup (Thomas Gleixner) - ptrace cleanup (Chen Gang)" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arch/x86/kernel/ptrace.c: Remove unused arg_offs_table x86/mm: Align macro defines x86/cpu: Provide a config option to disable static_cpu_has x86/cpufeature: Remove unused and seldomly used cpu_has_xx macros x86/cpufeature: Cleanup get_cpu_cap() x86/cpufeature: Move some of the scattered feature bits to x86_capability x86/paravirt: Remove paravirt ops pmd_update[_defer] and pte_update_defer x86/paravirt: Remove unused pv_apic_ops structure x86/tsc: Remove unused tsc_pre_init() hook x86: Remove unused function cpu_has_ht_siblings() x86/paravirt: Kill some unused patching functions
2016-01-11Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "The main changes in this cycle were: - vDSO and asm entry improvements (Andy Lutomirski) - Xen paravirt entry enhancements (Boris Ostrovsky) - asm entry labels enhancement (Borislav Petkov) - and other misc changes (Thomas Gleixner, me)" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n Revert "x86/kvm: On KVM re-enable (e.g. after suspend), update clocks" x86/entry/64_compat: Make labels local x86/platform/uv: Include clocksource.h for clocksource_touch_watchdog() x86/vdso: Enable vdso pvclock access on all vdso variants x86/vdso: Remove pvclock fixmap machinery x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader x86/kvm: On KVM re-enable (e.g. after suspend), update clocks x86/entry/64: Bypass enter_from_user_mode on non-context-tracking boots x86/asm: Add asm macros for static keys/jump labels x86/asm: Error out if asm/jump_label.h is included inappropriately context_tracking: Switch to new static_branch API x86/entry, x86/paravirt: Remove the unused usergs_sysret32 PV op x86/paravirt: Remove the unused irq_enable_sysexit pv op x86/xen: Avoid fast syscall path for Xen PV guests
2015-12-19x86/paravirt: Prevent rtc_cmos platform device init on PV guestsDavid Vrabel
Adding the rtc platform device in non-privileged Xen PV guests causes an IRQ conflict because these guests do not have legacy PIC and may allocate irqs in the legacy range. In a single VCPU Xen PV guest we should have: /proc/interrupts: CPU0 0: 4934 xen-percpu-virq timer0 1: 0 xen-percpu-ipi spinlock0 2: 0 xen-percpu-ipi resched0 3: 0 xen-percpu-ipi callfunc0 4: 0 xen-percpu-virq debug0 5: 0 xen-percpu-ipi callfuncsingle0 6: 0 xen-percpu-ipi irqwork0 7: 321 xen-dyn-event xenbus 8: 90 xen-dyn-event hvc_console ... But hvc_console cannot get its interrupt because it is already in use by rtc0 and the console does not work. genirq: Flags mismatch irq 8. 00000000 (hvc_console) vs. 00000000 (rtc0) We can avoid this problem by realizing that unprivileged PV guests (both Xen and lguests) are not supposed to have rtc_cmos device and so adding it is not necessary. Privileged guests (i.e. Xen's dom0) do use it but they should not have irq conflicts since they allocate irqs above legacy range (above gsi_top, in fact). Instead of explicitly testing whether the guest is privileged we can extend pv_info structure to include information about guest's RTC support. Reported-and-tested-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: vkuznets@redhat.com Cc: xen-devel@lists.xenproject.org Cc: konrad.wilk@oracle.com Cc: stable@vger.kernel.org # 4.2+ Link: http://lkml.kernel.org/r/1449842873-2613-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-25x86/paravirt: Remove paravirt ops pmd_update[_defer] and pte_update_deferJuergen Gross
pte_update_defer can be removed as it is always set to the same function as pte_update. So any usage of pte_update_defer() can be replaced by pte_update(). pmd_update and pmd_update_defer are always set to paravirt_nop, so they can just be nuked. Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: jeremy@goop.org Cc: chrisw@sous-sol.org Cc: akataria@vmware.com Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xen.org Cc: konrad.wilk@oracle.com Cc: david.vrabel@citrix.com Cc: boris.ostrovsky@oracle.com Link: http://lkml.kernel.org/r/1447771879-1806-1-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-23x86/entry, x86/paravirt: Remove the unused usergs_sysret32 PV opBoris Ostrovsky
As result of commit "x86/xen: Avoid fast syscall path for Xen PV guests", usergs_sysret32 pv op is not called by Xen PV guests anymore and since they were the only ones who used it we can safely remove it. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: david.vrabel@citrix.com Cc: konrad.wilk@oracle.com Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1447970147-1733-4-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23x86/paravirt: Remove the unused irq_enable_sysexit pv opBoris Ostrovsky
As result of commit "x86/xen: Avoid fast syscall path for Xen PV guests", the irq_enable_sysexit pv op is not called by Xen PV guests anymore and since they were the only ones who used it we can safely remove it. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: david.vrabel@citrix.com Cc: konrad.wilk@oracle.com Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1447970147-1733-3-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-19x86/paravirt: Remove unused pv_apic_ops structureJuergen Gross
The only member of that structure is startup_ipi_hook which is always set to paravirt_nop. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Cc: jeremy@goop.org Cc: chrisw@sous-sol.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xen.org Cc: konrad.wilk@oracle.com Cc: boris.ostrovsky@oracle.com Link: http://lkml.kernel.org/r/1447767872-16730-1-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-23x86/asm/msr: Make wrmsrl() a functionAndy Lutomirski
As of cf991de2f614 ("x86/asm/msr: Make wrmsrl_safe() a function"), wrmsrl_safe is a function, but wrmsrl is still a macro. The wrmsrl macro performs invalid shifts if the value argument is 32 bits. This makes it unnecessarily awkward to write code that puts an unsigned long into an MSR. To make this work, syscall_init needs tweaking to stop passing a function pointer to wrmsrl. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Willy Tarreau <w@1wt.eu> Link: http://lkml.kernel.org/r/690f0c629a1085d054e2d1ef3da073cfb3f7db92.1437678821.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06x86/asm/tsc, x86/paravirt: Remove read_tsc() and read_tscp() paravirt hooksAndy Lutomirski
We've had ->read_tsc() and ->read_tscp() paravirt hooks since the very beginning of paravirt, i.e., d3561b7fa0fb ("[PATCH] paravirt: header and stubs for paravirtualisation"). AFAICT, the only paravirt guest implementation that ever replaced these calls was vmware, and it's gone. Arguably even vmware shouldn't have hooked RDTSC -- we fully support systems that don't have a TSC at all, so there's no point for a paravirt implementation to pretend that we have a TSC but to replace it. I also doubt that these hooks actually worked. Calls to rdtscl() and rdtscll(), which respected the hooks, were used seemingly interchangeably with native_read_tsc(), which did not. Just remove them. If anyone ever needs them again, they can try to make a case for why they need them. Before, on a paravirt config: text data bss dec hex filename 12618257 1816384 1093632 15528273 ecf151 vmlinux After: text data bss dec hex filename 12617207 1816384 1093632 15527223 eced37 vmlinux Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/d08a2600fb298af163681e5efd8e599d889a5b97.1434501121.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-11locking/pvqspinlock: Rename QUEUED_SPINLOCK to QUEUED_SPINLOCKSIngo Molnar
Valentin Rothberg reported that we use CONFIG_QUEUED_SPINLOCKS in arch/x86/kernel/paravirt_patch_32.c, while the symbol is called CONFIG_QUEUED_SPINLOCK. (Note the extra 'S') But the typo was natural: the proper English term for such a generic object would be 'queued spinlocks' - so rename this and related symbols accordingly to the plural form. Reported-by: Valentin Rothberg <valentinrothberg@gmail.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <Waiman.Long@hp.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08locking/pvqspinlock, x86: Implement the paravirt qspinlock call patchingPeter Zijlstra (Intel)
We use the regular paravirt call patching to switch between: native_queued_spin_lock_slowpath() __pv_queued_spin_lock_slowpath() native_queued_spin_unlock() __pv_queued_spin_unlock() We use a callee saved call for the unlock function which reduces the i-cache footprint and allows 'inlining' of SPIN_UNLOCK functions again. We further optimize the unlock path by patching the direct call with a "movb $0,%arg1" if we are indeed using the native unlock code. This makes the unlock code almost as fast as the !PARAVIRT case. This significantly lowers the overhead of having CONFIG_PARAVIRT_SPINLOCKS enabled, even for native code. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <paolo.bonzini@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-10-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-14x86: expose number of page table levels on Kconfig levelKirill A. Shutemov
We would want to use number of page table level to define mm_struct. Let's expose it as CONFIG_PGTABLE_LEVELS. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-03x86/asm/entry: Drop now unused ENABLE_INTERRUPTS_SYSEXIT32Borislav Petkov
Commit: 4214a16b0297 ("x86/asm/entry/64/compat: Use SYSRETL to return from compat mode SYSENTER") removed the last user of ENABLE_INTERRUPTS_SYSEXIT32. Kill the macro now too. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/1428049714-829-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04x86: Store a per-cpu shadow copy of CR4Andy Lutomirski
Context switches and TLB flushes can change individual bits of CR4. CR4 reads take several cycles, so store a shadow copy of CR4 in a per-cpu variable. To avoid wasting a cache line, I added the CR4 shadow to cpu_tlbstate, which is already touched in switch_mm. The heaviest users of the cr4 shadow will be switch_mm and __switch_to_xtra, and __switch_to_xtra is called shortly after switch_mm during context switch, so the cacheline is likely to be hot. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/3a54dd3353fffbf84804398e00dfdc5b7c1afd7d.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-19x86: Cleanly separate use of asm-generic/mm_hooks.hDave Hansen
asm-generic/mm_hooks.h provides some generic fillers for the 90% of architectures that do not need to hook some mmap-manipulation functions. A comment inside says: > Define generic no-op hooks for arch_dup_mmap and > arch_exit_mmap, to be included in asm-FOO/mmu_context.h > for any arch FOO which doesn't need to hook these. So, does x86 need to hook these? It depends on CONFIG_PARAVIRT. We *conditionally* include this generic header if we have CONFIG_PARAVIRT=n. That's madness. With this patch, x86 stops using asm-generic/mmu_hooks.h entirely. We use our own copies of the functions. The paravirt code provides some stubs if it is disabled, and we always call those stubs in our x86-private versions of arch_exit_mmap() and arch_dup_mmap(). Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20141118182349.14567FA5@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-01-29x86, asmlinkage, paravirt: Make paravirt thunks globalAndi Kleen
The paravirt thunks use a hack of using a static reference to a static function to reference that function from the top level statement. This assumes that gcc always generates static function names in a specific format, which is not necessarily true. Simply make these functions global and asmlinkage or __visible. This way the static __used variables are not needed and everything works. Functions with arguments are __visible to keep the register calling convention on 32bit. Changed in paravirt and in all users (Xen and vsmp) v2: Use __visible for functions with arguments Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Ido Yariv <ido@wizery.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1382458079-24450-5-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-09x86, ticketlock: Add slowpath logicJeremy Fitzhardinge
Maintain a flag in the LSB of the ticket lock tail which indicates whether anyone is in the lock slowpath and may need kicking when the current holder unlocks. The flags are set when the first locker enters the slowpath, and cleared when unlocking to an empty queue (ie, no contention). In the specific implementation of lock_spinning(), make sure to set the slowpath flags on the lock just before blocking. We must do this before the last-chance pickup test to prevent a deadlock with the unlocker: Unlocker Locker test for lock pickup -> fail unlock test slowpath -> false set slowpath flags block Whereas this works in any ordering: Unlocker Locker set slowpath flags test for lock pickup -> fail block unlock test slowpath -> true, kick If the unlocker finds that the lock has the slowpath flag set but it is actually uncontended (ie, head == tail, so nobody is waiting), then it clears the slowpath flag. The unlock code uses a locked add to update the head counter. This also acts as a full memory barrier so that its safe to subsequently read back the slowflag state, knowing that the updated lock is visible to the other CPUs. If it were an unlocked add, then the flag read may just be forwarded from the store buffer before it was visible to the other CPUs, which could result in a deadlock. Unfortunately this means we need to do a locked instruction when unlocking with PV ticketlocks. However, if PV ticketlocks are not enabled, then the old non-locked "add" is the only unlocking code. Note: this code relies on gcc making sure that unlikely() code is out of line of the fastpath, which only happens when OPTIMIZE_SIZE=n. If it doesn't the generated code isn't too bad, but its definitely suboptimal. Thanks to Srivatsa Vaddagiri for providing a bugfix to the original version of this change, which has been folded in. Thanks to Stephan Diestelhorst for commenting on some code which relied on an inaccurate reading of the x86 memory ordering rules. Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-11-git-send-email-raghavendra.kt@linux.vnet.ibm.com Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Stephan Diestelhorst <stephan.diestelhorst@amd.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-09x86, pvticketlock: Use callee-save for lock_spinningJeremy Fitzhardinge
Although the lock_spinning calls in the spinlock code are on the uncommon path, their presence can cause the compiler to generate many more register save/restores in the function pre/postamble, which is in the fast path. To avoid this, convert it to using the pvops callee-save calling convention, which defers all the save/restores until the actual function is called, keeping the fastpath clean. Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-8-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Attilio Rao <attilio.rao@citrix.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-09x86, spinlock: Replace pv spinlocks with pv ticketlocksJeremy Fitzhardinge
Rather than outright replacing the entire spinlock implementation in order to paravirtualize it, keep the ticket lock implementation but add a couple of pvops hooks on the slow patch (long spin on lock, unlocking a contended lock). Ticket locks have a number of nice properties, but they also have some surprising behaviours in virtual environments. They enforce a strict FIFO ordering on cpus trying to take a lock; however, if the hypervisor scheduler does not schedule the cpus in the correct order, the system can waste a huge amount of time spinning until the next cpu can take the lock. (See Thomas Friebel's talk "Prevent Guests from Spinning Around" http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) To address this, we add two hooks: - __ticket_spin_lock which is called after the cpu has been spinning on the lock for a significant number of iterations but has failed to take the lock (presumably because the cpu holding the lock has been descheduled). The lock_spinning pvop is expected to block the cpu until it has been kicked by the current lock holder. - __ticket_spin_unlock, which on releasing a contended lock (there are more cpus with tail tickets), it looks to see if the next cpu is blocked and wakes it if so. When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub functions causes all the extra code to go away. Results: ======= setup: 32 core machine with 32 vcpu KVM guest (HT off) with 8GB RAM base = 3.11-rc patched = base + pvspinlock V12 +-----------------+----------------+--------+ dbench (Throughput in MB/sec. Higher is better) +-----------------+----------------+--------+ | base (stdev %)|patched(stdev%) | %gain | +-----------------+----------------+--------+ | 15035.3 (0.3) |15150.0 (0.6) | 0.8 | | 1470.0 (2.2) | 1713.7 (1.9) | 16.6 | | 848.6 (4.3) | 967.8 (4.3) | 14.0 | | 652.9 (3.5) | 685.3 (3.7) | 5.0 | +-----------------+----------------+--------+ pvspinlock shows benefits for overcommit ratio > 1 for PLE enabled cases, and undercommits results are flat Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-2-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Attilio Rao <attilio.rao@citrix.com> [ Raghavendra: Changed SPIN_THRESHOLD, fixed redefinition of arch_spinlock_t] Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-30Merge branch 'x86-paravirt-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 paravirt update from Ingo Molnar: "Various paravirtualization related changes - the biggest one makes guest support optional via CONFIG_HYPERVISOR_GUEST" * 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, wakeup, sleep: Use pvops functions for changing GDT entries x86, xen, gdt: Remove the pvops variant of store_gdt. x86-32, gdt: Store/load GDT for ACPI S3 or hibernation/resume path is not needed x86-64, gdt: Store/load GDT for ACPI S3 or hibernate/resume path is not needed. x86: Make Linux guest support optional x86, Kconfig: Move PARAVIRT_DEBUG into the paravirt menu
2013-04-11x86, xen, gdt: Remove the pvops variant of store_gdt.Konrad Rzeszutek Wilk
The two use-cases where we needed to store the GDT were during ACPI S3 suspend and resume. As the patches: x86/gdt/i386: store/load GDT for ACPI S3 or hibernation/resume path is not needed x86/gdt/64-bit: store/load GDT for ACPI S3 or hibernate/resume path is not needed. have demonstrated - there are other mechanism by which the GDT is saved and reloaded during early resume path. Hence we do not need to worry about the pvops call-chain for saving the GDT and can and can eliminate it. The other areas where the store_gdt is used are never going to be hit when running under the pvops platforms. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1365194544-14648-4-git-send-email-konrad.wilk@oracle.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-10x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metalBoris Ostrovsky
Invoking arch_flush_lazy_mmu_mode() results in calls to preempt_enable()/disable() which may have performance impact. Since lazy MMU is not used on bare metal we can patch away arch_flush_lazy_mmu_mode() so that it is never called in such environment. [ hpa: the previous patch "Fix vmalloc_fault oops during lazy MMU updates" may cause a minor performance regression on bare metal. This patch resolves that performance regression. It is somewhat unclear to me if this is a good -stable candidate. ] Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1364045796-10720-2-git-send-email-konrad.wilk@oracle.com Tested-by: Josh Boyer <jwboyer@redhat.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> SEE NOTE ABOVE
2012-12-18x86, paravirt: fix build error when thp is disabledDavid Rientjes
With CONFIG_PARAVIRT=y and CONFIG_TRANSPARENT_HUGEPAGE=n, the build breaks because set_pmd_at() is undeclared: mm/memory.c: In function 'do_pmd_numa_page': mm/memory.c:3520: error: implicit declaration of function 'set_pmd_at' mm/mprotect.c: In function 'change_pmd_protnuma': mm/mprotect.c:120: error: implicit declaration of function 'set_pmd_at' This is because paravirt defines set_pmd_at() only when CONFIG_TRANSPARENT_HUGEPAGE=y and such a restriction is unneeded. The fix is to define it for all CONFIG_PARAVIRT configurations. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-26Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/mm changes from Peter Anvin: "The big change here is the patchset by Alex Shi to use INVLPG to flush only the affected pages when we only need to flush a small page range. It also removes the special INVALIDATE_TLB_VECTOR interrupts (32 vectors!) and replace it with an ordinary IPI function call." Fix up trivial conflicts in arch/x86/include/asm/apic.h (added code next to changed line) * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tlb: Fix build warning and crash when building for !SMP x86/tlb: do flush_tlb_kernel_range by 'invlpg' x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR x86/tlb: enable tlb flush range support for x86 mm/mmu_gather: enable tlb flush range in generic mmu_gather x86/tlb: add tlb_flushall_shift knob into debugfs x86/tlb: add tlb_flushall_shift for specific CPU x86/tlb: fall back to flush all when meet a THP large page x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range x86/tlb_info: get last level TLB entry number of CPU x86: Add read_mostly declaration/definition to variables from smp.h x86: Define early read-mostly per-cpu macros
2012-07-05Merge branch 'x86/cpu' into perf/coreIngo Molnar
Merge this branch because we changed the wrmsr*_safe() API and there's a conflict. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-27x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_rangeAlex Shi
x86 has no flush_tlb_range support in instruction level. Currently the flush_tlb_range just implemented by flushing all page table. That is not the best solution for all scenarios. In fact, if we just use 'invlpg' to flush few lines from TLB, we can get the performance gain from later remain TLB lines accessing. But the 'invlpg' instruction costs much of time. Its execution time can compete with cr3 rewriting, and even a bit more on SNB CPU. So, on a 512 4KB TLB entries CPU, the balance points is at: (512 - X) * 100ns(assumed TLB refill cost) = X(TLB flush entries) * 100ns(assumed invlpg cost) Here, X is 256, that is 1/2 of 512 entries. But with the mysterious CPU pre-fetcher and page miss handler Unit, the assumed TLB refill cost is far lower then 100ns in sequential access. And 2 HT siblings in one core makes the memory access more faster if they are accessing the same memory. So, in the patch, I just do the change when the target entries is less than 1/16 of whole active tlb entries. Actually, I have no data support for the percentage '1/16', so any suggestions are welcomed. As to hugetlb, guess due to smaller page table, and smaller active TLB entries, I didn't see benefit via my benchmark, so no optimizing now. My micro benchmark show in ideal scenarios, the performance improves 70 percent in reading. And in worst scenario, the reading/writing performance is similar with unpatched 3.4-rc4 kernel. Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP 'always': multi thread testing, '-t' paramter is thread number: with patch unpatched 3.4-rc4 ./mprotect -t 1 14ns 24ns ./mprotect -t 2 13ns 22ns ./mprotect -t 4 12ns 19ns ./mprotect -t 8 14ns 16ns ./mprotect -t 16 28ns 26ns ./mprotect -t 32 54ns 51ns ./mprotect -t 128 200ns 199ns Single process with sequencial flushing and memory accessing: with patch unpatched 3.4-rc4 ./mprotect 7ns 11ns ./mprotect -p 4096 -l 8 -n 10240 21ns 21ns [ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com has additional performance numbers. ] Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-07x86, pvops: Remove hooks for {rd,wr}msr_safe_regsAndre Przywara
There were paravirt_ops hooks for the full register set variant of {rd,wr}msr_safe which are actually not used by anyone anymore. Remove them to make the code cleaner and avoid silent breakages when the pvops members were uninitialized. This has been boot-tested natively and under Xen with PVOPS enabled and disabled on one machine. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Link: http://lkml.kernel.org/r/1338562358-28182-2-git-send-email-bp@amd64.org Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-06x86: Add rdpmcl()Andi Kleen
Add a version of rdpmc() that directly reads into a u64 Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1338944211-28275-4-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-04-19x86, paravirt: Replace GET_CR2_INTO_RCX with GET_CR2_INTO_RAXH. Peter Anvin
GET_CR2_INTO_RCX is asinine: it is only used in one place, the actual paravirt call returns the value in %rax, not %rcx; and the one place that wants it wants the result in %r9. We actually generate as a result of this call: call ... movq %rax, %rcx xorq %rax, %rax /* this value isn't even used... */ movq %rcx, %r9 At least make the macro do what the paravirt call does, which is put the value into %rax. Nevermind the fact that the macro clobbers all the volatile registers. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1334794610-5546-4-git-send-email-hpa@zytor.com Cc: Glauber de Oliveira Costa <glommer@parallels.com>
2012-03-24Merge tag 'bug-for-3.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull <linux/bug.h> cleanup from Paul Gortmaker: "The changes shown here are to unify linux's BUG support under the one <linux/bug.h> file. Due to historical reasons, we have some BUG code in bug.h and some in kernel.h -- i.e. the support for BUILD_BUG in linux/kernel.h predates the addition of linux/bug.h, but old code in kernel.h wasn't moved to bug.h at that time. As a band-aid, kernel.h was including <asm/bug.h> to pseudo link them. This has caused confusion[1] and general yuck/WTF[2] reactions. Here is an example that violates the principle of least surprise: CC lib/string.o lib/string.c: In function 'strlcat': lib/string.c:225:2: error: implicit declaration of function 'BUILD_BUG_ON' make[2]: *** [lib/string.o] Error 1 $ $ grep linux/bug.h lib/string.c #include <linux/bug.h> $ We've included <linux/bug.h> for the BUG infrastructure and yet we still get a compile fail! [We've not kernel.h for BUILD_BUG_ON.] Ugh - very confusing for someone who is new to kernel development. With the above in mind, the goals of this changeset are: 1) find and fix any include/*.h files that were relying on the implicit presence of BUG code. 2) find and fix any C files that were consuming kernel.h and hence relying on implicitly getting some/all BUG code. 3) Move the BUG related code living in kernel.h to <linux/bug.h> 4) remove the asm/bug.h from kernel.h to finally break the chain. During development, the order was more like 3-4, build-test, 1-2. But to ensure that git history for bisect doesn't get needless build failures introduced, the commits have been reorderd to fix the problem areas in advance. [1] https://lkml.org/lkml/2012/1/3/90 [2] https://lkml.org/lkml/2012/1/17/414" Fix up conflicts (new radeon file, reiserfs header cleanups) as per Paul and linux-next. * tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: kernel.h: doesn't explicitly use bug.h, so don't include it. bug: consolidate BUILD_BUG_ON with other bug code BUG: headers with BUG/BUG_ON etc. need linux/bug.h bug.h: add include of it to various implicit C users lib: fix implicit users of kernel.h for TAINT_WARN spinlock: macroize assert_spin_locked to avoid bug.h dependency x86: relocate get/set debugreg fcns to include/asm/debugreg.
2012-03-04BUG: headers with BUG/BUG_ON etc. need linux/bug.hPaul Gortmaker
If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any other BUG variant in a static inline (i.e. not in a #define) then that header really should be including <linux/bug.h> and not just expecting it to be implicitly present. We can make this change risk-free, since if the files using these headers didn't have exposure to linux/bug.h already, they would have been causing compile failures/warnings. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-02-24static keys: Introduce 'struct static_key', static_key_true()/false() and ↵Ingo Molnar
static_key_slow_[inc|dec]() So here's a boot tested patch on top of Jason's series that does all the cleanups I talked about and turns jump labels into a more intuitive to use facility. It should also address the various misconceptions and confusions that surround jump labels. Typical usage scenarios: #include <linux/static_key.h> struct static_key key = STATIC_KEY_INIT_TRUE; if (static_key_false(&key)) do unlikely code else do likely code Or: if (static_key_true(&key)) do likely code else do unlikely code The static key is modified via: static_key_slow_inc(&key); ... static_key_slow_dec(&key); The 'slow' prefix makes it abundantly clear that this is an expensive operation. I've updated all in-kernel code to use this everywhere. Note that I (intentionally) have not pushed through the rename blindly through to the lowest levels: the actual jump-label patching arch facility should be named like that, so we want to decouple jump labels from the static-key facility a bit. On non-jump-label enabled architectures static keys default to likely()/unlikely() branches. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jason Baron <jbaron@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: a.p.zijlstra@chello.nl Cc: mathieu.desnoyers@efficios.com Cc: davem@davemloft.net Cc: ddaney.cavm@gmail.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.hu Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-14KVM guest: Add a pv_ops stub for steal timeGlauber Costa
This patch adds a function pointer in one of the many paravirt_ops structs, to allow guests to register a steal time function. Besides a steal time function, we also declare two jump_labels. They will be used to allow the steal time code to be easily bypassed when not in use. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Eric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-26thp: fix PARAVIRT x86 32bit noPAEAndrea Arcangeli
This fixes TRANSPARENT_HUGEPAGE=y with PARAVIRT=y and HIGHMEM64=n. The #ifdef that this patch removes was erratically introduced to fix a build error for noPAE (where pmd.pmd doesn't exist). So then the kernel built but it failed at runtime because set_pmd_at was a noop. This will correct it by enabling set_pmd_at for noPAE mode too. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: werner <w.landgraf@ru.ru> Reported-by: Minchan Kim <minchan.kim@gmail.com> Tested-by: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13thp: add pmd paravirt opsAndrea Arcangeli
Paravirt ops pmd_update/pmd_update_defer/pmd_set_at. Not all might be necessary (vmware needs pmd_update, Xen needs set_pmd_at, nobody needs pmd_update_defer), but this is to keep full simmetry with pte paravirt ops, which looks cleaner and simpler from a common code POV. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-27x86, paravirt: Use native_halt on a halt, not native_safe_haltCliff Wickman
halt() should use native_halt() safe_halt() uses native_safe_halt() If CONFIG_PARAVIRT=y, halt() is defined in arch/x86/include/asm/paravirt.h as static inline void halt(void) { PVOP_VCALL0(pv_irq_ops.safe_halt); } Otherwise (no CONFIG_PARAVIRT) halt() in arch/x86/include/asm/irqflags.h is static inline void halt(void) { native_halt(); } So it looks to me like the CONFIG_PARAVIRT case of using native_safe_halt() for a halt() is an oversight. Am I missing something? It probably hasn't shown up as a problem because the local apic is disabled on a shutdown or restart. But if we disable interrupts and call halt() we shouldn't expect that the halt() will re-enable interrupts. Signed-off-by: Cliff Wickman <cpw@sgi.com> LKML-Reference: <E1PSBcz-0001g1-FM@eag09.americas.sgi.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-11-10tracing: Force arch_local_irq_* notrace for paravirtSteven Rostedt
When running ktest.pl randconfig tests, I would sometimes trigger a lockdep annotation bug (possible reason: unannotated irqs-on). This triggering happened right after function tracer self test was executed. After doing a config bisect I found that this was caused with having function tracer, paravirt guest, prove locking, and rcu torture all enabled. The rcu torture just enhanced the likelyhood of triggering the bug. Prove locking was needed, since it was the thing that was bugging. Function tracer would trace and disable interrupts in all sorts of funny places. paravirt guest would turn arch_local_irq_* into functions that would be traced. Besides the fact that tracing arch_local_irq_* is just a bad idea, this is what is happening. The bug happened simply in the local_irq_restore() code: if (raw_irqs_disabled_flags(flags)) { \ raw_local_irq_restore(flags); \ trace_hardirqs_off(); \ } else { \ trace_hardirqs_on(); \ raw_local_irq_restore(flags); \ } \ The raw_local_irq_restore() was defined as arch_local_irq_restore(). Now imagine, we are about to enable interrupts. We go into the else case and call trace_hardirqs_on() which tells lockdep that we are enabling interrupts, so it sets the current->hardirqs_enabled = 1. Then we call raw_local_irq_restore() which calls arch_local_irq_restore() which gets traced! Now in the function tracer we disable interrupts with local_irq_save(). This is fine, but flags is stored that we have interrupts disabled. When the function tracer calls local_irq_restore() it does it, but this time with flags set as disabled, so we go into the if () path. This keeps interrupts disabled and calls trace_hardirqs_off() which sets current->hardirqs_enabled = 0. When the tracer is finished and proceeds with the original code, we enable interrupts but leave current->hardirqs_enabled as 0. Which now breaks lockdeps internal processing. Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-10-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-irqflagsLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-irqflags: Fix IRQ flag handling naming MIPS: Add missing #inclusions of <linux/irq.h> smc91x: Add missing #inclusion of <linux/irq.h> Drop a couple of unnecessary asm/system.h inclusions SH: Add missing consts to sys_execve() declaration Blackfin: Rename IRQ flags handling functions Blackfin: Add missing dep to asm/irqflags.h Blackfin: Rename DES PC2() symbol to avoid collision Blackfin: Split the BF532 BFIN_*_FIO_FLAG() functions to their own header Blackfin: Split PLL code from mach-specific cdef headers
2010-10-07Fix IRQ flag handling namingDavid Howells
Fix the IRQ flag handling naming. In linux/irqflags.h under one configuration, it maps: local_irq_enable() -> raw_local_irq_enable() local_irq_disable() -> raw_local_irq_disable() local_irq_save() -> raw_local_irq_save() ... and under the other configuration, it maps: raw_local_irq_enable() -> local_irq_enable() raw_local_irq_disable() -> local_irq_disable() raw_local_irq_save() -> local_irq_save() ... This is quite confusing. There should be one set of names expected of the arch, and this should be wrapped to give another set of names that are expected by users of this facility. Change this to have the arch provide: flags = arch_local_save_flags() flags = arch_local_irq_save() arch_local_irq_restore(flags) arch_local_irq_disable() arch_local_irq_enable() arch_irqs_disabled_flags(flags) arch_irqs_disabled() arch_safe_halt() Then linux/irqflags.h wraps these to provide: raw_local_save_flags(flags) raw_local_irq_save(flags) raw_local_irq_restore(flags) raw_local_irq_disable() raw_local_irq_enable() raw_irqs_disabled_flags(flags) raw_irqs_disabled() raw_safe_halt() with type checking on the flags 'arguments', and then wraps those to provide: local_save_flags(flags) local_irq_save(flags) local_irq_restore(flags) local_irq_disable() local_irq_enable() irqs_disabled_flags(flags) irqs_disabled() safe_halt() with tracing included if enabled. The arch functions can now all be inline functions rather than some of them having to be macros. Signed-off-by: David Howells <dhowells@redhat.com> [X86, FRV, MN10300] Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> [Tile] Signed-off-by: Michal Simek <monstr@monstr.eu> [Microblaze] Tested-by: Catalin Marinas <catalin.marinas@arm.com> [ARM] Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> [AVR] Acked-by: Tony Luck <tony.luck@intel.com> [IA-64] Acked-by: Hirokazu Takata <takata@linux-m32r.org> [M32R] Acked-by: Greg Ungerer <gerg@uclinux.org> [M68K/M68KNOMMU] Acked-by: Ralf Baechle <ralf@linux-mips.org> [MIPS] Acked-by: Kyle McMartin <kyle@mcmartin.ca> [PA-RISC] Acked-by: Paul Mackerras <paulus@samba.org> [PowerPC] Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [S390] Acked-by: Chen Liqin <liqin.chen@sunplusct.com> [Score] Acked-by: Matt Fleming <matt@console-pimps.org> [SH] Acked-by: David S. Miller <davem@davemloft.net> [Sparc] Acked-by: Chris Zankel <chris@zankel.net> [Xtensa] Reviewed-by: Richard Henderson <rth@twiddle.net> [Alpha] Reviewed-by: Yoshinori Sato <ysato@users.sourceforge.jp> [H8300] Cc: starvik@axis.com [CRIS] Cc: jesper.nilsson@axis.com [CRIS] Cc: linux-cris-kernel@axis.com