summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/svm.c
AgeCommit message (Collapse)Author
2014-11-14kvm: x86: don't kill guest on unknown exit reasonMichael S. Tsirkin
commit 2bc19dc3754fc066c43799659f0d848631c44cfe upstream. KVM_EXIT_UNKNOWN is a kvm bug, we don't really know whether it was triggered by a priveledged application. Let's not kill the guest: WARN and inject #UD instead. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14KVM: x86: Check non-canonical addresses upon WRMSRNadav Amit
commit 854e8bb1aa06c578c2c9145fa6bfe3680ef63b23 upstream. Upon WRMSR, the CPU should inject #GP if a non-canonical value (address) is written to certain MSRs. The behavior is "almost" identical for AMD and Intel (ignoring MSRs that are not implemented in either architecture since they would anyhow #GP). However, IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if non-canonical address is written on Intel but not on AMD (which ignores the top 32-bits). Accordingly, this patch injects a #GP on the MSRs which behave identically on Intel and AMD. To eliminate the differences between the architecutres, the value which is written to IA32_SYSENTER_ESP and IA32_SYSENTER_EIP is turned to canonical value before writing instead of injecting a #GP. Some references from Intel and AMD manuals: According to Intel SDM description of WRMSR instruction #GP is expected on WRMSR "If the source register contains a non-canonical address and ECX specifies one of the following MSRs: IA32_DS_AREA, IA32_FS_BASE, IA32_GS_BASE, IA32_KERNEL_GS_BASE, IA32_LSTAR, IA32_SYSENTER_EIP, IA32_SYSENTER_ESP." According to AMD manual instruction manual: LSTAR/CSTAR (SYSCALL): "The WRMSR instruction loads the target RIP into the LSTAR and CSTAR registers. If an RIP written by WRMSR is not in canonical form, a general-protection exception (#GP) occurs." IA32_GS_BASE and IA32_FS_BASE (WRFSBASE/WRGSBASE): "The address written to the base field must be in canonical form or a #GP fault will occur." IA32_KERNEL_GS_BASE (SWAPGS): "The address stored in the KernelGSbase MSR must be in canonical form." This patch fixes CVE-2014-3610. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-11KVM: x86: return all bits from get_interrupt_shadowPaolo Bonzini
For the next patch we will need to know the full state of the interrupt shadow; we will then set KVM_REQ_EVENT when one bit is cleared. However, right now get_interrupt_shadow only returns the one corresponding to the emulated instruction, or an unconditional 0 if the emulated instruction does not have an interrupt shadow. This is confusing and does not allow us to check for cleared bits as mentioned above. Clean the callback up, and modify toggle_interruptibility to match the comment above the call. As a small result, the call to set_interrupt_shadow will be skipped in the common case where int_shadow == 0 && mask == 0. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11KVM: Synthesize G bit for all segments.Jim Mattson
We have noticed that qemu-kvm hangs early in the BIOS when runnning nested under some versions of VMware ESXi. The problem we believe is because KVM assumes that the platform preserves the 'G' but for any segment register. The SVM specification itemizes the segment attribute bits that are observed by the CPU, but the (G)ranularity bit is not one of the bits itemized, for any segment. Though current AMD CPUs keep track of the (G)ranularity bit for all segment registers other than CS, the specification does not require it. VMware's virtual CPU may not track the (G)ranularity bit for any segment register. Since kvm already synthesizes the (G)ranularity bit for the CS segment. It should do so for all segments. The patch below does that, and helps get rid of the hangs. Patch applies on top of Linus' tree. Signed-off-by: Jim Mattson <jmattson@vmware.com> Signed-off-by: Alok N Kataria <akataria@vmware.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-09KVM: nSVM: Set correct port for IOIO interception evaluationJan Kiszka
Obtaining the port number from DX is bogus as a) there are immediate port accesses and b) user space may have changed the register content while processing the PIO access. Forward the correct value from the instruction emulator instead. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-09KVM: nSVM: Fix IOIO size reported on emulationJan Kiszka
The access size of an in/ins is reported in dst_bytes, and that of out/outs in src_bytes. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-09KVM: nSVM: Fix IOIO bitmap evaluationJan Kiszka
First, kvm_read_guest returns 0 on success. And then we need to take the access size into account when testing the bitmap: intercept if any of bits corresponding to the access is set. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-09KVM: nSVM: Do not report CLTS via SVM_EXIT_WRITE_CR0 to L1Jan Kiszka
CLTS only changes TS which is not monitored by selected CR0 interception. So skip any attempt to translate WRITE_CR0 to CR0_SEL_WRITE for this instruction. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-30KVM: SVM: Fix CPL export via SS.DPLJan Kiszka
We import the CPL via SS.DPL since ae9fedc793. However, we fail to export it this way so far. This caused spurious guest crashes, e.g. of Linux when accessing the vmport from guest user space which triggered register saving/restoring to/from host user space. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-22KVM: x86: get CPL from SS.DPLPaolo Bonzini
CS.RPL is not equal to the CPL in the few instructions between setting CR0.PE and reloading CS. And CS.DPL is also not equal to the CPL for conforming code segments. However, SS.DPL *is* always equal to the CPL except for the weird case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the value in the STAR MSR, but force CPL=3 (Intel instead forces SS.DPL=SS.RPL=CPL=3). So this patch: - modifies SVM to update the CPL from SS.DPL rather than CS.RPL; the above case with SYSRET is not broken further, and the way to fix it would be to pass the CPL to userspace and back - modifies VMX to always return the CPL from SS.DPL (except forcing it to 0 if we are emulating real mode via vm86 mode; in vm86 mode all DPLs have to be 3, but real mode does allow privileged instructions). It also removes the CPL cache, which becomes a duplicate of the SS access rights cache. This fixes doing KVM_IOCTL_SET_SREGS exactly after setting CR0.PE=1 but before CS has been reloaded. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-08kvm: x86: emulate monitor and mwait instructions as nopGabriel L. Somlo
Treat monitor and mwait instructions as nop, which is architecturally correct (but inefficient) behavior. We do this to prevent misbehaving guests (e.g. OS X <= 10.7) from crashing after they fail to check for monitor/mwait availability via cpuid. Since mwait-based idle loops relying on these nop-emulated instructions would keep the host CPU pegged at 100%, do NOT advertise their presence via cpuid, to prevent compliant guests from using them inadvertently. Signed-off-by: Gabriel L. Somlo <somlo@cmu.edu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-04-02Merge tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "PPC and ARM do not have much going on this time. Most of the cool stuff, instead, is in s390 and (after a few releases) x86. ARM has some caching fixes and PPC has transactional memory support in guests. MIPS has some fixes, with more probably coming in 3.16 as QEMU will soon get support for MIPS KVM. For x86 there are optimizations for debug registers, which trigger on some Windows games, and other important fixes for Windows guests. We now expose to the guest Broadwell instruction set extensions and also Intel MPX. There's also a fix/workaround for OS X guests, nested virtualization features (preemption timer), and a couple kvmclock refinements. For s390, the main news is asynchronous page faults, together with improvements to IRQs (floating irqs and adapter irqs) that speed up virtio devices" * tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (96 commits) KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8 KVM: PPC: Book3S HV: Fix decrementer timeouts with non-zero TB offset KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode KVM: PPC: Book3S HV: Return ENODEV error rather than EIO KVM: PPC: Book3S: Trim top 4 bits of physical address in RTAS code KVM: PPC: Book3S HV: Add get/set_one_reg for new TM state KVM: PPC: Book3S HV: Add transactional memory support KVM: Specify byte order for KVM_EXIT_MMIO KVM: vmx: fix MPX detection KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=n KVM: PPC: Book3S: Introduce hypervisor call H_GET_TCE KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd write KVM: s390: clear local interrupts at cpu initial reset KVM: s390: Fix possible memory leak in SIGP functions KVM: s390: fix calculation of idle_mask array size KVM: s390: randomize sca address KVM: ioapic: reinject pending interrupts on KVM_SET_IRQCHIP KVM: Bump KVM_MAX_IRQ_ROUTES for s390 KVM: s390: irq routing for adapter interrupts. KVM: s390: adapter interrupt sources ...
2014-03-17KVM: x86: handle missing MPX in nested virtualizationPaolo Bonzini
When doing nested virtualization, we may be able to read BNDCFGS but still not be allowed to write to GUEST_BNDCFGS in the VMCS. Guard writes to the field with vmx_mpx_supported(), and similarly hide the MSR from userspace if the processor does not support the field. We could work around this with the generic MSR save/load machinery, but there is only a limited number of MSR save/load slots and it is not really worthwhile to waste one for a scenario that should not happen except in the nested virtualization case. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-12KVM: SVM: fix cr8 intercept windowRadim Krčmář
We always disable cr8 intercept in its handler, but only re-enable it if handling KVM_REQ_EVENT, so there can be a window where we do not intercept cr8 writes, which allows an interrupt to disrupt a higher priority task. Fix this by disabling intercepts in the same function that re-enables them when needed. This fixes BSOD in Windows 2008. Cc: <stable@vger.kernel.org> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11KVM: svm: Allow the guest to run with dirty debug registersPaolo Bonzini
When not running in guest-debug mode (i.e. the guest controls the debug registers, having to take an exit for each DR access is a waste of time. If the guest gets into a state where each context switch causes DR to be saved and restored, this can take away as much as 40% of the execution time from the guest. If the guest is running with vcpu->arch.db == vcpu->arch.eff_db, we can let it write freely to the debug registers and reload them on the next exit. We still need to exit on the first access, so that the KVM_DEBUGREG_WONT_EXIT flag is set in switch_db_regs; after that, further accesses to the debug registers will not cause a vmexit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11KVM: svm: set/clear all DR intercepts in one swoopPaolo Bonzini
Unlike other intercepts, debug register intercepts will be modified in hot paths if the guest OS is bad or otherwise gets tricked into doing so. Avoid calling recalc_intercepts 16 times for debug registers. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11KVM: x86: Remove return code from enable_irq/nmi_windowJan Kiszka
It's no longer possible to enter enable_irq_window in guest mode when L1 intercepts external interrupts and we are entering L2. This is now caught in vcpu_enter_guest. So we can remove the check from the VMX version of enable_irq_window, thus the need to return an error code from both enable_irq_window and enable_nmi_window. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-18KVM: SVM: fix NMI window after iretRadim Krčmář
We should open NMI window right after an iret, but SVM exits before it. We wanted to single step using the trap flag and then open it. (or we could emulate the iret instead) We don't do it since commit 3842d135ff2 (likely), because the iret exit handler does not request an event, so NMI window remains closed until the next exit. Fix this by making KVM_REQ_EVENT request in the iret handler. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-17KVM: SVM: Fix reading of DR6Jan Kiszka
In contrast to VMX, SVM dose not automatically transfer DR6 into the VCPU's arch.dr6. So if we face a DR6 read, we must consult a new vendor hook to obtain the current value. And as SVM now picks the DR6 state from its VMCB, we also need a set callback in order to write updates of DR6 back. Fixes a regression of 020df0794f. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-03KVM: mmu: change useless int return types to voidPaolo Bonzini
kvm_mmu initialization is mostly filling in function pointers, there is no way for it to fail. Clean up unused return values. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-27kvm: Add a tracepoint write_tsc_offsetYoshihiro YUNOMAE
Add a tracepoint write_tsc_offset for tracing TSC offset change. We want to merge ftrace's trace data of guest OSs and the host OS using TSC for timestamp in chronological order. We need "TSC offset" values for each guest when merge those because the TSC value on a guest is always the host TSC plus guest's TSC offset. If we get the TSC offset values, we can calculate the host TSC value for each guest events from the TSC offset and the event TSC value. The host TSC values of the guest events are used when we want to merge trace data of guests and the host in chronological order. (Note: the trace_clock of both the host and the guest must be set x86-tsc in this case) This tracepoint also records vcpu_id which can be used to merge trace data for SMP guests. A merge tool will read TSC offset for each vcpu, then the tool converts guest TSC values to host TSC values for each vcpu. TSC offset is stored in the VMCS by vmx_write_tsc_offset() or vmx_adjust_tsc_offset(). KVM executes the former function when a guest boots. The latter function is executed when kvm clock is updated. Only host can read TSC offset value from VMCS, so a host needs to output TSC offset value when TSC offset is changed. Since the TSC offset is not often changed, it could be overwritten by other frequent events while tracing. To avoid that, I recommend to use a special instance for getting this event: 1. set a instance before booting a guest # cd /sys/kernel/debug/tracing/instances # mkdir tsc_offset # cd tsc_offset # echo x86-tsc > trace_clock # echo 1 > events/kvm/kvm_write_tsc_offset/enable 2. boot a guest Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-05Merge tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Gleb Natapov: "Highlights of the updates are: general: - new emulated device API - legacy device assignment is now optional - irqfd interface is more generic and can be shared between arches x86: - VMCS shadow support and other nested VMX improvements - APIC virtualization and Posted Interrupt hardware support - Optimize mmio spte zapping ppc: - BookE: in-kernel MPIC emulation with irqfd support - Book3S: in-kernel XICS emulation (incomplete) - Book3S: HV: migration fixes - BookE: more debug support preparation - BookE: e6500 support ARM: - reworking of Hyp idmaps s390: - ioeventfd for virtio-ccw And many other bug fixes, cleanups and improvements" * tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits) kvm: Add compat_ioctl for device control API KVM: x86: Account for failing enable_irq_window for NMI window request KVM: PPC: Book3S: Add API for in-kernel XICS emulation kvm/ppc/mpic: fix missing unlock in set_base_addr() kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write kvm/ppc/mpic: remove users kvm/ppc/mpic: fix mmio region lists when multiple guests used kvm/ppc/mpic: remove default routes from documentation kvm: KVM_CAP_IOMMU only available with device assignment ARM: KVM: iterate over all CPUs for CPU compatibility check KVM: ARM: Fix spelling in error message ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally KVM: ARM: Fix API documentation for ONE_REG encoding ARM: KVM: promote vfp_host pointer to generic host cpu context ARM: KVM: add architecture specific hook for capabilities ARM: KVM: perform HYP initilization for hotplugged CPUs ARM: KVM: switch to a dual-step HYP init code ARM: KVM: rework HYP page table freeing ARM: KVM: enforce maximum size for identity mapped code ARM: KVM: move to a KVM provided HYP idmap ...
2013-05-02KVM: x86: Account for failing enable_irq_window for NMI window requestJan Kiszka
With VMX, enable_irq_window can now return -EBUSY, in which case an immediate exit shall be requested before entering the guest. Account for this also in enable_nmi_window which uses enable_irq_window in absence of vnmi support, e.g. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-04-28KVM: x86: Rework request for immediate exitJan Kiszka
The VMX implementation of enable_irq_window raised KVM_REQ_IMMEDIATE_EXIT after we checked it in vcpu_enter_guest. This caused infinite loops on vmentry. Fix it by letting enable_irq_window signal the need for an immediate exit via its return value and drop KVM_REQ_IMMEDIATE_EXIT. This issue only affects nested VMX scenarios. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-04-28kvm, svm: Fix typo in printk messageBorislav Petkov
It is "exit_int_info". It is actually EXITINTINFO in the official docs but we don't like screaming docs. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-04-16KVM: VMX: Add the deliver posted interrupt algorithmYang Zhang
Only deliver the posted interrupt when target vcpu is running and there is no previous interrupt pending in pir. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-04-16KVM: VMX: Enable acknowledge interupt on vmexitYang Zhang
The "acknowledge interrupt on exit" feature controls processor behavior for external interrupt acknowledgement. When this control is set, the processor acknowledges the interrupt controller to acquire the interrupt vector on VM exit. After enabling this feature, an interrupt which arrived when target cpu is running in vmx non-root mode will be handled by vmx handler instead of handler in idt. Currently, vmx handler only fakes an interrupt stack and jump to idt table to let real handler to handle it. Further, we will recognize the interrupt and only delivery the interrupt which not belong to current vcpu through idt table. The interrupt which belonged to current vcpu will be handled inside vmx handler. This will reduce the interrupt handle cost of KVM. Also, interrupt enable logic is changed if this feature is turnning on: Before this patch, hypervior call local_irq_enable() to enable it directly. Now IF bit is set on interrupt stack frame, and will be enabled on a return from interrupt handler if exterrupt interrupt exists. If no external interrupt, still call local_irq_enable() to enable it. Refer to Intel SDM volum 3, chapter 33.2. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-04-02x86, cpu: Convert AMD Erratum 383Borislav Petkov
Convert the AMD erratum 383 testing code to the bug infrastructure. This allows keeping the AMD-specific erratum testing machinery private to amd.c and not export symbols to modules needlessly. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1363788448-31325-6-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2013-03-20KVM: x86: correctly initialize the CS base on resetPaolo Bonzini
The CS base was initialized to 0 on VMX (wrong, but usually overridden by userspace before starting) or 0xf0000 on SVM. The correct value is 0xffff0000, and VMX is able to emulate it now, so use it. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-03-13KVM: x86: Rework INIT and SIPI handlingJan Kiszka
A VCPU sending INIT or SIPI to some other VCPU races for setting the remote VCPU's mp_state. When we were unlucky, KVM_MP_STATE_INIT_RECEIVED was overwritten by kvm_emulate_halt and, thus, got lost. This introduces APIC events for those two signals, keeping them in kvm_apic until kvm_apic_accept_events is run over the target vcpu context. kvm_apic_has_events reports to kvm_arch_vcpu_runnable if there are pending events, thus if vcpu blocking should end. The patch comes with the side effect of effectively obsoleting KVM_MP_STATE_SIPI_RECEIVED. We still accept it from user space, but immediately translate it to KVM_MP_STATE_INIT_RECEIVED + KVM_APIC_SIPI. The vcpu itself will no longer enter the KVM_MP_STATE_SIPI_RECEIVED state. That also means we no longer exit to user space after receiving a SIPI event. Furthermore, we already reset the VCPU on INIT, only fixing up the code segment later on when SIPI arrives. Moreover, we fix INIT handling for the BSP: it never enter wait-for-SIPI but directly starts over on INIT. Tested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-03-12KVM: x86: Drop unused return code from VCPU reset callbackJan Kiszka
Neither vmx nor svm nor the common part may generate an error on kvm_vcpu_reset. So drop the return code. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-29x86, apicv: add virtual interrupt delivery supportYang Zhang
Virtual interrupt delivery avoids KVM to inject vAPIC interrupts manually, which is fully taken care of by the hardware. This needs some special awareness into existing interrupr injection path: - for pending interrupt, instead of direct injection, we may need update architecture specific indicators before resuming to guest. - A pending interrupt, which is masked by ISR, should be also considered in above update action, since hardware will decide when to inject it at right time. Current has_interrupt and get_interrupt only returns a valid vector from injection p.o.v. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-29x86, apicv: add virtual x2apic supportYang Zhang
basically to benefit from apicv, we need to enable virtualized x2apic mode. Currently, we only enable it when guest is really using x2apic. Also, clear MSR bitmap for corresponding x2apic MSRs when guest enabled x2apic: 0x800 - 0x8ff: no read intercept for apicv register virtualization, except APIC ID and TMCCT which need software's assistance to get right value. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-05KVM: x86: Make register state after reset conform to specificationJulian Stecklina
VMX behaves now as SVM wrt to FPU initialization. Code has been moved to generic code path. General-purpose registers are now cleared on reset and INIT. SVM code properly initializes EDX. Signed-off-by: Julian Stecklina <jsteckli@os.inf.tu-dresden.de> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-11-30KVM: x86: Emulate IA32_TSC_ADJUST MSRWill Auld
CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported Basic design is to emulate the MSR by allowing reads and writes to a guest vcpu specific location to store the value of the emulated MSR while adding the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This is of course as long as the "use TSC counter offsetting" VM-execution control is enabled as well as the IA32_TSC_ADJUST control. However, because hardware will only return the TSC + IA32_TSC_ADJUST + vmsc tsc_offset for a guest process when it does and rdtsc (with the correct settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one of these three locations. The argument against storing it in the actual MSR is performance. This is likely to be seldom used while the save/restore is required on every transition. IA32_TSC_ADJUST was created as a way to solve some issues with writing TSC itself so that is not an option either. The remaining option, defined above as our solution has the problem of returning incorrect vmcs tsc_offset values (unless we intercept and fix, not done here) as mentioned above. However, more problematic is that storing the data in vmcs tsc_offset will have a different semantic effect on the system than does using the actual MSR. This is illustrated in the following example: The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest process performs a rdtsc. In this case the guest process will get TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics as seen by the guest do not and hence this will not cause a problem. Signed-off-by: Will Auld <will.auld@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-30KVM: x86: Add code to track call origin for msr assignmentWill Auld
In order to track who initiated the call (host or guest) to modify an msr value I have changed function call parameters along the call path. The specific change is to add a struct pointer parameter that points to (index, data, caller) information rather than having this information passed as individual parameters. The initial use for this capability is for updating the IA32_TSC_ADJUST msr while setting the tsc value. It is anticipated that this capability is useful for other tasks. Signed-off-by: Will Auld <will.auld@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27KVM: x86: add kvm_arch_vcpu_postcreate callback, move TSC initializationMarcelo Tosatti
TSC initialization will soon make use of online_vcpus. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27KVM: x86: pass host_tsc to read_l1_tscMarcelo Tosatti
Allow the caller to pass host tsc value to kvm_x86_ops->read_l1_tsc(). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-10-22KVM: SVM: Cleanup error statementsBorislav Petkov
Use __func__ instead of the function name in svm_hardware_enable since those things tend to get out of sync. This also slims down printk line length in conjunction with using pr_err. No functionality change. Cc: Joerg Roedel <joro@8bytes.org> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-23KVM: x86: Fix guest debug across vcpu INIT resetJan Kiszka
If we reset a vcpu on INIT, we so far overwrote dr7 as provided by KVM_SET_GUEST_DEBUG, and we also cleared switch_db_regs unconditionally. Fix this by saving the dr7 used for guest debugging and calculating the effective register value as well as switch_db_regs on any potential change. This will change to focus of the set_guest_debug vendor op to update_dp_bp_intercept. Found while trying to stop on start_secondary. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-17KVM: SVM: Make use of asm.hAvi Kivity
Use macros for bitness-insensitive register names, instead of rolling our own. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-09-05KVM: SVM: constify lookup tablesMathias Krause
We never modify direct_access_msrs[], msrpm_ranges[], svm_exit_handlers[] or x86_intercept_map[] at runtime. Mark them r/o. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-06KVM: do not release the error pageXiao Guangrong
After commit a2766325cf9f9, the error page is replaced by the error code, it need not be released anymore [ The patch has been compiling tested for powerpc ] Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-20KVM: SVM: Fix typosGuo Chao
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-12KVM: VMX: Implement PCID/INVPCID for guests with EPTMao, Junjie
This patch handles PCID/INVPCID for guests. Process-context identifiers (PCIDs) are a facility by which a logical processor may cache information for multiple linear-address spaces so that the processor may retain cached information when software switches to a different linear address space. Refer to section 4.10.1 in IA32 Intel Software Developer's Manual Volume 3A for details. For guests with EPT, the PCID feature is enabled and INVPCID behaves as running natively. For guests without EPT, the PCID feature is disabled and INVPCID triggers #UD. Signed-off-by: Junjie Mao <junjie.mao@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-06-06KVM: Cleanup the kvm_print functions and introduce pr_XX wrappersChristoffer Dall
Introduces a couple of print functions, which are essentially wrappers around standard printk functions, with a KVM: prefix. Functions introduced or modified are: - kvm_err(fmt, ...) - kvm_info(fmt, ...) - kvm_debug(fmt, ...) - kvm_pr_unimpl(fmt, ...) - pr_unimpl(vcpu, fmt, ...) -> vcpu_unimpl(vcpu, fmt, ...) Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-16KVM: SVM: Auto-load on CPUs with SVMJosh Triplett
Enable x86 feature-based autoloading for the kvm-amd module on CPUs with X86_FEATURE_SVM. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-04-08KVM: SVM: count all irq windows exitJason Wang
Also count the exits of fast-path. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-28Merge branch 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Avi Kivity: "Changes include timekeeping improvements, support for assigning host PCI devices that share interrupt lines, s390 user-controlled guests, a large ppc update, and random fixes." This is with the sign-off's fixed, hopefully next merge window we won't have rebased commits. * 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits) KVM: Convert intx_mask_lock to spin lock KVM: x86: fix kvm_write_tsc() TSC matching thinko x86: kvmclock: abstract save/restore sched_clock_state KVM: nVMX: Fix erroneous exception bitmap check KVM: Ignore the writes to MSR_K7_HWCR(3) KVM: MMU: make use of ->root_level in reset_rsvds_bits_mask KVM: PMU: add proper support for fixed counter 2 KVM: PMU: Fix raw event check KVM: PMU: warn when pin control is set in eventsel msr KVM: VMX: Fix delayed load of shared MSRs KVM: use correct tlbs dirty type in cmpxchg KVM: Allow host IRQ sharing for assigned PCI 2.3 devices KVM: Ensure all vcpus are consistent with in-kernel irqchip settings KVM: x86 emulator: Allow PM/VM86 switch during task switch KVM: SVM: Fix CPL updates KVM: x86 emulator: VM86 segments must have DPL 3 KVM: x86 emulator: Fix task switch privilege checks arch/powerpc/kvm/book3s_hv.c: included linux/sched.h twice KVM: x86 emulator: correctly mask pmc index bits in RDPMC instruction emulation KVM: mmu_notifier: Flush TLBs before releasing mmu_lock ...
2012-03-08KVM: x86 emulator: Allow PM/VM86 switch during task switchKevin Wolf
Task switches can switch between Protected Mode and VM86. The current mode must be updated during the task switch emulation so that the new segment selectors are interpreted correctly. In order to let privilege checks succeed, rflags needs to be updated in the vcpu struct as this causes a CPL update. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>