summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/vmx.c
AgeCommit message (Collapse)Author
2014-08-05KVM: nVMX: Fix nested vmexit ack intr before load vmcs01Wanpeng Li
An external interrupt will cause a vmexit with reason "external interrupt" when L2 is running. L1 will pick up the interrupt through vmcs12 if L1 set the ack interrupt bit. Commit 77b0f5d (KVM: nVMX: Ack and write vector info to intr_info if L1 asks us to) retrieves the interrupt that belongs to L1 before vmcs01 is loaded. This will lead to problems in the next patch, which would write to SVI of vmcs02 instead of vmcs01 (SVI of vmcs02 doesn't make sense because L2 runs without APICv). Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Liu, RongrongX <rongrongx.liu@intel.com> Tested-by: Felipe Reyes <freyes@suse.com> Fixes: 77b0f5d67ff2781f36831cba79674c3e97bd7acf Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> [Move tracepoint as well. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-30KVM: vmx: remove duplicate vmx_mpx_supported() prototypeChris J Arges
Remove a prototype which was added by both 93c4adc7afe and 36be0b9deb2. Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-24Replace NR_VMX_MSR with its definitionPaolo Bonzini
Using ARRAY_SIZE directly makes it easier to read the code. While touching the code, replace the division by a multiplication in the recently added BUILD_BUG_ON. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-24KVM: x86: Assertions to check no overrun in MSR listsNadav Amit
Currently there is no check whether shared MSRs list overrun the allocated size which can results in bugs. In addition there is no check that vmx->guest_msrs has sufficient space to accommodate all the VMX msrs. This patch adds the assertions. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-21KVM: x86: DR6/7.RTM cannot be writtenNadav Amit
Haswell and newer Intel CPUs have support for RTM, and in that case DR6.RTM is not fixed to 1 and DR7.RTM is not fixed to zero. That is not the case in the current KVM implementation. This bug is apparent only if the MOV-DR instruction is emulated or the host also debugs the guest. This patch is a partial fix which enables DR6.RTM and DR7.RTM to be cleared and set respectively. It also sets DR6.RTM upon every debug exception. Obviously, it is not a complete fix, as debugging of RTM is still unsupported. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-21KVM: nVMX: clean up nested_release_vmcs12 and code around itPaolo Bonzini
Make nested_release_vmcs12 idempotent. Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-21KVM: nVMX: fix lifetime issues for vmcs02Paolo Bonzini
free_nested needs the loaded_vmcs to be valid if it is a vmcs02, in order to detach it from the shadow vmcs. However, this is not available anymore after commit 26a865f4aa8e (KVM: VMX: fix use after free of vmx->loaded_vmcs, 2014-01-03). Revert that patch, and fix its problem by forcing a vmcs01 as the active VMCS before freeing all the nested VMX state. Reported-by: Wanpeng Li <wanpeng.li@linux.intel.com> Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-17KVM: nVMX: Fix virtual interrupt delivery injectionWanpeng Li
This patch fix bug reported in https://bugzilla.kernel.org/show_bug.cgi?id=73331, after the patch http://www.spinics.net/lists/kvm/msg105230.html applied, there is some progress and the L2 can boot up, however, slowly. The original idea of this fix vid injection patch is from "Zhang, Yang Z" <yang.z.zhang@intel.com>. Interrupt which delivered by vid should be injected to L1 by L0 if current is in L1, or should be injected to L2 by L0 through the old injection way if L1 doesn't have set External-interrupt exiting bit. The current logic doen't consider these cases. This patch fix it by vid intr to L1 if current is L1 or L2 through old injection way if L1 doen't have External-interrupt exiting bit set. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: "Zhang, Yang Z" <yang.z.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11KVM: x86: return all bits from get_interrupt_shadowPaolo Bonzini
For the next patch we will need to know the full state of the interrupt shadow; we will then set KVM_REQ_EVENT when one bit is cleared. However, right now get_interrupt_shadow only returns the one corresponding to the emulated instruction, or an unconditional 0 if the emulated instruction does not have an interrupt shadow. This is confusing and does not allow us to check for cleared bits as mentioned above. Clean the callback up, and modify toggle_interruptibility to match the comment above the call. As a small result, the call to set_interrupt_shadow will be skipped in the common case where int_shadow == 0 && mask == 0. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11KVM: vmx: speed up emulation of invalid guest statePaolo Bonzini
About 25% of the time spent in emulation of invalid guest state is wasted in checking whether emulation is required for the next instruction. However, this almost never changes except when a segment register (or TR or LDTR) changes, or when there is a mode transition (i.e. CR0 changes). In fact, vmx_set_segment and vmx_set_cr0 already modify vmx->emulation_required (except that the former for some reason uses |= instead of just an assignment). So there is no need to call guest_state_valid in the emulation loop. Emulation performance test results indicate 1650-2600 cycles for common instructions, versus 2300-3200 before this patch on a Sandy Bridge Xeon. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19KVM: vmx: vmx instructions handling does not consider cs.lNadav Amit
VMX instructions use 32-bit operands in 32-bit mode, and 64-bit operands in 64-bit mode. The current implementation is broken since it does not use the register operands correctly, and always uses 64-bit for reads and writes. Moreover, write to memory in vmwrite only considers long-mode, so it ignores cs.l. This patch fixes this behavior. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19KVM: vmx: handle_cr ignores 32/64-bit modeNadav Amit
On 32-bit mode only bits [31:0] of the CR should be used for setting the CR value. Otherwise, the host may incorrectly assume the value is invalid if bits [63:32] are not zero. Moreover, the CR is currently being read twice when CR8 is used. Last, nested mov-cr exiting is modified to handle the CR value correctly as well. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19KVM: x86: check DR6/7 high-bits are clear only on long-modeNadav Amit
When the guest sets DR6 and DR7, KVM asserts the high 32-bits are clear, and otherwise injects a #GP exception. This exception should only be injected only if running in long-mode. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19KVM: nVMX: Fix returned value of MSR_IA32_VMX_VMCS_ENUMJan Kiszka
Many real CPUs get this wrong as well, but ours is totally off: bits 9:1 define the highest index value. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19KVM: nVMX: Allow to disable VM_{ENTRY_LOAD,EXIT_SAVE}_DEBUG_CONTROLSJan Kiszka
Allow L1 to "leak" its debug controls into L2, i.e. permit cleared VM_{ENTRY_LOAD,EXIT_SAVE}_DEBUG_CONTROLS. This requires to manually transfer the state of DR7 and IA32_DEBUGCTLMSR from L1 into L2 as both run on different VMCS. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19KVM: nVMX: Fix returned value of MSR_IA32_VMX_PROCBASED_CTLSJan Kiszka
SDM says bits 1, 4-6, 8, 13-16, and 26 have to be set. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19KVM: nVMX: Allow to disable CR3 access interceptionJan Kiszka
We already have this control enabled by exposing a broken MSR_IA32_VMX_PROCBASED_CTLS value. This will properly advertise our capability once the value is fixed by clearing the right bits in MSR_IA32_VMX_TRUE_PROCBASED_CTLS. We also have to ensure to test the right value on L2 entry. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19KVM: nVMX: Advertise support for MSR_IA32_VMX_TRUE_*_CTLSJan Kiszka
We already implemented them but failed to advertise them. Currently they all return the identical values to the capability MSRs they are augmenting. So there is no change in exposed features yet. Drop related comments at this chance that are partially incorrect and redundant anyway. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19arch/x86/kvm/vmx.c: use PAGE_ALIGNED instead of IS_ALIGNED(PAGE_SIZEFabian Frederick
use mm.h definition Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm into nextLinus Torvalds
Pull KVM updates from Paolo Bonzini: "At over 200 commits, covering almost all supported architectures, this was a pretty active cycle for KVM. Changes include: - a lot of s390 changes: optimizations, support for migration, GDB support and more - ARM changes are pretty small: support for the PSCI 0.2 hypercall interface on both the guest and the host (the latter acked by Catalin) - initial POWER8 and little-endian host support - support for running u-boot on embedded POWER targets - pretty large changes to MIPS too, completing the userspace interface and improving the handling of virtualized timer hardware - for x86, a larger set of changes is scheduled for 3.17. Still, we have a few emulator bugfixes and support for running nested fully-virtualized Xen guests (para-virtualized Xen guests have always worked). And some optimizations too. The only missing architecture here is ia64. It's not a coincidence that support for KVM on ia64 is scheduled for removal in 3.17" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (203 commits) KVM: add missing cleanup_srcu_struct KVM: PPC: Book3S PR: Rework SLB switching code KVM: PPC: Book3S PR: Use SLB entry 0 KVM: PPC: Book3S HV: Fix machine check delivery to guest KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs KVM: PPC: Book3S HV: Make sure we don't miss dirty pages KVM: PPC: Book3S HV: Fix dirty map for hugepages KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base address KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates() KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register number KVM: PPC: Book3S: Add ONE_REG register names that were missed KVM: PPC: Add CAP to indicate hcall fixes KVM: PPC: MPIC: Reset IRQ source private members KVM: PPC: Graciously fail broken LE hypercalls PPC: ePAPR: Fix hypercall on LE guest KVM: PPC: BOOK3S: Remove open coded make_dsisr in alignment handler KVM: PPC: BOOK3S: Always use the saved DAR value PPC: KVM: Make NX bit available with magic page KVM: PPC: Disable NX for old magic page using guests KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest ...
2014-05-22KVM: vmx: DR7 masking on task switch emulation is wrongNadav Amit
The DR7 masking which is done on task switch emulation should be in hex format (clearing the local breakpoints enable bits 0,2,4 and 6). Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-22KVM: x86: get CPL from SS.DPLPaolo Bonzini
CS.RPL is not equal to the CPL in the few instructions between setting CR0.PE and reloading CS. And CS.DPL is also not equal to the CPL for conforming code segments. However, SS.DPL *is* always equal to the CPL except for the weird case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the value in the STAR MSR, but force CPL=3 (Intel instead forces SS.DPL=SS.RPL=CPL=3). So this patch: - modifies SVM to update the CPL from SS.DPL rather than CS.RPL; the above case with SYSRET is not broken further, and the way to fix it would be to pass the CPL to userspace and back - modifies VMX to always return the CPL from SS.DPL (except forcing it to 0 if we are emulating real mode via vm86 mode; in vm86 mode all DPLs have to be 3, but real mode does allow privileged instructions). It also removes the CPL cache, which becomes a duplicate of the SS access rights cache. This fixes doing KVM_IOCTL_SET_SREGS exactly after setting CR0.PE=1 but before CS has been reloaded. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-08kvm: x86: emulate monitor and mwait instructions as nopGabriel L. Somlo
Treat monitor and mwait instructions as nop, which is architecturally correct (but inefficient) behavior. We do this to prevent misbehaving guests (e.g. OS X <= 10.7) from crashing after they fail to check for monitor/mwait availability via cpuid. Since mwait-based idle loops relying on these nop-emulated instructions would keep the host CPU pegged at 100%, do NOT advertise their presence via cpuid, to prevent compliant guests from using them inadvertently. Signed-off-by: Gabriel L. Somlo <somlo@cmu.edu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-07KVM: vmx: handle_dr does not handle RSP correctlyNadav Amit
The RSP register is not automatically cached, causing mov DR instruction with RSP to fail. Instead the regular register accessing interface should be used. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-07KVM: vmx: disable APIC virtualization in nested guestsPaolo Bonzini
While running a nested guest, we should disable APIC virtualization controls (virtualized APIC register accesses, virtual interrupt delivery and posted interrupts), because we do not expose them to the nested guest. Reported-by: Hu Yaohui <loki2441@gmail.com> Suggested-by: Abel Gordon <abel@stratoscale.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-06KVM: nVMX: move vmclear and vmptrld pre-checks to nested_vmx_check_vmptrBandan Das
Some checks are common to all, and moreover, according to the spec, the check for whether any bits beyond the physical address width are set are also applicable to all of them Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-06KVM: nVMX: fail on invalid vmclear/vmptrld pointerBandan Das
The spec mandates that if the vmptrld or vmclear address is equal to the vmxon region pointer, the instruction should fail with error "VMPTRLD with VMXON pointer" or "VMCLEAR with VMXON pointer" Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-06KVM: nVMX: additional checks on vmxon regionBandan Das
Currently, the vmxon region isn't used in the nested case. However, according to the spec, the vmxon instruction performs additional sanity checks on this region and the associated pointer. Modify emulated vmxon to better adhere to the spec requirements Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-06KVM: nVMX: rearrange get_vmx_mem_addressBandan Das
Our common function for vmptr checks (in 2/4) needs to fetch the memory address Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-04-28KVM: x86: Check for host supported fields in shadow vmcsBandan Das
We track shadow vmcs fields through two static lists, one for read only and another for r/w fields. However, with addition of new vmcs fields, not all fields may be supported on all hosts. If so, copy_vmcs12_to_shadow() trying to vmwrite on unsupported hosts will result in a vmwrite error. For example, commit 36be0b9deb23161 introduced GUEST_BNDCFGS, which is not supported by all processors. Filter out host unsupported fields before letting guests use shadow vmcs Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-04-22KVM: nVMX: Advertise support for interrupt acknowledgementBandan Das
Some Type 1 hypervisors such as XEN won't enable VMX without it present Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-22KVM: nVMX: Ack and write vector info to intr_info if L1 asks us toBandan Das
This feature emulates the "Acknowledge interrupt on exit" behavior. We can safely emulate it for L1 to run L2 even if L0 itself has it disabled (to run L1). Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-22KVM: nVMX: Don't advertise single context invalidation for inveptBandan Das
For single context invalidation, we fall through to global invalidation in handle_invept() except for one case - when the operand supplied by L1 is different from what we have in vmcs12. However, typically hypervisors will only call invept for the currently loaded eptp, so the condition will never be true. Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-22KVM: VMX: Advance rip to after an ICEBP instructionHuw Davies
When entering an exception after an ICEBP, the saved instruction pointer should point to after the instruction. This fixes the bug here: https://bugs.launchpad.net/qemu/+bug/1119686 Signed-off-by: Huw Davies <huw@codeweavers.com> Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-17KVM: VMX: speed up wildcard MMIO EVENTFDMichael S. Tsirkin
With KVM, MMIO is much slower than PIO, due to the need to do page walk and emulation. But with EPT, it does not have to be: we know the address from the VMCS so if the address is unique, we can look up the eventfd directly, bypassing emulation. Unfortunately, this only works if userspace does not need to match on access length and data. The implementation adds a separate FAST_MMIO bus internally. This serves two purposes: - minimize overhead for old userspace that does not use eventfd with lengtth = 0 - minimize disruption in other code (since we don't know the length, devices on the MMIO bus only get a valid address in write, this way we don't need to touch all devices to teach them to handle an invalid length) At the moment, this optimization only has effect for EPT on x86. It will be possible to speed up MMIO for NPT and MMU using the same idea in the future. With this patch applied, on VMX MMIO EVENTFD is essentially as fast as PIO. I was unable to detect any measureable slowdown to non-eventfd MMIO. Making MMIO faster is important for the upcoming virtio 1.0 which includes an MMIO signalling capability. The idea was suggested by Peter Anvin. Lots of thanks to Gleb for pre-review and suggestions. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14KVM: Disable SMAP for guests in EPT realmode and EPT unpaging modeFeng Wu
SMAP is disabled if CPU is in non-paging mode in hardware. However KVM always uses paging mode to emulate guest non-paging mode with TDP. To emulate this behavior, SMAP needs to be manually disabled when guest switches to non-paging mode. Signed-off-by: Feng Wu <feng.wu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-03-17KVM: x86: handle missing MPX in nested virtualizationPaolo Bonzini
When doing nested virtualization, we may be able to read BNDCFGS but still not be allowed to write to GUEST_BNDCFGS in the VMCS. Guard writes to the field with vmx_mpx_supported(), and similarly hide the MSR from userspace if the processor does not support the field. We could work around this with the generic MSR save/load machinery, but there is only a limited number of MSR save/load slots and it is not really worthwhile to waste one for a scenario that should not happen except in the nested virtualization case. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-17KVM: x86: Add nested virtualization support for MPXPaolo Bonzini
This is simple to do, the "host" BNDCFGS is either 0 or the guest value. However, both controls have to be present. We cannot provide MPX if we only have one of the "load BNDCFGS" or "clear BNDCFGS" controls. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11KVM: nVMX: Allow nested guests to run with dirty debug registersPaolo Bonzini
When preparing the VMCS02, the CPU-based execution controls is computed by vmx_exec_control. Turn off DR access exits there, too, if the KVM_DEBUGREG_WONT_EXIT bit is set in switch_db_regs. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11KVM: vmx: Allow the guest to run with dirty debug registersPaolo Bonzini
When not running in guest-debug mode (i.e. the guest controls the debug registers, having to take an exit for each DR access is a waste of time. If the guest gets into a state where each context switch causes DR to be saved and restored, this can take away as much as 40% of the execution time from the guest. If the guest is running with vcpu->arch.db == vcpu->arch.eff_db, we can let it write freely to the debug registers and reload them on the next exit. We still need to exit on the first access, so that the KVM_DEBUGREG_WONT_EXIT flag is set in switch_db_regs; after that, further accesses to the debug registers will not cause a vmexit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11KVM: vmx: we do rely on loading DR7 on entryPaolo Bonzini
Currently, this works even if the bit is not in "min", because the bit is always set in MSR_IA32_VMX_ENTRY_CTLS. Mention it for the sake of documentation, and to avoid surprises if we later switch to MSR_IA32_VMX_TRUE_ENTRY_CTLS. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11KVM: x86: Remove return code from enable_irq/nmi_windowJan Kiszka
It's no longer possible to enter enable_irq_window in guest mode when L1 intercepts external interrupts and we are entering L2. This is now caught in vcpu_enter_guest. So we can remove the check from the VMX version of enable_irq_window, thus the need to return an error code from both enable_irq_window and enable_nmi_window. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11KVM: nVMX: Do not inject NMI vmexits when L2 has a pending interruptJan Kiszka
According to SDM 27.2.3, IDT vectoring information will not be valid on vmexits caused by external NMIs. So we have to avoid creating such scenarios by delaying EXIT_REASON_EXCEPTION_NMI injection as long as we have a pending interrupt because that one would be migrated to L1's IDT vectoring info on nested exit. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11KVM: nVMX: Fully emulate preemption timerJan Kiszka
We cannot rely on the hardware-provided preemption timer support because we are holding L2 in HLT outside non-root mode. Furthermore, emulating the preemption will resolve tick rate errata on older Intel CPUs. The emulation is based on hrtimer which is started on L2 entry, stopped on L2 exit and evaluated via the new check_nested_events hook. As we no longer rely on hardware features, we can enable both the preemption timer support and value saving unconditionally. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11KVM: nVMX: Rework interception of IRQs and NMIsJan Kiszka
Move the check for leaving L2 on pending and intercepted IRQs or NMIs from the *_allowed handler into a dedicated callback. Invoke this callback at the relevant points before KVM checks if IRQs/NMIs can be injected. The callback has the task to switch from L2 to L1 if needed and inject the proper vmexit events. The rework fixes L2 wakeups from HLT and provides the foundation for preemption timer emulation. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-03kvm, vmx: Really fix lazy FPU on nested guestPaolo Bonzini
Commit e504c9098ed6 (kvm, vmx: Fix lazy FPU on nested guest, 2013-11-13) highlighted a real problem, but the fix was subtly wrong. nested_read_cr0 is the CR0 as read by L2, but here we want to look at the CR0 value reflecting L1's setup. In other words, L2 might think that TS=0 (so nested_read_cr0 has the bit clear); but if L1 is actually running it with TS=1, we should inject the fault into L1. The effective value of CR0 in L2 is contained in vmcs12->guest_cr0, use it. Fixes: e504c9098ed6acd9e1079c5e10e4910724ad429f Reported-by: Kashyap Chamarty <kchamart@redhat.com> Reported-by: Stefan Bader <stefan.bader@canonical.com> Tested-by: Kashyap Chamarty <kchamart@redhat.com> Tested-by: Anthoine Bourgeois <bourgeois@bertin.fr> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-25KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_saveLiu, Jinsong
From 5d5a80cd172ea6fb51786369bcc23356b1e9e956 Mon Sep 17 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com> Date: Mon, 24 Feb 2014 18:11:55 +0800 Subject: [PATCH v5 2/3] KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_save Add MSR_IA32_BNDCFGS to msrs_to_save, and corresponding logic to kvm_get/set_msr(). Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-24KVM: x86: Intel MPX vmx and msr handleLiu, Jinsong
From caddc009a6d2019034af8f2346b2fd37a81608d0 Mon Sep 17 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com> Date: Mon, 24 Feb 2014 18:11:11 +0800 Subject: [PATCH v5 1/3] KVM: x86: Intel MPX vmx and msr handle This patch handle vmx and msr of Intel MPX feature. Signed-off-by: Xudong Hao <xudong.hao@intel.com> Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-27KVM: x86: Validate guest writes to MSR_IA32_APICBASEJan Kiszka
Check for invalid state transitions on guest-initiated updates of MSR_IA32_APICBASE. This address both enabling of the x2APIC when it is not supported and all invalid transitions as described in SDM section 10.12.5. It also checks that no reserved bit is set in APICBASE by the guest. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> [Use cpuid_maxphyaddr instead of guest_cpuid_get_phys_bits. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-22Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "First round of KVM updates for 3.14; PPC parts will come next week. Nothing major here, just bugfixes all over the place. The most interesting part is the ARM guys' virtualized interrupt controller overhaul, which lets userspace get/set the state and thus enables migration of ARM VMs" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (67 commits) kvm: make KVM_MMU_AUDIT help text more readable KVM: s390: Fix memory access error detection KVM: nVMX: Update guest activity state field on L2 exits KVM: nVMX: Fix nested_run_pending on activity state HLT KVM: nVMX: Clean up handling of VMX-related MSRs KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit KVM: nVMX: Leave VMX mode on clearing of feature control MSR KVM: VMX: Fix DR6 update on #DB exception KVM: SVM: Fix reading of DR6 KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS add support for Hyper-V reference time counter KVM: remove useless write to vcpu->hv_clock.tsc_timestamp KVM: x86: fix tsc catchup issue with tsc scaling KVM: x86: limit PIT timer frequency KVM: x86: handle invalid root_hpa everywhere kvm: Provide kvm_vcpu_eligible_for_directed_yield() stub kvm: vfio: silence GCC warning KVM: ARM: Remove duplicate include arm/arm64: KVM: relax the requirements of VMA alignment for THP ...