summaryrefslogtreecommitdiff
path: root/Documentation/virtual
AgeCommit message (Collapse)Author
2019-11-16kvm: Convert kvm_lock to a mutexJunaid Shahid
commit 0d9ce162cf46c99628cc5da9510b959c7976735b upstream. It doesn't seem as if there is any particular need for kvm_lock to be a spinlock, so convert the lock to a mutex so that sleepable functions (in particular cond_resched()) can be called while holding it. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 4.9: - Drop changes in kvm_hyperv_tsc_notifier(), vm_stat_clear(), vcpu_stat_clear(), kvm_uevent_notify_change() - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-03KVM: Reject device ioctls from processes other than the VM's creatorSean Christopherson
commit ddba91801aeb5c160b660caed1800eb3aef403f8 upstream. KVM's API requires thats ioctls must be issued from the same process that created the VM. In other words, userspace can play games with a VM's file descriptors, e.g. fork(), SCM_RIGHTS, etc..., but only the creator can do anything useful. Explicitly reject device ioctls that are issued by a process other than the VM's creator, and update KVM's API documentation to extend its requirements to device ioctls. Fixes: 852b6d57dc7f ("kvm: add device control API") Cc: <stable@vger.kernel.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15KVM: x86: Add a framework for supporting MSR-based featuresTom Lendacky
commit 801e459a6f3a63af9d447e6249088c76ae16efc4 upstream Provide a new KVM capability that allows bits within MSRs to be recognized as features. Two new ioctls are added to the /dev/kvm ioctl routine to retrieve the list of these MSRs and then retrieve their values. A kvm_x86_ops callback is used to determine support for the listed MSR-based features. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [Tweaked documentation. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-09arm/arm64: KVM: Add PSCI version selection APIMarc Zyngier
commit 85bd0ba1ff9875798fad94218b627ea9f768f3c3 upstream. Although we've implemented PSCI 0.1, 0.2 and 1.0, we expose either 0.1 or 1.0 to a guest, defaulting to the latest version of the PSCI implementation that is compatible with the requested version. This is no different from doing a firmware upgrade on KVM. But in order to give a chance to hypothetical badly implemented guests that would have a fit by discovering something other than PSCI 0.2, let's provide a new API that allows userspace to pick one particular version of the API. This is implemented as a new class of "firmware" registers, where we expose the PSCI version. This allows the PSCI version to be save/restored as part of a guest migration, and also set to any supported version if the guest requires it. Cc: stable@vger.kernel.org #4.16 Reviewed-by: Christoffer Dall <cdall@kernel.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09KVM: PPC: Book3S HV: Save/restore XER in checkpointed register statePaul Mackerras
commit 0d808df06a44200f52262b6eb72bcb6042f5a7c5 upstream. When switching from/to a guest that has a transaction in progress, we need to save/restore the checkpointed register state. Although XER is part of the CPU state that gets checkpointed, the code that does this saving and restoring doesn't save/restore XER. This fixes it by saving and restoring the XER. To allow userspace to read/write the checkpointed XER value, we also add a new ONE_REG specifier. The visible effect of this bug is that the guest may see its XER value being corrupted when it uses transactions. Fixes: e4e38121507a ("KVM: PPC: Book3S HV: Add transactional memory support") Fixes: 0a8eccefcb34 ("KVM: PPC: Book3S HV: Add missing code for transaction reclaim on guest exit") Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-19kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in usePaolo Bonzini
Userspace can read the exact value of kvmclock by reading the TSC and fetching the timekeeping parameters out of guest memory. This however is brittle and not necessary anymore with KVM 4.11. Provide a mechanism that lets userspace know if the new KVM_GET_CLOCK semantics are in effect, and---since we are at it---if the clock is stable across all VCPUs. Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-10-27KVM: document lock ordersPaolo Bonzini
This is long overdue, and not really hard. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1476357057-17899-1-git-send-email-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-27KVM: arm64: Require in-kernel irqchip for PMU supportChristoffer Dall
If userspace creates a PMU for the VCPU, but doesn't create an in-kernel irqchip, then we end up in a nasty path where we try to take an uninitialized spinlock, which can lead to all sorts of breakages. Luckily, QEMU always creates the VGIC before the PMU, so we can establish this as ABI and check for the VGIC in the PMU init stage. This can be relaxed at a later time if we want to support PMU with a userspace irqchip. Cc: stable@vger.kernel.org Cc: Shannon Zhao <shannon.zhao@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-08KVM: arm/arm64: Add VGICv3 save/restore API documentationChristoffer Dall
Factor out the GICv3 and ITS-specific documentation into a separate documentation file. Add description for how to access distributor, redistributor, and CPU interface registers for GICv3 in this new file, and add a group for accessing level triggered IRQ information for GICv3 as well. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-08-04KVM: documentation: fix KVM_CAP_X2APIC_API informationPaolo Bonzini
The KVM_X2APIC_API_USE_32BIT_IDS feature applies to both KVM_SET_GSI_ROUTING and KVM_SIGNAL_MSI, but was not mentioned in the documentation for the latter ioctl. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-04Merge tag 'kvm-arm-for-4.8-take2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM Changes for v4.8 - Take 2 Includes GSI routing support to go along with the new VGIC and a small fix that has been cooking in -next for a while.
2016-07-22Merge tag 'kvm-arm-for-4.8' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into next KVM/ARM changes for Linux 4.8 - GICv3 ITS emulation - Simpler idmap management that fixes potential TLB conflicts - Honor the kernel protection in HYP mode - Removal of the old vgic implementation
2016-07-22KVM: arm/arm64: Enable MSI routingEric Auger
Up to now, only irqchip routing entries could be set. This patch adds the capability to insert MSI routing entries. For ARM64, let's also increase KVM_MAX_IRQ_ROUTES to 4096: this include SPI irqchip routes plus MSI routes. In the future this might be extended. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-07-22KVM: arm/arm64: Enable irqchip routingEric Auger
This patch adds compilation and link against irqchip. Main motivation behind using irqchip code is to enable MSI routing code. In the future irqchip routing may also be useful when targeting multiple irqchips. Routing standard callbacks now are implemented in vgic-irqfd: - kvm_set_routing_entry - kvm_set_irq - kvm_set_msi They only are supported with new_vgic code. Both HAVE_KVM_IRQCHIP and HAVE_KVM_IRQ_ROUTING are defined. KVM_CAP_IRQ_ROUTING is advertised and KVM_SET_GSI_ROUTING is allowed. So from now on IRQCHIP routing is enabled and a routing table entry must exist for irqfd injection to succeed for a given SPI. This patch builds a default flat irqchip routing table (gsi=irqchip.pin) covering all the VGIC SPI indexes. This routing table is overwritten by the first first user-space call to KVM_SET_GSI_ROUTING ioctl. MSI routing setup is not yet allowed. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-07-22KVM: api: Pass the devid in the msi routing entryEric Auger
On ARM, the MSI msg (address and data) comes along with out-of-band device ID information. The device ID encodes the device that writes the MSI msg. Let's convey the device id in kvm_irq_routing_msi and use KVM_MSI_VALID_DEVID flag value in kvm_irq_routing_entry to indicate the msi devid is populated. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-07-18KVM: arm64: vgic-its: Enable ITS emulation as a virtual MSI controllerAndre Przywara
Now that all ITS emulation functionality is in place, we advertise MSI functionality to userland and also the ITS device to the guest - if userland has configured that. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-07-18KVM: arm64: vgic-its: Introduce new KVM ITS deviceAndre Przywara
Introduce a new KVM device that represents an ARM Interrupt Translation Service (ITS) controller. Since there can be multiple of this per guest, we can't piggy back on the existing GICv3 distributor device, but create a new type of KVM device. On the KVM_CREATE_DEVICE ioctl we allocate and initialize the ITS data structure and store the pointer in the kvm_device data. Upon an explicit init ioctl from userland (after having setup the MMIO address) we register the handlers with the kvm_io_bus framework. Any reference to an ITS thus has to go via this interface. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-07-18KVM: Extend struct kvm_msi to hold a 32-bit device IDAndre Przywara
The ARM GICv3 ITS MSI controller requires a device ID to be able to assign the proper interrupt vector. On real hardware, this ID is sampled from the bus. To be able to emulate an ITS controller, extend the KVM MSI interface to let userspace provide such a device ID. For PCI devices, the device ID is simply the 16-bit bus-device-function triplet, which should be easily available to the userland tool. Also there is a new KVM capability which advertises whether the current VM requires a device ID to be set along with the MSI data. This flag is still reported as not available everywhere, later we will enable it when ITS emulation is used. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-07-18KVM: s390: allow user space to handle instr 0x0000David Hildenbrand
We will use illegal instruction 0x0000 for handling 2 byte sw breakpoints from user space. As it can be enabled dynamically via a capability, let's move setting of ICTL_OPEREXC to the post creation step, so we avoid any races when enabling that capability just while adding new cpus. Acked-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-07-14KVM: x86: add a flag to disable KVM x2apic broadcast quirkRadim Krčmář
Add KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK as a feature flag to KVM_CAP_X2APIC_API. The quirk made KVM interpret 0xff as a broadcast even in x2APIC mode. The enableable capability is needed in order to support standard x2APIC and remain backward compatible. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> [Expand kvm_apic_mda comment. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14KVM: x86: add KVM_CAP_X2APIC_APIRadim Krčmář
KVM_CAP_X2APIC_API is a capability for features related to x2APIC enablement. KVM_X2APIC_API_32BIT_FORMAT feature can be enabled to extend APIC ID in get/set ioctl and MSI addresses to 32 bits. Both are needed to support x2APIC. The feature has to be enableable and disabled by default, because get/set ioctl shifted and truncated APIC ID to 8 bits by using a non-standard protocol inspired by xAPIC and the change is not backward-compatible. Changes to MSI addresses follow the format used by interrupt remapping unit. The upper address word, that used to be 0, contains upper 24 bits of the LAPIC address in its upper 24 bits. Lower 8 bits are reserved as 0. Using the upper address word is not backward-compatible either as we didn't check that userspace zeroed the word. Reserved bits are still not explicitly checked, but non-zero data will affect LAPIC addresses, which will cause a bug. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-15MIPS: KVM: Add KScratch registersJames Hogan
Allow up to 6 KVM guest KScratch registers to be enabled and accessed via the KVM guest register API and from the guest itself (the fallback reading and writing of commpage registers is sufficient for KScratch registers to work as expected). User mode can expose the registers by setting the appropriate bits of the guest Config4.KScrExist field. KScratch registers that aren't usable won't be writeable via the KVM Ioctl API. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-15Merge tag 'kvm-s390-next-4.8-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Features and fixes for 4.8 part1 Four bigger things: 1. The implementation of the STHYI opcode in the kernel. This is used in libraries like qclib [1] to provide enough information for a capacity and usage based software licence pricing. The STHYI content is defined by the related z/VM documentation [2]. Its data can be composed by accessing several other interfaces provided by LPAR or the machine. This information is partially sensitive or root-only so the kernel does the necessary filtering. 2. Preparation for nested virtualization (VSIE). KVM should query the proper sclp interfaces for the availability of some features before using it. In the past we have been sloppy and simply assumed that several features are available. With this we should be able to handle most cases of a missing feature. 3. CPU model interfaces extended by some additional features that are not covered by a facility bit in STFLE. For example all the crypto instructions of the coprocessor provide a query function. As reality tends to be more complex (e.g. export regulations might block some algorithms) we have to provide additional interfaces to query or set these non-stfle features. 4. Several fixes and changes detected and fixed when doing 1-3. All features change base s390 code. All relevant patches have an ACK from the s390 or component maintainers. The next pull request for 4.8 (part2) will contain the implementation of VSIE. [1] http://www.ibm.com/developerworks/linux/linux390/qclib.html [2] https://www.ibm.com/support/knowledgecenter/SSB27U_6.3.0/com.ibm.zvm.v630.hcpb4/hcpb4sth.htm
2016-06-14KVM: x86: Fix typosAndrea Gelmini
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-10KVM: s390: provide CMMA attributes only if availableDavid Hildenbrand
Let's not provide the device attribute for cmma enabling and clearing if the hardware doesn't support it. This also helps getting rid of the undocumented return value "-EINVAL" in case CMMA is not available when trying to enable it. Also properly document the meaning of -EINVAL for CMMA clearing. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: interface to query and configure cpu subfunctionsDavid Hildenbrand
We have certain instructions that indicate available subfunctions via a query subfunction (crypto functions and ptff), or via a test bit function (plo). By exposing these "subfunction blocks" to user space, we allow user space to 1) query available subfunctions and make sure subfunctions won't get lost during migration - e.g. properly indicate them via a CPU model 2) change the subfunctions to be reported to the guest (even adding unavailable ones) This mechanism works just like the way we indicate the stfl(e) list to user space. This way, user space could even emulate some subfunctions in QEMU in the future. If this is ever applicable, we have to make sure later on, that unsupported subfunctions result in an intercept to QEMU. Please note that support to indicate them to the guest is still missing and requires hardware support. Usually, the IBC takes already care of these subfunctions for migration safety. QEMU should make sure to always set these bits properly according to the machine generation to be emulated. Available subfunctions are only valid in combination with STFLE bits retrieved via KVM_S390_VM_CPU_MACHINE and enabled via KVM_S390_VM_CPU_PROCESSOR. If the applicable bits are available, the indicated subfunctions are guaranteed to be correct. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: interface to query and configure cpu featuresDavid Hildenbrand
For now, we only have an interface to query and configure facilities indicated via STFL(E). However, we also have features indicated via SCLP, that have to be indicated to the guest by user space and usually require KVM support. This patch allows user space to query and configure available cpu features for the guest. Please note that disabling a feature doesn't necessarily mean that it is completely disabled (e.g. ESOP is mostly handled by the SIE). We will try our best to disable it. Most features (e.g. SCLP) can't directly be forwarded, as most of them need in addition to hardware support, support in KVM. As we later on want to turn these features in KVM explicitly on/off (to simulate different behavior), we have to filter all features provided by the hardware and make them configurable. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-05-11kvm: introduce KVM_MAX_VCPU_IDGreg Kurz
The KVM_MAX_VCPUS define provides the maximum number of vCPUs per guest, and also the upper limit for vCPU ids. This is okay for all archs except PowerPC which can have higher ids, depending on the cpu/core/thread topology. In the worst case (single threaded guest, host with 8 threads per core), it limits the maximum number of vCPUS to KVM_MAX_VCPUS / 8. This patch separates the vCPU numbering from the total number of vCPUs, with the introduction of KVM_MAX_VCPU_ID, as the maximal valid value for vCPU ids plus one. The corresponding KVM_CAP_MAX_VCPU_ID allows userspace to validate vCPU ids before passing them to KVM_CREATE_VCPU. This patch only implements KVM_MAX_VCPU_ID with a specific value for PowerPC. Other archs continue to return KVM_MAX_VCPUS instead. Suggested-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-10Merge tag 'kvm-s390-next-4.7-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: features and fixes for 4.7 part2 - Use hardware provided information about facility bits that do not need any hypervisor activitiy - Add missing documentation for KVM_CAP_S390_RI - Some updates/fixes for handling cpu models and facilities
2016-05-09KVM: s390: document KVM_CAP_S390_RIDavid Hildenbrand
We forgot to document that capability, let's add documentation. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-04-25Documentation: virtual: fix spelling mistakeEric Engestrom
Signed-off-by: Eric Engestrom <eric@engestrom.ch> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-04-20KVM: s390: add clear I/O irq operation for FLICHalil Pasic
Introduce a FLIC operation for clearing I/O interrupts for a subchannel. Rationale: According to the platform specification, pending I/O interruption requests have to be revoked in certain situations. For instance, according to the Principles of Operation (page 17-27), a subchannel put into the installed parameters initialized state is in the same state as after an I/O system reset (just parameters possibly changed). This implies that any I/O interrupts for that subchannel are no longer pending (as I/O system resets clear I/O interrupts). Therefore, we need an interface to clear pending I/O interrupts. Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-04-20KVM: s390: document FLIC behavior on unsupportedHalil Pasic
FLIC behavior deviates from the API documentation in reporting EINVAL instead of ENXIO for KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR when the group or attribute is unknown/unsupported. Unfortunately this can not be fixed for historical reasons. Let us at least have it documented. Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "One of the largest releases for KVM... Hardly any generic changes, but lots of architecture-specific updates. ARM: - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems - PMU support for guests - 32bit world switch rewritten in C - various optimizations to the vgic save/restore code. PPC: - enabled KVM-VFIO integration ("VFIO device") - optimizations to speed up IPIs between vcpus - in-kernel handling of IOMMU hypercalls - support for dynamic DMA windows (DDW). s390: - provide the floating point registers via sync regs; - separated instruction vs. data accesses - dirty log improvements for huge guests - bugfixes and documentation improvements. x86: - Hyper-V VMBus hypercall userspace exit - alternative implementation of lowest-priority interrupts using vector hashing (for better VT-d posted interrupt support) - fixed guest debugging with nested virtualizations - improved interrupt tracking in the in-kernel IOAPIC - generic infrastructure for tracking writes to guest memory - currently its only use is to speedup the legacy shadow paging (pre-EPT) case, but in the future it will be used for virtual GPUs as well - much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (217 commits) KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch KVM: x86: disable MPX if host did not enable MPX XSAVE features arm64: KVM: vgic-v3: Only wipe LRs on vcpu exit arm64: KVM: vgic-v3: Reset LRs at boot time arm64: KVM: vgic-v3: Do not save an LR known to be empty arm64: KVM: vgic-v3: Save maintenance interrupt state only if required arm64: KVM: vgic-v3: Avoid accessing ICH registers KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hit KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exit KVM: arm/arm64: vgic-v2: Reset LRs at boot time KVM: arm/arm64: vgic-v2: Do not save an LR known to be empty KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own function KVM: arm/arm64: vgic-v2: Save maintenance interrupt state only if required KVM: arm/arm64: vgic-v2: Avoid accessing GICH registers KVM: s390: allocate only one DMA page per VM KVM: s390: enable STFLE interpretation only if enabled for the guest KVM: s390: wake up when the VCPU cpu timer expires KVM: s390: step the VCPU timer while in enabled wait KVM: s390: protect VCPU cpu timer with a seqcount KVM: s390: step VCPU cpu timer during kvm_run ioctl ...
2016-03-10KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 comboPaolo Bonzini
Yes, all of these are needed. :) This is admittedly a bit odd, but kvm-unit-tests access.flat tests this if you run it with "-cpu host" and of course ept=0. KVM runs the guest with CR0.WP=1, so it must handle supervisor writes specially when pte.u=1/pte.w=0/CR0.WP=0. Such writes cause a fault when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0. When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and restarts execution. This will still cause a user write to fault, while supervisor writes will succeed. User reads will fault spuriously now, and KVM will then flip U and W again in the SPTE (U=1, W=0). User reads will be enabled and supervisor writes disabled, going back to the originary situation where supervisor writes fault spuriously. When SMEP is in effect, however, U=0 will enable kernel execution of this page. To avoid this, KVM also sets NX=1 in the shadow PTE together with U=0. If the guest has not enabled NX, the result is a continuous stream of page faults due to the NX bit being reserved. The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER switch. (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry control, so they do not use user-return notifiers for EFER---if they did, EFER.NX would be forced to the same value as the host). There is another bug in the reserved bit check, which I've split to a separate patch for easier application to stable kernels. Cc: stable@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-09Merge tag 'kvm-arm-for-4.6' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM updates for 4.6 - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems - PMU support for guests - 32bit world switch rewritten in C - Various optimizations to the vgic save/restore code Conflicts: include/uapi/linux/kvm.h
2016-03-04KVM: document KVM_REINJECT_CONTROL ioctlRadim Krčmář
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowedXiao Guangrong
kvm_lpage_info->write_count is used to detect if the large page mapping for the gfn on the specified level is allowed, rename it to disallow_lpage to reflect its purpose, also we rename has_wrprotected_page() to mmu_gfn_lpage_is_disallowed() to make the code more clearer Later we will extend this mechanism for page tracking: if the gfn is tracked then large mapping for that gfn on any level is not allowed. The new name is more straightforward Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03Merge branch 'kvm-ppc-next' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD The highlights are: * Enable VFIO device on PowerPC, from David Gibson * Optimizations to speed up IPIs between vcpus in HV KVM, from Suresh Warrier (who is also Suresh E. Warrier) * In-kernel handling of IOMMU hypercalls, and support for dynamic DMA windows (DDW), from Alexey Kardashevskiy. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-02KVM: PPC: Add support for 64bit TCE windowsAlexey Kardashevskiy
The existing KVM_CREATE_SPAPR_TCE only supports 32bit windows which is not enough for directly mapped windows as the guest can get more than 4GB. This adds KVM_CREATE_SPAPR_TCE_64 ioctl and advertises it via KVM_CAP_SPAPR_TCE_64 capability. The table size is checked against the locked memory limit. Since 64bit windows are to support Dynamic DMA windows (DDW), let's add @bus_offset and @page_shift which are also required by DDW. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-29arm64: KVM: Add a new vcpu device control group for PMUv3Shannon Zhao
To configure the virtual PMUv3 overflow interrupt number, we use the vcpu kvm_device ioctl, encapsulating the KVM_ARM_VCPU_PMU_V3_IRQ attribute within the KVM_ARM_VCPU_PMU_V3_CTRL group. After configuring the PMUv3, call the vcpu ioctl with attribute KVM_ARM_VCPU_PMU_V3_INIT to initialize the PMUv3. Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Acked-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29arm64: KVM: Introduce per-vcpu kvm device controlsShannon Zhao
In some cases it needs to get/set attributes specific to a vcpu and so needs something else than ONE_REG. Let's copy the KVM_DEVICE approach, and define the respective ioctls for the vcpu file descriptor. Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Acked-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29arm64: KVM: Add a new feature bit for PMUv3Shannon Zhao
To support guest PMUv3, use one bit of the VCPU INIT feature array. Initialize the PMU when initialzing the vcpu with that bit and PMU overflow interrupt set. Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Acked-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-16kvm/x86: Hyper-V VMBus hypercall userspace exitAndrey Smetanin
The patch implements KVM_EXIT_HYPERV userspace exit functionality for Hyper-V VMBus hypercalls: HV_X64_HCALL_POST_MESSAGE, HV_X64_HCALL_SIGNAL_EVENT. Changes v3: * use vcpu->arch.complete_userspace_io to setup hypercall result Changes v2: * use KVM_EXIT_HYPERV for hypercalls Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Joerg Roedel <joro@8bytes.org> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-16KVM: PPC: Add support for multiple-TCE hcallsAlexey Kardashevskiy
This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO devices or emulated PCI. These calls allow adding multiple entries (up to 512) into the TCE table in one call which saves time on transition between kernel and user space. The current implementation of kvmppc_h_stuff_tce() allows it to be executed in both real and virtual modes so there is one helper. The kvmppc_rm_h_put_tce_indirect() needs to translate the guest address to the host address and since the translation is different, there are 2 helpers - one for each mode. This implements the KVM_CAP_PPC_MULTITCE capability. When present, the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE if these are enabled by the userspace via KVM_CAP_PPC_ENABLE_HCALL. If they can not be handled by the kernel, they are passed on to the user space. The user space still has to have an implementation for these. Both HV and PR-syle KVM are supported. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-10KVM: s390: usage hint for adapter mappingsCornelia Huck
The interface for adapter mappings was designed with code in mind that maps each address only once; let's document this. Otherwise, duplicate mappings are added to the list, which makes the code ineffective and uses up the limited amount of mapping needlessly. Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10KVM: s390: add documentation of KVM_S390_VM_CRYPTODavid Hildenbrand
Let's properly document KVM_S390_VM_CRYPTO and its attributes. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10KVM: s390: add documentation of KVM_S390_VM_TODDavid Hildenbrand
Let's properly document KVM_S390_VM_TOD and its attributes. Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-01-26KVM doc: Fix KVM_SMI chapter numberAlexey Kardashevskiy
The KVM_SMI capability is following the KVM_S390_SET_IRQ_STATE capability which is "4.95", this changes the number of the KVM_SMI chapter to 4.96. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-12-15KVM: s390: fix mismatch between user and in-kernel guest limitDominik Dingel
While the userspace interface requests the maximum size the gmap code expects to get a maximum address. This error resulted in bigger page tables than necessary for some guest sizes, e.g. a 2GB guest used 3 levels instead of 2. At the same time we introduce KVM_S390_NO_MEM_LIMIT, which allows in a bright future that a guest spans the complete 64 bit address space. We also switch to TASK_MAX_SIZE for the initial memory size, this is a cosmetic change as the previous size also resulted in a 4 level pagetable creation. Reported-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>