summaryrefslogtreecommitdiff
path: root/arch/arm/kvm
AgeCommit message (Collapse)Author
2021-03-17KVM: arm64: Fix exclusive limit for IPA sizeMarc Zyngier
Commit 262b003d059c6671601a19057e9fe1a5e7f23722 upstream. When registering a memslot, we check the size and location of that memslot against the IPA size to ensure that we can provide guest access to the whole of the memory. Unfortunately, this check rejects memslot that end-up at the exact limit of the addressing capability for a given IPA size. For example, it refuses the creation of a 2GB memslot at 0x8000000 with a 32bit IPA space. Fix it by relaxing the check to accept a memslot reaching the limit of the IPA space. Fixes: c3058d5da222 ("arm/arm64: KVM: Ensure memslots are within KVM_PHYS_SIZE") Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org # 4.4, 4.9 Reviewed-by: Andrew Jones <drjones@redhat.com> Link: https://lore.kernel.org/r/20210311100016.3830038-3-maz@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26KVM: arm/arm64: Don't reschedule in unmap_stage2_range()Will Deacon
Upstream commits fdfe7cbd5880 ("KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()") and b5331379bc62 ("KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set") fix a "sleeping from invalid context" BUG caused by unmap_stage2_range() attempting to reschedule when called on the OOM path. Unfortunately, these patches rely on the MMU notifier callback being passed knowledge about whether or not blocking is permitted, which was introduced in 4.19. Rather than backport this considerable amount of infrastructure just for KVM on arm, instead just remove the conditional reschedule. Cc: <stable@vger.kernel.org> # v4.4 only Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-10KVM: arm/arm64: Only skip MMIO insn onceAndrew Jones
[ Upstream commit 2113c5f62b7423e4a72b890bd479704aa85c81ba ] If after an MMIO exit to userspace a VCPU is immediately run with an immediate_exit request, such as when a signal is delivered or an MMIO emulation completion is needed, then the VCPU completes the MMIO emulation and immediately returns to userspace. As the exit_reason does not get changed from KVM_EXIT_MMIO in these cases we have to be careful not to complete the MMIO emulation again, when the VCPU is eventually run again, because the emulation does an instruction skip (and doing too many skips would be a waste of guest code :-) We need to use additional VCPU state to track if the emulation is complete. As luck would have it, we already have 'mmio_needed', which even appears to be used in this way by other architectures already. Fixes: 0d640732dbeb ("arm64: KVM: Skip MMIO insn after emulation") Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-11KVM: arm/arm64: Ensure vcpu target is unset on reset failureAndrew Jones
[ Upstream commit 811328fc3222f7b55846de0cd0404339e2e1e6d7 ] A failed KVM_ARM_VCPU_INIT should not set the vcpu target, as the vcpu target is used by kvm_vcpu_initialized() to determine if other vcpu ioctls may proceed. We need to set the target before calling kvm_reset_vcpu(), but if that call fails, we should then unset it and clear the feature bitmap while we're at it. Signed-off-by: Andrew Jones <drjones@redhat.com> [maz: Simplified patch, completed commit message] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-23KVM: arm/arm64: Fix MMIO emulation data handlingChristoffer Dall
commit 83091db981e105d97562d3ed3ffe676e21927e3a upstream. When the kernel was handling a guest MMIO read access internally, we need to copy the emulation result into the run->mmio structure in order for the kvm_handle_mmio_return() function to pick it up and inject the result back into the guest. Currently the only user of kvm_io_bus for ARM is the VGIC, which did this copying itself, so this was not causing issues so far. But with the upcoming new vgic implementation we need this done properly. Update the kvm_handle_mmio_return description and cleanup the code to only perform a single copying when needed. Code and commit message inspired by Andre Przywara. Reported-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23arm/arm64: KVM: Feed initialized memory to MMIO accessesMarc Zyngier
commit 1d6a821277aaa0cdd666278aaff93298df313d41 upstream. On an MMIO access, we always copy the on-stack buffer info the shared "run" structure, even if this is a read access. This ends up leaking up to 8 bytes of uninitialized memory into userspace, depending on the size of the access. An obvious fix for this one is to only perform the copy if this is an actual write. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-20arm64: KVM: Skip MMIO insn after emulationMark Rutland
[ Upstream commit 0d640732dbebed0f10f18526de21652931f0b2f2 ] When we emulate an MMIO instruction, we advance the CPU state within decode_hsr(), before emulating the instruction effects. Having this logic in decode_hsr() is opaque, and advancing the state before emulation is problematic. It gets in the way of applying consistent single-step logic, and it prevents us from being able to fail an MMIO instruction with a synchronous exception. Clean this up by only advancing the CPU state *after* the effects of the instruction are emulated. Cc: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-09-05KVM: arm/arm64: Skip updating PMD entry if no changePunit Agrawal
commit 86658b819cd0a9aa584cd84453ed268a6f013770 upstream. Contention on updating a PMD entry by a large number of vcpus can lead to duplicate work when handling stage 2 page faults. As the page table update follows the break-before-make requirement of the architecture, it can lead to repeated refaults due to clearing the entry and flushing the tlbs. This problem is more likely when - * there are large number of vcpus * the mapping is large block mapping such as when using PMD hugepages (512MB) with 64k pages. Fix this by skipping the page table update if there is no change in the entry being updated. Cc: stable@vger.kernel.org Fixes: ad361f093c1e ("KVM: ARM: Support hugetlbfs backed huge pages") Reviewed-by: Suzuki Poulose <suzuki.poulose@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-05KVM: arm/arm64: Skip updating PTE entry if no changePunit Agrawal
commit 976d34e2dab10ece5ea8fe7090b7692913f89084 upstream. When there is contention on faulting in a particular page table entry at stage 2, the break-before-make requirement of the architecture can lead to additional refaulting due to TLB invalidation. Avoid this by skipping a page table update if the new value of the PTE matches the previous value. Cc: stable@vger.kernel.org Fixes: d5d8184d35c9 ("KVM: ARM: Memory virtualization setup") Reviewed-by: Suzuki Poulose <suzuki.poulose@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16arm: KVM: Fix SMCCC handling of unimplemented SMC/HVC callsMarc Zyngier
commit 20e8175d246e9f9deb377f2784b3e7dfb2ad3e86 upstream. KVM doesn't follow the SMCCC when it comes to unimplemented calls, and inject an UNDEF instead of returning an error. Since firmware calls are now used for security mitigation, they are becoming more common, and the undef is counter productive. Instead, let's follow the SMCCC which states that -1 must be returned to the caller when getting an unknown function number. Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17KVM: Fix stack-out-of-bounds read in write_mmioWanpeng Li
commit e39d200fa5bf5b94a0948db0dae44c1b73b84a56 upstream. Reported by syzkaller: BUG: KASAN: stack-out-of-bounds in write_mmio+0x11e/0x270 [kvm] Read of size 8 at addr ffff8803259df7f8 by task syz-executor/32298 CPU: 6 PID: 32298 Comm: syz-executor Tainted: G OE 4.15.0-rc2+ #18 Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016 Call Trace: dump_stack+0xab/0xe1 print_address_description+0x6b/0x290 kasan_report+0x28a/0x370 write_mmio+0x11e/0x270 [kvm] emulator_read_write_onepage+0x311/0x600 [kvm] emulator_read_write+0xef/0x240 [kvm] emulator_fix_hypercall+0x105/0x150 [kvm] em_hypercall+0x2b/0x80 [kvm] x86_emulate_insn+0x2b1/0x1640 [kvm] x86_emulate_instruction+0x39a/0xb90 [kvm] handle_exception+0x1b4/0x4d0 [kvm_intel] vcpu_enter_guest+0x15a0/0x2640 [kvm] kvm_arch_vcpu_ioctl_run+0x549/0x7d0 [kvm] kvm_vcpu_ioctl+0x479/0x880 [kvm] do_vfs_ioctl+0x142/0x9a0 SyS_ioctl+0x74/0x80 entry_SYSCALL_64_fastpath+0x23/0x9a The path of patched vmmcall will patch 3 bytes opcode 0F 01 C1(vmcall) to the guest memory, however, write_mmio tracepoint always prints 8 bytes through *(u64 *)val since kvm splits the mmio access into 8 bytes. This leaks 5 bytes from the kernel stack (CVE-2017-17741). This patch fixes it by just accessing the bytes which we operate on. Before patch: syz-executor-5567 [007] .... 51370.561696: kvm_mmio: mmio write len 3 gpa 0x10 val 0x1ffff10077c1010f After patch: syz-executor-13416 [002] .... 51302.299573: kvm_mmio: mmio write len 3 gpa 0x10 val 0xc1010f Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-16arm: KVM: Survive unknown traps from guestsMark Rutland
[ Upstream commit f050fe7a9164945dd1c28be05bf00e8cfb082ccf ] Currently we BUG() if we see a HSR.EC value we don't recognise. As configurable disables/enables are added to the architecture (controlled by RES1/RES0 bits respectively), with associated synchronous exceptions, it may be possible for a guest to trigger exceptions with classes that we don't recognise. While we can't service these exceptions in a manner useful to the guest, we can avoid bringing down the host. Per ARM DDI 0406C.c, all currently unallocated HSR EC encodings are reserved, and per ARM DDI 0487A.k_iss10775, page G6-4395, EC values within the range 0x00 - 0x2c are reserved for future use with synchronous exceptions, and EC values within the range 0x2d - 0x3f may be used for either synchronous or asynchronous exceptions. The patch makes KVM handle any unknown EC by injecting an UNDEFINED exception into the guest, with a corresponding (ratelimited) warning in the host dmesg. We could later improve on this with with a new (opt-in) exit to the host userspace. Cc: Dave Martin <dave.martin@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-07kvm: arm/arm64: Force reading uncached stage2 PGDSuzuki K Poulose
commit 2952a6070e07ebdd5896f1f5b861acad677caded upstream. Make sure we don't use a cached value of the KVM stage2 PGD while resetting the PGD. Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-07kvm: arm/arm64: Fix race in resetting stage2 PGDSuzuki K Poulose
commit 6c0d706b563af732adb094c5bf807437e8963e84 upstream. In kvm_free_stage2_pgd() we check the stage2 PGD before holding the lock and proceed to take the lock if it is valid. And we unmap the page tables, followed by releasing the lock. We reset the PGD only after dropping this lock, which could cause a race condition where another thread waiting on or even holding the lock, could potentially see that the PGD is still valid and proceed to perform a stage2 operation and later encounter a NULL PGD. [223090.242280] Unable to handle kernel NULL pointer dereference at virtual address 00000040 [223090.262330] PC is at unmap_stage2_range+0x8c/0x428 [223090.262332] LR is at kvm_unmap_hva_handler+0x2c/0x3c [223090.262531] Call trace: [223090.262533] [<ffff0000080adb78>] unmap_stage2_range+0x8c/0x428 [223090.262535] [<ffff0000080adf40>] kvm_unmap_hva_handler+0x2c/0x3c [223090.262537] [<ffff0000080ace2c>] handle_hva_to_gpa+0xb0/0x104 [223090.262539] [<ffff0000080af988>] kvm_unmap_hva+0x5c/0xbc [223090.262543] [<ffff0000080a2478>] kvm_mmu_notifier_invalidate_page+0x50/0x8c [223090.262547] [<ffff0000082274f8>] __mmu_notifier_invalidate_page+0x5c/0x84 [223090.262551] [<ffff00000820b700>] try_to_unmap_one+0x1d0/0x4a0 [223090.262553] [<ffff00000820c5c8>] rmap_walk+0x1cc/0x2e0 [223090.262555] [<ffff00000820c90c>] try_to_unmap+0x74/0xa4 [223090.262557] [<ffff000008230ce4>] migrate_pages+0x31c/0x5ac [223090.262561] [<ffff0000081f869c>] compact_zone+0x3fc/0x7ac [223090.262563] [<ffff0000081f8ae0>] compact_zone_order+0x94/0xb0 [223090.262564] [<ffff0000081f91c0>] try_to_compact_pages+0x108/0x290 [223090.262569] [<ffff0000081d5108>] __alloc_pages_direct_compact+0x70/0x1ac [223090.262571] [<ffff0000081d64a0>] __alloc_pages_nodemask+0x434/0x9f4 [223090.262572] [<ffff0000082256f0>] alloc_pages_vma+0x230/0x254 [223090.262574] [<ffff000008235e5c>] do_huge_pmd_anonymous_page+0x114/0x538 [223090.262576] [<ffff000008201bec>] handle_mm_fault+0xd40/0x17a4 [223090.262577] [<ffff0000081fb324>] __get_user_pages+0x12c/0x36c [223090.262578] [<ffff0000081fb804>] get_user_pages_unlocked+0xa4/0x1b8 [223090.262579] [<ffff0000080a3ce8>] __gfn_to_pfn_memslot+0x280/0x31c [223090.262580] [<ffff0000080a3dd0>] gfn_to_pfn_prot+0x4c/0x5c [223090.262582] [<ffff0000080af3f8>] kvm_handle_guest_abort+0x240/0x774 [223090.262584] [<ffff0000080b2bac>] handle_exit+0x11c/0x1ac [223090.262586] [<ffff0000080ab99c>] kvm_arch_vcpu_ioctl_run+0x31c/0x648 [223090.262587] [<ffff0000080a1d78>] kvm_vcpu_ioctl+0x378/0x768 [223090.262590] [<ffff00000825df5c>] do_vfs_ioctl+0x324/0x5a4 [223090.262591] [<ffff00000825e26c>] SyS_ioctl+0x90/0xa4 [223090.262595] [<ffff000008085d84>] el0_svc_naked+0x38/0x3c This patch moves the stage2 PGD manipulation under the lock. Reported-by: Alexander Graf <agraf@suse.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-12KVM: arm/arm64: Handle hva aging while destroying the vmSuzuki K Poulose
commit 7e5a672289c9754d07e1c3b33649786d3d70f5e4 upstream. The mmu_notifier_release() callback of KVM triggers cleaning up the stage2 page table on kvm-arm. However there could be other notifier callbacks in parallel with the mmu_notifier_release(), which could cause the call backs ending up in an empty stage2 page table. Make sure we check it for all the notifier callbacks. Fixes: commit 293f29363 ("kvm-arm: Unmap shadow pagetables properly") Reported-by: Alex Graf <agraf@suse.de> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pagesMarc Zyngier
commit d6dbdd3c8558cad3b6d74cc357b408622d122331 upstream. Under memory pressure, we start ageing pages, which amounts to parsing the page tables. Since we don't want to allocate any extra level, we pass NULL for our private allocation cache. Which means that stage2_get_pud() is allowed to fail. This results in the following splat: [ 1520.409577] Unable to handle kernel NULL pointer dereference at virtual address 00000008 [ 1520.417741] pgd = ffff810f52fef000 [ 1520.421201] [00000008] *pgd=0000010f636c5003, *pud=0000010f56f48003, *pmd=0000000000000000 [ 1520.429546] Internal error: Oops: 96000006 [#1] PREEMPT SMP [ 1520.435156] Modules linked in: [ 1520.438246] CPU: 15 PID: 53550 Comm: qemu-system-aar Tainted: G W 4.12.0-rc4-00027-g1885c397eaec #7205 [ 1520.448705] Hardware name: FOXCONN R2-1221R-A4/C2U4N_MB, BIOS G31FB12A 10/26/2016 [ 1520.463726] task: ffff800ac5fb4e00 task.stack: ffff800ce04e0000 [ 1520.469666] PC is at stage2_get_pmd+0x34/0x110 [ 1520.474119] LR is at kvm_age_hva_handler+0x44/0xf0 [ 1520.478917] pc : [<ffff0000080b137c>] lr : [<ffff0000080b149c>] pstate: 40000145 [ 1520.486325] sp : ffff800ce04e33d0 [ 1520.489644] x29: ffff800ce04e33d0 x28: 0000000ffff40064 [ 1520.494967] x27: 0000ffff27e00000 x26: 0000000000000000 [ 1520.500289] x25: ffff81051ba65008 x24: 0000ffff40065000 [ 1520.505618] x23: 0000ffff40064000 x22: 0000000000000000 [ 1520.510947] x21: ffff810f52b20000 x20: 0000000000000000 [ 1520.516274] x19: 0000000058264000 x18: 0000000000000000 [ 1520.521603] x17: 0000ffffa6fe7438 x16: ffff000008278b70 [ 1520.526940] x15: 000028ccd8000000 x14: 0000000000000008 [ 1520.532264] x13: ffff7e0018298000 x12: 0000000000000002 [ 1520.537582] x11: ffff000009241b93 x10: 0000000000000940 [ 1520.542908] x9 : ffff0000092ef800 x8 : 0000000000000200 [ 1520.548229] x7 : ffff800ce04e36a8 x6 : 0000000000000000 [ 1520.553552] x5 : 0000000000000001 x4 : 0000000000000000 [ 1520.558873] x3 : 0000000000000000 x2 : 0000000000000008 [ 1520.571696] x1 : ffff000008fd5000 x0 : ffff0000080b149c [ 1520.577039] Process qemu-system-aar (pid: 53550, stack limit = 0xffff800ce04e0000) [...] [ 1521.510735] [<ffff0000080b137c>] stage2_get_pmd+0x34/0x110 [ 1521.516221] [<ffff0000080b149c>] kvm_age_hva_handler+0x44/0xf0 [ 1521.522054] [<ffff0000080b0610>] handle_hva_to_gpa+0xb8/0xe8 [ 1521.527716] [<ffff0000080b3434>] kvm_age_hva+0x44/0xf0 [ 1521.532854] [<ffff0000080a58b0>] kvm_mmu_notifier_clear_flush_young+0x70/0xc0 [ 1521.539992] [<ffff000008238378>] __mmu_notifier_clear_flush_young+0x88/0xd0 [ 1521.546958] [<ffff00000821eca0>] page_referenced_one+0xf0/0x188 [ 1521.552881] [<ffff00000821f36c>] rmap_walk_anon+0xec/0x250 [ 1521.558370] [<ffff000008220f78>] rmap_walk+0x78/0xa0 [ 1521.563337] [<ffff000008221104>] page_referenced+0x164/0x180 [ 1521.569002] [<ffff0000081f1af0>] shrink_active_list+0x178/0x3b8 [ 1521.574922] [<ffff0000081f2058>] shrink_node_memcg+0x328/0x600 [ 1521.580758] [<ffff0000081f23f4>] shrink_node+0xc4/0x328 [ 1521.585986] [<ffff0000081f2718>] do_try_to_free_pages+0xc0/0x340 [ 1521.592000] [<ffff0000081f2a64>] try_to_free_pages+0xcc/0x240 [...] The trivial fix is to handle this NULL pud value early, rather than dereferencing it blindly. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14arm: KVM: Allow unaligned accesses at HYPMarc Zyngier
commit 33b5c38852b29736f3b472dd095c9a18ec22746f upstream. We currently have the HSCTLR.A bit set, trapping unaligned accesses at HYP, but we're not really prepared to deal with it. Since the rest of the kernel is pretty happy about that, let's follow its example and set HSCTLR.A to zero. Modern CPUs don't really care. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20KVM: arm/arm64: fix races in kvm_psci_vcpu_onAndrew Jones
commit 6c7a5dce22b3f3cc44be098e2837fa6797edb8b8 upstream. Fix potential races in kvm_psci_vcpu_on() by taking the kvm->lock mutex. In general, it's a bad idea to allow more than one PSCI_CPU_ON to process the same target VCPU at the same time. One such problem that may arise is that one PSCI_CPU_ON could be resetting the target vcpu, which fills the entire sys_regs array with a temporary value including the MPIDR register, while another looks up the VCPU based on the MPIDR value, resulting in no target VCPU found. Resolves both races found with the kvm-unit-tests/arm/psci unit test. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Reported-by: Levente Kurusa <lkurusa@redhat.com> Suggested-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-27kvm: arm/arm64: Fix locking for kvm_free_stage2_pgdSuzuki K Poulose
commit 8b3405e345b5a098101b0c31b264c812bba045d9 upstream. In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling unmap_stage2_range() on the entire memory range for the guest. This could cause problems with other callers (e.g, munmap on a memslot) trying to unmap a range. And since we have to unmap the entire Guest memory range holding a spinlock, make sure we yield the lock if necessary, after we unmap each PUD range. Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup") Cc: Paolo Bonzini <pbonzin@redhat.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> [ Avoid vCPU starvation and lockup detector warnings ] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-12arm/arm64: KVM: Take mmap_sem in kvm_arch_prepare_memory_regionMarc Zyngier
commit 72f310481a08db821b614e7b5d00febcc9064b36 upstream. We don't hold the mmap_sem while searching for VMAs (via find_vma), in kvm_arch_prepare_memory_region, which can end up in expected failures. Fixes: commit 8eef91239e57 ("arm/arm64: KVM: map MMIO regions at creation time") Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Eric Auger <eric.auger@rehat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> [ Handle dirty page logging failure case ] Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-12arm/arm64: KVM: Take mmap_sem in stage2_unmap_vmMarc Zyngier
commit 90f6e150e44a0dc3883110eeb3ab35d1be42b6bb upstream. We don't hold the mmap_sem while searching for the VMAs when we try to unmap each memslot for a VM. Fix this properly to avoid unexpected results. Fixes: commit 957db105c997 ("arm/arm64: KVM: Introduce stage2_unmap_vm") Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-24kvm-arm: Unmap shadow pagetables properlySuzuki K Poulose
commit 293f293637b55db4f9f522a5a72514e98a541076 upstream. On arm/arm64, we depend on the kvm_unmap_hva* callbacks (via mmu_notifiers::invalidate_*) to unmap the stage2 pagetables when the userspace buffer gets unmapped. However, when the Hypervisor process exits without explicit unmap of the guest buffers, the only notifier we get is kvm_arch_flush_shadow_all() (via mmu_notifier::release ) which does nothing on arm. Later this causes us to access pages that were already released [via exit_mmap() -> unmap_vmas()] when we actually get to unmap the stage2 pagetable [via kvm_arch_destroy_vm() -> kvm_free_stage2_pgd()]. This triggers crashes with CONFIG_DEBUG_PAGEALLOC, which unmaps any free'd pages from the linear map. [ 757.644120] Unable to handle kernel paging request at virtual address ffff800661e00000 [ 757.652046] pgd = ffff20000b1a2000 [ 757.655471] [ffff800661e00000] *pgd=00000047fffe3003, *pud=00000047fcd8c003, *pmd=00000047fcc7c003, *pte=00e8004661e00712 [ 757.666492] Internal error: Oops: 96000147 [#3] PREEMPT SMP [ 757.672041] Modules linked in: [ 757.675100] CPU: 7 PID: 3630 Comm: qemu-system-aar Tainted: G D 4.8.0-rc1 #3 [ 757.683240] Hardware name: AppliedMicro X-Gene Mustang Board/X-Gene Mustang Board, BIOS 3.06.15 Aug 19 2016 [ 757.692938] task: ffff80069cdd3580 task.stack: ffff8006adb7c000 [ 757.698840] PC is at __flush_dcache_area+0x1c/0x40 [ 757.703613] LR is at kvm_flush_dcache_pmd+0x60/0x70 [ 757.708469] pc : [<ffff20000809dbdc>] lr : [<ffff2000080b4a70>] pstate: 20000145 ... [ 758.357249] [<ffff20000809dbdc>] __flush_dcache_area+0x1c/0x40 [ 758.363059] [<ffff2000080b6748>] unmap_stage2_range+0x458/0x5f0 [ 758.368954] [<ffff2000080b708c>] kvm_free_stage2_pgd+0x34/0x60 [ 758.374761] [<ffff2000080b2280>] kvm_arch_destroy_vm+0x20/0x68 [ 758.380570] [<ffff2000080aa330>] kvm_put_kvm+0x210/0x358 [ 758.385860] [<ffff2000080aa524>] kvm_vm_release+0x2c/0x40 [ 758.391239] [<ffff2000082ad234>] __fput+0x114/0x2e8 [ 758.396096] [<ffff2000082ad46c>] ____fput+0xc/0x18 [ 758.400869] [<ffff200008104658>] task_work_run+0x108/0x138 [ 758.406332] [<ffff2000080dc8ec>] do_exit+0x48c/0x10e8 [ 758.411363] [<ffff2000080dd5fc>] do_group_exit+0x6c/0x130 [ 758.416739] [<ffff2000080ed924>] get_signal+0x284/0xa18 [ 758.421943] [<ffff20000808a098>] do_signal+0x158/0x860 [ 758.427060] [<ffff20000808aad4>] do_notify_resume+0x6c/0x88 [ 758.432608] [<ffff200008083624>] work_pending+0x10/0x14 [ 758.437812] Code: 9ac32042 8b010001 d1000443 8a230000 (d50b7e20) This patch fixes the issue by moving the kvm_free_stage2_pgd() to kvm_arch_flush_shadow_all(). Tested-by: Itaru Kitayama <itaru.kitayama@riken.jp> Reported-by: Itaru Kitayama <itaru.kitayama@riken.jp> Reported-by: James Morse <james.morse@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01arm/arm64: KVM: Enforce Break-Before-Make on Stage-2 page tablesMarc Zyngier
commit d4b9e0790aa764c0b01e18d4e8d33e93ba36d51f upstream. The ARM architecture mandates that when changing a page table entry from a valid entry to another valid entry, an invalid entry is first written, TLB invalidated, and only then the new entry being written. The current code doesn't respect this, directly writing the new entry and only then invalidating TLBs. Let's fix it up. Reported-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-09arm/arm64: KVM: Fix ioctl error handlingMichael S. Tsirkin
commit 4cad67fca3fc952d6f2ed9e799621f07666a560f upstream. Calling return copy_to_user(...) in an ioctl will not do the right thing if there's a pagefault: copy_to_user returns the number of bytes not copied in this case. Fix up kvm to do return copy_to_user(...)) ? -EFAULT : 0; everywhere. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-04ARM/arm64: KVM: correct PTE uncachedness checkArd Biesheuvel
Commit e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's uncachedness") modified the logic to test whether a HYP or stage-2 mapping needs flushing, from [incorrectly] interpreting the page table attributes to [incorrectly] checking whether the PFN that backs the mapping is covered by host system RAM. The PFN number is part of the output of the translation, not the input, so we have to use pte_pfn() on the contents of the PTE, not __phys_to_pfn() on the HYP virtual address or stage-2 intermediate physical address. Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's uncachedness") Cc: stable@vger.kernel.org Tested-by: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-12-04arm64: KVM: Get rid of old vcpu_reg()Pavel Fedin
Using oldstyle vcpu_reg() accessor is proven to be inappropriate and unsafe on ARM64. This patch converts the rest of use cases to new accessors and completely removes vcpu_reg() on ARM64. Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-12-04arm64: KVM: Correctly handle zero register during MMIOPavel Fedin
On ARM64 register index of 31 corresponds to both zero register and SP. However, all memory access instructions, use ZR as transfer register. SP is used only as a base register in indirect memory addressing, or by register-register arithmetics, which cannot be trapped here. Correct emulation is achieved by introducing new register accessor functions, which can do special handling for reg_num == 31. These new accessors intentionally do not rely on old vcpu_reg() on ARM64, because it is to be removed. Since the affected code is shared by both ARM flavours, implementations of these accessors are also added to ARM32 code. This patch fixes setting MMIO register to a random value (actually SP) instead of zero by something like: *((volatile int *)reg) = 0; compilers tend to generate "str wzr, [xx]" here [Marc: Fixed 32bit splat] Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-11-24KVM: arm/arm64: Fix preemptible timer active state crazynessChristoffer Dall
We were setting the physical active state on the GIC distributor in a preemptible section, which could cause us to set the active state on different physical CPU from the one we were actually going to run on, hacoc ensues. Since we are no longer descheduling/scheduling soft timers in the flush/sync timer functions, simply moving the timer flush into a non-preemptible section. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24ARM/arm64: KVM: test properly for a PTE's uncachednessArd Biesheuvel
The open coded tests for checking whether a PTE maps a page as uncached use a flawed '(pte_val(xxx) & CONST) != CONST' pattern, which is not guaranteed to work since the type of a mapping is not a set of mutually exclusive bits For HYP mappings, the type is an index into the MAIR table (i.e, the index itself does not contain any information whatsoever about the type of the mapping), and for stage-2 mappings it is a bit field where normal memory and device types are defined as follows: #define MT_S2_NORMAL 0xf #define MT_S2_DEVICE_nGnRE 0x1 I.e., masking *and* comparing with the latter matches on the former, and we have been getting lucky merely because the S2 device mappings also have the PTE_UXN bit set, or we would misidentify memory mappings as device mappings. Since the unmap_range() code path (which contains one instance of the flawed test) is used both for HYP mappings and stage-2 mappings, and considering the difference between the two, it is non-trivial to fix this by rewriting the tests in place, as it would involve passing down the type of mapping through all the functions. However, since HYP mappings and stage-2 mappings both deal with host physical addresses, we can simply check whether the mapping is backed by memory that is managed by the host kernel, and only perform the D-cache maintenance if this is the case. Cc: stable@vger.kernel.org Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Pavel Fedin <p.fedin@samsung.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22arm/arm64: KVM: Improve kvm_exit tracepointChristoffer Dall
The ARM architecture only saves the exit class to the HSR (ESR_EL2 for arm64) on synchronous exceptions, not on asynchronous exceptions like an IRQ. However, we only report the exception class on kvm_exit, which is confusing because an IRQ looks like it exited at some PC with the same reason as the previous exit. Add a lookup table for the exception index and prepend the kvm_exit tracepoint text with the exception type to clarify this situation. Also resolve the exception class (EC) to a human-friendly text version so the trace output becomes immediately usable for debugging this code. Cc: Wei Huang <wei@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22KVM: arm/arm64: implement kvm_arm_[halt,resume]_guestEric Auger
We introduce kvm_arm_halt_guest and resume functions. They will be used for IRQ forward state change. Halt is synchronous and prevents the guest from being re-entered. We use the same mechanism put in place for PSCI former pause, now renamed power_off. A new flag is introduced in arch vcpu state, pause, only meant to be used by those functions. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22KVM: arm/arm64: check power_off in critical section before VCPU runEric Auger
In case a vcpu off PSCI call is called just after we executed the vcpu_sleep check, we can enter the guest although power_off is set. Let's check the power_off state in the critical section, just before entering the guest. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reported-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22KVM: arm/arm64: check power_off in kvm_arch_vcpu_runnableEric Auger
kvm_arch_vcpu_runnable now also checks whether the power_off flag is set. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22KVM: arm/arm64: rename pause into power_offEric Auger
The kvm_vcpu_arch pause field is renamed into power_off to prepare for the introduction of a new pause field. Also vcpu_pause is renamed into vcpu_sleep since we will sleep until both power_off and pause are false. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22arm/arm64: KVM : Enable vhost device selection under KVM config menuWei Huang
vhost drivers provide guest VMs with better I/O performance and lower CPU utilization. This patch allows users to select vhost devices under KVM configuration menu on ARM. This makes vhost support on arm/arm64 on a par with other architectures (e.g. x86, ppc). Signed-off-by: Wei Huang <wei@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22arm/arm64: KVM: Rework the arch timer to use level-triggered semanticsChristoffer Dall
The arch timer currently uses edge-triggered semantics in the sense that the line is never sampled by the vgic and lowering the line from the timer to the vgic doesn't have any effect on the pending state of virtual interrupts in the vgic. This means that we do not support a guest with the otherwise valid behavior of (1) disable interrupts (2) enable the timer (3) disable the timer (4) enable interrupts. Such a guest would validly not expect to see any interrupts on real hardware, but will see interrupts on KVM. This patch fixes this shortcoming through the following series of changes. First, we change the flow of the timer/vgic sync/flush operations. Now the timer is always flushed/synced before the vgic, because the vgic samples the state of the timer output. This has the implication that we move the timer operations in to non-preempible sections, but that is fine after the previous commit getting rid of hrtimer schedules on every entry/exit. Second, we change the internal behavior of the timer, letting the timer keep track of its previous output state, and only lower/raise the line to the vgic when the state changes. Note that in theory this could have been accomplished more simply by signalling the vgic every time the state *potentially* changed, but we don't want to be hitting the vgic more often than necessary. Third, we get rid of the use of the map->active field in the vgic and instead simply set the interrupt as active on the physical distributor whenever the input to the GIC is asserted and conversely clear the physical active state when the input to the GIC is deasserted. Fourth, and finally, we now initialize the timer PPIs (and all the other unused PPIs for now), to be level-triggered, and modify the sync code to sample the line state on HW sync and re-inject a new interrupt if it is still pending at that time. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_blockChristoffer Dall
We currently schedule a soft timer every time we exit the guest if the timer did not expire while running the guest. This is really not necessary, because the only work we do in the timer work function is to kick the vcpu. Kicking the vcpu does two things: (1) If the vpcu thread is on a waitqueue, make it runnable and remove it from the waitqueue. (2) If the vcpu is running on a different physical CPU from the one doing the kick, it sends a reschedule IPI. The second case cannot happen, because the soft timer is only ever scheduled when the vcpu is not running. The first case is only relevant when the vcpu thread is on a waitqueue, which is only the case when the vcpu thread has called kvm_vcpu_block(). Therefore, we only need to make sure a timer is scheduled for kvm_vcpu_block(), which we do by encapsulating all calls to kvm_vcpu_block() with kvm_timer_{un}schedule calls. Additionally, we only schedule a soft timer if the timer is enabled and unmasked, since it is useless otherwise. Note that theoretically userspace can use the SET_ONE_REG interface to change registers that should cause the timer to fire, even if the vcpu is blocked without a scheduled timer, but this case was not supported before this patch and we leave it for future work for now. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-20KVM: arm: use GIC support unconditionallyArnd Bergmann
The vgic code on ARM is built for all configurations that enable KVM, but the parent_data field that it references is only present when CONFIG_IRQ_DOMAIN_HIERARCHY is set: virt/kvm/arm/vgic.c: In function 'kvm_vgic_map_phys_irq': virt/kvm/arm/vgic.c:1781:13: error: 'struct irq_data' has no member named 'parent_data' This flag is implied by the GIC driver, and indeed the VGIC code only makes sense if a GIC is present. This changes the CONFIG_KVM symbol to always select GIC, which avoids the issue. Fixes: 662d9715840 ("arm/arm64: KVM: Kill CONFIG_KVM_ARM_{VGIC,TIMER}") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-20KVM: arm/arm64: Fix memory leak if timer initialization failsPavel Fedin
Jump to correct label and free kvm_host_cpu_state Reviewed-by: Wei Huang <wei@redhat.com> Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-09-17arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS'Ming Lei
This patch removes config option of KVM_ARM_MAX_VCPUS, and like other ARCHs, just choose the maximum allowed value from hardware, and follows the reasons: 1) from distribution view, the option has to be defined as the max allowed value because it need to meet all kinds of virtulization applications and need to support most of SoCs; 2) using a bigger value doesn't introduce extra memory consumption, and the help text in Kconfig isn't accurate because kvm_vpu structure isn't allocated until request of creating VCPU is sent from QEMU; 3) the main effect is that the field of vcpus[] in 'struct kvm' becomes a bit bigger(sizeof(void *) per vcpu) and need more cache lines to hold the structure, but 'struct kvm' is one generic struct, and it has worked well on other ARCHs already in this way. Also, the world switch frequecy is often low, for example, it is ~2000 when running kernel building load in VM from APM xgene KVM host, so the effect is very small, and the difference can't be observed in my test at all. Cc: Dann Frazier <dann.frazier@canonical.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-09-17arm: KVM: Disable virtual timer even if the guest is not using itMarc Zyngier
When running a guest with the architected timer disabled (with QEMU and the kernel_irqchip=off option, for example), it is important to make sure the timer gets turned off. Otherwise, the guest may try to enable it anyway, leading to a screaming HW interrupt. The fix is to unconditionally turn off the virtual timer on guest exit. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-09-16arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping resourcesPavel Fedin
Until b26e5fdac43c ("arm/arm64: KVM: introduce per-VM ops"), kvm_vgic_map_resources() used to include a check on irqchip_in_kernel(), and vgic_v2_map_resources() still has it. But now vm_ops are not initialized until we call kvm_vgic_create(). Therefore kvm_vgic_map_resources() can being called without a VGIC, and we die because vm_ops.map_resources is NULL. Fixing this restores QEMU's kernel-irqchip=off option to a working state, allowing to use GIC emulation in userspace. Fixes: b26e5fdac43c ("arm/arm64: KVM: introduce per-VM ops") Cc: stable@vger.kernel.org Signed-off-by: Pavel Fedin <p.fedin@samsung.com> [maz: reworked commit message] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-09-16arm: KVM: Fix incorrect device to IPA mappingMarek Majtyka
A critical bug has been found in device memory stage1 translation for VMs with more then 4GB of address space. Once vm_pgoff size is smaller then pa (which is true for LPAE case, u32 and u64 respectively) some more significant bits of pa may be lost as a shift operation is performed on u32 and later cast onto u64. Example: vm_pgoff(u32)=0x00210030, PAGE_SHIFT=12 expected pa(u64): 0x0000002010030000 produced pa(u64): 0x0000000010030000 The fix is to change the order of operations (casting first onto phys_addr_t and then shifting). Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> [maz: fixed changelog and patch formatting] Cc: stable@vger.kernel.org Signed-off-by: Marek Majtyka <marek.majtyka@tieto.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-09-04arm/arm64: KVM: Fix PSCI affinity info return value for non valid coresAlexander Spyridakis
If a guest requests the affinity info for a non-existing vCPU we need to properly return an error, instead of erroneously reporting an off state. Signed-off-by: Alexander Spyridakis <a.spyridakis@virtualopensystems.com> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-19arm: KVM: keep arm vfp/simd exit handling consistent with arm64Mario Smarduch
After enhancing arm64 FP/SIMD exit handling, ARMv7 VFP exit branch is moved to guest trap handling. This allows us to keep exit handling flow between both architectures consistent. Signed-off-by: Mario Smarduch <m.smarduch@samsung.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12KVM: arm/arm64: timer: Allow the timer to control the active stateMarc Zyngier
In order to remove the crude hack where we sneak the masked bit into the timer's control register, make use of the phys_irq_map API control the active state of the interrupt. This causes some limited changes to allow for potential error propagation. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interruptsMarc Zyngier
In order to be able to feed physical interrupts to a guest, we need to be able to establish the virtual-physical mapping between the two worlds. The mappings are kept in a set of RCU lists, indexed by virtual interrupts. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12arm/arm64: KVM: Move vgic handling to a non-preemptible sectionMarc Zyngier
As we're about to introduce some serious GIC-poking to the vgic code, it is important to make sure that we're going to poke the part of the GIC that belongs to the CPU we're about to run on (otherwise, we'd end up with some unexpected interrupts firing)... Introducing a non-preemptible section in kvm_arch_vcpu_ioctl_run prevents the problem from occuring. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12arm/arm64: KVM: Fix ordering of timer/GIC on guest entryMarc Zyngier
As we now inject the timer interrupt when we're about to enter the guest, it makes a lot more sense to make sure this happens before the vgic code queues the pending interrupts. Otherwise, we get the interrupt on the following exit, which is not great for latency (and leads to all kind of bizarre issues when using with active interrupts at the HW level). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-07-21KVM: arm64: introduce vcpu->arch.debug_ptrAlex Bennée
This introduces a level of indirection for the debug registers. Instead of using the sys_regs[] directly we store registers in a structure in the vcpu. The new kvm_arm_reset_debug_ptr() sets the debug ptr to the guest context. Because we no longer give the sys_regs offset for the sys_reg_desc->reg field, but instead the index into a debug-specific struct we need to add a number of additional trap functions for each register. Also as the generic generic user-space access code no longer works we have introduced a new pair of function pointers to the sys_reg_desc structure to override the generic code when needed. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>