summaryrefslogtreecommitdiff
path: root/arch/powerpc/kvm/book3s_xive.c
AgeCommit message (Collapse)Author
2019-10-11KVM: PPC: Book3S HV: XIVE: Free escalation interrupts before disabling the VPCédric Le Goater
[ Upstream commit 237aed48c642328ff0ab19b63423634340224a06 ] When a vCPU is brought done, the XIVE VP (Virtual Processor) is first disabled and then the event notification queues are freed. When freeing the queues, we check for possible escalation interrupts and free them also. But when a XIVE VP is disabled, the underlying XIVE ENDs also are disabled in OPAL. When an END (Event Notification Descriptor) is disabled, its ESB pages (ESn and ESe) are disabled and loads return all 1s. Which means that any access on the ESB page of the escalation interrupt will return invalid values. When an interrupt is freed, the shutdown handler computes a 'saved_p' field from the value returned by a load in xive_do_source_set_mask(). This value is incorrect for escalation interrupts for the reason described above. This has no impact on Linux/KVM today because we don't make use of it but we will introduce in future changes a xive_get_irqchip_state() handler. This handler will use the 'saved_p' field to return the state of an interrupt and 'saved_p' being incorrect, softlockup will occur. Fix the vCPU cleanup sequence by first freeing the escalation interrupts if any, then disable the XIVE VP and last free the queues. Fixes: 90c73795afa2 ("KVM: PPC: Book3S HV: Add a new KVM device for the XIVE native exploitation mode") Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190806172538.5087-1-clg@kaod.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-09KVM: PPC: Book3S HV: XIVE: Do not clear IRQ data of passthrough interruptsCédric Le Goater
commit ef9740204051d0e00f5402fe96cf3a43ddd2bbbf upstream. The passthrough interrupts are defined at the host level and their IRQ data should not be cleared unless specifically deconfigured (shutdown) by the host. They differ from the IPI interrupts which are allocated by the XIVE KVM device and reserved to the guest usage only. This fixes a host crash when destroying a VM in which a PCI adapter was passed-through. In this case, the interrupt is cleared and freed by the KVM device and then shutdown by vfio at the host level. [ 1007.360265] BUG: Kernel NULL pointer dereference at 0x00000d00 [ 1007.360285] Faulting instruction address: 0xc00000000009da34 [ 1007.360296] Oops: Kernel access of bad area, sig: 7 [#1] [ 1007.360303] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV [ 1007.360314] Modules linked in: vhost_net vhost iptable_mangle ipt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc kvm_hv kvm xt_tcpudp iptable_filter squashfs fuse binfmt_misc vmx_crypto ib_iser rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi nfsd ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq multipath mlx5_ib ib_uverbs ib_core crc32c_vpmsum mlx5_core [ 1007.360425] CPU: 9 PID: 15576 Comm: CPU 18/KVM Kdump: loaded Not tainted 5.1.0-gad7e7d0ef #4 [ 1007.360454] NIP: c00000000009da34 LR: c00000000009e50c CTR: c00000000009e5d0 [ 1007.360482] REGS: c000007f24ccf330 TRAP: 0300 Not tainted (5.1.0-gad7e7d0ef) [ 1007.360500] MSR: 900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24002484 XER: 00000000 [ 1007.360532] CFAR: c00000000009da10 DAR: 0000000000000d00 DSISR: 00080000 IRQMASK: 1 [ 1007.360532] GPR00: c00000000009e62c c000007f24ccf5c0 c000000001510600 c000007fe7f947c0 [ 1007.360532] GPR04: 0000000000000d00 0000000000000000 0000000000000000 c000005eff02d200 [ 1007.360532] GPR08: 0000000000400000 0000000000000000 0000000000000000 fffffffffffffffd [ 1007.360532] GPR12: c00000000009e5d0 c000007fffff7b00 0000000000000031 000000012c345718 [ 1007.360532] GPR16: 0000000000000000 0000000000000008 0000000000418004 0000000000040100 [ 1007.360532] GPR20: 0000000000000000 0000000008430000 00000000003c0000 0000000000000027 [ 1007.360532] GPR24: 00000000000000ff 0000000000000000 00000000000000ff c000007faa90d98c [ 1007.360532] GPR28: c000007faa90da40 00000000000fe040 ffffffffffffffff c000007fe7f947c0 [ 1007.360689] NIP [c00000000009da34] xive_esb_read+0x34/0x120 [ 1007.360706] LR [c00000000009e50c] xive_do_source_set_mask.part.0+0x2c/0x50 [ 1007.360732] Call Trace: [ 1007.360738] [c000007f24ccf5c0] [c000000000a6383c] snooze_loop+0x15c/0x270 (unreliable) [ 1007.360775] [c000007f24ccf5f0] [c00000000009e62c] xive_irq_shutdown+0x5c/0xe0 [ 1007.360795] [c000007f24ccf630] [c00000000019e4a0] irq_shutdown+0x60/0xe0 [ 1007.360813] [c000007f24ccf660] [c000000000198c44] __free_irq+0x3a4/0x420 [ 1007.360831] [c000007f24ccf700] [c000000000198dc8] free_irq+0x78/0xe0 [ 1007.360849] [c000007f24ccf730] [c00000000096c5a8] vfio_msi_set_vector_signal+0xa8/0x350 [ 1007.360878] [c000007f24ccf7f0] [c00000000096c938] vfio_msi_set_block+0xe8/0x1e0 [ 1007.360899] [c000007f24ccf850] [c00000000096cae0] vfio_msi_disable+0xb0/0x110 [ 1007.360912] [c000007f24ccf8a0] [c00000000096cd04] vfio_pci_set_msi_trigger+0x1c4/0x3d0 [ 1007.360922] [c000007f24ccf910] [c00000000096d910] vfio_pci_set_irqs_ioctl+0xa0/0x170 [ 1007.360941] [c000007f24ccf930] [c00000000096b400] vfio_pci_disable+0x80/0x5e0 [ 1007.360963] [c000007f24ccfa10] [c00000000096b9bc] vfio_pci_release+0x5c/0x90 [ 1007.360991] [c000007f24ccfa40] [c000000000963a9c] vfio_device_fops_release+0x3c/0x70 [ 1007.361012] [c000007f24ccfa70] [c0000000003b5668] __fput+0xc8/0x2b0 [ 1007.361040] [c000007f24ccfac0] [c0000000001409b0] task_work_run+0x140/0x1b0 [ 1007.361059] [c000007f24ccfb20] [c000000000118f8c] do_exit+0x3ac/0xd00 [ 1007.361076] [c000007f24ccfc00] [c0000000001199b0] do_group_exit+0x60/0x100 [ 1007.361094] [c000007f24ccfc40] [c00000000012b514] get_signal+0x1a4/0x8f0 [ 1007.361112] [c000007f24ccfd30] [c000000000021cc8] do_notify_resume+0x1a8/0x430 [ 1007.361141] [c000007f24ccfe20] [c00000000000e444] ret_from_except_lite+0x70/0x74 [ 1007.361159] Instruction dump: [ 1007.361175] 38422c00 e9230000 712a0004 41820010 548a2036 7d442378 78840020 71290020 [ 1007.361194] 4082004c e9230010 7c892214 7c0004ac <e9240000> 0c090000 4c00012c 792a0022 Cc: stable@vger.kernel.org # v4.12+ Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-29KVM: PPC: Book3S HV: Fix pending_pri value in kvmppc_xive_get_icp()Laurent Vivier
commit 7333b5aca412d6ad02667b5a513485838a91b136 upstream. When we migrate a VM from a POWER8 host (XICS) to a POWER9 host (XICS-on-XIVE), we have an error: qemu-kvm: Unable to restore KVM interrupt controller state \ (0xff000000) for CPU 0: Invalid argument This is because kvmppc_xics_set_icp() checks the new state is internaly consistent, and especially: ... 1129 if (xisr == 0) { 1130 if (pending_pri != 0xff) 1131 return -EINVAL; ... On the other side, kvmppc_xive_get_icp() doesn't set neither the pending_pri value, nor the xisr value (set to 0) (and kvmppc_xive_set_icp() ignores the pending_pri value) As xisr is 0, pending_pri must be set to 0xff. Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-29KVM: PPC: Book3S: fix XIVE migration of pending interruptsCédric Le Goater
commit dc1c4165d189350cb51bdd3057deb6ecd164beda upstream. When restoring a pending interrupt, we are setting the Q bit to force a retrigger in xive_finish_unmask(). But we also need to force an EOI in this case to reach the same initial state : P=1, Q=0. This can be done by not setting 'old_p' for pending interrupts which will inform xive_finish_unmask() that an EOI needs to be sent. Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Tested-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-03KVM: PPC: Book3S: Fix server always zero from kvmppc_xive_get_xive()Sam Bobroff
In KVM's XICS-on-XIVE emulation, kvmppc_xive_get_xive() returns the value of state->guest_server as "server". However, this value is not set by it's counterpart kvmppc_xive_set_xive(). When the guest uses this interface to migrate interrupts away from a CPU that is going offline, it sees all interrupts as belonging to CPU 0, so they are left assigned to (now) offline CPUs. This patch removes the guest_server field from the state, and returns act_server in it's place (that is, the CPU actually handling the interrupt, which may differ from the one requested). Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Cc: stable@vger.kernel.org Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-09-12KVM: PPC: Book3S HV: Don't access XIVE PIPR register using byte accessesBenjamin Herrenschmidt
The XIVE interrupt controller on POWER9 machines doesn't support byte accesses to any register in the thread management area other than the CPPR (current processor priority register). In particular, when reading the PIPR (pending interrupt priority register), we need to do a 32-bit or 64-bit load. Cc: stable@vger.kernel.org # v4.13 Fixes: 2c4fb78f78b6 ("KVM: PPC: Book3S HV: Workaround POWER9 DD1.0 bug causing IPB bit loss") Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-07-03KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving codePaul Mackerras
This fixes a typo where the wrong loop index was used to index the kvmppc_xive_vcpu.queues[] array in xive_pre_save_scan(). The variable i contains the vcpu number; we need to index queues[] using j, which iterates from 0 to KVMPPC_XIVE_Q_COUNT-1. The effect of this bug is that things that save the interrupt controller state, such as "virsh dump", on a VM with more than 8 vCPUs, result in xive_pre_save_queue() getting called on a bogus queue structure, usually resulting in a crash like this: [ 501.821107] Unable to handle kernel paging request for data at address 0x00000084 [ 501.821212] Faulting instruction address: 0xc008000004c7c6f8 [ 501.821234] Oops: Kernel access of bad area, sig: 11 [#1] [ 501.821305] SMP NR_CPUS=1024 [ 501.821307] NUMA [ 501.821376] PowerNV [ 501.821470] Modules linked in: vhost_net vhost tap xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables ses enclosure scsi_transport_sas ipmi_powernv ipmi_devintf ipmi_msghandler powernv_op_panel kvm_hv nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc kvm tg3 ptp pps_core [ 501.822477] CPU: 3 PID: 3934 Comm: live_migration Not tainted 4.11.0-4.git8caa70f.el7.centos.ppc64le #1 [ 501.822633] task: c0000003f9e3ae80 task.stack: c0000003f9ed4000 [ 501.822745] NIP: c008000004c7c6f8 LR: c008000004c7c628 CTR: 0000000030058018 [ 501.822877] REGS: c0000003f9ed7980 TRAP: 0300 Not tainted (4.11.0-4.git8caa70f.el7.centos.ppc64le) [ 501.823030] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> [ 501.823047] CR: 28022244 XER: 00000000 [ 501.823203] CFAR: c008000004c7c77c DAR: 0000000000000084 DSISR: 40000000 SOFTE: 1 [ 501.823203] GPR00: c008000004c7c628 c0000003f9ed7c00 c008000004c91450 00000000000000ff [ 501.823203] GPR04: c0000003f5580000 c0000003f559bf98 9000000000009033 0000000000000000 [ 501.823203] GPR08: 0000000000000084 0000000000000000 00000000000001e0 9000000000001003 [ 501.823203] GPR12: c00000000008a7d0 c00000000fdc1b00 000000000a9a0000 0000000000000000 [ 501.823203] GPR16: 00000000402954e8 000000000a9a0000 0000000000000004 0000000000000000 [ 501.823203] GPR20: 0000000000000008 c000000002e8f180 c000000002e8f1e0 0000000000000001 [ 501.823203] GPR24: 0000000000000008 c0000003f5580008 c0000003f4564018 c000000002e8f1e8 [ 501.823203] GPR28: 00003ff6e58bdc28 c0000003f4564000 0000000000000000 0000000000000000 [ 501.825441] NIP [c008000004c7c6f8] xive_get_attr+0x3b8/0x5b0 [kvm] [ 501.825671] LR [c008000004c7c628] xive_get_attr+0x2e8/0x5b0 [kvm] [ 501.825887] Call Trace: [ 501.825991] [c0000003f9ed7c00] [c008000004c7c628] xive_get_attr+0x2e8/0x5b0 [kvm] (unreliable) [ 501.826312] [c0000003f9ed7cd0] [c008000004c62ec4] kvm_device_ioctl_attr+0x64/0xa0 [kvm] [ 501.826581] [c0000003f9ed7d20] [c008000004c62fcc] kvm_device_ioctl+0xcc/0xf0 [kvm] [ 501.826843] [c0000003f9ed7d40] [c000000000350c70] do_vfs_ioctl+0xd0/0x8c0 [ 501.827060] [c0000003f9ed7de0] [c000000000351534] SyS_ioctl+0xd4/0xf0 [ 501.827282] [c0000003f9ed7e30] [c00000000000b8e0] system_call+0x38/0xfc [ 501.827496] Instruction dump: [ 501.827632] 419e0078 3b760008 e9160008 83fb000c 83db0010 80fb0008 2f280000 60000000 [ 501.827901] 60000000 60420000 419a0050 7be91764 <7d284c2c> 552a0ffe 7f8af040 419e003c [ 501.828176] ---[ end trace 2d0529a5bbbbafed ]--- Cc: stable@vger.kernel.org Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-05-09Merge branch 'kvm-ppc-next' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD The main thing here is a new implementation of the in-kernel XICS interrupt controller emulation for POWER9 machines, from Ben Herrenschmidt. POWER9 has a new interrupt controller called XIVE (eXternal Interrupt Virtualization Engine) which is able to deliver interrupts directly to guest virtual CPUs in hardware without hypervisor intervention. With this new code, the guest still sees the old XICS interface but performance is better because the XICS emulation in the host uses the XIVE directly rather than going through a XICS emulation in firmware. Conflicts: arch/powerpc/kernel/cpu_setup_power.S [cherry-picked fix] arch/powerpc/kvm/book3s_xive.c [include asm/debugfs.h]
2017-04-27KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controllerBenjamin Herrenschmidt
This patch makes KVM capable of using the XIVE interrupt controller to provide the standard PAPR "XICS" style hypercalls. It is necessary for proper operations when the host uses XIVE natively. This has been lightly tested on an actual system, including PCI pass-through with a TG3 device. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Cleanup pr_xxx(), unsplit pr_xxx() strings, etc., fix build failures by adding KVM_XIVE which depends on KVM_XICS and XIVE, and adding empty stubs for the kvm_xive_xxx() routines, fixup subject, integrate fixes from Paul for building PR=y HV=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>