summaryrefslogtreecommitdiff
path: root/drivers/xen
AgeCommit message (Collapse)Author
2013-06-07xen/events: Handle VIRQ_TIMER before any other hardirq in event loop.Keir Fraser
commit bee980d9e9642e96351fa3ca9077b853ecf62f57 upstream. This avoids any other hardirq handler seeing a very stale jiffies value immediately after wakeup from a long idle period. The one observable symptom of this was a USB keyboard, with software keyboard repeat, which would always repeat a key immediately that it was pressed. This is due to the key press waking the guest, the key handler immediately runs, sees an old jiffies value, and then that jiffies value significantly updated, before the key is unpressed. Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-20xen/pciback: Don't disable a PCI device that is already disabled.Konrad Rzeszutek Wilk
commit bdc5c1812cea6efe1aaefb3131fcba28cd0b2b68 upstream. While shuting down a HVM guest with pci devices passed through we get this: pciback 0000:04:00.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100002) ------------[ cut here ]------------ WARNING: at drivers/pci/pci.c:1397 pci_disable_device+0x88/0xa0() Hardware name: MS-7640 Device pciback disabling already-disabled device Modules linked in: Pid: 53, comm: xenwatch Not tainted 3.9.0-rc1-20130304a+ #1 Call Trace: [<ffffffff8106994a>] warn_slowpath_common+0x7a/0xc0 [<ffffffff81069a31>] warn_slowpath_fmt+0x41/0x50 [<ffffffff813cf288>] pci_disable_device+0x88/0xa0 [<ffffffff814554a7>] xen_pcibk_reset_device+0x37/0xd0 [<ffffffff81454b6f>] ? pcistub_put_pci_dev+0x6f/0x120 [<ffffffff81454b8d>] pcistub_put_pci_dev+0x8d/0x120 [<ffffffff814582a9>] __xen_pcibk_release_devices+0x59/0xa0 This fixes the bug. Reported-and-Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-28xen: close evtchn port if binding to irq failsWei Liu
commit e7e44e444876478d50630f57b0c31d29f6725020 upstream. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21xen/grant-table: correctly initialize grant table version 1Matt Wilson
commit d0b4d64aadb9f4a90669848de9ef3819050a98cd upstream. Commit 85ff6acb075a484780b3d763fdf41596d8fc0970 (xen/granttable: Grant tables V2 implementation) changed the GREFS_PER_GRANT_FRAME macro from a constant to a conditional expression. The expression depends on grant_table_version being appropriately set. Unfortunately, at init time grant_table_version will be 0. The GREFS_PER_GRANT_FRAME conditional expression checks for "grant_table_version == 1", and therefore returns the number of grant references per frame for v2. This causes gnttab_init() to allocate fewer pages for gnttab_list, as a frame can old half the number of v2 entries than v1 entries. After gnttab_resume() is called, grant_table_version is appropriately set. nr_init_grefs will then be miscalculated and gnttab_free_count will hold a value larger than the actual number of free gref entries. If a guest is heavily utilizing improperly initialized v1 grant tables, memory corruption can occur. One common manifestation is corruption of the vmalloc list, resulting in a poisoned pointer derefrence when accessing /proc/meminfo or /proc/vmallocinfo: [ 40.770064] BUG: unable to handle kernel paging request at 0000200200001407 [ 40.770083] IP: [<ffffffff811a6fb0>] get_vmalloc_info+0x70/0x110 [ 40.770102] PGD 0 [ 40.770107] Oops: 0000 [#1] SMP [ 40.770114] CPU 10 This patch introduces a static variable, grefs_per_grant_frame, to cache the calculated value. gnttab_init() now calls gnttab_request_version() early so that grant_table_version and grefs_per_grant_frame can be appropriately set. A few BUG_ON()s have been added to prevent this type of bug from reoccurring in the future. Signed-off-by: Matt Wilson <msw@amazon.com> Reviewed-and-Tested-by: Steven Noonan <snoonan@amazon.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Annie Li <annie.li@oracle.com> Cc: xen-devel@lists.xen.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26xen/events: fix RCU warning, or Call idle notifier after irq_enter()Mojiong Qiu
commit 772aebcefeff310f80e32b874988af0076cb799d upstream. exit_idle() should be called after irq_enter(), otherwise it throws: [ INFO: suspicious RCU usage. ] 3.6.5 #1 Not tainted ------------------------------- include/linux/rcupdate.h:725 rcu_read_lock() used illegally while idle! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 1 RCU used illegally from extended quiescent state! 1 lock held by swapper/0/0: #0: (rcu_read_lock){......}, at: [<ffffffff810e9fe0>] __atomic_notifier_call_chain+0x0/0x140 stack backtrace: Pid: 0, comm: swapper/0 Not tainted 3.6.5 #1 Call Trace: <IRQ> [<ffffffff811259a2>] lockdep_rcu_suspicious+0xe2/0x130 [<ffffffff810ea10c>] __atomic_notifier_call_chain+0x12c/0x140 [<ffffffff810e9fe0>] ? atomic_notifier_chain_unregister+0x90/0x90 [<ffffffff811216cd>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff810ea136>] atomic_notifier_call_chain+0x16/0x20 [<ffffffff810777c3>] exit_idle+0x43/0x50 [<ffffffff81568865>] xen_evtchn_do_upcall+0x25/0x50 [<ffffffff81aa690e>] xen_do_hypervisor_callback+0x1e/0x30 <EOI> [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000 [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000 [<ffffffff81061540>] ? xen_safe_halt+0x10/0x20 [<ffffffff81075cfa>] ? default_idle+0xba/0x570 [<ffffffff810778af>] ? cpu_idle+0xdf/0x140 [<ffffffff81a4d881>] ? rest_init+0x135/0x144 [<ffffffff81a4d74c>] ? csum_partial_copy_generic+0x16c/0x16c [<ffffffff82520c45>] ? start_kernel+0x3db/0x3e8 [<ffffffff8252066a>] ? repair_env_string+0x5a/0x5a [<ffffffff82520356>] ? x86_64_start_reservations+0x131/0x135 [<ffffffff82524aca>] ? xen_start_kernel+0x465/0x46 Git commit 98ad1cc14a5c4fd658f9d72c6ba5c86dfd3ce0d5 Author: Frederic Weisbecker <fweisbec@gmail.com> Date: Fri Oct 7 18:22:09 2011 +0200 x86: Call idle notifier after irq_enter() did this, but it missed the Xen code. Signed-off-by: Mojiong Qiu <mjqiu@tencent.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-17xen/gntdev: don't leak memory from IOCTL_GNTDEV_MAP_GRANT_REFDavid Vrabel
commit a67baeb77375199bbd842fa308cb565164dd1f19 upstream. map->kmap_ops allocated in gntdev_alloc_map() wasn't freed by gntdev_put_map(). Add a gntdev_free_map() helper function to free everything allocated by gntdev_alloc_map(). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02xen/m2p: do not reuse kmap_op->dev_bus_addrStefano Stabellini
commit 2fc136eecd0c647a6b13fcd00d0c41a1a28f35a5 upstream. If the caller passes a valid kmap_op to m2p_add_override, we use kmap_op->dev_bus_addr to store the original mfn, but dev_bus_addr is part of the interface with Xen and if we are batching the hypercalls it might not have been written by the hypervisor yet. That means that later on Xen will write to it and we'll think that the original mfn is actually what Xen has written to it. Rather than "stealing" struct members from kmap_op, keep using page->index to store the original mfn and add another parameter to m2p_remove_override to get the corresponding kmap_op instead. It is now responsibility of the caller to keep track of which kmap_op corresponds to a particular page in the m2p_override (gntdev, the only user of this interface that passes a valid kmap_op, is already doing that). Reported-and-Tested-By: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-14xen/pciback: Fix proper FLR steps.Konrad Rzeszutek Wilk
commit 80ba77dfbce85f2d1be54847de3c866de1b18a9a upstream. When we do FLR and save PCI config we did it in the wrong order. The end result was that if a PCI device was unbind from its driver, then binded to xen-pciback, and then back to its driver we would get: > lspci -s 04:00.0 04:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 13:42:12 # 4 :~/ > echo "0000:04:00.0" > /sys/bus/pci/drivers/pciback/unbind > modprobe e1000e e1000e: Intel(R) PRO/1000 Network Driver - 2.0.0-k e1000e: Copyright(c) 1999 - 2012 Intel Corporation. e1000e 0000:04:00.0: Disabling ASPM L0s L1 e1000e 0000:04:00.0: enabling device (0000 -> 0002) xen: registering gsi 48 triggering 0 polarity 1 Already setup the GSI :48 e1000e 0000:04:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode e1000e: probe of 0000:04:00.0 failed with error -2 This fixes it by first saving the PCI configuration space, then doing the FLR. Reported-by: Ren, Yongjie <yongjie.ren@intel.com> Reported-and-Tested-by: Tobias Geiger <tobias.geiger@vido.info> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-14xen: Use correct masking in xen_swiotlb_alloc_coherent.Ronny Hegewald
commit b5031ed1be0aa419250557123633453753181643 upstream. When running 32-bit pvops-dom0 and a driver tries to allocate a coherent DMA-memory the xen swiotlb-implementation returned memory beyond 4GB. The underlaying reason is that if the supplied driver passes in a DMA_BIT_MASK(64) ( hwdev->coherent_dma_mask is set to 0xffffffffffffffff) our dma_mask will be u64 set to 0xffffffffffffffff even if we set it to DMA_BIT_MASK(32) previously. Meaning we do not reset the upper bits. By using the dma_alloc_coherent_mask function - it does the proper casting and we get 0xfffffffff. This caused not working sound on a system with 4 GB and a 64-bit compatible sound-card with sets the DMA-mask to 64bit. On bare-metal and the forward-ported xen-dom0 patches from OpenSuse a coherent DMA-memory is always allocated inside the 32-bit address-range by calling dma_alloc_coherent_mask. This patch adds the same functionality to xen swiotlb and is a rebase of the original patch from Ronny Hegewald which never got upstream b/c the underlaying reason was not understood until now. The original email with the original patch is in: http://old-list-archives.xen.org/archives/html/xen-devel/2010-02/msg00038.html the original thread from where the discussion started is in: http://old-list-archives.xen.org/archives/html/xen-devel/2010-01/msg00928.html Signed-off-by: Ronny Hegewald <ronny.hegewald@online.de> Signed-off-by: Stefano Panella <stefano.panella@citrix.com> Acked-By: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-01xen: do not map the same GSI twice in PVHVM guests.Stefano Stabellini
commit 68c2c39a76b094e9b2773e5846424ea674bf2c46 upstream. PV on HVM guests map GSIs into event channels. At restore time the event channels are resumed by restore_pirqs. Device drivers might try to register the same GSI again through ACPI at restore time, but the GSI has already been mapped and bound by restore_pirqs. This patch detects these situations and avoids mapping the same GSI multiple times. Without this patch we get: (XEN) irq.c:2235: dom4: pirq 23 or emuirq 28 already mapped and waste a pirq. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-07xen/Kconfig: fix Kconfig layoutAndrew Morton
Fit it into 80 columns so that it is readable in menuconfig. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-26xen/acpi: Workaround broken BIOSes exporting non-existing C-states.Konrad Rzeszutek Wilk
We did a similar check for the P-states but did not do it for the C-states. What we want to do is ignore cases where the DSDT has definition for sixteen CPUs, but the machine only has eight CPUs and we get: xen-acpi-processor: (CX): Hypervisor error (-22) for ACPI CPU14 Reported-by: Tobias Geiger <tobias.geiger@vido.info> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-26xen: use the pirq number to check the pirq_eoi_mapStefano Stabellini
In pirq_check_eoi_map use the pirq number rather than the Linux irq number to check whether an eoi is needed in the pirq_eoi_map. The reason is that the irq number is not always identical to the pirq number so if we wrongly use the irq number to check the pirq_eoi_map we are going to check for the wrong pirq to EOI. As a consequence some interrupts might not be EOI'ed by the guest correctly. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Tested-by: Tobias Geiger <tobias.geiger@vido.info> [v1: Added some extra wording to git commit] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-19xen/resume: Fix compile warnings.Konrad Rzeszutek Wilk
linux/drivers/xen/manage.c: In function 'do_suspend': linux/drivers/xen/manage.c:160:5: warning: 'si.cancelled' may be used uninitialized in this function Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-19xen/xenbus: Add quirk to deal with misconfigured backends.Konrad Rzeszutek Wilk
A rather annoying and common case is when booting a PVonHVM guest and exposing the PV KBD and PV VFB - as broken toolstacks don't always initialize the backends correctly. Normally The HVM guest is using the VGA driver and the emulated keyboard for this (though upstream version of QEMU implements PV KBD, but still uses a VGA driver). We provide a very basic two-stage wait mechanism - where we wait for 30 seconds for all devices, and then for 270 for all them except the two mentioned. That allows us to wait for the essential devices, like network or disk for the full 6 minutes. To trigger this, put this in your guest config: vfb = [ 'vnc=1, vnclisten=0.0.0.0 ,vncunused=1'] instead of this: vnc=1 vnclisten="0.0.0.0" CC: stable@kernel.org Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> [v3: Split delay in non-essential (30 seconds) and essential devices per Ian and Stefano suggestion] [v4: Added comments per Stefano suggestion] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-18Merge commit 'c104f1fa1ecf4ee0fc06e31b1f77630b2551be81' into ↵Konrad Rzeszutek Wilk
stable/for-linus-3.4 * commit 'c104f1fa1ecf4ee0fc06e31b1f77630b2551be81': (14566 commits) cpufreq: OMAP: fix build errors: depends on ARCH_OMAP2PLUS sparc64: Eliminate obsolete __handle_softirq() function sparc64: Fix bootup crash on sun4v. kconfig: delete last traces of __enabled_ from autoconf.h Revert "kconfig: fix __enabled_ macros definition for invisible and un-selected symbols" kconfig: fix IS_ENABLED to not require all options to be defined irq_domain: fix type mismatch in debugfs output format staging: android: fix mem leaks in __persistent_ram_init() staging: vt6656: Don't leak memory in drivers/staging/vt6656/ioctl.c::private_ioctl() staging: iio: hmc5843: Fix crash in probe function. panic: fix stack dump print on direct call to panic() drivers/rtc/rtc-pl031.c: enable clock on all ST variants Revert "mm: vmscan: fix misused nr_reclaimed in shrink_mem_cgroup_zone()" hugetlb: fix race condition in hugetlb_fault() drivers/rtc/rtc-twl.c: use static register while reading time drivers/rtc/rtc-s3c.c: add placeholder for driver private data drivers/rtc/rtc-s3c.c: fix compilation error MAINTAINERS: add PCDP console maintainer memcg: do not open code accesses to res_counter members drivers/rtc/rtc-efi.c: fix section mismatch warning ...
2012-04-17xen/gntdev: do not set VM_PFNMAPStefano Stabellini
Since we are using the m2p_override we do have struct pages corresponding to the user vma mmap'ed by gntdev. Removing the VM_PFNMAP flag makes get_user_pages work on that vma. An example test case would be using a Xen userspace block backend (QDISK) on a file on NFS using O_DIRECT. CC: stable@kernel.org Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-17xen/grant-table: add error-handling code on failure of gnttab_resumeJulia Lawall
Jump to the label ini_nomem as done on the failure of the page allocations above. The code at ini_nomem is modified to accommodate different return values. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-06Merge tag 'stable/for-linus-3.4-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull xen fixes from Konrad Rzeszutek Wilk: "Two fixes for regressions: * one is a workaround that will be removed in v3.5 with proper fix in the tip/x86 tree, * the other is to fix drivers to load on PV (a previous patch made them only load in PVonHVM mode). The rest are just minor fixes in the various drivers and some cleanup in the core code." * tag 'stable/for-linus-3.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/pcifront: avoid pci_frontend_enable_msix() falsely returning success xen/pciback: fix XEN_PCI_OP_enable_msix result xen/smp: Remove unnecessary call to smp_processor_id() xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries' xen: only check xen_platform_pci_unplug if hvm
2012-04-06xen/pciback: fix XEN_PCI_OP_enable_msix resultJan Beulich
Prior to 2.6.19 and as of 2.6.31, pci_enable_msix() can return a positive value to indicate the number of vectors (less than the amount requested) that can be set up for a given device. Returning this as an operation value (secondary result) is fine, but (primary) operation results are expected to be negative (error) or zero (success) according to the protocol. With the frontend fixed to match the XenoLinux behavior, the backend can now validly return zero (success) here, passing the upper limit on the number of vectors in op->value. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.linaro.org/people/mszyprowski/linux-dma-mapping Pull DMA mapping branch from Marek Szyprowski: "Short summary for the whole series: A few limitations have been identified in the current dma-mapping design and its implementations for various architectures. There exist more than one function for allocating and freeing the buffers: currently these 3 are used dma_{alloc, free}_coherent, dma_{alloc,free}_writecombine, dma_{alloc,free}_noncoherent. For most of the systems these calls are almost equivalent and can be interchanged. For others, especially the truly non-coherent ones (like ARM), the difference can be easily noticed in overall driver performance. Sadly not all architectures provide implementations for all of them, so the drivers might need to be adapted and cannot be easily shared between different architectures. The provided patches unify all these functions and hide the differences under the already existing dma attributes concept. The thread with more references is available here: http://www.spinics.net/lists/linux-sh/msg09777.html These patches are also a prerequisite for unifying DMA-mapping implementation on ARM architecture with the common one provided by dma_map_ops structure and extending it with IOMMU support. More information is available in the following thread: http://thread.gmane.org/gmane.linux.kernel.cross-arch/12819 More works on dma-mapping framework are planned, especially in the area of buffer sharing and managing the shared mappings (together with the recently introduced dma_buf interface: commit d15bd7ee445d "dma-buf: Introduce dma buffer sharing mechanism"). The patches in the current set introduce a new alloc/free methods (with support for memory attributes) in dma_map_ops structure, which will later replace dma_alloc_coherent and dma_alloc_writecombine functions." People finally started piping up with support for merging this, so I'm merging it as the last of the pending stuff from the merge window. Looks like pohmelfs is going to wait for 3.5 and more external support for merging. * 'for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping: common: DMA-mapping: add NON-CONSISTENT attribute common: DMA-mapping: add WRITE_COMBINE attribute common: dma-mapping: introduce mmap method common: dma-mapping: remove old alloc_coherent and free_coherent methods Hexagon: adapt for dma_map_ops changes Unicore32: adapt for dma_map_ops changes Microblaze: adapt for dma_map_ops changes SH: adapt for dma_map_ops changes Alpha: adapt for dma_map_ops changes SPARC: adapt for dma_map_ops changes PowerPC: adapt for dma_map_ops changes MIPS: adapt for dma_map_ops changes X86 & IA64: adapt for dma_map_ops changes common: dma-mapping: introduce generic alloc() and free() methods
2012-03-28X86 & IA64: adapt for dma_map_ops changesAndrzej Pietrasiewicz
Adapt core x86 and IA64 architecture code for dma_map_ops changes: replace alloc/free_coherent with generic alloc/free methods. Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@samsung.com> Acked-by: Kyungmin Park <kyungmin.park@samsung.com> [removed swiotlb related changes and replaced it with wrappers, merged with IA64 patch to avoid inter-patch dependences in intel-iommu code] Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Tony Luck <tony.luck@intel.com>
2012-03-24Merge tag 'stable/for-linus-3.4-tag-two' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull more xen updates from Konrad Rzeszutek Wilk: "One tiny feature that accidentally got lost in the initial git pull: * Add fast-EOI acking of interrupts (clear a bit instead of hypercall) And bug-fixes: * Fix CPU bring-up code missing a call to notify other subsystems. * Fix reading /sys/hypervisor even if PVonHVM drivers are not loaded. * In Xen ACPI processor driver: remove too verbose WARN messages, fix up the Kconfig dependency to be a module by default, and add dependency on CPU_FREQ. * Disable CPU frequency drivers from loading when booting under Xen (as we want the Xen ACPI processor to be used instead). * Cleanups in tmem code." * tag 'stable/for-linus-3.4-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/acpi: Fix Kconfig dependency on CPU_FREQ xen: initialize platform-pci even if xen_emul_unplug=never xen/smp: Fix bringup bug in AP code. xen/acpi: Remove the WARN's as they just create noise. xen/tmem: cleanup xen: support pirq_eoi_map xen/acpi-processor: Do not depend on CPU frequency scaling drivers. xen/cpufreq: Disable the cpu frequency scaling drivers from loading. provide disable_cpufreq() function to disable the API.
2012-03-24xen/acpi: Fix Kconfig dependency on CPU_FREQKonrad Rzeszutek Wilk
The functions: "acpi_processor_*" sound like they depend on CONFIG_ACPI_PROCESSOR but in reality they are exposed when CONFIG_CPU_FREQ=[y|m]. As such update the Kconfig to have this dependency and fix compile issues: ERROR: "acpi_processor_unregister_performance" [drivers/xen/xen-acpi-processor.ko] undefined! ERROR: "acpi_processor_notify_smm" [drivers/xen/xen-acpi-processor.ko] undefined! ERROR: "acpi_processor_register_performance" [drivers/xen/xen-acpi-processor.ko] undefined! ERROR: "acpi_processor_preregister_performance" [drivers/xen/xen-acpi-processor.ko] undefined! Note: We still need the CONFIG_ACPI Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-03-22Merge tag 'stable/for-linus-3.4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull xen updates from Konrad Rzeszutek Wilk: "which has three neat features: - PV multiconsole support, so that there can be hvc1, hvc2, etc; This can be used in HVM and in PV mode. - P-state and C-state power management driver that uploads said power management data to the hypervisor. It also inhibits cpufreq scaling drivers to load so that only the hypervisor can make power management decisions - fixing a weird perf bug. There is one thing in the Kconfig that you won't like: "default y if (X86_ACPI_CPUFREQ = y || X86_POWERNOW_K8 = y)" (note, that it all depends on CONFIG_XEN which depends on CONFIG_PARAVIRT which by default is off). I've a fix to convert that boolean expression into "default m" which I am going to post after the cpufreq git pull - as the two patches to make this work depend on a fix in Dave Jones's tree. - Function Level Reset (FLR) support in the Xen PCI backend. Fixes: - Kconfig dependencies for Xen PV keyboard and video - Compile warnings and constify fixes - Change over to use percpu_xxx instead of this_cpu_xxx" Fix up trivial conflicts in drivers/tty/hvc/hvc_xen.c due to changes to a removed commit. * tag 'stable/for-linus-3.4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen kconfig: relax INPUT_XEN_KBDDEV_FRONTEND deps xen/acpi-processor: C and P-state driver that uploads said data to hypervisor. xen: constify all instances of "struct attribute_group" xen/xenbus: ignore console/0 hvc_xen: introduce HVC_XEN_FRONTEND hvc_xen: implement multiconsole support hvc_xen: support PV on HVM consoles xenbus: don't free other end details too early xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it. xen/setup/pm/acpi: Remove the call to boot_option_idle_override. xenbus: address compiler warnings xen: use this_cpu_xxx replace percpu_xxx funcs xen/pciback: Support pci_reset_function, aka FLR or D3 support. pci: Introduce __pci_reset_function_locked to be used when holding device_lock. xen: Utilize the restore_msi_irqs hook.
2012-03-22Merge tag 'stable/for-linus-3.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm Pull cleancache changes from Konrad Rzeszutek Wilk: "This has some patches for the cleancache API that should have been submitted a _long_ time ago. They are basically cleanups: - rename of flush to invalidate - moving reporting of statistics into debugfs - use __read_mostly as necessary. Oh, and also the MAINTAINERS file change. The files (except the MAINTAINERS file) have been in #linux-next for months now. The late addition of MAINTAINERS file is a brain-fart on my side - didn't realize I needed that just until I was typing this up - and I based that patch on v3.3 - so the tree is on top of v3.3." * tag 'stable/for-linus-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm: MAINTAINERS: Adding cleancache API to the list. mm: cleancache: Use __read_mostly as appropiate. mm: cleancache: report statistics via debugfs instead of sysfs. mm: zcache/tmem/cleancache: s/flush/invalidate/ mm: cleancache: s/flush/invalidate/
2012-03-22xen: initialize platform-pci even if xen_emul_unplug=neverIgor Mammedov
When xen_emul_unplug=never is specified on kernel command line reading files from /sys/hypervisor is broken (returns -EBUSY). It is caused by xen_bus dependency on platform-pci and platform-pci isn't initialized when xen_emul_unplug=never is specified. Fix it by allowing platform-pci to ignore xen_emul_unplug=never, and do not intialize xen_[blk|net]front instead. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-03-21xen/acpi: Remove the WARN's as they just create noise.Konrad Rzeszutek Wilk
When booting the kernel under machines that do not have P-states we would end up with: ------------[ cut here ]------------ WARNING: at drivers/xen/xen-acpi-processor.c:504 xen_acpi_processor_init+0x286/0 x2e0() Hardware name: ProLiant BL460c G6 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.39-200.0.3.el5uek #1 Call Trace: [<ffffffff8191d056>] ? xen_acpi_processor_init+0x286/0x2e0 [<ffffffff81068300>] warn_slowpath_common+0x90/0xc0 [<ffffffff8191cdd0>] ? check_acpi_ids+0x1e0/0x1e0 [<ffffffff8106834a>] warn_slowpath_null+0x1a/0x20 [<ffffffff8191d056>] xen_acpi_processor_init+0x286/0x2e0 [<ffffffff8191cdd0>] ? check_acpi_ids+0x1e0/0x1e0 [<ffffffff81002168>] do_one_initcall+0xe8/0x130 .. snip.. Which is OK - the machines do not have P-states, so we fail to register to process the _PXX states. But there is no need to WARN the user of it. Oracle BZ# 13871288 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-03-20xen/tmem: cleanupJan Beulich
Use 'bool' for boolean variables. Do proper section placement. Eliminate an unnecessary export. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-03-20xen: support pirq_eoi_mapStefano Stabellini
The pirq_eoi_map is a bitmap offered by Xen to check which pirqs need to be EOI'd without having to issue an hypercall every time. We use PHYSDEVOP_pirq_eoi_gmfn_v2 to map the bitmap, then if we succeed we use pirq_eoi_map to check whether pirqs need eoi. Changes in v3: - explicitly use PHYSDEVOP_pirq_eoi_gmfn_v2 rather than PHYSDEVOP_pirq_eoi_gmfn; - introduce pirq_check_eoi_map, a function to check if a pirq needs an eoi using the map; -rename pirq_needs_eoi into pirq_needs_eoi_flag; - introduce a function pointer called pirq_needs_eoi that is going to be set to the right implementation depending on the availability of PHYSDEVOP_pirq_eoi_gmfn_v2. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-03-20xen/acpi-processor: Do not depend on CPU frequency scaling drivers.Konrad Rzeszutek Wilk
With patch "xen/cpufreq: Disable the cpu frequency scaling drivers from loading." we do not have to worry about said drivers loading themselves before the xen-acpi-processor driver. Hence we can remove the default selection (=y if CPU frequency drivers were built-in, or =m if CPU frequency drivers were built as modules), and just select =m for the default case. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-03-19Merge branch 'stable/cleancache.v13' into linux-nextKonrad Rzeszutek Wilk
* stable/cleancache.v13: mm: cleancache: Use __read_mostly as appropiate. mm: cleancache: report statistics via debugfs instead of sysfs. mm: zcache/tmem/cleancache: s/flush/invalidate/ mm: cleancache: s/flush/invalidate/
2012-03-14xen/acpi-processor: C and P-state driver that uploads said data to hypervisor.Konrad Rzeszutek Wilk
This driver solves three problems: 1). Parse and upload ACPI0007 (or PROCESSOR_TYPE) information to the hypervisor - aka P-states (cpufreq data). 2). Upload the the Cx state information (cpuidle data). 3). Inhibit CPU frequency scaling drivers from loading. The reason for wanting to solve 1) and 2) is such that the Xen hypervisor is the only one that knows the CPU usage of different guests and can make the proper decision of when to put CPUs and packages in proper states. Unfortunately the hypervisor has no support to parse ACPI DSDT tables, hence it needs help from the initial domain to provide this information. The reason for 3) is that we do not want the initial domain to change P-states while the hypervisor is doing it as well - it causes rather some funny cases of P-states transitions. For this to work, the driver parses the Power Management data and uploads said information to the Xen hypervisor. It also calls acpi_processor_notify_smm() to inhibit the other CPU frequency scaling drivers from being loaded. Everything revolves around the 'struct acpi_processor' structure which gets updated during the bootup cycle in different stages. At the startup, when the ACPI parser starts, the C-state information is processed (processor_idle) and saved in said structure as 'power' element. Later on, the CPU frequency scaling driver (powernow-k8 or acpi_cpufreq), would call the the acpi_processor_* (processor_perflib functions) to parse P-states information and populate in the said structure the 'performance' element. Since we do not want the CPU frequency scaling drivers from loading we have to call the acpi_processor_* functions to parse the P-states and call "acpi_processor_notify_smm" to stop them from loading. There is also one oddity in this driver which is that under Xen, the physical online CPU count can be different from the virtual online CPU count. Meaning that the macros 'for_[online|possible]_cpu' would process only up to virtual online CPU count. We on the other hand want to process the full amount of physical CPUs. For that, the driver checks if the ACPI IDs count is different from the APIC ID count - which can happen if the user choose to use dom0_max_vcpu argument. In such a case a backup of the PM structure is used and uploaded to the hypervisor. [v1-v2: Initial RFC implementations that were posted] [v3: Changed the name to passthru suggested by Pasi Kärkkäinen <pasik@iki.fi>] [v4: Added vCPU != pCPU support - aka dom0_max_vcpus support] [v5: Cleaned up the driver, fix bug under Athlon XP] [v6: Changed the driver to a CPU frequency governor] [v7: Jan Beulich <jbeulich@suse.com> suggestion to make it a cpufreq scaling driver made me rework it as driver that inhibits cpufreq scaling driver] [v8: Per Jan's review comments, fixed up the driver] [v9: Allow to continue even if acpi_processor_preregister_perf.. fails] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-03-14xen: constify all instances of "struct attribute_group"Jan Beulich
The functions these get passed to have been taking pointers to const since at least 2.6.16. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-03-13xen/xenbus: ignore console/0Stefano Stabellini
Unfortunately xend creates a bogus console/0 frotend/backend entry pair on xenstore that console backends cannot properly cope with. Any guest behavior that is not completely ignoring console/0 is going to either cause problems with xenconsoled or qemu. Returning 0 or -ENODEV from xencons_probe is not enough because it is going to cause the frontend state to become 4 or 6 respectively. The best possible thing we can do here is just ignore the entry from xenbus_probe_frontend. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-03-13xenbus: don't free other end details too earlyJan Beulich
The individual drivers' remove functions could legitimately attempt to access this information (for logging messages if nothing else). Note that I did not in fact observe a problem anywhere, but I came across this while looking into the reasons for what turned out to need the fix at https://lkml.org/lkml/2012/3/5/336 to vsprintf(). Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-03-04Merge branch 'pm-sleep'Rafael J. Wysocki
* pm-sleep: PM / Freezer: Remove references to TIF_FREEZE in comments PM / Sleep: Add more wakeup source initialization routines PM / Hibernate: Enable usermodehelpers in hibernate() error path PM / Sleep: Make __pm_stay_awake() delete wakeup source timers PM / Sleep: Fix race conditions related to wakeup source timer function PM / Sleep: Fix possible infinite loop during wakeup source destruction PM / Hibernate: print physical addresses consistently with other parts of kernel PM: Add comment describing relationships between PM callbacks to pm.h PM / Sleep: Drop suspend_stats_update() PM / Sleep: Make enter_state() in kernel/power/suspend.c static PM / Sleep: Unify kerneldoc comments in kernel/power/suspend.c PM / Sleep: Remove unnecessary label from suspend_freeze_processes() PM / Sleep: Do not check wakeup too often in try_to_freeze_tasks() PM / Sleep: Initialize wakeup source locks in wakeup_source_add() PM / Hibernate: Refactor and simplify freezer_test_done PM / Hibernate: Thaw kernel threads in hibernation_snapshot() in error/test path PM / Freezer / Docs: Document the beauty of freeze/thaw semantics PM / Suspend: Avoid code duplication in suspend statistics update PM / Sleep: Introduce generic callbacks for new device PM phases PM / Sleep: Introduce "late suspend" and "early resume" of devices
2012-02-26xenbus: address compiler warningsJan Beulich
- casting pointers to integer types of different size is being warned on - an uninitialized variable warning occurred on certain gcc versions Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-02-03xenbus_dev: add missing error check to watch handlingJan Beulich
So far only the watch path was checked to be zero terminated, while the watch token was merely assumed to be. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-02-03xen/pci[front|back]: Use %d instead of %1x for displaying PCI devfn.Konrad Rzeszutek Wilk
.. as the rest of the kernel is using that format. Suggested-by: Марк Коренберг <socketpair@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-02-03xen/bootup: During bootup suppress XENBUS: Unable to read cpu stateKonrad Rzeszutek Wilk
When the initial domain starts, it prints (depending on the amount of CPUs) a slew of XENBUS: Unable to read cpu state XENBUS: Unable to read cpu state XENBUS: Unable to read cpu state XENBUS: Unable to read cpu state which provide no useful information - as the error is a valid issue - but not on the initial domain. The reason is that the XenStore is not accessible at that time (it is after all the first guest) so the CPU hotplug watch cannot parse "availability/cpu" attribute. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-01-29PM / Sleep: Introduce "late suspend" and "early resume" of devicesRafael J. Wysocki
The current device suspend/resume phases during system-wide power transitions appear to be insufficient for some platforms that want to use the same callback routines for saving device states and related operations during runtime suspend/resume as well as during system suspend/resume. In principle, they could point their .suspend_noirq() and .resume_noirq() to the same callback routines as their .runtime_suspend() and .runtime_resume(), respectively, but at least some of them require device interrupts to be enabled while the code in those routines is running. It also makes sense to have device suspend-resume callbacks that will be executed with runtime PM disabled and with device interrupts enabled in case someone needs to run some special code in that context during system-wide power transitions. Apart from this, .suspend_noirq() and .resume_noirq() were introduced as a workaround for drivers using shared interrupts and failing to prevent their interrupt handlers from accessing suspended hardware. It appears to be better not to use them for other porposes, or we may have to deal with some serious confusion (which seems to be happening already). For the above reasons, introduce new device suspend/resume phases, "late suspend" and "early resume" (and analogously for hibernation) whose callback will be executed with runtime PM disabled and with device interrupts enabled and whose callback pointers generally may point to runtime suspend/resume routines. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reviewed-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Reviewed-by: Kevin Hilman <khilman@ti.com>
2012-01-27xen/granttable: Disable grant v2 for HVM domains.Konrad Rzeszutek Wilk
As proper scaffolding for supporting error status is not yet implemented. BUG: unable to handle kernel NULL pointer dereference at 0000000000000400 IP: [<ffffffff81375ae9>] gnttab_end_foreign_access_ref_v2+0x29/0x40 PGD 32aa3067 PUD 32a87067 PMD 0 Oops: 0000 [#1] PREEMPT SMP CPU 0 Modules linked in: sg sr_mod cdrom ata_generic ata_piix libata scsi_mod xen_blkfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea xen_kbdfront cmd Pid: 2307, comm: ip Not tainted 3.3.0-rc1 #1 Xen HVM domU RIP: 0010:[<ffffffff81375ae9>] [<ffffffff81375ae9>] gnttab_end_foreign_access_ref_v2+0x29/0x40 RSP: 0018:ffff88003be03d38 EFLAGS: 00010206 RAX: 0000000000000000 RBX: ffff880033210640 RCX: 0000000000000040 RDX: 0000000000002000 RSI: 0000000000000000 RDI: 0000000000000200 RBP: ffff88003be03d38 R08: 0000000000000101 R09: 0000000000000000 R10: dead000000100100 R11: 0000000000000000 R12: ffff88003be03e48 R13: 0000000000000001 R14: ffff880039461c00 R15: 0000000000000200 FS: 00007fb1f84ec700(0000) GS:ffff88003be00000(0000) knlGS:0000000000000000 ... Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-01-23mm: zcache/tmem/cleancache: s/flush/invalidate/Dan Magenheimer
Complete the renaming from "flush" to "invalidate" across both tmem frontends (cleancache and frontswap) and both tmem backends (Xen and zcache), as required by akpm. This change is completely cosmetic. [v10: no change] [v9: akpm@linux-foundation.org: change "flush" to "invalidate", part 3] Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Jan Beulich <JBeulich@novell.com> Acked-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Chris Mason <chris.mason@oracle.com> Cc: Rik Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> [v11: Remove the frontswap part] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-01-18xen: using EXPORT_SYMBOL requires including export.hStephen Rothwell
Fix these warnings: drivers/xen/biomerge.c:14:1: warning: data definition has no type or storage class [enabled by default] drivers/xen/biomerge.c:14:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL' [-Wimplicit-int] drivers/xen/biomerge.c:14:1: warning: parameter names (without types) in function declaration [enabled by default] And this build error: ERROR: "xen_biovec_phys_mergeable" [drivers/block/nvme.ko] undefined! Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-18Merge git://git.infradead.org/users/willy/linux-nvmeLinus Torvalds
* git://git.infradead.org/users/willy/linux-nvme: (105 commits) NVMe: Set number of queues correctly NVMe: Version 0.8 NVMe: Set queue flags correctly NVMe: Simplify nvme_unmap_user_pages NVMe: Mark the end of the sg list NVMe: Fix DMA mapping for admin commands NVMe: Rename IO_TIMEOUT to NVME_IO_TIMEOUT NVMe: Merge the nvme_bio and nvme_prp data structures NVMe: Change nvme_completion_fn to take a dev NVMe: Change get_nvmeq to take a dev instead of a namespace NVMe: Simplify completion handling NVMe: Update Identify Controller data structure NVMe: Implement doorbell stride capability NVMe: Version 0.7 NVMe: Don't probe namespace 0 Fix calculation of number of pages in a PRP List NVMe: Create nvme_identify and nvme_get_features functions NVMe: Fix memory leak in nvme_dev_add() NVMe: Fix calls to dma_unmap_sg NVMe: Correct sg list setup in nvme_map_user_pages ...
2012-01-17Merge branch 'stable/for-linus-fixes-3.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/for-linus-fixes-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/balloon: Move the registration from device to subsystem.
2012-01-13module_param: make bool parameters really bool (drivers & misc)Rusty Russell
module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-12xen/pciback: Support pci_reset_function, aka FLR or D3 support.Konrad Rzeszutek Wilk
We use the __pci_reset_function_locked to perform the action. Also on attaching ("bind") and detaching ("unbind") we save and restore the configuration states. When the device is disconnected from a guest we use the "pci_reset_function" to also reset the device before being passed to another guest. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-01-12xen/balloon: Move the registration from device to subsystem.Konrad Rzeszutek Wilk
With git commit 070680218379e15c1901f4bf21b98e3cbf12b527 "xen-balloon: convert sysdev_class to a regular subsystem" we would end up with the attributes being put in: /sys/devices/xen_memory0/target_kb instead of /sys/devices/system/xen_memory/xen_memory0/target_kb Making the tools inable to deflate the kernel to make more space for launching another guest and printing: Error: Failed to query current memory allocation of dom0 Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Suggested-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>