summaryrefslogtreecommitdiff
path: root/drivers/hv
AgeCommit message (Collapse)Author
2018-04-20Drivers: hv: vmbus: do not mark HV_PCIE as perf_deviceDexuan Cui
commit 238064f13d057390a8c5e1a6a80f4f0a0ec46499 upstream. The pci-hyperv driver's channel callback hv_pci_onchannelcallback() is not really a hot path, so we don't need to mark it as a perf_device, meaning with this patch all HV_PCIE channels' target_cpu will be CPU0. Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: stable@vger.kernel.org Cc: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17x86/retpoline/hyperv: Convert assembler indirect jumpsDavid Woodhouse
commit e70e5892b28c18f517f29ab6e83bd57705104b31 upstream. Convert all indirect jumps in hyperv inline asm code to use non-speculative sequences when CONFIG_RETPOLINE is enabled. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515707194-20531-9-git-send-email-dwmw@amazon.co.uk [ backport to 4.9, hopefully correct, not tested... - gregkh ] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20Drivers: hv: util: move waiting for release to hv_utils_transport itselfVitaly Kuznetsov
[ Upstream commit e9c18ae6eb2b312f16c63e34b43ea23926daa398 ] Waiting for release_event in all three drivers introduced issues on release as on_reset() hook is not always called. E.g. if the device was never opened we will never get the completion. Move the waiting code to hvutil_transport_destroy() and make sure it is only called when the device is open. hvt->lock serialization should guarantee the absence of races. Fixes: 5a66fecbf6aa ("Drivers: hv: util: kvp: Fix a rescind processing issue") Fixes: 20951c7535b5 ("Drivers: hv: util: Fcopy: Fix a rescind processing issue") Fixes: d77044d142e9 ("Drivers: hv: util: Backup: Fix a rescind processing issue") Reported-by: Dexuan Cui <decui@microsoft.com> Tested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-12Drivers: hv: fcopy: restore correct transfer lengthOlaf Hering
commit 549e658a0919e355a2b2144dc380b3729bef7f3e upstream. Till recently the expected length of bytes read by the daemon did depend on the context. It was either hv_start_fcopy or hv_do_fcopy. The daemon had a buffer size of two pages, which was much larger than needed. Now the expected length of bytes read by the daemon changed slightly. For START_FILE_COPY it is still the size of hv_start_fcopy. But for WRITE_TO_FILE and the other operations it is as large as the buffer that arrived via vmbus. In case of WRITE_TO_FILE that is slightly larger than a struct hv_do_fcopy. Since the buffer in the daemon was still larger everything was fine. Currently, the daemon reads only what is actually needed. The new buffer layout is as large as a struct hv_do_fcopy, for the WRITE_TO_FILE operation. Since the kernel expects a slightly larger size, hvt_op_read will return -EINVAL because the daemon will read slightly less than expected. Address this by restoring the expected buffer size in case of WRITE_TO_FILE. Fixes: 'c7e490fc23eb ("Drivers: hv: fcopy: convert to hv_utils_transport")' Fixes: '3f2baa8a7d2e ("Tools: hv: update buffer handling in hv_fcopy_daemon")' Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30Drivers: hv: vmbus: Don't leak memory when a channel is rescindedK. Y. Srinivasan
commit 5e030d5ce9d99a899b648413139ff65bab12b038 upstream. When we close a channel that has been rescinded, we will leak memory since vmbus_teardown_gpadl() returns an error. Fix this so that we can properly cleanup the memory allocated to the ring buffers. Fixes: ccb61f8a99e6 ("Drivers: hv: vmbus: Fix a rescind handling bug") Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: Dexuan Cui <decui@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30Drivers: hv: vmbus: Don't leak channel idsK. Y. Srinivasan
commit 9a5476020a5f06a0fc6f17097efc80275d2f03cd upstream. If we cannot allocate memory for the channel, free the relid associated with the channel. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15drivers: hv: Turn off write permission on the hypercall pageK. Y. Srinivasan
commit 372b1e91343e657a7cc5e2e2bcecd5140ac28119 upstream. The hypercall page only needs to be executable but currently it is setup to be writable as well. Fix the issue. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Acked-by: Kees Cook <keescook@chromium.org> Reported-by: Stephen Hemminger <stephen@networkplumber.org> Tested-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12Drivers: hv: util: Backup: Fix a rescind processing issueK. Y. Srinivasan
commit d77044d142e960f7b5f814a91ecb8bcf86aa552c upstream. VSS may use a char device to support the communication between the user level daemon and the driver. When the VSS channel is rescinded we need to make sure that the char device is fully cleaned up before we can process a new VSS offer from the host. Implement this logic. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12Drivers: hv: util: Fcopy: Fix a rescind processing issueK. Y. Srinivasan
commit 20951c7535b5e6af46bc37b7142105f716df739c upstream. Fcopy may use a char device to support the communication between the user level daemon and the driver. When the Fcopy channel is rescinded we need to make sure that the char device is fully cleaned up before we can process a new Fcopy offer from the host. Implement this logic. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12Drivers: hv: util: kvp: Fix a rescind processing issueK. Y. Srinivasan
commit 5a66fecbf6aa528e375cbebccb1061cc58d80c84 upstream. KVP may use a char device to support the communication between the user level daemon and the driver. When the KVP channel is rescinded we need to make sure that the char device is fully cleaned up before we can process a new KVP offer from the host. Implement this logic. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12Drivers: hv: vmbus: Fix a rescind handling bugK. Y. Srinivasan
commit ccb61f8a99e6c29df4fb96a65dad4fad740d5be9 upstream. The host can rescind a channel that has been offered to the guest and once the channel is rescinded, the host does not respond to any requests on that channel. Deal with the case where the guest may be blocked waiting for a response from the host. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12Drivers: hv: vmbus: Prevent sending data on a rescinded channelK. Y. Srinivasan
commit e7e97dd8b77ee7366f2f8c70a033bf5fa05ec2e0 upstream. After the channel is rescinded, the host does not read from the rescinded channel. Fail writes to a channel that has already been rescinded. If we permit writes on a rescinded channel, since the host will not respond we will have situations where we will be unable to unload vmbus drivers that cannot have any outstanding requests to the host at the point they are unoaded. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12hv: don't reset hv_context.tsc_page on crashVitaly Kuznetsov
commit 56ef6718a1d8d77745033c5291e025ce18504159 upstream. It may happen that secondary CPUs are still alive and resetting hv_context.tsc_page will cause a consequent crash in read_hv_clock_tsc() as we don't check for it being not NULL there. It is safe as we're not freeing this page anyways. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12hv: init percpu_list in hv_synic_alloc()Vitaly Kuznetsov
commit 3c7630d35009e6635e5b58d62de554fd5b6db5df upstream. Initializing hv_context.percpu_list in hv_synic_alloc() helps to prevent a crash in percpu_channel_enq() when not all CPUs were online during initialization and it naturally belongs there. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12hv: allocate synic pages for all present CPUsVitaly Kuznetsov
commit 421b8f20d3c381b215f988b42428f56fc3b82405 upstream. It may happen that not all CPUs are online when we do hv_synic_alloc() and in case more CPUs come online later we may try accessing these allocated structures. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12Drivers: hv: vmbus: Raise retry/wait limits in vmbus_post_msg()Vitaly Kuznetsov
commit c0bb03924f1a80e7f65900e36c8e6b3dc167c5f8 upstream. DoS protection conditions were altered in WS2016 and now it's easy to get -EAGAIN returned from vmbus_post_msg() (e.g. when we try changing MTU on a netvsc device in a loop). All vmbus_post_msg() callers don't retry the operation and we usually end up with a non-functional device or crash. While host's DoS protection conditions are unknown to me my tests show that it can take up to 10 seconds before the message is sent so doing udelay() is not an option, we really need to sleep. Almost all vmbus_post_msg() callers are ready to sleep but there is one special case: vmbus_initiate_unload() which can be called from interrupt/NMI context and we can't sleep there. I'm also not sure about the lonely vmbus_send_tl_connect_request() which has no in-tree users but its external users are most likely waiting for the host to reply so sleeping there is also appropriate. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-14Drivers: hv: vmbus: finally fix hv_need_to_signal_on_read()Dexuan Cui
commit 433e19cf33d34bb6751c874a9c00980552fe508c upstream. Commit a389fcfd2cb5 ("Drivers: hv: vmbus: Fix signaling logic in hv_need_to_signal_on_read()") added the proper mb(), but removed the test "prev_write_sz < pending_sz" when making the signal decision. As a result, the guest can signal the host unnecessarily, and then the host can throttle the guest because the host thinks the guest is buggy or malicious; finally the user running stress test can perceive intermittent freeze of the guest. This patch brings back the test, and properly handles the in-place consumption APIs used by NetVSC (see get_next_pkt_raw(), put_pkt_raw() and commit_rd_index()). Fixes: a389fcfd2cb5 ("Drivers: hv: vmbus: Fix signaling logic in hv_need_to_signal_on_read()") Signed-off-by: Dexuan Cui <decui@microsoft.com> Reported-by: Rolf Neugebauer <rolf.neugebauer@docker.com> Tested-by: Rolf Neugebauer <rolf.neugebauer@docker.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: Rolf Neugebauer <rolf.neugebauer@docker.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-14Drivers: hv: vmbus: On the read path cleanup the logic to interrupt the hostK. Y. Srinivasan
commit 3372592a140db69fd63837e81f048ab4abf8111e upstream. Signal the host when we determine the host is to be signaled - on th read path. The currrent code determines the need to signal in the ringbuffer code and actually issues the signal elsewhere. This can result in the host viewing this interrupt as spurious since the host may also poll the channel. Make the necessary adjustments. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: Rolf Neugebauer <rolf.neugebauer@docker.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-14Drivers: hv: vmbus: On write cleanup the logic to interrupt the hostK. Y. Srinivasan
commit 1f6ee4e7d83586c8b10bd4f2f4346353d04ce884 upstream. Signal the host when we determine the host is to be signaled. The currrent code determines the need to signal in the ringbuffer code and actually issues the signal elsewhere. This can result in the host viewing this interrupt as spurious since the host may also poll the channel. Make the necessary adjustments. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: Rolf Neugebauer <rolf.neugebauer@docker.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-14Drivers: hv: vmbus: Base host signaling strictly on the ring stateK. Y. Srinivasan
commit 74198eb4a42c4a3c4fbef08fa01a291a282f7c2e upstream. One of the factors that can result in the host concluding that a given guest in mounting a DOS attack is if the guest generates interrupts to the host when the host is not expecting it. If these "spurious" interrupts reach a certain rate, the host can throttle the guest to minimize the impact. The host computation of the "expected number of interrupts" is strictly based on the ring transitions. Until the host logic is fixed, base the guest logic to interrupt solely on the ring state. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: Rolf Neugebauer <rolf.neugebauer@docker.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09hv: acquire vmbus_connection.channel_mutex in vmbus_free_channels()Vitaly Kuznetsov
commit abd1026da4a7700a8db370947f75cd17b6ae6f76 upstream. "kernel BUG at drivers/hv/channel_mgmt.c:350!" is observed when hv_vmbus module is unloaded. BUG_ON() was introduced in commit 85d9aa705184 ("Drivers: hv: vmbus: add an API vmbus_hvsock_device_unregister()") as vmbus_free_channels() codepath was apparently forgotten. Fixes: 85d9aa705184 ("Drivers: hv: vmbus: add an API vmbus_hvsock_device_unregister()") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-01vmbus: make sysfs names consistent with PCIStephen Hemminger
In commit 9a56e5d6a0ba ("Drivers: hv: make VMBus bus ids persistent") the name of vmbus devices in sysfs changed to be (in 4.9-rc1): /sys/bus/vmbus/vmbus-6aebe374-9ba0-11e6-933c-00259086b36b The prefix ("vmbus-") is redundant and differs from how PCI is represented in sysfs. Therefore simplify to: /sys/bus/vmbus/6aebe374-9ba0-11e6-933c-00259086b36b Please merge this before 4.9 is released and the old format has to live forever. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-25hv: do not lose pending heartbeat vmbus packetsLong Li
The host keeps sending heartbeat packets independent of the guest responding to them. Even though we respond to the heartbeat messages at interrupt level, we can have situations where there maybe multiple heartbeat messages pending that have not been responded to. For instance this occurs when the VM is paused and the host continues to send the heartbeat messages. Address this issue by draining and responding to all the heartbeat messages that maybe pending. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-27Drivers: hv: get rid of id in struct vmbus_channelVitaly Kuznetsov
The auto incremented counter is not being used anymore, get rid of it. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-27Drivers: hv: make VMBus bus ids persistentVitaly Kuznetsov
Some tools use bus ids to identify devices and they count on the fact that these ids are persistent across reboot. This may be not true for VMBus as we use auto incremented counter from alloc_channel() as such id. Switch to using if_instance from channel offer, this id is supposed to be persistent. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-09Drivers: hv: hv_util: Avoid dynamic allocation in time synchVivek yadav
Under stress, we have seen allocation failure in time synch code. Avoid this dynamic allocation. Signed-off-by: Vivek Yadav <vyadav@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-08Drivers: hv: utils: Support TimeSync version 4.0 protocol samples.Alex Ng
This enables support for more accurate TimeSync v4 samples when hosted under Windows Server 2016 and newer hosts. The new time samples include a "vmreferencetime" field that represents the guest's TSC value when the host generated its time sample. This value lets the guest calculate the latency in receiving the time sample. The latency is added to the sample host time prior to updating the clock. Signed-off-by: Alex Ng <alexng@messages.microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-08Drivers: hv: utils: Use TimeSync samples to adjust the clock after boot.Alex Ng
Only the first 50 samples after boot were being used to discipline the clock. After the first 50 samples, any samples from the host were ignored and the guest clock would eventually drift from the host clock. This patch allows TimeSync-enabled guests to continuously synchronize the clock with the host clock, even after the first 50 samples. Signed-off-by: Alex Ng <alexng@messages.microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-08Drivers: hv: utils: Rename version definitions to reflect protocol version.Alex Ng
Different Windows host versions may reuse the same protocol version when negotiating the TimeSync, Shutdown, and Heartbeat protocols. We should only refer to the protocol version to avoid conflating the two concepts. Signed-off-by: Alex Ng <alexng@messages.microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-07Drivers: hv: vmbus: suppress some "hv_vmbus: Unknown GUID" warningsDexuan Cui
Some VMBus devices are not needed by Linux guest[1][2], and, VMBus channels of Hyper-V Sockets don't really mean usual synthetic devices, so let's suppress the warnings for them. [1] https://support.microsoft.com/en-us/kb/2925727 [2] https://msdn.microsoft.com/en-us/library/jj980180(v=winembedded.81).aspx Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-07Driver: hv: vmbus: Make mmio resource localStephen Hemminger
This fixes a sparse warning because hyperv_mmio resources are only used in this one file and should be static. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-02Drivers: hv: utils: Check VSS daemon is listening before a hot backupAlex Ng
Hyper-V host will send a VSS_OP_HOT_BACKUP request to check if guest is ready for a live backup/snapshot. The driver should respond to the check only if the daemon is running and listening to requests. This allows the host to fallback to standard snapshots in case the VSS daemon is not running. Signed-off-by: Alex Ng <alexng@messages.microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-02Drivers: hv: utils: Continue to poll VSS channel after handling requests.Alex Ng
Multiple VSS_OP_HOT_BACKUP requests may arrive in quick succession, even though the host only signals once. The driver wass handling the first request while ignoring the others in the ring buffer. We should poll the VSS channel after handling a request to continue processing other requests. Signed-off-by: Alex Ng <alexng@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-02Drivers: hv: Introduce a policy for controlling channel affinityK. Y. Srinivasan
Introduce a mechanism to control how channels will be affinitized. We will support two policies: 1. HV_BALANCED: All performance critical channels will be dstributed evenly amongst all the available NUMA nodes. Once the Node is assigned, we will assign the CPU based on a simple round robin scheme. 2. HV_LOCALIZED: Only the primary channels are distributed across all NUMA nodes. Sub-channels will be in the same NUMA node as the primary channel. This is the current behaviour. The default policy will be the HV_BALANCED as it can minimize the remote memory access on NUMA machines with applications that span NUMA nodes. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-02Drivers: hv: ring_buffer: use wrap around mappings in hv_copy{from, ↵Vitaly Kuznetsov
to}_ringbuffer() With wrap around mappings for ring buffers we can always use a single memcpy() to do the job. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Tested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-02Drivers: hv: ring_buffer: wrap around mappings for ring buffersVitaly Kuznetsov
Make it possible to always use a single memcpy() or to provide a direct link to a packet on the ring buffer by creating virtual mapping for two copies of the ring buffer with vmap(). Utilize currently empty hv_ringbuffer_cleanup() to do the unmap. While on it, replace sizeof(struct hv_ring_buffer) check in hv_ringbuffer_init() with BUILD_BUG_ON() as it is a compile time check. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-02Drivers: hv: cleanup vmbus_open() for wrap around mappingsVitaly Kuznetsov
In preparation for doing wrap around mappings for ring buffers cleanup vmbus_open() function: - check that ring sizes are PAGE_SIZE aligned (they are for all in-kernel drivers now); - kfree(open_info) on error only after we kzalloc() it (not an issue as it is valid to call kfree(NULL); - rename poorly named labels; - use alloc_pages() instead of __get_free_pages() as we need struct page pointer for future. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31Drivers: hv: balloon: Use available memory value in pressure reportAlex Ng
Reports for available memory should use the si_mem_available() value. The previous freeram value does not include available page cache memory. Signed-off-by: Alex Ng <alexng@messages.microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31Drivers: hv: balloon: replace ha_region_mutex with spinlockVitaly Kuznetsov
lockdep reports possible circular locking dependency when udev is used for memory onlining: systemd-udevd/3996 is trying to acquire lock: ((memory_chain).rwsem){++++.+}, at: [<ffffffff810d137e>] __blocking_notifier_call_chain+0x4e/0xc0 but task is already holding lock: (&dm_device.ha_region_mutex){+.+.+.}, at: [<ffffffffa015382e>] hv_memory_notifier+0x5e/0xc0 [hv_balloon] ... which is probably a false positive because we take and release ha_region_mutex from memory notifier chain depending on the arg. No real deadlocks were reported so far (though I'm not really sure about preemptible kernels...) but we don't really need to hold the mutex for so long. We use it to protect ha_region_list (and its members) and the num_pages_onlined counter. None of these operations require us to sleep and nothing is slow, switch to using spinlock with interrupts disabled. While on it, replace list_for_each -> list_for_each_entry as we actually need entries in all these cases, drop meaningless list_empty() checks. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31Drivers: hv: balloon: don't wait for ol_waitevent when memhp_auto_online is ↵Vitaly Kuznetsov
enabled With the recently introduced in-kernel memory onlining (MEMORY_HOTPLUG_DEFAULT_ONLINE) these is no point in waiting for pages to come online in the driver and we can get rid of the waiting. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31Drivers: hv: balloon: account for gaps in hot add regionsVitaly Kuznetsov
I'm observing the following hot add requests from the WS2012 host: hot_add_req: start_pfn = 0x108200 count = 330752 hot_add_req: start_pfn = 0x158e00 count = 193536 hot_add_req: start_pfn = 0x188400 count = 239616 As the host doesn't specify hot add regions we're trying to create 128Mb-aligned region covering the first request, we create the 0x108000 - 0x160000 region and we add 0x108000 - 0x158e00 memory. The second request passes the pfn_covered() check, we enlarge the region to 0x108000 - 0x190000 and add 0x158e00 - 0x188200 memory. The problem emerges with the third request as it starts at 0x188400 so there is a 0x200 gap which is not covered. As the end of our region is 0x190000 now it again passes the pfn_covered() check were we just adjust the covered_end_pfn and make it 0x188400 instead of 0x188200 which means that we'll try to online 0x188200-0x188400 pages but these pages were never assigned to us and we crash. We can't react to such requests by creating new hot add regions as it may happen that the whole suggested range falls into the previously identified 128Mb-aligned area so we'll end up adding nothing or create intersecting regions and our current logic doesn't allow that. Instead, create a list of such 'gaps' and check for them in the page online callback. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31Drivers: hv: balloon: keep track of where ha_region startsVitaly Kuznetsov
Windows 2012 (non-R2) does not specify hot add region in hot add requests and the logic in hot_add_req() is trying to find a 128Mb-aligned region covering the request. It may also happen that host's requests are not 128Mb aligned and the created ha_region will start before the first specified PFN. We can't online these non-present pages but we don't remember the real start of the region. This is a regression introduced by the commit 5abbbb75d733 ("Drivers: hv: hv_balloon: don't lose memory when onlining order is not natural"). While the idea of keeping the 'moving window' was wrong (as there is no guarantee that hot add requests come ordered) we should still keep track of covered_start_pfn. This is not a revert, the logic is different. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31Drivers: hv: vmbus: Implement a mechanism to tag the channel for low latencyK. Y. Srinivasan
On Hyper-V, performance critical channels use the monitor mechanism to signal the host when the guest posts mesages for the host. This mechanism minimizes the hypervisor intercepts and also makes the host more efficient in that each time the host is woken up, it processes a batch of messages as opposed to just one. The goal here is improve the throughput and this is at the expense of increased latency. Implement a mechanism to let the client driver decide if latency is important. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31Drivers: hv: vmbus: Reduce the delay between retries in vmbus_post_msg()K. Y. Srinivasan
The current delay between retries is unnecessarily high and is negatively affecting the time it takes to boot the system. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31Drivers: hv: vmbus: Enable explicit signaling policy for NIC channelsK. Y. Srinivasan
For synthetic NIC channels, enable explicit signaling policy as netvsc wants to explicitly control when the host is to be signaled. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31Drivers: hv: vmbus: fix the race when querying & updating the percpu listDexuan Cui
There is a rare race when we remove an entry from the global list hv_context.percpu_list[cpu] in hv_process_channel_removal() -> percpu_channel_deq() -> list_del(): at this time, if vmbus_on_event() -> process_chn_event() -> pcpu_relid2channel() is trying to query the list, we can get the kernel fault. Similarly, we also have the issue in the code path: vmbus_process_offer() -> percpu_channel_enq(). We can resolve the issue by disabling the tasklet when updating the list. The patch also moves vmbus_release_relid() to a later place where the channel has been removed from the per-cpu and the global lists. Reported-by: Rolf Neugebauer <rolf.neugebauer@docker.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31Drivers: hv: utils: fix a race on userspace daemons registrationVitaly Kuznetsov
Background: userspace daemons registration protocol for Hyper-V utilities drivers has two steps: 1) daemon writes its own version to kernel 2) kernel reads it and replies with module version at this point we consider the handshake procedure being completed and we do hv_poll_channel() transitioning the utility device to HVUTIL_READY state. At this point we're ready to handle messages from kernel. When hvutil_transport is in HVUTIL_TRANSPORT_CHARDEV mode we have a single buffer for outgoing message. hvutil_transport_send() puts to this buffer and till the buffer is cleared with hvt_op_read() returns -EFAULT to all consequent calls. Host<->guest protocol guarantees there is no more than one request at a time and we will not get new requests till we reply to the previous one so this single message buffer is enough. Now to the race. When we finish negotiation procedure and send kernel module version to userspace with hvutil_transport_send() it goes into the above mentioned buffer and if the daemon is slow enough to read it from there we can get a collision when a request from the host comes, we won't be able to put anything to the buffer so the request will be lost. To solve the issue we need to know when the negotiation is really done (when the version message is read by the daemon) and transition to HVUTIL_READY state after this happens. Implement a callback on read to support this. Old style netlink communication is not affected by the change, we don't really know when these messages are delivered but we don't have a single message buffer there. Reported-by: Barry Davis <barry_davis@stormagic.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31Drivers: hv: get rid of timeout in vmbus_open()Vitaly Kuznetsov
vmbus_teardown_gpadl() can result in infinite wait when it is called on 5 second timeout in vmbus_open(). The issue is caused by the fact that gpadl teardown operation won't ever succeed for an opened channel and the timeout isn't always enough. As a guest, we can always trust the host to respond to our request (and there is nothing we can do if it doesn't). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31Drivers: hv: don't leak memory in vmbus_establish_gpadl()Vitaly Kuznetsov
In some cases create_gpadl_header() allocates submessages but we never free them. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31Drivers: hv: get rid of redundant messagecount in create_gpadl_header()Vitaly Kuznetsov
We use messagecount only once in vmbus_establish_gpadl() to check if it is safe to iterate through the submsglist. We can just initialize the list header in all cases in create_gpadl_header() instead. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>