summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/nouveau
AgeCommit message (Collapse)Author
2019-06-11drm/nouveau: add kconfig option to turn off nouveau legacy contexts. (v3)Dave Airlie
commit b30a43ac7132cdda833ac4b13dd1ebd35ace14b7 upstream. There was a nouveau DDX that relied on legacy context ioctls to work, but we fixed it years ago, give distros that have a modern DDX the option to break the uAPI and close the mess of holes that legacy context support is. Full context of the story: commit 0e975980d435d58df2d430d688b8c18778b42218 Author: Peter Antoine <peter.antoine@intel.com> Date: Tue Jun 23 08:18:49 2015 +0100 drm: Turn off Legacy Context Functions The context functions are not used by the i915 driver and should not be used by modeset drivers. These driver functions contain several bugs and security holes. This change makes these functions optional can be turned on by a setting, they are turned off by default for modeset driver with the exception of the nouvea driver that may require them with an old version of libdrm. The previous attempt was commit 7c510133d93dd6f15ca040733ba7b2891ed61fd1 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Thu Aug 8 15:41:21 2013 +0200 drm: mark context support as a legacy subsystem but this had to be reverted commit c21eb21cb50d58e7cbdcb8b9e7ff68b85cfa5095 Author: Dave Airlie <airlied@redhat.com> Date: Fri Sep 20 08:32:59 2013 +1000 Revert "drm: mark context support as a legacy subsystem" v2: remove returns from void function, and formatting (Daniel Vetter) v3: - s/Nova/nouveau/ in the commit message, and add references to the previous attempts - drop the part touching the drm hw lock, that should be a separate patch. Signed-off-by: Peter Antoine <peter.antoine@intel.com> (v2) Cc: Peter Antoine <peter.antoine@intel.com> (v2) Reviewed-by: Peter Antoine <peter.antoine@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> v2: move DRM_VM dependency into legacy config. v3: fix missing dep (kbuild robot) Cc: stable@vger.kernel.org Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-09drm/nouveau/i2c: Disable i2c bus access after ->fini()Lyude Paul
commit 342406e4fbba9a174125fbfe6aeac3d64ef90f76 upstream. For a while, we've had the problem of i2c bus access not grabbing a runtime PM ref when it's being used in userspace by i2c-dev, resulting in nouveau spamming the kernel log with errors if anything attempts to access the i2c bus while the GPU is in runtime suspend. An example: [ 130.078386] nouveau 0000:01:00.0: i2c: aux 000d: begin idle timeout ffffffff Since the GPU is in runtime suspend, the MMIO region that the i2c bus is on isn't accessible. On x86, the standard behavior for accessing an unavailable MMIO region is to just return ~0. Except, that turned out to be a lie. While computers with a clean concious will return ~0 in this scenario, some machines will actually completely hang a CPU on certian bad MMIO accesses. This was witnessed with someone's Lenovo ThinkPad P50, where sensors-detect attempting to access the i2c bus while the GPU was suspended would result in a CPU hang: CPU: 5 PID: 12438 Comm: sensors-detect Not tainted 5.0.0-0.rc4.git3.1.fc30.x86_64 #1 Hardware name: LENOVO 20EQS64N17/20EQS64N17, BIOS N1EET74W (1.47 ) 11/21/2017 RIP: 0010:ioread32+0x2b/0x30 Code: 81 ff ff ff 03 00 77 20 48 81 ff 00 00 01 00 76 05 0f b7 d7 ed c3 48 c7 c6 e1 0c 36 96 e8 2d ff ff ff b8 ff ff ff ff c3 8b 07 <c3> 0f 1f 40 00 49 89 f0 48 81 fe ff ff 03 00 76 04 40 88 3e c3 48 RSP: 0018:ffffaac3c5007b48 EFLAGS: 00000292 ORIG_RAX: ffffffffffffff13 RAX: 0000000001111000 RBX: 0000000001111000 RCX: 0000043017a97186 RDX: 0000000000000aaa RSI: 0000000000000005 RDI: ffffaac3c400e4e4 RBP: ffff9e6443902c00 R08: ffffaac3c400e4e4 R09: ffffaac3c5007be7 R10: 0000000000000004 R11: 0000000000000001 R12: ffff9e6445dd0000 R13: 000000000000e4e4 R14: 00000000000003c4 R15: 0000000000000000 FS: 00007f253155a740(0000) GS:ffff9e644f600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005630d1500358 CR3: 0000000417c44006 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: g94_i2c_aux_xfer+0x326/0x850 [nouveau] nvkm_i2c_aux_i2c_xfer+0x9e/0x140 [nouveau] __i2c_transfer+0x14b/0x620 i2c_smbus_xfer_emulated+0x159/0x680 ? _raw_spin_unlock_irqrestore+0x1/0x60 ? rt_mutex_slowlock.constprop.0+0x13d/0x1e0 ? __lock_is_held+0x59/0xa0 __i2c_smbus_xfer+0x138/0x5a0 i2c_smbus_xfer+0x4f/0x80 i2cdev_ioctl_smbus+0x162/0x2d0 [i2c_dev] i2cdev_ioctl+0x1db/0x2c0 [i2c_dev] do_vfs_ioctl+0x408/0x750 ksys_ioctl+0x5e/0x90 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x60/0x1e0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f25317f546b Code: 0f 1e fa 48 8b 05 1d da 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed d9 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffc88caab68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00005630d0fe7260 RCX: 00007f25317f546b RDX: 00005630d1598e80 RSI: 0000000000000720 RDI: 0000000000000003 RBP: 00005630d155b968 R08: 0000000000000001 R09: 00005630d15a1da0 R10: 0000000000000070 R11: 0000000000000246 R12: 00005630d1598e80 R13: 00005630d12f3d28 R14: 0000000000000720 R15: 00005630d12f3ce0 watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [sensors-detect:12438] Yikes! While I wanted to try to make it so that accessing an i2c bus on nouveau would wake up the GPU as needed, airlied pointed out that pretty much any usecase for userspace accessing an i2c bus on a GPU (mainly for the DDC brightness control that some displays have) is going to only be useful while there's at least one display enabled on the GPU anyway, and the GPU never sleeps while there's displays running. Since teaching the i2c bus to wake up the GPU on userspace accesses is a good deal more difficult than it might seem, mostly due to the fact that we have to use the i2c bus during runtime resume of the GPU, we instead opt for the easiest solution: don't let userspace access i2c busses on the GPU at all while it's in runtime suspend. Changes since v1: * Also disable i2c busses that run over DP AUX Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-31drm/nouveau/bar/nv50: ensure BAR is mappedJon Derrick
[ Upstream commit f10b83de1fd49216a4c657816f48001437e4bdd5 ] If the BAR is zero size, it indicates it was never successfully mapped. Ensure that the BAR is valid during initialization before attempting to use it. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20drm/nouveau/volt/gf117: fix speedo readout registerIlia Mirkin
[ Upstream commit fc782242749fa4235592854fafe1a1297583c1fb ] GF117 appears to use the same register as GK104 (but still with the general Fermi readout mechanism). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108980 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20drm/nouveau/debugfs: Fix check of pm_runtime_get_sync failureYueHaibing
[ Upstream commit 909e9c9c428376e2a43d178ed4b0a2d5ba9cb7d3 ] pm_runtime_get_sync returns negative on failure. Fixes: eaeb9010bb4b ("drm/nouveau/debugfs: Wake up GPU before doing any reclocking") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-05drm/nouveau: Stop using drm_crtc_force_disableDaniel Vetter
[ Upstream commit 934c5b32a5e43d8de2ab4f1566f91d7c3bf8cb64 ] The correct way for legacy drivers to update properties that need to do a full modeset, is to do a full modeset. Note that we don't need to call the drm_mode_config_internal helper because we're not changing any of the refcounted paramters. v2: Fixup error handling (Ville). Since the old code didn't bother I decided to just delete it instead of adding even more code for just error handling. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1) Cc: Sean Paul <seanpaul@chromium.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181217194303.14397-2-daniel.vetter@ffwll.ch Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-20drm/nouveau/falcon: avoid touching registers if engine is offIlia Mirkin
[ Upstream commit a5176a4cb85bb6213daadf691097cf411da35df2 ] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108980 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-20drm/nouveau: Don't disable polling in fallback modeTakashi Iwai
[ Upstream commit 118780066e30c34de3d9349710b51780bfa0ba83 ] When a fan is controlled via linear fallback without cstate, we shouldn't stop polling. Otherwise it won't be adjusted again and keeps running at an initial crazy pace. Fixes: 800efb4c2857 ("drm/nouveau/drm/therm/fan: add a fallback if no fan control is specified in the vbios") Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1103356 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107447 Reported-by: Thomas Blume <thomas.blume@suse.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Martin Peres <martin.peres@free.fr> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-13drm/nouveau/drm/nouveau: Check rc from drm_dp_mst_topology_mgr_resume()Lyude Paul
commit b89fdf7ae8500feae1100d8b283176a44d31d698 upstream. We need to actually make sure we check this on resume since otherwise we won't know whether or not the topology is still there once we've resumed, which will cause us to still think the topology is connected even after it's been removed if the removal happens mid-suspend. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-19drm/nouveau/kms/nv50-: also flush fb writes when rewinding push bufferBen Skeggs
commit 970a5ee41c72df46e3b0f307528c7d8ef7734a2e upstream. Should hopefully fix a regression some people have been seeing since EVO push buffers were moved to VRAM by default on Pascal GPUs. Fixes: d00ddd9da ("drm/nouveau/kms/nv50-: allocate push buffers in vidmem on pascal") Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Cc: <stable@vger.kernel.org> # 4.19+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-19drm/nouveau/kms: Fix memory leak in nv50_mstm_del()Lyude Paul
commit 24199c5436f267399afed0c4f1f57663c0408f57 upstream. Noticed this while working on redoing the reference counting scheme in the DP MST helpers. Nouveau doesn't attempt to call drm_dp_mst_topology_mgr_destroy() at all, which leaves it leaking all of the resources for drm_dp_mst_topology_mgr and it's children mstbs+ports. Fixes: f479c0ba4a17 ("drm/nouveau/kms/nv50: initial support for DP 1.2 multi-stream") Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: <stable@vger.kernel.org> # v4.10+ Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21drm/nouveau: Fix nv50_mstc->best_encoder()Lyude Paul
commit 7b0f61e91b6056c71649efa3204112a4b6cf5fc8 upstream. As mentioned in the previous commit, we currently prevent new modesets on recently-removed MST connectors by returning no encoder from our ->best_encoder() callback once the MST port has disappeared. This is wrong however, because it prevents legacy modesetting users from being able to disable CRTCs on MST connectors after the connector's respective topology has disappeared. So, fix this by instead by just always returning a valid encoder. Changes since v2: - Remove usage of atomic MST helper for now, since that got replaced with a much simpler solution Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Ben Skeggs <bskeggs@redhat.com> Cc: stable@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20181008232437.5571-3-lyude@redhat.com (cherry picked from commit e87b0bbc9f0380d403f8f2f6abba0d51c74d944f) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21drm/nouveau: Check backlight IDs are >= 0, not > 0Lyude Paul
commit dc854914999d5d52ac1b31740cb0ea8d89d0372e upstream. Remember, ida IDs start at 0, not 1! Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21drm/nouveau/secboot/acr: fix memory leakGustavo A. R. Silva
[ Upstream commit 74a07c0a59fa372b069d879971ba4d9e341979cf ] In case memory resources for *bl_desc* were allocated, release them before return. Addresses-Coverity-ID: 1472021 ("Resource leak") Fixes: 0d466901552a ("drm/nouveau/secboot/acr: Remove VLA usage") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05drm/nouveau/drm/nouveau: Grab runtime PM ref in nv50_mstc_detect()Lyude Paul
While we currently grab a runtime PM ref in nouveau's normal connector detection code, we apparently don't do this for MST. This means if we're in a scenario where the GPU is suspended and userspace attempts to do a connector probe on an MSTC connector, the probe will fail entirely due to the DP aux channel and GPU not being woken up: [ 316.633489] nouveau 0000:01:00.0: i2c: aux 000a: begin idle timeout ffffffff [ 316.635713] nouveau 0000:01:00.0: i2c: aux 000a: begin idle timeout ffffffff [ 316.637785] nouveau 0000:01:00.0: i2c: aux 000a: begin idle timeout ffffffff ... So, grab a runtime PM ref here. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Karol Herbst <kherbst@redhat.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-13drm/nouveau/devinit: fix warning when PMU/PRE_OS is missingBen Skeggs
Messed up when sending pull request and sent an outdated version of previous patch, this fixes it up to remove warnings. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau/disp/gm200-: enforce identity-mapped SOR assignment for LVDS/eDP ↵Ben Skeggs
panels Fixes eDP backlight issues on more recent laptops. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau/disp: fix DP disable raceBen Skeggs
If a HPD pulse signalling the need to retrain the link occurs between the KMS driver releasing the output and the supervisor interrupt that finishes the teardown, it was possible get a NULL-ptr deref. Avoid this by marking the link as inactive earlier. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau/disp: move eDP panel power handlingBen Skeggs
We need to do this earlier to prevent aux channel timeouts in resume paths on certain systems. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau/disp: remove unused struct memberBen Skeggs
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau/TBDdevinit: don't fail when PMU/PRE_OS is missing from VBIOSBen Skeggs
This Falcon application doesn't appear to be present on some newer systems, so let's not fail init if we can't find it. TBD: is there a way to determine whether it *should* be there? Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau/mmu: don't attempt to dereference vmm without valid instance pointerBen Skeggs
Fixes oopses in certain failure paths. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau: fix oops in client init failure pathBen Skeggs
The NV_ERROR macro requires drm->client to be initialised, which it may not be at this stage of the init process. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau: Fix nouveau_connector_ddc_detect()Lyude Paul
It looks like that when we moved over to using drm_connector_for_each_possible_encoder() in nouveau, that one rather important part of this function got dropped by accident: /* Right v here */ for (i = 0; nv_encoder = NULL, i < DRM_CONNECTOR_MAX_ENCODER; i++) { int id = connector->encoder_ids[i]; if (id == 0) break; Since it's rather difficult to notice: the conditional in this loop is actually: nv_encoder = NULL, i < DRM_CONNECTOR_MAX_ENCODER Meaning that all early breaks result in nv_encoder keeping it's value, otherwise nv_encoder = NULL. Ugh. Since this got dropped, nouveau_connector_ddc_detect() now returns an encoder for every single connector, regardless of whether or not it's detected: [ 1780.056185] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for DP-2 So: fix this to ensure we only return an encoder if we actually found one, and clean up the rest of the function while we're at it since it's nearly impossible to read properly. Changes since v1: - Don't skip ddc probing for LVDS if we can't switch DDC through vga-switcheroo, just do the DDC probing without calling vga_switcheroo_lock_ddc() - skeggsb Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Fixes: ddba766dd07e ("drm/nouveau: Use drm_connector_for_each_possible_encoder()") Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau/drm/nouveau: Don't forget to cancel hpd_work on suspend/unloadLyude Paul
Currently, there's nothing in nouveau that actually cancels this work struct. So, cancel it on suspend/unload. Otherwise, if we're unlucky enough hpd_work might try to keep running up until the system is suspended. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau/drm/nouveau: Prevent handling ACPI HPD events too earlyLyude Paul
On most systems with ACPI hotplugging support, it seems that we always receive a hotplug event once we re-enable EC interrupts even if the GPU hasn't even been resumed yet. This can cause problems since even though we schedule hpd_work to handle connector reprobing for us, hpd_work synchronizes on pm_runtime_get_sync() to wait until the device is ready to perform reprobing. Since runtime suspend/resume callbacks are disabled before the PM core calls ->suspend(), any calls to pm_runtime_get_sync() during this period will grab a runtime PM ref and return immediately with -EACCES. Because we schedule hpd_work from our ACPI HPD handler, and hpd_work synchronizes on pm_runtime_get_sync(), this causes us to launch a connector reprobe immediately even if the GPU isn't actually resumed just yet. This causes various warnings in dmesg and occasionally, also prevents some displays connected to the dedicated GPU from coming back up after suspend. Example: usb 1-4: USB disconnect, device number 14 usb 1-4.1: USB disconnect, device number 15 WARNING: CPU: 0 PID: 838 at drivers/gpu/drm/nouveau/include/nvkm/subdev/i2c.h:170 nouveau_dp_detect+0x17e/0x370 [nouveau] CPU: 0 PID: 838 Comm: kworker/0:6 Not tainted 4.17.14-201.Lyude.bz1477182.V3.fc28.x86_64 #1 Hardware name: LENOVO 20EQS64N00/20EQS64N00, BIOS N1EET77W (1.50 ) 03/28/2018 Workqueue: events nouveau_display_hpd_work [nouveau] RIP: 0010:nouveau_dp_detect+0x17e/0x370 [nouveau] RSP: 0018:ffffa15143933cf0 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff8cb4f656c400 RCX: 0000000000000000 RDX: ffffa1514500e4e4 RSI: ffffa1514500e4e4 RDI: 0000000001009002 RBP: ffff8cb4f4a8a800 R08: ffffa15143933cfd R09: ffffa15143933cfc R10: 0000000000000000 R11: 0000000000000000 R12: ffff8cb4fb57a000 R13: ffff8cb4fb57a000 R14: ffff8cb4f4a8f800 R15: ffff8cb4f656c418 FS: 0000000000000000(0000) GS:ffff8cb51f400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f78ec938000 CR3: 000000073720a003 CR4: 00000000003606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? _cond_resched+0x15/0x30 nouveau_connector_detect+0x2ce/0x520 [nouveau] ? _cond_resched+0x15/0x30 ? ww_mutex_lock+0x12/0x40 drm_helper_probe_detect_ctx+0x8b/0xe0 [drm_kms_helper] drm_helper_hpd_irq_event+0xa8/0x120 [drm_kms_helper] nouveau_display_hpd_work+0x2a/0x60 [nouveau] process_one_work+0x187/0x340 worker_thread+0x2e/0x380 ? pwq_unbound_release_workfn+0xd0/0xd0 kthread+0x112/0x130 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x35/0x40 Code: 4c 8d 44 24 0d b9 00 05 00 00 48 89 ef ba 09 00 00 00 be 01 00 00 00 e8 e1 09 f8 ff 85 c0 0f 85 b2 01 00 00 80 7c 24 0c 03 74 02 <0f> 0b 48 89 ef e8 b8 07 f8 ff f6 05 51 1b c8 ff 02 0f 84 72 ff ---[ end trace 55d811b38fc8e71a ]--- So, to fix this we attempt to grab a runtime PM reference in the ACPI handler itself asynchronously. If the GPU is already awake (it will have normal hotplugging at this point) or runtime PM callbacks are currently disabled on the device, we drop our reference without updating the autosuspend delay. We only schedule connector reprobes when we successfully managed to queue up a resume request with our asynchronous PM ref. This also has the added benefit of preventing redundant connector reprobes from ACPI while the GPU is runtime resumed! Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Cc: Karol Herbst <kherbst@redhat.com> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1477182#c41 Signed-off-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau: Reset MST branching unit before enablingLyude Paul
When probing a new MST device, it's not safe to make any assumptions about it's current state. While most well mannered MST hubs will just disable the branching unit on hotplug disconnects, this isn't enough to save us from various other scenarios that might have resulted in something writing to the MST branching unit before we got control of it. This could happen if a previous probe we tried failed, if we're booting in kexec context and the hub is still in the state the last kernel put it in, etc. Luckily; there is no reason we can't just reset the branching unit every time we enable a new topology. So, fix this by resetting it on enabling new topologies to ensure that we always start off with a clean, unmodified topology state on MST sinks. This fixes occasional hard-lockups on my P50's laptop dock (e.g. AUX times out all DPCD trasactions) observed after multiple docks, undocks, and module reloads. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Cc: Karol Herbst <karolherbst@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau: Only write DP_MSTM_CTRL when neededLyude Paul
Currently, nouveau will re-write the DP_MSTM_CTRL register for an MST hub every time it receives a long HPD pulse on DP. This isn't actually necessary and additionally, has some unintended side effects. With the P50 I've got here, rewriting DP_MSTM_CTRL constantly seems to make it rather likely (1 out of 5 times usually) that bringing up MST with it's ThinkPad dock will fail and result in sideband messages timing out in the middle. Afterwards, successive probes don't manage to get the dock to communicate properly over MST sideband properly. Many times sideband message timeouts from MST hubs are indicative of either the source or the sink dropping an ESI event, which can cause DRM's perspective of the topology's current state to go out of sync with reality. While it's tough to really know for sure what's happening to the dock, using userspace tools to write to DP_MSTM_CTRL in the middle of the MST link probing process does appear to make things flaky. It's possible that when we write to DP_MSTM_CTRL, the function that gets triggered to respond in the dock's firmware temporarily puts it in a state where it might end up not reporting an ESI to the source, or ends up dropping a sideband message we sent it. So, to fix this we make it so that when probing an MST topology, we respect it's current state. If the dock's already enabled, we simply read DP_MSTM_CTRL and disable the topology if it's value is not what we expected. Otherwise, we perform the normal MST probing dance. We avoid taking any action except if the state of the MST topology actually changes. This fixes MST sideband message timeouts and detection failures on my P50 with its ThinkPad dock. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Cc: Karol Herbst <karolherbst@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau: Remove useless poll_enable() call in drm_load()Lyude Paul
Again, this doesn't do anything. drm_kms_helper_poll_enable() will have already been called in nouveau_display_init() Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau: Remove useless poll_disable() call in switcheroo_set_state()Lyude Paul
This won't do anything but potentially make us miss hotplugs. We already call drm_kms_helper_poll_disable() in nouveau_pmops_suspend()->nouveau_display_suspend()->nouveau_display_fini() Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau: Remove useless poll_enable() call in switcheroo_set_state()Lyude Paul
This doesn't do anything, drm_kms_helper_poll_enable() gets called in nouveau_pmops_resume()->nouveau_display_resume()->nouveau_display_init() already. Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau: Fix deadlocks in nouveau_connector_detect()Lyude Paul
When we disable hotplugging on the GPU, we need to be able to synchronize with each connector's hotplug interrupt handler before the interrupt is finally disabled. This can be a problem however, since nouveau_connector_detect() currently grabs a runtime power reference when handling connector probing. This will deadlock the runtime suspend handler like so: [ 861.480896] INFO: task kworker/0:2:61 blocked for more than 120 seconds. [ 861.483290] Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.485158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 861.486332] kworker/0:2 D 0 61 2 0x80000000 [ 861.487044] Workqueue: events nouveau_display_hpd_work [nouveau] [ 861.487737] Call Trace: [ 861.488394] __schedule+0x322/0xaf0 [ 861.489070] schedule+0x33/0x90 [ 861.489744] rpm_resume+0x19c/0x850 [ 861.490392] ? finish_wait+0x90/0x90 [ 861.491068] __pm_runtime_resume+0x4e/0x90 [ 861.491753] nouveau_display_hpd_work+0x22/0x60 [nouveau] [ 861.492416] process_one_work+0x231/0x620 [ 861.493068] worker_thread+0x44/0x3a0 [ 861.493722] kthread+0x12b/0x150 [ 861.494342] ? wq_pool_ids_show+0x140/0x140 [ 861.494991] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.495648] ret_from_fork+0x3a/0x50 [ 861.496304] INFO: task kworker/6:2:320 blocked for more than 120 seconds. [ 861.496968] Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.497654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 861.498341] kworker/6:2 D 0 320 2 0x80000080 [ 861.499045] Workqueue: pm pm_runtime_work [ 861.499739] Call Trace: [ 861.500428] __schedule+0x322/0xaf0 [ 861.501134] ? wait_for_completion+0x104/0x190 [ 861.501851] schedule+0x33/0x90 [ 861.502564] schedule_timeout+0x3a5/0x590 [ 861.503284] ? mark_held_locks+0x58/0x80 [ 861.503988] ? _raw_spin_unlock_irq+0x2c/0x40 [ 861.504710] ? wait_for_completion+0x104/0x190 [ 861.505417] ? trace_hardirqs_on_caller+0xf4/0x190 [ 861.506136] ? wait_for_completion+0x104/0x190 [ 861.506845] wait_for_completion+0x12c/0x190 [ 861.507555] ? wake_up_q+0x80/0x80 [ 861.508268] flush_work+0x1c9/0x280 [ 861.508990] ? flush_workqueue_prep_pwqs+0x1b0/0x1b0 [ 861.509735] nvif_notify_put+0xb1/0xc0 [nouveau] [ 861.510482] nouveau_display_fini+0xbd/0x170 [nouveau] [ 861.511241] nouveau_display_suspend+0x67/0x120 [nouveau] [ 861.511969] nouveau_do_suspend+0x5e/0x2d0 [nouveau] [ 861.512715] nouveau_pmops_runtime_suspend+0x47/0xb0 [nouveau] [ 861.513435] pci_pm_runtime_suspend+0x6b/0x180 [ 861.514165] ? pci_has_legacy_pm_support+0x70/0x70 [ 861.514897] __rpm_callback+0x7a/0x1d0 [ 861.515618] ? pci_has_legacy_pm_support+0x70/0x70 [ 861.516313] rpm_callback+0x24/0x80 [ 861.517027] ? pci_has_legacy_pm_support+0x70/0x70 [ 861.517741] rpm_suspend+0x142/0x6b0 [ 861.518449] pm_runtime_work+0x97/0xc0 [ 861.519144] process_one_work+0x231/0x620 [ 861.519831] worker_thread+0x44/0x3a0 [ 861.520522] kthread+0x12b/0x150 [ 861.521220] ? wq_pool_ids_show+0x140/0x140 [ 861.521925] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.522622] ret_from_fork+0x3a/0x50 [ 861.523299] INFO: task kworker/6:0:1329 blocked for more than 120 seconds. [ 861.523977] Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.524644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 861.525349] kworker/6:0 D 0 1329 2 0x80000000 [ 861.526073] Workqueue: events nvif_notify_work [nouveau] [ 861.526751] Call Trace: [ 861.527411] __schedule+0x322/0xaf0 [ 861.528089] schedule+0x33/0x90 [ 861.528758] rpm_resume+0x19c/0x850 [ 861.529399] ? finish_wait+0x90/0x90 [ 861.530073] __pm_runtime_resume+0x4e/0x90 [ 861.530798] nouveau_connector_detect+0x7e/0x510 [nouveau] [ 861.531459] ? ww_mutex_lock+0x47/0x80 [ 861.532097] ? ww_mutex_lock+0x47/0x80 [ 861.532819] ? drm_modeset_lock+0x88/0x130 [drm] [ 861.533481] drm_helper_probe_detect_ctx+0xa0/0x100 [drm_kms_helper] [ 861.534127] drm_helper_hpd_irq_event+0xa4/0x120 [drm_kms_helper] [ 861.534940] nouveau_connector_hotplug+0x98/0x120 [nouveau] [ 861.535556] nvif_notify_work+0x2d/0xb0 [nouveau] [ 861.536221] process_one_work+0x231/0x620 [ 861.536994] worker_thread+0x44/0x3a0 [ 861.537757] kthread+0x12b/0x150 [ 861.538463] ? wq_pool_ids_show+0x140/0x140 [ 861.539102] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.539815] ret_from_fork+0x3a/0x50 [ 861.540521] Showing all locks held in the system: [ 861.541696] 2 locks held by kworker/0:2/61: [ 861.542406] #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.543071] #1: 0000000076868126 ((work_completion)(&drm->hpd_work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.543814] 1 lock held by khungtaskd/64: [ 861.544535] #0: 0000000059db4b53 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185 [ 861.545160] 3 locks held by kworker/6:2/320: [ 861.545896] #0: 00000000d9e1bc59 ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.546702] #1: 00000000c9f92d84 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.547443] #2: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: nouveau_display_fini+0x96/0x170 [nouveau] [ 861.548146] 1 lock held by dmesg/983: [ 861.548889] 2 locks held by zsh/1250: [ 861.549605] #0: 00000000348e3cf6 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40 [ 861.550393] #1: 000000007009a7a8 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870 [ 861.551122] 6 locks held by kworker/6:0/1329: [ 861.551957] #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.552765] #1: 00000000ddb499ad ((work_completion)(&notify->work)#2){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.553582] #2: 000000006e013cbe (&dev->mode_config.mutex){+.+.}, at: drm_helper_hpd_irq_event+0x6c/0x120 [drm_kms_helper] [ 861.554357] #3: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: drm_helper_hpd_irq_event+0x78/0x120 [drm_kms_helper] [ 861.555227] #4: 0000000044f294d9 (crtc_ww_class_acquire){+.+.}, at: drm_helper_probe_detect_ctx+0x3d/0x100 [drm_kms_helper] [ 861.556133] #5: 00000000db193642 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_lock+0x4b/0x130 [drm] [ 861.557864] ============================================= [ 861.559507] NMI backtrace for cpu 2 [ 861.560363] CPU: 2 PID: 64 Comm: khungtaskd Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.561197] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET78W (1.51 ) 05/18/2018 [ 861.561948] Call Trace: [ 861.562757] dump_stack+0x8e/0xd3 [ 861.563516] nmi_cpu_backtrace.cold.3+0x14/0x5a [ 861.564269] ? lapic_can_unplug_cpu.cold.27+0x42/0x42 [ 861.565029] nmi_trigger_cpumask_backtrace+0xa1/0xae [ 861.565789] arch_trigger_cpumask_backtrace+0x19/0x20 [ 861.566558] watchdog+0x316/0x580 [ 861.567355] kthread+0x12b/0x150 [ 861.568114] ? reset_hung_task_detector+0x20/0x20 [ 861.568863] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.569598] ret_from_fork+0x3a/0x50 [ 861.570370] Sending NMI from CPU 2 to CPUs 0-1,3-7: [ 861.571426] NMI backtrace for cpu 6 skipped: idling at intel_idle+0x7f/0x120 [ 861.571429] NMI backtrace for cpu 7 skipped: idling at intel_idle+0x7f/0x120 [ 861.571432] NMI backtrace for cpu 3 skipped: idling at intel_idle+0x7f/0x120 [ 861.571464] NMI backtrace for cpu 5 skipped: idling at intel_idle+0x7f/0x120 [ 861.571467] NMI backtrace for cpu 0 skipped: idling at intel_idle+0x7f/0x120 [ 861.571469] NMI backtrace for cpu 4 skipped: idling at intel_idle+0x7f/0x120 [ 861.571472] NMI backtrace for cpu 1 skipped: idling at intel_idle+0x7f/0x120 [ 861.572428] Kernel panic - not syncing: hung_task: blocked tasks So: fix this by making it so that normal hotplug handling /only/ happens so long as the GPU is currently awake without any pending runtime PM requests. In the event that a hotplug occurs while the device is suspending or resuming, we can simply defer our response until the GPU is fully runtime resumed again. Changes since v4: - Use a new trick I came up with using pm_runtime_get() instead of the hackish junk we had before Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: stable@vger.kernel.org Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau/drm/nouveau: Use pm_runtime_get_noresume() in connector_detect()Lyude Paul
It's true we can't resume the device from poll workers in nouveau_connector_detect(). We can however, prevent the autosuspend timer from elapsing immediately if it hasn't already without risking any sort of deadlock with the runtime suspend/resume operations. So do that instead of entirely avoiding grabbing a power reference. Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: stable@vger.kernel.org Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau/drm/nouveau: Fix deadlock with fb_helper with async RPM requestsLyude Paul
Currently, nouveau uses the generic drm_fb_helper_output_poll_changed() function provided by DRM as it's output_poll_changed callback. Unfortunately however, this function doesn't grab runtime PM references early enough and even if it did-we can't block waiting for the device to resume in output_poll_changed() since it's very likely that we'll need to grab the fb_helper lock at some point during the runtime resume process. This currently results in deadlocking like so: [ 246.669625] INFO: task kworker/4:0:37 blocked for more than 120 seconds. [ 246.673398] Not tainted 4.18.0-rc5Lyude-Test+ #2 [ 246.675271] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.676527] kworker/4:0 D 0 37 2 0x80000000 [ 246.677580] Workqueue: events output_poll_execute [drm_kms_helper] [ 246.678704] Call Trace: [ 246.679753] __schedule+0x322/0xaf0 [ 246.680916] schedule+0x33/0x90 [ 246.681924] schedule_preempt_disabled+0x15/0x20 [ 246.683023] __mutex_lock+0x569/0x9a0 [ 246.684035] ? kobject_uevent_env+0x117/0x7b0 [ 246.685132] ? drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] [ 246.686179] mutex_lock_nested+0x1b/0x20 [ 246.687278] ? mutex_lock_nested+0x1b/0x20 [ 246.688307] drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] [ 246.689420] drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper] [ 246.690462] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper] [ 246.691570] output_poll_execute+0x198/0x1c0 [drm_kms_helper] [ 246.692611] process_one_work+0x231/0x620 [ 246.693725] worker_thread+0x214/0x3a0 [ 246.694756] kthread+0x12b/0x150 [ 246.695856] ? wq_pool_ids_show+0x140/0x140 [ 246.696888] ? kthread_create_worker_on_cpu+0x70/0x70 [ 246.697998] ret_from_fork+0x3a/0x50 [ 246.699034] INFO: task kworker/0:1:60 blocked for more than 120 seconds. [ 246.700153] Not tainted 4.18.0-rc5Lyude-Test+ #2 [ 246.701182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.702278] kworker/0:1 D 0 60 2 0x80000000 [ 246.703293] Workqueue: pm pm_runtime_work [ 246.704393] Call Trace: [ 246.705403] __schedule+0x322/0xaf0 [ 246.706439] ? wait_for_completion+0x104/0x190 [ 246.707393] schedule+0x33/0x90 [ 246.708375] schedule_timeout+0x3a5/0x590 [ 246.709289] ? mark_held_locks+0x58/0x80 [ 246.710208] ? _raw_spin_unlock_irq+0x2c/0x40 [ 246.711222] ? wait_for_completion+0x104/0x190 [ 246.712134] ? trace_hardirqs_on_caller+0xf4/0x190 [ 246.713094] ? wait_for_completion+0x104/0x190 [ 246.713964] wait_for_completion+0x12c/0x190 [ 246.714895] ? wake_up_q+0x80/0x80 [ 246.715727] ? get_work_pool+0x90/0x90 [ 246.716649] flush_work+0x1c9/0x280 [ 246.717483] ? flush_workqueue_prep_pwqs+0x1b0/0x1b0 [ 246.718442] __cancel_work_timer+0x146/0x1d0 [ 246.719247] cancel_delayed_work_sync+0x13/0x20 [ 246.720043] drm_kms_helper_poll_disable+0x1f/0x30 [drm_kms_helper] [ 246.721123] nouveau_pmops_runtime_suspend+0x3d/0xb0 [nouveau] [ 246.721897] pci_pm_runtime_suspend+0x6b/0x190 [ 246.722825] ? pci_has_legacy_pm_support+0x70/0x70 [ 246.723737] __rpm_callback+0x7a/0x1d0 [ 246.724721] ? pci_has_legacy_pm_support+0x70/0x70 [ 246.725607] rpm_callback+0x24/0x80 [ 246.726553] ? pci_has_legacy_pm_support+0x70/0x70 [ 246.727376] rpm_suspend+0x142/0x6b0 [ 246.728185] pm_runtime_work+0x97/0xc0 [ 246.728938] process_one_work+0x231/0x620 [ 246.729796] worker_thread+0x44/0x3a0 [ 246.730614] kthread+0x12b/0x150 [ 246.731395] ? wq_pool_ids_show+0x140/0x140 [ 246.732202] ? kthread_create_worker_on_cpu+0x70/0x70 [ 246.732878] ret_from_fork+0x3a/0x50 [ 246.733768] INFO: task kworker/4:2:422 blocked for more than 120 seconds. [ 246.734587] Not tainted 4.18.0-rc5Lyude-Test+ #2 [ 246.735393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.736113] kworker/4:2 D 0 422 2 0x80000080 [ 246.736789] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper] [ 246.737665] Call Trace: [ 246.738490] __schedule+0x322/0xaf0 [ 246.739250] schedule+0x33/0x90 [ 246.739908] rpm_resume+0x19c/0x850 [ 246.740750] ? finish_wait+0x90/0x90 [ 246.741541] __pm_runtime_resume+0x4e/0x90 [ 246.742370] nv50_disp_atomic_commit+0x31/0x210 [nouveau] [ 246.743124] drm_atomic_commit+0x4a/0x50 [drm] [ 246.743775] restore_fbdev_mode_atomic+0x1c8/0x240 [drm_kms_helper] [ 246.744603] restore_fbdev_mode+0x31/0x140 [drm_kms_helper] [ 246.745373] drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xb0 [drm_kms_helper] [ 246.746220] drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper] [ 246.746884] drm_fb_helper_hotplug_event.part.28+0x96/0xb0 [drm_kms_helper] [ 246.747675] drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper] [ 246.748544] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper] [ 246.749439] nv50_mstm_hotplug+0x15/0x20 [nouveau] [ 246.750111] drm_dp_send_link_address+0x177/0x1c0 [drm_kms_helper] [ 246.750764] drm_dp_check_and_send_link_address+0xa8/0xd0 [drm_kms_helper] [ 246.751602] drm_dp_mst_link_probe_work+0x51/0x90 [drm_kms_helper] [ 246.752314] process_one_work+0x231/0x620 [ 246.752979] worker_thread+0x44/0x3a0 [ 246.753838] kthread+0x12b/0x150 [ 246.754619] ? wq_pool_ids_show+0x140/0x140 [ 246.755386] ? kthread_create_worker_on_cpu+0x70/0x70 [ 246.756162] ret_from_fork+0x3a/0x50 [ 246.756847] Showing all locks held in the system: [ 246.758261] 3 locks held by kworker/4:0/37: [ 246.759016] #0: 00000000f8df4d2d ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.759856] #1: 00000000e6065461 ((work_completion)(&(&dev->mode_config.output_poll_work)->work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.760670] #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] [ 246.761516] 2 locks held by kworker/0:1/60: [ 246.762274] #0: 00000000fff6be0f ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.762982] #1: 000000005ab44fb4 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.763890] 1 lock held by khungtaskd/64: [ 246.764664] #0: 000000008cb8b5c3 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185 [ 246.765588] 5 locks held by kworker/4:2/422: [ 246.766440] #0: 00000000232f0959 ((wq_completion)"events_long"){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.767390] #1: 00000000bb59b134 ((work_completion)(&mgr->work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.768154] #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_restore_fbdev_mode_unlocked+0x4c/0xb0 [drm_kms_helper] [ 246.768966] #3: 000000004c8f0b6b (crtc_ww_class_acquire){+.+.}, at: restore_fbdev_mode_atomic+0x4b/0x240 [drm_kms_helper] [ 246.769921] #4: 000000004c34a296 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_backoff+0x8a/0x1b0 [drm] [ 246.770839] 1 lock held by dmesg/1038: [ 246.771739] 2 locks held by zsh/1172: [ 246.772650] #0: 00000000836d0438 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40 [ 246.773680] #1: 000000001f4f4d48 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870 [ 246.775522] ============================================= After trying dozens of different solutions, I found one very simple one that should also have the benefit of preventing us from having to fight locking for the rest of our lives. So, we work around these deadlocks by deferring all fbcon hotplug events that happen after the runtime suspend process starts until after the device is resumed again. Changes since v7: - Fixup commit message - Daniel Vetter Changes since v6: - Remove unused nouveau_fbcon_hotplugged_in_suspend() - Ilia Changes since v5: - Come up with the (hopefully final) solution for solving this dumb problem, one that is a lot less likely to cause issues with locking in the future. This should work around all deadlock conditions with fbcon brought up thus far. Changes since v4: - Add nouveau_fbcon_hotplugged_in_suspend() to workaround deadlock condition that Lukas described - Just move all of this out of drm_fb_helper. It seems that other DRM drivers have already figured out other workarounds for this. If other drivers do end up needing this in the future, we can just move this back into drm_fb_helper again. Changes since v3: - Actually check if fb_helper is NULL in both new helpers - Actually check drm_fbdev_emulation in both new helpers - Don't fire off a fb_helper hotplug unconditionally; only do it if the following conditions are true (as otherwise, calling this in the wrong spot will cause Bad Things to happen): - fb_helper hotplug handling was actually inhibited previously - fb_helper actually has a delayed hotplug pending - fb_helper is actually bound - fb_helper is actually initialized - Add __must_check to drm_fb_helper_suspend_hotplug(). There's no situation where a driver would actually want to use this without checking the return value, so enforce that - Rewrite and clarify the documentation for both helpers. - Make sure to return true in the drm_fb_helper_suspend_hotplug() stub that's provided in drm_fb_helper.h when CONFIG_DRM_FBDEV_EMULATION isn't enabled - Actually grab the toplevel fb_helper lock in drm_fb_helper_resume_hotplug(), since it's possible other activity (such as a hotplug) could be going on at the same time the driver calls drm_fb_helper_resume_hotplug(). We need this to check whether or not drm_fb_helper_hotplug_event() needs to be called anyway Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: stable@vger.kernel.org Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau: Remove duplicate poll_enable() in pmops_runtime_suspend()Lyude Paul
Since actual hotplug notifications don't get disabled until nouveau_display_fini() is called, all this will do is cause any hotplugs that happen between this drm_kms_helper_poll_disable() call and the actual hotplug disablement to potentially be dropped if ACPI isn't around to help us. Signed-off-by: Lyude Paul <lyude@redhat.com> Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: stable@vger.kernel.org Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau/drm/nouveau: Fix bogus drm_kms_helper_poll_enable() placementLyude Paul
Turns out this part is my fault for not noticing when reviewing 9a2eba337cace ("drm/nouveau: Fix drm poll_helper handling"). Currently we call drm_kms_helper_poll_enable() from nouveau_display_hpd_work(). This makes basically no sense however, because that means we're calling drm_kms_helper_poll_enable() every time we schedule the hotplug detection work. This is also against the advice mentioned in drm_kms_helper_poll_enable()'s documentation: Note that calls to enable and disable polling must be strictly ordered, which is automatically the case when they're only call from suspend/resume callbacks. Of course, hotplugs can't really be ordered. They could even happen immediately after we called drm_kms_helper_poll_disable() in nouveau_display_fini(), which can lead to all sorts of issues. Additionally; enabling polling /after/ we call drm_helper_hpd_irq_event() could also mean that we'd miss a hotplug event anyway, since drm_helper_hpd_irq_event() wouldn't bother trying to probe connectors so long as polling is disabled. So; simply move this back into nouveau_display_init() again. The race condition that both of these patches attempted to work around has already been fixed properly in d61a5c106351 ("drm/nouveau: Fix deadlock on runtime suspend") Fixes: 9a2eba337cace ("drm/nouveau: Fix drm poll_helper handling") Signed-off-by: Lyude Paul <lyude@redhat.com> Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: Lukas Wunner <lukas@wunner.de> Cc: Peter Ujfalusi <peter.ujfalusi@ti.com> Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-30BackMerge v4.18-rc7 into drm-nextDave Airlie
rmk requested this for armada and I think we've had a few conflicts build up. Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-07-20Merge tag 'drm-misc-next-2018-07-18' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 4.19: Core Changes: - add support for DisplayPort CEC-Tunneling-over-AUX (Hans Verkuil) - more doc updates (Daniel Vetter) - fourcc: Add is_yuv field to drm_format_info (Ayan Kumar Halder) - dma-buf: correctly place BUG_ON (Michel Dänzer) Driver Changes: - more vkms support(Rodrigo Siqueira) - many fixes and small improments to all drivers Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180718200826.GA20165@juma
2018-07-20Merge branch 'linux-4.19' of git://github.com/skeggsb/linux into drm-nextDave Airlie
misc fixes and cleanups for next. Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/CACAvsv55CfRonQ0bo2XiitkCiWTjKwhsP=+ZFhoa-BaJ72Ryew@mail.gmail.com
2018-07-20Merge branch 'linux-4.18' of git://github.com/skeggsb/linux into drm-fixesDave Airlie
- fix problem with pascal and large memory systems - fix a bunch of MST problems - fix a runtime PM interaction with MST Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/CACAvsv79O8deSts2fxJ_oS6=q8yA+OgwBSEpp5R=BQBmWa+oyg@mail.gmail.com
2018-07-19drm/nouveau/kms/nv50-: allocate push buffers in vidmem on pascalBen Skeggs
Workaround for issues seen on systems with large amounts of RAM, caused by display not supporting the same physical address limits as the other parts of the GPU. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-19drm/nouveau/fb/gp100-: disable address remapperBen Skeggs
This was causing problems on a system with a large amount of RAM, where display push buffers were being fetched incorrectly when placed in high system memory addresses. While this commit will resolve the issue on that particular system, the issue will be avoided completely with another patch to more fully solve problems with display and large amounts of system memory on Pascal. It's still probably a good idea to disable this to prevent weird issues in the future. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau: tegra: Detach from ARM DMA/IOMMU mappingThierry Reding
Depending on the kernel configuration, early ARM architecture setup code may have attached the GPU to a DMA/IOMMU mapping that transparently uses the IOMMU to back the DMA API. Tegra requires special handling for IOMMU backed buffers (a special bit in the GPU's MMU page tables indicates the memory path to take: via the SMMU or directly to the memory controller). Transparently backing DMA memory with an IOMMU prevents Nouveau from properly handling such memory accesses and causes memory access faults. As a side-note: buffers other than those allocated in instance memory don't need to be physically contiguous from the GPU's perspective since the GPU can map them into contiguous buffers using its own MMU. Mapping these buffers through the IOMMU is unnecessary and will even lead to performance degradation because of the additional translation. One exception to this are compressible buffers which need large pages. In order to enable these large pages, multiple small pages will have to be combined into one large (I/O virtually contiguous) mapping via the IOMMU. However, that is a topic outside the scope of this fix and isn't currently supported. An implementation will want to explicitly create these large pages in the Nouveau driver, so detaching from a DMA/IOMMU mapping would still be required. Signed-off-by: Thierry Reding <treding@nvidia.com> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Nicolas Chauvet <kwizart@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/secboot/acr: Remove VLA usageKees Cook
In the quest to remove all stack VLA usage from the kernel[1], this allocates the working buffers before starting the writing so it won't abort in the middle. This needs an initial walk of the lists to figure out how large the buffer should be. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau: Replace drm_dev_unref with drm_dev_putThomas Zimmermann
This patch unifies the naming of DRM functions for reference counting of struct drm_device. The resulting code is more aligned with the rest of the Linux kernel interfaces. Signed-off-by: Thomas Zimmermann <tdz@users.sourceforge.net> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau: Replace drm_gem_object_unreference_unlocked with put functionThomas Zimmermann
This patch unifies the naming of DRM functions for reference counting of struct drm_gem_object. The resulting code is more aligned with the rest of the Linux kernel interfaces. Signed-off-by: Thomas Zimmermann <tdz@users.sourceforge.net> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau: Replace drm_framebuffer_{un/reference} with put, get functionsThomas Zimmermann
This patch unifies the naming of DRM functions for reference counting of struct drm_framebuffer. The resulting code is more aligned with the rest of the Linux kernel interfaces. Signed-off-by: Thomas Zimmermann <tdz@users.sourceforge.net> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/nvif: remove const attribute from nvif_mclassNick Desaulniers
Similar to commit 0bf8bf50eddc ("module: Remove const attribute from alias for MODULE_DEVICE_TABLE") Fixes many -Wduplicate-decl-specifier warnings due to the combination of const typeof() of already const variables. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/hwmon: potential uninitialized variablesDan Carpenter
Smatch complains that "value" can be uninitialized when kstrtol() returns -ERANGE. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau: Fix runtime PM leak in drm_open()Lyude Paul
Noticed this as I was skimming through, if we fail to allocate memory for cli we'll end up returning without dropping the runtime PM ref we got. Additionally, we'll even return the wrong return code! (ret most likely will == 0 here, we want -ENOMEM). Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>