summaryrefslogtreecommitdiff
path: root/drivers/cpufreq
AgeCommit message (Collapse)Author
2020-04-24cpufreq: powernv: Fix use-after-freeOliver O'Halloran
commit d0a72efac89d1c35ac55197895201b7b94c5e6ef upstream. The cpufreq driver has a use-after-free that we can hit if: a) There's an OCC message pending when the notifier is registered, and b) The cpufreq driver fails to register with the core. When a) occurs the notifier schedules a workqueue item to handle the message. The backing work_struct is located on chips[].throttle and when b) happens we clean up by freeing the array. Once we get to the (now free) queued item and the kernel crashes. Fixes: c5e29ea7ac14 ("cpufreq: powernv: Fix bugs in powernv_cpufreq_{init/exit}") Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200206062622.28235-1-oohall@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-04cpufreq: Register drivers only after CPU devices have been registeredViresh Kumar
[ Upstream commit 46770be0cf94149ca48be87719bda1d951066644 ] The cpufreq core heavily depends on the availability of the struct device for CPUs and if they aren't available at the time cpufreq driver is registered, we will never succeed in making cpufreq work. This happens due to following sequence of events: - cpufreq_register_driver() - subsys_interface_register() - return 0; //successful registration of driver ... at a later point of time - register_cpu(); - device_register(); - bus_probe_device(); - sif->add_dev(); - cpufreq_add_dev(); - get_cpu_device(); //FAILS - per_cpu(cpu_sys_devices, num) = &cpu->dev; //used by get_cpu_device() - return 0; //CPU registered successfully Because the per-cpu variable cpu_sys_devices is set only after the CPU device is regsitered, cpufreq will never be able to get it when cpufreq_add_dev() is called. This patch avoids this failure by making sure device structure of at least CPU0 is available when the cpufreq driver is registered, else return -EPROBE_DEFER. Reported-by: Bjorn Andersson <bjorn.andersson@linaro.org> Co-developed-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-28cpufreq: Add NULL checks to show() and store() methods of cpufreqKai Shen
commit e6e8df07268c1f75dd9215536e2ce4587b70f977 upstream. Add NULL checks to show() and store() in cpufreq.c to avoid attempts to invoke a NULL callback. Though some interfaces of cpufreq are set as read-only, users can still get write permission using chmod which can lead to a kernel crash, as follows: chmod +w /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq echo 1 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq This bug was found in linux 4.19. Signed-off-by: Kai Shen <shenkai8@huawei.com> Reported-by: Feilong Lin <linfeilong@huawei.com> Reviewed-by: Feilong Lin <linfeilong@huawei.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Subject & changelog ] Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-28cpufreq: Skip cpufreq resume if it's not suspendedBo Yan
commit 703cbaa601ff3fb554d1246c336ba727cc083ea0 upstream. cpufreq_resume can be called even without preceding cpufreq_suspend. This can happen in following scenario: suspend_devices_and_enter --> dpm_suspend_start --> dpm_prepare --> device_prepare : this function errors out --> dpm_suspend: this is skipped due to dpm_prepare failure this means cpufreq_suspend is skipped over --> goto Recover_platform, due to previous error --> goto Resume_devices --> dpm_resume_end --> dpm_resume --> cpufreq_resume In case schedutil is used as frequency governor, cpufreq_resume will eventually call sugov_start, which does following: memset(sg_cpu, 0, sizeof(*sg_cpu)); .... This effectively erases function pointer for frequency update, causing crash later on. The function pointer would have been set correctly if subsequent cpufreq_add_update_util_hook runs successfully, but that function returns earlier because cpufreq_suspend was not called: if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu))) return; The fix is to check cpufreq_suspended first, if it's false, that means cpufreq_suspend was not called in the first place, so do not resume cpufreq. Signed-off-by: Bo Yan <byan@nvidia.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Dropped printing a message ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-29cpufreq: Avoid cpufreq_suspend() deadlock on system shutdownRafael J. Wysocki
commit 65650b35133ff20f0c9ef0abd5c3c66dbce3ae57 upstream. It is incorrect to set the cpufreq syscore shutdown callback pointer to cpufreq_suspend(), because that function cannot be run in the syscore stage of system shutdown for two reasons: (a) it may attempt to carry out actions depending on devices that have already been shut down at that point and (b) the RCU synchronization carried out by it may not be able to make progress then. The latter issue has been present since commit 45975c7d21a1 ("rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds"), but the former one has been there since commit 90de2a4aa9f3 ("cpufreq: suspend cpufreq governors on shutdown") regardless. Fix that by dropping cpufreq_syscore_ops altogether and making device_shutdown() call cpufreq_suspend() directly before shutting down devices, which is along the lines of what system-wide power management does. Fixes: 45975c7d21a1 ("rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds") Fixes: 90de2a4aa9f3 ("cpufreq: suspend cpufreq governors on shutdown") Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.0+ <stable@vger.kernel.org> # 4.0+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-25cpufreq/pasemi: fix use-after-free in pas_cpufreq_cpu_init()Wen Yang
[ Upstream commit e0a12445d1cb186d875410d093a00d215bec6a89 ] The cpu variable is still being used in the of_get_property() call after the of_node_put() call, which may result in use-after-free. Fixes: a9acc26b75f6 ("cpufreq/pasemi: fix possible object reference leak") Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-31cpufreq: pmac32: fix possible object reference leakWen Yang
[ Upstream commit 8d10dc28a9ea6e8c02e825dab28699f3c72b02d9 ] The call to of_find_node_by_name returns a node pointer with refcount incremented thus it must be explicitly decremented after the last usage. Detected by coccinelle with the following warnings: ./drivers/cpufreq/pmac32-cpufreq.c:557:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 552, but without a corresponding object release within this function. ./drivers/cpufreq/pmac32-cpufreq.c:569:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 552, but without a corresponding object release within this function. ./drivers/cpufreq/pmac32-cpufreq.c:598:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 587, but without a corresponding object release within this function. Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linux-pm@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-31cpufreq/pasemi: fix possible object reference leakWen Yang
[ Upstream commit a9acc26b75f652f697e02a9febe2ab0da648a571 ] The call to of_get_cpu_node returns a node pointer with refcount incremented thus it must be explicitly decremented after the last usage. Detected by coccinelle with the following warnings: ./drivers/cpufreq/pasemi-cpufreq.c:212:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 147, but without a corresponding object release within this function. ./drivers/cpufreq/pasemi-cpufreq.c:220:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 147, but without a corresponding object release within this function. Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-31cpufreq: ppc_cbe: fix possible object reference leakWen Yang
[ Upstream commit 233298032803f2802fe99892d0de4ab653bfece4 ] The call to of_get_cpu_node returns a node pointer with refcount incremented thus it must be explicitly decremented after the last usage. Detected by coccinelle with the following warnings: ./drivers/cpufreq/ppc_cbe_cpufreq.c:89:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 76, but without a corresponding object release within this function. ./drivers/cpufreq/ppc_cbe_cpufreq.c:89:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 76, but without a corresponding object release within this function. Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-31sched/cpufreq: Fix kobject memleakViresh Kumar
[ Upstream commit 9a4f26cc98d81b67ecc23b890c28e2df324e29f3 ] Currently the error return path from kobject_init_and_add() is not followed by a call to kobject_put() - which means we are leaking the kobject. Fix it by adding a call to kobject_put() in the error path of kobject_init_and_add(). Signed-off-by: Tobin C. Harding <tobin@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tobin C. Harding <tobin@kernel.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Link: http://lkml.kernel.org/r/20190430001144.24890-1-tobin@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-14x86/cpu: Sanitize FAM6_ATOM namingPeter Zijlstra
commit f2c4db1bd80720cd8cb2a5aa220d9bc9f374f04e upstream. Going primarily by: https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors with additional information gleaned from other related pages; notably: - Bonnell shrink was called Saltwell - Moorefield is the Merriefield refresh which makes it Airmont The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE for i in `git grep -l FAM6_ATOM` ; do sed -i -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g' \ -e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/' \ -e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g' \ -e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g' \ -e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g' \ -e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g' \ -e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g' \ -e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g' \ -e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g' \ -e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g' \ -e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i} done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: dave.hansen@linux.intel.com Cc: len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bwh: Backported to 4.9: - Drop changes to CPU IDs that weren't already included - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23cpufreq: pxa2xx: remove incorrect __init annotationArnd Bergmann
commit 9505b98ccddc454008ca7efff90044e3e857c827 upstream. pxa_cpufreq_init_voltages() is marked __init but usually inlined into the non-__init pxa_cpufreq_init() function. When building with clang, it can stay as a standalone function in a discarded section, and produce this warning: WARNING: vmlinux.o(.text+0x616a00): Section mismatch in reference from the function pxa_cpufreq_init() to the function .init.text:pxa_cpufreq_init_voltages() The function pxa_cpufreq_init() references the function __init pxa_cpufreq_init_voltages(). This is often because pxa_cpufreq_init lacks a __init annotation or the annotation of pxa_cpufreq_init_voltages is wrong. Fixes: 50e77fcd790e ("ARM: pxa: remove __init from cpufreq_driver->init()") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Robert Jarzmik <robert.jarzmik@free.fr> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23cpufreq: tegra124: add missing of_node_put()Yangtao Li
commit 446fae2bb5395f3028d8e3aae1508737e5a72ea1 upstream. of_cpu_device_node_get() will increase the refcount of device_node, it is necessary to call of_node_put() at the end to release the refcount. Fixes: 9eb15dbbfa1a2 ("cpufreq: Add cpufreq driver for Tegra124") Cc: <stable@vger.kernel.org> # 4.4+ Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-13cpufreq: Use struct kobj_attribute instead of struct global_attrViresh Kumar
commit 625c85a62cb7d3c79f6e16de3cfa972033658250 upstream. The cpufreq_global_kobject is created using kobject_create_and_add() helper, which assigns the kobj_type as dynamic_kobj_ktype and show/store routines are set to kobj_attr_show() and kobj_attr_store(). These routines pass struct kobj_attribute as an argument to the show/store callbacks. But all the cpufreq files created using the cpufreq_global_kobject expect the argument to be of type struct attribute. Things work fine currently as no one accesses the "attr" argument. We may not see issues even if the argument is used, as struct kobj_attribute has struct attribute as its first element and so they will both get same address. But this is logically incorrect and we should rather use struct kobj_attribute instead of struct global_attr in the cpufreq core and drivers and the show/store callbacks should take struct kobj_attribute as argument instead. This bug is caught using CFI CLANG builds in android kernel which catches mismatch in function prototypes for such callbacks. Reported-by: Donghee Han <dh.han@samsung.com> Reported-by: Sangkyu Kim <skwith.kim@samsung.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-20cpufreq: check if policy is inactive early in __cpufreq_get()Sudeep Holla
[ Upstream commit 2f66196208c98b3d1b4294edffb2c5a8197be899 ] cpuinfo_cur_freq gets current CPU frequency as detected by hardware while scaling_cur_freq last known CPU frequency. Some platforms may not allow checking the CPU frequency of an offline CPU or the associated resources may have been released via cpufreq_exit when the CPU gets offlined, in which case the policy would have been invalidated already. If we attempt to get current frequency from the hardware, it may result in hang or crash. For example on Juno, I see: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000188 [0000000000000188] pgd=0000000000000000 Internal error: Oops: 96000004 [#1] PREEMPT SMP Modules linked in: CPU: 5 PID: 4202 Comm: cat Not tainted 4.20.0-08251-ga0f2c0318a15-dirty #87 Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform pstate: 40000005 (nZcv daif -PAN -UAO) pc : scmi_cpufreq_get_rate+0x34/0xb0 lr : scmi_cpufreq_get_rate+0x34/0xb0 Call trace: scmi_cpufreq_get_rate+0x34/0xb0 __cpufreq_get+0x34/0xc0 show_cpuinfo_cur_freq+0x24/0x78 show+0x40/0x60 sysfs_kf_seq_show+0xc0/0x148 kernfs_seq_show+0x44/0x50 seq_read+0xd4/0x480 kernfs_fop_read+0x15c/0x208 __vfs_read+0x60/0x188 vfs_read+0x94/0x150 ksys_read+0x6c/0xd8 __arm64_sys_read+0x24/0x30 el0_svc_common+0x78/0x100 el0_svc_handler+0x38/0x78 el0_svc+0x8/0xc ---[ end trace 3d1024e58f77f6b2 ]--- So fix the issue by checking if the policy is invalid early in __cpufreq_get before attempting to get the current frequency. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-01cpufreq: imx6q: add return value check for voltage scaleAnson Huang
[ Upstream commit 6ef28a04d1ccf718eee069b72132ce4aa1e52ab9 ] Add return value check for voltage scale when ARM clock rate change fail. Signed-off-by: Anson Huang <Anson.Huang@nxp.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-13cpufreq: dt: Try freeing static OPPs only if we have added themViresh Kumar
[ Upstream commit 51c99dd2c06b234575661fa1e0a1dea6c3ef566f ] We can not call dev_pm_opp_of_cpumask_remove_table() freely anymore since the latest OPP core updates as that uses reference counting to free resources. There are cases where no static OPPs are added (using DT) for a platform and trying to remove the OPP table may end up decrementing refcount which is already zero and hence generating warnings. Lets track if we were able to add static OPPs or not and then only remove the table based on that. Some reshuffling of code is also done to do that. Reported-by: Niklas Cassel <niklas.cassel@linaro.org> Tested-by: Niklas Cassel <niklas.cassel@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-20sched/cputime: Convert kcpustat to nsecsFrederic Weisbecker
commit 7fb1327ee9b92fca27662f9b9d60c7c3376d6c69 upstream. Kernel CPU stats are stored in cputime_t which is an architecture defined type, and hence a bit opaque and requiring accessors and mutators for any operation. Converting them to nsecs simplifies the code and is one step toward the removal of cputime_t in the core code. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-4-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> [colona: minor conflict as 527b0a76f41d ("sched/cpuacct: Avoid %lld seq_printf warning") is missing from v4.9] Signed-off-by: Ivan Delalande <colona@arista.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26cpufreq: Fix new policy initialization during limits updates via sysfsTao Wang
commit c7d1f119c48f64bebf0fa1e326af577c6152fe30 upstream. If the policy limits are updated via cpufreq_update_policy() and subsequently via sysfs, the limits stored in user_policy may be set incorrectly. For example, if both min and max are set via sysfs to the maximum available frequency, user_policy.min and user_policy.max will also be the maximum. If a policy notifier triggered by cpufreq_update_policy() lowers both the min and the max at this point, that change is not reflected by the user_policy limits, so if the max is updated again via sysfs to the same lower value, then user_policy.max will be lower than user_policy.min which shouldn't happen. In particular, if one of the policy CPUs is then taken offline and back online, cpufreq_set_policy() will fail for it due to a failing limits check. To prevent that from happening, initialize the min and max fields of the new_policy object to the ones stored in user_policy that were previously set via sysfs. Signed-off-by: Kevin Wangtao <kevin.wangtao@hisilicon.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Subject & changelog ] Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30cpufreq: Reorder cpufreq_online() error code pathViresh Kumar
[ Upstream commit b24b6478e65f140610ab1ffaadc7bc6bf0be8aad ] Ideally the de-allocation of resources should happen in the exact opposite order in which they were allocated. It helps maintain the code in long term, even if nothing really breaks with incorrect ordering. That wasn't followed in cpufreq_online() and it has some inconsistencies. For example, the symlinks were created from within the locked region while they are removed only after putting the locks. Also ->exit() should have been called only after the symlinks are removed and the lock is dropped, as that was the case when ->init() was first called. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30cpufreq: cppc_cpufreq: Fix cppc_cpufreq_init() failure pathChunyu Hu
[ Upstream commit 55b55abc17f238c61921360e61dde90dd9a326d1 ] Kmemleak reported the below leak. When cppc_cpufreq_init went into failure path, the cpu mask is not freed. After fix, this report is gone. And to avaoid potential NULL pointer reference, check the cpu value first. unreferenced object 0xffff800fd5ea4880 (size 128): comm "swapper/0", pid 1, jiffies 4294939510 (age 668.680s) hex dump (first 32 bytes): 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 .... ........... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffff0000082c4ae4>] __kmalloc_node+0x278/0x634 [<ffff0000088f4a74>] alloc_cpumask_var_node+0x28/0x60 [<ffff0000088f4af0>] zalloc_cpumask_var+0x14/0x1c [<ffff000008d20254>] cppc_cpufreq_init+0xd0/0x19c [<ffff000008083828>] do_one_initcall+0xec/0x15c [<ffff000008cd1018>] kernel_init_freeable+0x1f4/0x2a4 [<ffff0000089099b0>] kernel_init+0x18/0x10c [<ffff000008084d50>] ret_from_fork+0x10/0x18 [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Chunyu Hu <chuhu@redhat.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30cpufreq: CPPC: Initialize shared perf capabilities of CPUsShunyong Yang
[ Upstream commit 8913315e9459b146e5888ab5138e10daa061b885 ] When multiple CPUs are related in one cpufreq policy, the first online CPU will be chosen by default to handle cpufreq operations. Let's take cpu0 and cpu1 as an example. When cpu0 is offline, policy->cpu will be shifted to cpu1. cpu1's perf capabilities should be initialized. Otherwise, perf capabilities are 0s and speed change can not take effect. This patch copies perf capabilities of the first online CPU to other shared CPUs when policy shared type is CPUFREQ_SHARED_TYPE_ANY. Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-01cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer interruptShilpasri G Bhat
commit c0f7f5b6c69107ca92909512533e70258ee19188 upstream. gpstate_timer_handler() uses synchronous smp_call to set the pstate on the requested core. This causes the below hard lockup: smp_call_function_single+0x110/0x180 (unreliable) smp_call_function_any+0x180/0x250 gpstate_timer_handler+0x1e8/0x580 call_timer_fn+0x50/0x1c0 expire_timers+0x138/0x1f0 run_timer_softirq+0x1e8/0x270 __do_softirq+0x158/0x3e4 irq_exit+0xe8/0x120 timer_interrupt+0x9c/0xe0 decrementer_common+0x114/0x120 -- interrupt: 901 at doorbell_global_ipi+0x34/0x50 LR = arch_send_call_function_ipi_mask+0x120/0x130 arch_send_call_function_ipi_mask+0x4c/0x130 smp_call_function_many+0x340/0x450 pmdp_invalidate+0x98/0xe0 change_huge_pmd+0xe0/0x270 change_protection_range+0xb88/0xe40 mprotect_fixup+0x140/0x340 SyS_mprotect+0x1b4/0x350 system_call+0x58/0x6c One way to avoid this is removing the smp-call. We can ensure that the timer always runs on one of the policy-cpus. If the timer gets migrated to a cpu outside the policy then re-queue it back on the policy->cpus. This way we can get rid of the smp-call which was being used to set the pstate on the policy->cpus. Fixes: 7bc54b652f13 ("timers, cpufreq/powernv: Initialize the gpstate timer as pinned") Cc: stable@vger.kernel.org # v4.8+ Reported-by: Nicholas Piggin <npiggin@gmail.com> Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Acked-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24cpufreq/sh: Replace racy task affinity logicThomas Gleixner
[ Upstream commit 205dcc1ecbc566cbc20acf246e68de3b080b3ecf ] The target() callback must run on the affected cpu. This is achieved by temporarily setting the affinity of the calling thread to the requested CPU and reset it to the original affinity afterwards. That's racy vs. concurrent affinity settings for that thread resulting in code executing on the wrong CPU. Replace it by work_on_cpu(). All call pathes which invoke the callbacks are already protected against CPU hotplug. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: linux-pm@vger.kernel.org Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Tejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/20170412201042.958216363@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11cpufreq: s3c24xx: Fix broken s3c_cpufreq_init()Viresh Kumar
commit 0373ca74831b0f93cd4cdbf7ad3aec3c33a479a5 upstream. commit a307a1e6bc0d "cpufreq: s3c: use cpufreq_generic_init()" accidentally broke cpufreq on s3c2410 and s3c2412. These two platforms don't have a CPU frequency table and used to skip calling cpufreq_table_validate_and_show() for them. But with the above commit, we started calling it unconditionally and that will eventually fail as the frequency table pointer is NULL. Fix this by calling cpufreq_table_validate_and_show() conditionally again. Fixes: a307a1e6bc0d "cpufreq: s3c: use cpufreq_generic_init()" Cc: 3.13+ <stable@vger.kernel.org> # v3.13+ Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-22x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_steppingJia Zhang
commit b399151cb48db30ad1e0e93dd40d68c6d007b637 upstream. x86_mask is a confusing name which is hard to associate with the processor's stepping. Additionally, correct an indent issue in lib/cpu.c. Signed-off-by: Jia Zhang <qianyue.zj@alibaba-inc.com> [ Updated it to more recent kernels. ] Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: tony.luck@intel.com Link: http://lkml.kernel.org/r/1514771530-70829-1-git-send-email-qianyue.zj@alibaba-inc.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-22cpufreq: powernv: Dont assume distinct pstate values for nominal and pminShilpasri G Bhat
commit 3fa4680b860bf48b437d6a2c039789c4abe202ae upstream. Some OpenPOWER boxes can have same pstate values for nominal and pmin pstates. In these boxes the current code will not initialize 'powernv_pstate_info.min' variable and result in erroneous CPU frequency reporting. This patch fixes this problem. Fixes: 09ca4c9b5958 (cpufreq: powernv: Replacing pstate_id with frequency table index) Reported-by: Alvin Wang <wangat@tw.ibm.com> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.8+ <stable@vger.kernel.org> # 4.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-03cpufreq: Add Loongson machine dependenciesJames Hogan
[ Upstream commit 0d307935fefa6389eb726c6362351c162c949101 ] The MIPS loongson cpufreq drivers don't build unless configured for the correct machine type, due to dependency on machine specific architecture headers and symbols in machine specific platform code. More specifically loongson1-cpufreq.c uses RST_CPU_EN and RST_CPU, neither of which is defined in asm/mach-loongson32/regs-clk.h unless CONFIG_LOONGSON1_LS1B=y, and loongson2_cpufreq.c references loongson2_clockmod_table[], which is only defined in arch/mips/loongson64/lemote-2f/clock.c, i.e. when CONFIG_LEMOTE_MACH2F=y. Add these dependencies to Kconfig to avoid randconfig / allyesconfig build failures (e.g. when based on BMIPS which also has a cpufreq driver). Signed-off-by: James Hogan <jhogan@kernel.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25cpufreq: Fix creation of symbolic links to policy directoriesRafael J. Wysocki
[ Upstream commit 2f0ba790df51721794c11abc7a076d407392f648 ] The cpufreq core only tries to create symbolic links from CPU directories in sysfs to policy directories in cpufreq_add_dev(), either when a given CPU is registered or when the cpufreq driver is registered, whichever happens first. That is not sufficient, however, because cpufreq_add_dev() may be called for an offline CPU whose policy object has not been created yet and, quite obviously, the symbolic cannot be added in that case. Fix that by making cpufreq_online() attempt to add symbolic links to policy objects for the CPUs in the related_cpus mask of every new policy object created by it. The cpufreq_driver_lock locking around the for_each_cpu() loop in cpufreq_online() is dropped, because it is not necessary and the code is somewhat simpler without it. Moreover, failures to create a symbolic link will not be regarded as hard errors any more and the CPUs without those links will not be taken offline automatically, but that should not be problematic in practice. Reported-and-tested-by: Prashanth Prakash <pprakash@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-08cpufreq: Do not clear real_cpus mask on policy initRafael J. Wysocki
[ Upstream commit f451014692ae34e587b00de6745e16661cf734d8 ] If new_policy is set in cpufreq_online(), the policy object has just been created and its real_cpus mask has been zeroed on allocation, and the driver's ->init() callback should not touch it. It doesn't need to be cleared again, so don't do that. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21cpufreq: CPPC: add ACPI_PROCESSOR dependencyArnd Bergmann
[ Upstream commit a578884fa0d2768f13d37c6591a9e1ed600482d3 ] Without the Kconfig dependency, we can get this warning: warning: ACPI_CPPC_CPUFREQ selects ACPI_CPPC_LIB which has unmet direct dependencies (ACPI && ACPI_PROCESSOR) Fixes: 5477fb3bd1e8 (ACPI / CPPC: Add a CPUFreq driver for use with CPPC) Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-08cpufreq: intel_pstate: Update pid_params.sample_rate_ns in pid_param_set()Rafael J. Wysocki
[ Upstream commit 6e7408acd04d06c04981c0c0fb5a2462b16fae4f ] Fix the debugfs interface for PID tuning to actually update pid_params.sample_rate_ns on PID parameters updates, as changing pid_params.sample_rate_ms via debugfs has no effect now. Fixes: a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with utilization update callbacks) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-05cpufreq: s3c2416: double free on driver init error pathDan Carpenter
commit a69261e4470d680185a15f748d9cdafb37c57a33 upstream. The "goto err_armclk;" error path already does a clk_put(s3c_freq->hclk); so this is a double free. Fixes: 34ee55075265 ([CPUFREQ] Add S3C2416/S3C2450 cpufreq driver) Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24cpufreq: conservative: Allow down_threshold to take values from 1 to 10Tomasz Wilczyński
commit b8e11f7d2791bd9320be1c6e772a60b2aa093e45 upstream. Commit 27ed3cd2ebf4 (cpufreq: conservative: Fix the logic in frequency decrease checking) removed the 10 point substraction when comparing the load against down_threshold but did not remove the related limit for the down_threshold value. As a result, down_threshold lower than 11 is not allowed even though values from 1 to 10 do work correctly too. The comment ("cannot be lower than 11 otherwise freq will not fall") also does not apply after removing the substraction. For this reason, allow down_threshold to take any value from 1 to 99 and fix the related comment. Fixes: 27ed3cd2ebf4 (cpufreq: conservative: Fix the logic in frequency decrease checking) Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14cpufreq: cpufreq_register_driver() should return -ENODEV if init failsDavid Arcari
commit 6c77003677d5f1ce15f26d24360cb66c0bc07bb3 upstream. For a driver that does not set the CPUFREQ_STICKY flag, if all of the ->init() calls fail, cpufreq_register_driver() should return an error. This will prevent the driver from loading. Fixes: ce1bcfe94db8 (cpufreq: check cpufreq_policy_list instead of scanning policies for all CPUs) Signed-off-by: David Arcari <darcari@redhat.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-21cpufreq: Bring CPUs up even if cpufreq_online() failedChen Yu
commit c4a3fa261b16858416f1fd7db03a33d7ef5fc0b3 upstream. There is a report that after commit 27622b061eb4 ("cpufreq: Convert to hotplug state machine"), the normal CPU offline/online cycle fails on some platforms. According to the ftrace result, this problem was triggered on platforms using acpi-cpufreq as the default cpufreq driver, and due to the lack of some ACPI freq method (eg. _PCT), cpufreq_online() failed and returned a negative value, so the CPU hotplug state machine rolled back the CPU online process. Actually, from the user's perspective, the failure of cpufreq_online() should not prevent that CPU from being brought up, although cpufreq might not work on that CPU. BTW, during system startup cpufreq_online() is not invoked via CPU online but by the cpufreq device creation process, so the APs can be brought up even though cpufreq_online() fails in that stage. This patch ignores the return value of cpufreq_online/offline() and lets the cpufreq framework deal with the failure. cpufreq_online() itself will do a proper rollback in that case and if _PCT is missing, the ACPI cpufreq driver will print a warning if the corresponding debug options have been enabled. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=194581 Fixes: 27622b061eb4 ("cpufreq: Convert to hotplug state machine") Reported-and-tested-by: Tomasz Maciej Nowak <tmn505@gmail.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30cpufreq: Restore policy min/max limits on CPU onlineViresh Kumar
commit ff010472fb75670cb5c08671e820eeea3af59c87 upstream. On CPU online the cpufreq core restores the previous governor (or the previous "policy" setting for ->setpolicy drivers), but it does not restore the min/max limits at the same time, which is confusing, inconsistent and real pain for users who set the limits and then suspend/resume the system (using full suspend), in which case the limits are reset on all CPUs except for the boot one. Fix this by making cpufreq_online() restore the limits when an inactive policy is brought online. The commit log and patch are inspired from Rafael's earlier work. Reported-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26cpufreq: Fix and clean up show_cpuinfo_cur_freq()Rafael J. Wysocki
commit 9b4f603e7a9f4282aec451063ffbbb8bb410dcd9 upstream. There is a missing newline in show_cpuinfo_cur_freq(), so add it, but while at it clean that function up somewhat too. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-14cpufreq: intel_pstate: Disable energy efficiency optimizationSrinivas Pandruvada
commit 6e978b22efa1db9f6e71b24440b5f1d93e968ee3 upstream. Some Kabylake desktop processors may not reach max turbo when running in HWP mode, even if running under sustained 100% utilization. This occurs when the HWP.EPP (Energy Performance Preference) is set to "balance_power" (0x80) -- the default on most systems. It occurs because the platform BIOS may erroneously enable an energy-efficiency setting -- MSR_IA32_POWER_CTL BIT-EE, which is not recommended to be enabled on this SKU. On the failing systems, this BIOS issue was not discovered when the desktop motherboard was tested with Windows, because the BIOS also neglects to provide the ACPI/CPPC table, that Windows requires to enable HWP, and so Windows runs in legacy P-state mode, where this setting has no effect. Linux' intel_pstate driver does not require ACPI/CPPC to enable HWP, and so it runs in HWP mode, exposing this incorrect BIOS configuration. There are several ways to address this problem. First, Linux can also run in legacy P-state mode on this system. As intel_pstate is how Linux enables HWP, booting with "intel_pstate=disable" will run in acpi-cpufreq/ondemand legacy p-state mode. Or second, the "performance" governor can be used with intel_pstate, which will modify HWP.EPP to 0. Or third, starting in 4.10, the /sys/devices/system/cpu/cpufreq/policy*/energy_performance_preference attribute in can be updated from "balance_power" to "performance". Or fourth, apply this patch, which fixes the erroneous setting of MSR_IA32_POWER_CTL BIT_EE on this model, allowing the default configuration to function as designed. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-19cpufreq: powernv: Disable preemption while checking CPU throttling stateDenis Kirjanov
commit 8a10c06a20ec8097a68fd7a4a1c0e285095b4d2f upstream. With preemption turned on we can read incorrect throttling state while being switched to CPU on a different chip. BUG: using smp_processor_id() in preemptible [00000000] code: cat/7343 caller is .powernv_cpufreq_throttle_check+0x2c/0x710 CPU: 13 PID: 7343 Comm: cat Not tainted 4.8.0-rc5-dirty #1 Call Trace: [c0000007d25b75b0] [c000000000971378] .dump_stack+0xe4/0x150 (unreliable) [c0000007d25b7640] [c0000000005162e4] .check_preemption_disabled+0x134/0x150 [c0000007d25b76e0] [c0000000007b63ac] .powernv_cpufreq_throttle_check+0x2c/0x710 [c0000007d25b7790] [c0000000007b6d18] .powernv_cpufreq_target_index+0x288/0x360 [c0000007d25b7870] [c0000000007acee4] .__cpufreq_driver_target+0x394/0x8c0 [c0000007d25b7920] [c0000000007b22ac] .cpufreq_set+0x7c/0xd0 [c0000007d25b79b0] [c0000000007adf50] .store_scaling_setspeed+0x80/0xc0 [c0000007d25b7a40] [c0000000007ae270] .store+0xa0/0x100 [c0000007d25b7ae0] [c0000000003566e8] .sysfs_kf_write+0x88/0xb0 [c0000007d25b7b70] [c0000000003553b8] .kernfs_fop_write+0x178/0x260 [c0000007d25b7c10] [c0000000002ac3cc] .__vfs_write+0x3c/0x1c0 [c0000007d25b7cf0] [c0000000002ad584] .vfs_write+0xc4/0x230 [c0000007d25b7d90] [c0000000002aeef8] .SyS_write+0x58/0x100 [c0000007d25b7e30] [c00000000000bfec] system_call+0x38/0xfc Fixes: 09a972d16209 (cpufreq: powernv: Report cpu frequency throttling) Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-06PM / OPP: Pass opp_table to dev_pm_opp_put_regulator()Stephen Boyd
commit 91291d9ad92faa65a56a9a19d658d8049b78d3d4 upstream. Joonyoung Shim reported an interesting problem on his ARM octa-core Odoroid-XU3 platform. During system suspend, dev_pm_opp_put_regulator() was failing for a struct device for which dev_pm_opp_set_regulator() is called earlier. This happened because an earlier call to dev_pm_opp_of_cpumask_remove_table() function (from cpufreq-dt.c file) removed all the entries from opp_table->dev_list apart from the last CPU device in the cpumask of CPUs sharing the OPP. But both dev_pm_opp_set_regulator() and dev_pm_opp_put_regulator() routines get CPU device for the first CPU in the cpumask. And so the OPP core failed to find the OPP table for the struct device. This patch attempts to fix this problem by returning a pointer to the opp_table from dev_pm_opp_set_regulator() and using that as the parameter to dev_pm_opp_put_regulator(). This ensures that the dev_pm_opp_put_regulator() doesn't fail to find the opp table. Note that similar design problem also exists with other dev_pm_opp_put_*() APIs, but those aren't used currently by anyone and so we don't need to update them for now. Reported-by: Joonyoung Shim <jy0922.shim@samsung.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ Viresh: Wrote commit log and tested on exynos 5250 ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-29Merge branches 'pm-cpufreq-fixes' and 'pm-sleep-fixes'Rafael J. Wysocki
* pm-cpufreq-fixes: cpufreq: intel_pstate: Always set max P-state in performance mode cpufreq: intel_pstate: Set P-state upfront in performance mode * pm-sleep-fixes: PM / suspend: Fix missing KERN_CONT for suspend message
2016-10-24cpufreq: intel_pstate: Always set max P-state in performance modeRafael J. Wysocki
The only times at which intel_pstate checks the policy set for a given CPU is the initialization of that CPU and updates of its policy settings from cpufreq when intel_pstate_set_policy() is invoked. That is insufficient, however, because intel_pstate uses the same P-state selection function for all CPUs regardless of the policy setting for each of them and the P-state limits are shared between them. Thus if the policy is set to "performance" for a particular CPU, it may not behave as expected if the cpufreq settings are changed subsequently for another CPU. That can be easily demonstrated by writing "performance" to scaling_governor for all CPUs and then switching it to "powersave" for one of them in which case all of the CPUs will behave as though their scaling_governor were all "powersave" (even though the policy still appears to be "performance" for the remaining CPUs). Fix this problem by modifying intel_pstate_adjust_busy_pstate() to always set the P-state to the maximum allowed by the current limits for all CPUs whose policy is set to "performance". Note that it still is recommended to always change the policy setting in the same way for all CPUs even with this fix applied to avoid confusion. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-21cpufreq: intel_pstate: Set P-state upfront in performance modeRafael J. Wysocki
After commit a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with utilization update callbacks) the cpufreq governor callbacks may not be invoked on NOHZ_FULL CPUs and, in particular, switching to the "performance" policy via sysfs may not have any effect on them. That is a problem, because it usually is desirable to squeeze the last bit of performance out of those CPUs, so work around it by setting the maximum P-state (within the limits) in intel_pstate_set_policy() upfront when the policy is CPUFREQ_POLICY_PERFORMANCE. Fixes: a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with utilization update callbacks) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-10-14Merge tag 'pm-extra-4.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "This includes a couple of fixes for cpufreq regressions introduced in 4.8, a rework of the intel_pstate algorithm used on Atom processors (that took some time to test) plus a fix and a couple of cleanups in that driver, a CPPC cpufreq driver fix, and a some devfreq fixes and cleanups (core and exynos-nocp). Specifics: - Fix two cpufreq regressions causing undesirable changes in behavior to appear (one in the core and one in the conservative governor) introduced during the 4.8 cycle (Aaro Koskinen, Rafael Wysocki). - Fix the way the intel_pstate driver accesses MSRs related to the hardware-managed P-states (HWP) feature during the initialization which currently is unsafe and may cause the processor to generate a general protection fault (Srinivas Pandruvada). - Rework the intel_pstate's P-state selection algorithm used on Atom processors to avoid known problems with the current one and to make the computation more straightforward, which also happens to improve performance in multiple benchmarks a bit (Rafael Wysocki). - Improve two comments in the intel_pstate driver (Rafael Wysocki). - Fix the desired performance computation in the CPPC cpufreq driver (Hoan Tran). - Fix the devfreq core to avoid printing misleading error messages in some cases (Tobias Jakobi). - Fix the error code path in devfreq_add_device() to use proper locking around list modifications (Axel Lin). - Fix a build failure and remove a couple of redundant updates of variables in the exynos-nocp devfreq driver (Axel Lin)" * tag 'pm-extra-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: CPPC: Correct desired_perf calculation cpufreq: conservative: Fix next frequency selection cpufreq: skip invalid entries when searching the frequency cpufreq: intel_pstate: Fix struct pstate_adjust_policy kerneldoc cpufreq: intel_pstate: Proportional algorithm for Atom PM / devfreq: Skip status update on uninitialized previous_freq PM / devfreq: Add proper locking around list_del() PM / devfreq: exynos-nocp: Remove redundant code PM / devfreq: exynos-nocp: Select REGMAP_MMIO cpufreq: intel_pstate: Clarify comment in get_target_pstate_use_performance() cpufreq: intel_pstate: Fix unsafe HWP MSR access
2016-10-13cpufreq: CPPC: Correct desired_perf calculationHoan Tran
The desired_perf is an abstract performance number. Its value should be in the range of [lowest perf, highest perf] of CPPC. The correct calculation is desired_perf = freq * cppc_highest_perf / cppc_dmi_max_khz And cppc_cpufreq_set_target() returns if desired_perf is exactly the same with the old perf. Signed-off-by: Hoan Tran <hotran@apm.com> Reviewed-by: Prashanth Prakash <pprakash@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-13cpufreq: conservative: Fix next frequency selectionRafael J. Wysocki
Commit d352cf47d93e (cpufreq: conservative: Do not use transition notifications) overlooked the case when the "frequency step" used by the conservative governor is small relative to the distances between the available frequencies and broke the algorithm by using policy->cur instead of the previously requested frequency when computing the next one. As a result, the governor may not be able to go outside of a narrow range between two consecutive available frequencies. Fix the problem by making the governor save the previously requested frequency and select the next one relative that value (unless it is out of range, in which case policy->cur will be used instead). Fixes: d352cf47d93e (cpufreq: conservative: Do not use transition notifications) Link: https://bugzilla.kernel.org/show_bug.cgi?id=177171 Reported-and-tested-by: Aleksey Rybalkin <aleksey@rybalkin.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.8+ <stable@vger.kernel.org> # 4.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-12cpufreq: intel_pstate: Fix struct pstate_adjust_policy kerneldocRafael J. Wysocki
It looks like the name of struct pstate_adjust_policy was updated without updating its kerneldoc comment accordingly, so fix that mistake. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-12cpufreq: intel_pstate: Proportional algorithm for AtomRafael J. Wysocki
The PID algorithm used by the intel_pstate driver tends to drive performance to the minimum for workloads with utilization below the setpoint, which is undesirable, so replace it with a modified "proportional" algorithm on Atom. The new algorithm will set the new P-state to be 1.25 times the available maximum times the (frequency-invariant) utilization during the previous sampling period except when the target P-state computed this way is lower than the average P-state during the previous sampling period. In the latter case, it will increase the target by 50% of the difference between it and the average P-state to prevent performance from dropping down too fast in some cases. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-10-09cpufreq: intel_pstate: Clarify comment in get_target_pstate_use_performance()Rafael J. Wysocki
Make the comment explaining the meaning of the perf_scaled variable in get_target_pstate_use_performance() more straightforward. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>