summaryrefslogtreecommitdiff
path: root/drivers/cpufreq/powernv-cpufreq.c
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2016-07-26 17:29:07 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2016-07-26 17:29:07 -0700
commit6453dbdda30428a3c56568c96fe70ea3612f07e2 (patch)
tree9a3c6087a2832c36e8c49296fb05f95b877e0111 /drivers/cpufreq/powernv-cpufreq.c
parent27b79027bc112a63ad4004eb83c6acacae08a0de (diff)
parentbc841e260c95608921809a2c7481cf6f03bec21a (diff)
Merge tag 'pm-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki: "Again, the majority of changes go into the cpufreq subsystem, but there are no big features this time. The cpufreq changes that stand out somewhat are the governor interface rework and improvements related to the handling of frequency tables. Apart from those, there are fixes and new device/CPU IDs in drivers, cleanups and an improvement of the new schedutil governor. Next, there are some changes in the hibernation core, including a fix for a nasty problem related to the MONITOR/MWAIT usage by CPU offline during resume from hibernation, a few core improvements related to memory management during resume, a couple of additional debug features and cleanups. Finally, we have some fixes and cleanups in the devfreq subsystem, generic power domains framework improvements related to system suspend/resume, support for some new chips in intel_idle and in the power capping RAPL driver, a new version of the AnalyzeSuspend utility and some assorted fixes and cleanups. Specifics: - Rework the cpufreq governor interface to make it more straightforward and modify the conservative governor to avoid using transition notifications (Rafael Wysocki). - Rework the handling of frequency tables by the cpufreq core to make it more efficient (Viresh Kumar). - Modify the schedutil governor to reduce the number of wakeups it causes to occur in cases when the CPU frequency doesn't need to be changed (Steve Muckle, Viresh Kumar). - Fix some minor issues and clean up code in the cpufreq core and governors (Rafael Wysocki, Viresh Kumar). - Add Intel Broxton support to the intel_pstate driver (Srinivas Pandruvada). - Fix problems related to the config TDP feature and to the validity of the MSR_HWP_INTERRUPT register in intel_pstate (Jan Kiszka, Srinivas Pandruvada). - Make intel_pstate update the cpu_frequency tracepoint even if the frequency doesn't change to avoid confusing powertop (Rafael Wysocki). - Clean up the usage of __init/__initdata in intel_pstate, mark some of its internal variables as __read_mostly and drop an unused structure element from it (Jisheng Zhang, Carsten Emde). - Clean up the usage of some duplicate MSR symbols in intel_pstate and turbostat (Srinivas Pandruvada). - Update/fix the powernv, s3c24xx and mvebu cpufreq drivers (Akshay Adiga, Viresh Kumar, Ben Dooks). - Fix a regression (introduced during the 4.5 cycle) in the pcc-cpufreq driver by reverting the problematic commit (Andreas Herrmann). - Add support for Intel Denverton to intel_idle, clean up Broxton support in it and make it explicitly non-modular (Jacob Pan, Jan Beulich, Paul Gortmaker). - Add support for Denverton and Ivy Bridge server to the Intel RAPL power capping driver and make it more careful about the handing of MSRs that may not be present (Jacob Pan, Xiaolong Wang). - Fix resume from hibernation on x86-64 by making the CPU offline during resume avoid using MONITOR/MWAIT in the "play dead" loop which may lead to an inadvertent "revival" of a "dead" CPU and a page fault leading to a kernel crash from it (Rafael Wysocki). - Make memory management during resume from hibernation more straightforward (Rafael Wysocki). - Add debug features that should help to detect problems related to hibernation and resume from it (Rafael Wysocki, Chen Yu). - Clean up hibernation core somewhat (Rafael Wysocki). - Prevent KASAN from instrumenting the hibernation core which leads to large numbers of false-positives from it (James Morse). - Prevent PM (hibernate and suspend) notifiers from being called during the cleanup phase if they have not been called during the corresponding preparation phase which is possible if one of the other notifiers returns an error at that time (Lianwei Wang). - Improve suspend-related debug printout in the tasks freezer and clean up suspend-related console handling (Roger Lu, Borislav Petkov). - Update the AnalyzeSuspend script in the kernel sources to version 4.2 (Todd Brandt). - Modify the generic power domains framework to make it handle system suspend/resume better (Ulf Hansson). - Make the runtime PM framework avoid resuming devices synchronously when user space changes the runtime PM settings for them and improve its error reporting (Rafael Wysocki, Linus Walleij). - Fix error paths in devfreq drivers (exynos, exynos-ppmu, exynos-bus) and in the core, make some devfreq code explicitly non-modular and change some of it into tristate (Bartlomiej Zolnierkiewicz, Peter Chen, Paul Gortmaker). - Add DT support to the generic PM clocks management code and make it export some more symbols (Jon Hunter, Paul Gortmaker). - Make the PCI PM core code slightly more robust against possible driver errors (Andy Shevchenko). - Make it possible to change DESTDIR and PREFIX in turbostat (Andy Shevchenko)" * tag 'pm-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits) Revert "cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency" PM / hibernate: Introduce test_resume mode for hibernation cpufreq: export cpufreq_driver_resolve_freq() cpufreq: Disallow ->resolve_freq() for drivers providing ->target_index() PCI / PM: check all fields in pci_set_platform_pm() cpufreq: acpi-cpufreq: use cached frequency mapping when possible cpufreq: schedutil: map raw required frequency to driver frequency cpufreq: add cpufreq_driver_resolve_freq() cpufreq: intel_pstate: Check cpuid for MSR_HWP_INTERRUPT intel_pstate: Update cpu_frequency tracepoint every time cpufreq: intel_pstate: clean remnant struct element PM / tools: scripts: AnalyzeSuspend v4.2 x86 / hibernate: Use hlt_play_dead() when resuming from hibernation cpufreq: powernv: Replacing pstate_id with frequency table index intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate() PM / hibernate: Image data protection during restoration PM / hibernate: Add missing braces in __register_nosave_region() PM / hibernate: Clean up comments in snapshot.c PM / hibernate: Clean up function headers in snapshot.c PM / hibernate: Add missing braces in hibernate_setup() ...
Diffstat (limited to 'drivers/cpufreq/powernv-cpufreq.c')
-rw-r--r--drivers/cpufreq/powernv-cpufreq.c181
1 files changed, 103 insertions, 78 deletions
diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index 6bd715b7f11c..87796e0864e9 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -64,12 +64,14 @@
/**
* struct global_pstate_info - Per policy data structure to maintain history of
* global pstates
- * @highest_lpstate: The local pstate from which we are ramping down
+ * @highest_lpstate_idx: The local pstate index from which we are
+ * ramping down
* @elapsed_time: Time in ms spent in ramping down from
- * highest_lpstate
+ * highest_lpstate_idx
* @last_sampled_time: Time from boot in ms when global pstates were
* last set
- * @last_lpstate,last_gpstate: Last set values for local and global pstates
+ * @last_lpstate_idx, Last set value of local pstate and global
+ * last_gpstate_idx pstate in terms of cpufreq table index
* @timer: Is used for ramping down if cpu goes idle for
* a long time with global pstate held high
* @gpstate_lock: A spinlock to maintain synchronization between
@@ -77,11 +79,11 @@
* governer's target_index calls
*/
struct global_pstate_info {
- int highest_lpstate;
+ int highest_lpstate_idx;
unsigned int elapsed_time;
unsigned int last_sampled_time;
- int last_lpstate;
- int last_gpstate;
+ int last_lpstate_idx;
+ int last_gpstate_idx;
spinlock_t gpstate_lock;
struct timer_list timer;
};
@@ -124,29 +126,47 @@ static int nr_chips;
static DEFINE_PER_CPU(struct chip *, chip_info);
/*
- * Note: The set of pstates consists of contiguous integers, the
- * smallest of which is indicated by powernv_pstate_info.min, the
- * largest of which is indicated by powernv_pstate_info.max.
+ * Note:
+ * The set of pstates consists of contiguous integers.
+ * powernv_pstate_info stores the index of the frequency table for
+ * max, min and nominal frequencies. It also stores number of
+ * available frequencies.
*
- * The nominal pstate is the highest non-turbo pstate in this
- * platform. This is indicated by powernv_pstate_info.nominal.
+ * powernv_pstate_info.nominal indicates the index to the highest
+ * non-turbo frequency.
*/
static struct powernv_pstate_info {
- int min;
- int max;
- int nominal;
- int nr_pstates;
+ unsigned int min;
+ unsigned int max;
+ unsigned int nominal;
+ unsigned int nr_pstates;
} powernv_pstate_info;
+/* Use following macros for conversions between pstate_id and index */
+static inline int idx_to_pstate(unsigned int i)
+{
+ return powernv_freqs[i].driver_data;
+}
+
+static inline unsigned int pstate_to_idx(int pstate)
+{
+ /*
+ * abs() is deliberately used so that is works with
+ * both monotonically increasing and decreasing
+ * pstate values
+ */
+ return abs(pstate - idx_to_pstate(powernv_pstate_info.max));
+}
+
static inline void reset_gpstates(struct cpufreq_policy *policy)
{
struct global_pstate_info *gpstates = policy->driver_data;
- gpstates->highest_lpstate = 0;
+ gpstates->highest_lpstate_idx = 0;
gpstates->elapsed_time = 0;
gpstates->last_sampled_time = 0;
- gpstates->last_lpstate = 0;
- gpstates->last_gpstate = 0;
+ gpstates->last_lpstate_idx = 0;
+ gpstates->last_gpstate_idx = 0;
}
/*
@@ -156,9 +176,10 @@ static inline void reset_gpstates(struct cpufreq_policy *policy)
static int init_powernv_pstates(void)
{
struct device_node *power_mgt;
- int i, pstate_min, pstate_max, pstate_nominal, nr_pstates = 0;
+ int i, nr_pstates = 0;
const __be32 *pstate_ids, *pstate_freqs;
u32 len_ids, len_freqs;
+ u32 pstate_min, pstate_max, pstate_nominal;
power_mgt = of_find_node_by_path("/ibm,opal/power-mgt");
if (!power_mgt) {
@@ -208,6 +229,7 @@ static int init_powernv_pstates(void)
return -ENODEV;
}
+ powernv_pstate_info.nr_pstates = nr_pstates;
pr_debug("NR PStates %d\n", nr_pstates);
for (i = 0; i < nr_pstates; i++) {
u32 id = be32_to_cpu(pstate_ids[i]);
@@ -216,15 +238,17 @@ static int init_powernv_pstates(void)
pr_debug("PState id %d freq %d MHz\n", id, freq);
powernv_freqs[i].frequency = freq * 1000; /* kHz */
powernv_freqs[i].driver_data = id;
+
+ if (id == pstate_max)
+ powernv_pstate_info.max = i;
+ else if (id == pstate_nominal)
+ powernv_pstate_info.nominal = i;
+ else if (id == pstate_min)
+ powernv_pstate_info.min = i;
}
+
/* End of list marker entry */
powernv_freqs[i].frequency = CPUFREQ_TABLE_END;
-
- powernv_pstate_info.min = pstate_min;
- powernv_pstate_info.max = pstate_max;
- powernv_pstate_info.nominal = pstate_nominal;
- powernv_pstate_info.nr_pstates = nr_pstates;
-
return 0;
}
@@ -233,12 +257,12 @@ static unsigned int pstate_id_to_freq(int pstate_id)
{
int i;
- i = powernv_pstate_info.max - pstate_id;
+ i = pstate_to_idx(pstate_id);
if (i >= powernv_pstate_info.nr_pstates || i < 0) {
pr_warn("PState id %d outside of PState table, "
"reporting nominal id %d instead\n",
- pstate_id, powernv_pstate_info.nominal);
- i = powernv_pstate_info.max - powernv_pstate_info.nominal;
+ pstate_id, idx_to_pstate(powernv_pstate_info.nominal));
+ i = powernv_pstate_info.nominal;
}
return powernv_freqs[i].frequency;
@@ -252,7 +276,7 @@ static ssize_t cpuinfo_nominal_freq_show(struct cpufreq_policy *policy,
char *buf)
{
return sprintf(buf, "%u\n",
- pstate_id_to_freq(powernv_pstate_info.nominal));
+ powernv_freqs[powernv_pstate_info.nominal].frequency);
}
struct freq_attr cpufreq_freq_attr_cpuinfo_nominal_freq =
@@ -426,7 +450,7 @@ static void set_pstate(void *data)
*/
static inline unsigned int get_nominal_index(void)
{
- return powernv_pstate_info.max - powernv_pstate_info.nominal;
+ return powernv_pstate_info.nominal;
}
static void powernv_cpufreq_throttle_check(void *data)
@@ -435,20 +459,22 @@ static void powernv_cpufreq_throttle_check(void *data)
unsigned int cpu = smp_processor_id();
unsigned long pmsr;
int pmsr_pmax;
+ unsigned int pmsr_pmax_idx;
pmsr = get_pmspr(SPRN_PMSR);
chip = this_cpu_read(chip_info);
/* Check for Pmax Capping */
pmsr_pmax = (s8)PMSR_MAX(pmsr);
- if (pmsr_pmax != powernv_pstate_info.max) {
+ pmsr_pmax_idx = pstate_to_idx(pmsr_pmax);
+ if (pmsr_pmax_idx != powernv_pstate_info.max) {
if (chip->throttled)
goto next;
chip->throttled = true;
- if (pmsr_pmax < powernv_pstate_info.nominal) {
- pr_warn_once("CPU %d on Chip %u has Pmax reduced below nominal frequency (%d < %d)\n",
+ if (pmsr_pmax_idx > powernv_pstate_info.nominal) {
+ pr_warn_once("CPU %d on Chip %u has Pmax(%d) reduced below nominal frequency(%d)\n",
cpu, chip->id, pmsr_pmax,
- powernv_pstate_info.nominal);
+ idx_to_pstate(powernv_pstate_info.nominal));
chip->throttle_sub_turbo++;
} else {
chip->throttle_turbo++;
@@ -484,34 +510,35 @@ next:
/**
* calc_global_pstate - Calculate global pstate
- * @elapsed_time: Elapsed time in milliseconds
- * @local_pstate: New local pstate
- * @highest_lpstate: pstate from which its ramping down
+ * @elapsed_time: Elapsed time in milliseconds
+ * @local_pstate_idx: New local pstate
+ * @highest_lpstate_idx: pstate from which its ramping down
*
* Finds the appropriate global pstate based on the pstate from which its
* ramping down and the time elapsed in ramping down. It follows a quadratic
* equation which ensures that it reaches ramping down to pmin in 5sec.
*/
static inline int calc_global_pstate(unsigned int elapsed_time,
- int highest_lpstate, int local_pstate)
+ int highest_lpstate_idx,
+ int local_pstate_idx)
{
- int pstate_diff;
+ int index_diff;
/*
* Using ramp_down_percent we get the percentage of rampdown
* that we are expecting to be dropping. Difference between
- * highest_lpstate and powernv_pstate_info.min will give a absolute
+ * highest_lpstate_idx and powernv_pstate_info.min will give a absolute
* number of how many pstates we will drop eventually by the end of
* 5 seconds, then just scale it get the number pstates to be dropped.
*/
- pstate_diff = ((int)ramp_down_percent(elapsed_time) *
- (highest_lpstate - powernv_pstate_info.min)) / 100;
+ index_diff = ((int)ramp_down_percent(elapsed_time) *
+ (powernv_pstate_info.min - highest_lpstate_idx)) / 100;
/* Ensure that global pstate is >= to local pstate */
- if (highest_lpstate - pstate_diff < local_pstate)
- return local_pstate;
+ if (highest_lpstate_idx + index_diff >= local_pstate_idx)
+ return local_pstate_idx;
else
- return highest_lpstate - pstate_diff;
+ return highest_lpstate_idx + index_diff;
}
static inline void queue_gpstate_timer(struct global_pstate_info *gpstates)
@@ -546,7 +573,7 @@ void gpstate_timer_handler(unsigned long data)
{
struct cpufreq_policy *policy = (struct cpufreq_policy *)data;
struct global_pstate_info *gpstates = policy->driver_data;
- int gpstate_id;
+ int gpstate_idx;
unsigned int time_diff = jiffies_to_msecs(jiffies)
- gpstates->last_sampled_time;
struct powernv_smp_call_data freq_data;
@@ -556,29 +583,29 @@ void gpstate_timer_handler(unsigned long data)
gpstates->last_sampled_time += time_diff;
gpstates->elapsed_time += time_diff;
- freq_data.pstate_id = gpstates->last_lpstate;
+ freq_data.pstate_id = idx_to_pstate(gpstates->last_lpstate_idx);
- if ((gpstates->last_gpstate == freq_data.pstate_id) ||
+ if ((gpstates->last_gpstate_idx == gpstates->last_lpstate_idx) ||
(gpstates->elapsed_time > MAX_RAMP_DOWN_TIME)) {
- gpstate_id = freq_data.pstate_id;
+ gpstate_idx = pstate_to_idx(freq_data.pstate_id);
reset_gpstates(policy);
- gpstates->highest_lpstate = freq_data.pstate_id;
+ gpstates->highest_lpstate_idx = gpstate_idx;
} else {
- gpstate_id = calc_global_pstate(gpstates->elapsed_time,
- gpstates->highest_lpstate,
- freq_data.pstate_id);
+ gpstate_idx = calc_global_pstate(gpstates->elapsed_time,
+ gpstates->highest_lpstate_idx,
+ freq_data.pstate_id);
}
/*
* If local pstate is equal to global pstate, rampdown is over
* So timer is not required to be queued.
*/
- if (gpstate_id != freq_data.pstate_id)
+ if (gpstate_idx != gpstates->last_lpstate_idx)
queue_gpstate_timer(gpstates);
- freq_data.gpstate_id = gpstate_id;
- gpstates->last_gpstate = freq_data.gpstate_id;
- gpstates->last_lpstate = freq_data.pstate_id;
+ freq_data.gpstate_id = idx_to_pstate(gpstate_idx);
+ gpstates->last_gpstate_idx = pstate_to_idx(freq_data.gpstate_id);
+ gpstates->last_lpstate_idx = pstate_to_idx(freq_data.pstate_id);
spin_unlock(&gpstates->gpstate_lock);
@@ -595,7 +622,7 @@ static int powernv_cpufreq_target_index(struct cpufreq_policy *policy,
unsigned int new_index)
{
struct powernv_smp_call_data freq_data;
- unsigned int cur_msec, gpstate_id;
+ unsigned int cur_msec, gpstate_idx;
struct global_pstate_info *gpstates = policy->driver_data;
if (unlikely(rebooting) && new_index != get_nominal_index())
@@ -607,15 +634,15 @@ static int powernv_cpufreq_target_index(struct cpufreq_policy *policy,
cur_msec = jiffies_to_msecs(get_jiffies_64());
spin_lock(&gpstates->gpstate_lock);
- freq_data.pstate_id = powernv_freqs[new_index].driver_data;
+ freq_data.pstate_id = idx_to_pstate(new_index);
if (!gpstates->last_sampled_time) {
- gpstate_id = freq_data.pstate_id;
- gpstates->highest_lpstate = freq_data.pstate_id;
+ gpstate_idx = new_index;
+ gpstates->highest_lpstate_idx = new_index;
goto gpstates_done;
}
- if (gpstates->last_gpstate > freq_data.pstate_id) {
+ if (gpstates->last_gpstate_idx < new_index) {
gpstates->elapsed_time += cur_msec -
gpstates->last_sampled_time;
@@ -626,34 +653,34 @@ static int powernv_cpufreq_target_index(struct cpufreq_policy *policy,
*/
if (gpstates->elapsed_time > MAX_RAMP_DOWN_TIME) {
reset_gpstates(policy);
- gpstates->highest_lpstate = freq_data.pstate_id;
- gpstate_id = freq_data.pstate_id;
+ gpstates->highest_lpstate_idx = new_index;
+ gpstate_idx = new_index;
} else {
/* Elaspsed_time is less than 5 seconds, continue to rampdown */
- gpstate_id = calc_global_pstate(gpstates->elapsed_time,
- gpstates->highest_lpstate,
- freq_data.pstate_id);
+ gpstate_idx = calc_global_pstate(gpstates->elapsed_time,
+ gpstates->highest_lpstate_idx,
+ new_index);
}
} else {
reset_gpstates(policy);
- gpstates->highest_lpstate = freq_data.pstate_id;
- gpstate_id = freq_data.pstate_id;
+ gpstates->highest_lpstate_idx = new_index;
+ gpstate_idx = new_index;
}
/*
* If local pstate is equal to global pstate, rampdown is over
* So timer is not required to be queued.
*/
- if (gpstate_id != freq_data.pstate_id)
+ if (gpstate_idx != new_index)
queue_gpstate_timer(gpstates);
else
del_timer_sync(&gpstates->timer);
gpstates_done:
- freq_data.gpstate_id = gpstate_id;
+ freq_data.gpstate_id = idx_to_pstate(gpstate_idx);
gpstates->last_sampled_time = cur_msec;
- gpstates->last_gpstate = freq_data.gpstate_id;
- gpstates->last_lpstate = freq_data.pstate_id;
+ gpstates->last_gpstate_idx = gpstate_idx;
+ gpstates->last_lpstate_idx = new_index;
spin_unlock(&gpstates->gpstate_lock);
@@ -759,9 +786,7 @@ void powernv_cpufreq_work_fn(struct work_struct *work)
struct cpufreq_policy policy;
cpufreq_get_policy(&policy, cpu);
- cpufreq_frequency_table_target(&policy, policy.freq_table,
- policy.cur,
- CPUFREQ_RELATION_C, &index);
+ index = cpufreq_table_find_index_c(&policy, policy.cur);
powernv_cpufreq_target_index(&policy, index);
cpumask_andnot(&mask, &mask, policy.cpus);
}
@@ -847,8 +872,8 @@ static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy)
struct powernv_smp_call_data freq_data;
struct global_pstate_info *gpstates = policy->driver_data;
- freq_data.pstate_id = powernv_pstate_info.min;
- freq_data.gpstate_id = powernv_pstate_info.min;
+ freq_data.pstate_id = idx_to_pstate(powernv_pstate_info.min);
+ freq_data.gpstate_id = idx_to_pstate(powernv_pstate_info.min);
smp_call_function_single(policy->cpu, set_pstate, &freq_data, 1);
del_timer_sync(&gpstates->timer);
}