summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2016-12-13 10:41:53 -0800
committerLinus Torvalds <torvalds@linux-foundation.org>2016-12-13 10:41:53 -0800
commit7b9dc3f75fc8be046e76387a22a21f421ce55b53 (patch)
treedd42312eebdcb5273461b304384d49a7e7e5fa73 /Documentation
parent36869cb93d36269f34800b3384ba7991060a69cf (diff)
parentbbc17bb8a89b3eb31520abf3a9b362d5ee54f908 (diff)
Merge tag 'pm-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki: "Again, cpufreq gets more changes than the other parts this time (one new driver, one old driver less, a bunch of enhancements of the existing code, new CPU IDs, fixes, cleanups) There also are some changes in cpuidle (idle injection rework, a couple of new CPU IDs, online/offline rework in intel_idle, fixes and cleanups), in the generic power domains framework (mostly related to supporting power domains containing CPUs), and in the Operating Performance Points (OPP) library (mostly related to supporting devices with multiple voltage regulators) In addition to that, the system sleep state selection interface is modified to make it easier for distributions with unchanged user space to support suspend-to-idle as the default system suspend method, some issues are fixed in the PM core, the latency tolerance PM QoS framework is improved a bit, the Intel RAPL power capping driver is cleaned up and there are some fixes and cleanups in the devfreq subsystem Specifics: - New cpufreq driver for Broadcom STB SoCs and a Device Tree binding for it (Markus Mayer) - Support for ARM Integrator/AP and Integrator/CP in the generic DT cpufreq driver and elimination of the old Integrator cpufreq driver (Linus Walleij) - Support for the zx296718, r8a7743 and r8a7745, Socionext UniPhier, and PXA SoCs in the the generic DT cpufreq driver (Baoyou Xie, Geert Uytterhoeven, Masahiro Yamada, Robert Jarzmik) - cpufreq core fix to eliminate races that may lead to using inactive policy objects and related cleanups (Rafael Wysocki) - cpufreq schedutil governor update to make it use SCHED_FIFO kernel threads (instead of regular workqueues) for doing delayed work (to reduce the response latency in some cases) and related cleanups (Viresh Kumar) - New cpufreq sysfs attribute for resetting statistics (Markus Mayer) - cpufreq governors fixes and cleanups (Chen Yu, Stratos Karafotis, Viresh Kumar) - Support for using generic cpufreq governors in the intel_pstate driver (Rafael Wysocki) - Support for per-logical-CPU P-state limits and the EPP/EPB (Energy Performance Preference/Energy Performance Bias) knobs in the intel_pstate driver (Srinivas Pandruvada) - New CPU ID for Knights Mill in intel_pstate (Piotr Luc) - intel_pstate driver modification to use the P-state selection algorithm based on CPU load on platforms with the system profile in the ACPI tables set to "mobile" (Srinivas Pandruvada) - intel_pstate driver cleanups (Arnd Bergmann, Rafael Wysocki, Srinivas Pandruvada) - cpufreq powernv driver updates including fast switching support (for the schedutil governor), fixes and cleanus (Akshay Adiga, Andrew Donnellan, Denis Kirjanov) - acpi-cpufreq driver rework to switch it over to the new CPU offline/online state machine (Sebastian Andrzej Siewior) - Assorted cleanups in cpufreq drivers (Wei Yongjun, Prashanth Prakash) - Idle injection rework (to make it use the regular idle path instead of a home-grown custom one) and related powerclamp thermal driver updates (Peter Zijlstra, Jacob Pan, Petr Mladek, Sebastian Andrzej Siewior) - New CPU IDs for Atom Z34xx and Knights Mill in intel_idle (Andy Shevchenko, Piotr Luc) - intel_idle driver cleanups and switch over to using the new CPU offline/online state machine (Anna-Maria Gleixner, Sebastian Andrzej Siewior) - cpuidle DT driver update to support suspend-to-idle properly (Sudeep Holla) - cpuidle core cleanups and misc updates (Daniel Lezcano, Pan Bian, Rafael Wysocki) - Preliminary support for power domains including CPUs in the generic power domains (genpd) framework and related DT bindings (Lina Iyer) - Assorted fixes and cleanups in the generic power domains (genpd) framework (Colin Ian King, Dan Carpenter, Geert Uytterhoeven) - Preliminary support for devices with multiple voltage regulators and related fixes and cleanups in the Operating Performance Points (OPP) library (Viresh Kumar, Masahiro Yamada, Stephen Boyd) - System sleep state selection interface rework to make it easier to support suspend-to-idle as the default system suspend method (Rafael Wysocki) - PM core fixes and cleanups, mostly related to the interactions between the system suspend and runtime PM frameworks (Ulf Hansson, Sahitya Tummala, Tony Lindgren) - Latency tolerance PM QoS framework imorovements (Andrew Lutomirski) - New Knights Mill CPU ID for the Intel RAPL power capping driver (Piotr Luc) - Intel RAPL power capping driver fixes, cleanups and switch over to using the new CPU offline/online state machine (Jacob Pan, Thomas Gleixner, Sebastian Andrzej Siewior) - Fixes and cleanups in the exynos-ppmu, exynos-nocp, rk3399_dmc, rockchip-dfi devfreq drivers and the devfreq core (Axel Lin, Chanwoo Choi, Javier Martinez Canillas, MyungJoo Ham, Viresh Kumar) - Fix for false-positive KASAN warnings during resume from ACPI S3 (suspend-to-RAM) on x86 (Josh Poimboeuf) - Memory map verification during resume from hibernation on x86 to ensure a consistent address space layout (Chen Yu) - Wakeup sources debugging enhancement (Xing Wei) - rockchip-io AVS driver cleanup (Shawn Lin)" * tag 'pm-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (127 commits) devfreq: rk3399_dmc: Don't use OPP structures outside of RCU locks devfreq: rk3399_dmc: Remove dangling rcu_read_unlock() devfreq: exynos: Don't use OPP structures outside of RCU locks Documentation: intel_pstate: Document HWP energy/performance hints cpufreq: intel_pstate: Support for energy performance hints with HWP cpufreq: intel_pstate: Add locking around HWP requests PM / sleep: Print active wakeup sources when blocking on wakeup_count reads PM / core: Fix bug in the error handling of async suspend PM / wakeirq: Fix dedicated wakeirq for drivers not using autosuspend PM / Domains: Fix compatible for domain idle state PM / OPP: Don't WARN on multiple calls to dev_pm_opp_set_regulators() PM / OPP: Allow platform specific custom set_opp() callbacks PM / OPP: Separate out _generic_set_opp() PM / OPP: Add infrastructure to manage multiple regulators PM / OPP: Pass struct dev_pm_opp_supply to _set_opp_voltage() PM / OPP: Manage supply's voltage/current in a separate structure PM / OPP: Don't use OPP structure outside of rcu protected section PM / OPP: Reword binding supporting multiple regulators per device PM / OPP: Fix incorrect cpu-supply property in binding cpuidle: Add a kerneldoc comment to cpuidle_use_deepest_state() ..
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-power45
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt22
-rw-r--r--Documentation/cpu-freq/cpufreq-stats.txt6
-rw-r--r--Documentation/cpu-freq/intel-pstate.txt54
-rw-r--r--Documentation/devicetree/bindings/cpufreq/brcm,stb-avs-cpu-freq.txt78
-rw-r--r--Documentation/devicetree/bindings/opp/opp.txt27
-rw-r--r--Documentation/devicetree/bindings/power/domain-idle-state.txt33
-rw-r--r--Documentation/devicetree/bindings/power/power_domain.txt43
-rw-r--r--Documentation/power/devices.txt14
-rw-r--r--Documentation/power/states.txt62
10 files changed, 321 insertions, 63 deletions
diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power
index 50b368d490b5..f523e5a3ac33 100644
--- a/Documentation/ABI/testing/sysfs-power
+++ b/Documentation/ABI/testing/sysfs-power
@@ -7,30 +7,35 @@ Description:
subsystem.
What: /sys/power/state
-Date: May 2014
+Date: November 2016
Contact: Rafael J. Wysocki <rjw@rjwysocki.net>
Description:
The /sys/power/state file controls system sleep states.
Reading from this file returns the available sleep state
- labels, which may be "mem", "standby", "freeze" and "disk"
- (hibernation). The meanings of the first three labels depend on
- the relative_sleep_states command line argument as follows:
- 1) relative_sleep_states = 1
- "mem", "standby", "freeze" represent non-hibernation sleep
- states from the deepest ("mem", always present) to the
- shallowest ("freeze"). "standby" and "freeze" may or may
- not be present depending on the capabilities of the
- platform. "freeze" can only be present if "standby" is
- present.
- 2) relative_sleep_states = 0 (default)
- "mem" - "suspend-to-RAM", present if supported.
- "standby" - "power-on suspend", present if supported.
- "freeze" - "suspend-to-idle", always present.
-
- Writing to this file one of these strings causes the system to
- transition into the corresponding state, if available. See
- Documentation/power/states.txt for a description of what
- "suspend-to-RAM", "power-on suspend" and "suspend-to-idle" mean.
+ labels, which may be "mem" (suspend), "standby" (power-on
+ suspend), "freeze" (suspend-to-idle) and "disk" (hibernation).
+
+ Writing one of the above strings to this file causes the system
+ to transition into the corresponding state, if available.
+
+ See Documentation/power/states.txt for more information.
+
+What: /sys/power/mem_sleep
+Date: November 2016
+Contact: Rafael J. Wysocki <rjw@rjwysocki.net>
+Description:
+ The /sys/power/mem_sleep file controls the operating mode of
+ system suspend. Reading from it returns the available modes
+ as "s2idle" (always present), "shallow" and "deep" (present if
+ supported). The mode that will be used on subsequent attempts
+ to suspend the system (by writing "mem" to the /sys/power/state
+ file described above) is enclosed in square brackets.
+
+ Writing one of the above strings to this file causes the mode
+ represented by it to be used on subsequent attempts to suspend
+ the system.
+
+ See Documentation/power/states.txt for more information.
What: /sys/power/disk
Date: September 2006
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 62d68b2056de..be2d6d0a03a4 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1560,6 +1560,12 @@
disable
Do not enable intel_pstate as the default
scaling driver for the supported processors
+ passive
+ Use intel_pstate as a scaling driver, but configure it
+ to work with generic cpufreq governors (instead of
+ enabling its internal governor). This mode cannot be
+ used along with the hardware-managed P-states (HWP)
+ feature.
force
Enable intel_pstate on systems that prohibit it by default
in favor of acpi-cpufreq. Forcing the intel_pstate driver
@@ -1580,6 +1586,9 @@
Description Table, specifies preferred power management
profile as "Enterprise Server" or "Performance Server",
then this feature is turned on by default.
+ per_cpu_perf_limits
+ Allow per-logical-CPU P-State performance control limits using
+ cpufreq sysfs interface
intremap= [X86-64, Intel-IOMMU]
on enable Interrupt Remapping (default)
@@ -2122,6 +2131,12 @@
memory contents and reserves bad memory
regions that are detected.
+ mem_sleep_default= [SUSPEND] Default system suspend mode:
+ s2idle - Suspend-To-Idle
+ shallow - Power-On Suspend or equivalent (if supported)
+ deep - Suspend-To-RAM or equivalent (if supported)
+ See Documentation/power/states.txt.
+
meye.*= [HW] Set MotionEye Camera parameters
See Documentation/video4linux/meye.txt.
@@ -3475,13 +3490,6 @@
[KNL, SMP] Set scheduler's default relax_domain_level.
See Documentation/cgroup-v1/cpusets.txt.
- relative_sleep_states=
- [SUSPEND] Use sleep state labeling where the deepest
- state available other than hibernation is always "mem".
- Format: { "0" | "1" }
- 0 -- Traditional sleep state labels.
- 1 -- Relative sleep state labels.
-
reserve= [KNL,BUGS] Force the kernel to ignore some iomem area
reservetop= [X86-32]
diff --git a/Documentation/cpu-freq/cpufreq-stats.txt b/Documentation/cpu-freq/cpufreq-stats.txt
index 8d9773f23550..3c355f6ad834 100644
--- a/Documentation/cpu-freq/cpufreq-stats.txt
+++ b/Documentation/cpu-freq/cpufreq-stats.txt
@@ -44,11 +44,17 @@ the stats driver insertion.
total 0
drwxr-xr-x 2 root root 0 May 14 16:06 .
drwxr-xr-x 3 root root 0 May 14 15:58 ..
+--w------- 1 root root 4096 May 14 16:06 reset
-r--r--r-- 1 root root 4096 May 14 16:06 time_in_state
-r--r--r-- 1 root root 4096 May 14 16:06 total_trans
-r--r--r-- 1 root root 4096 May 14 16:06 trans_table
--------------------------------------------------------------------------------
+- reset
+Write-only attribute that can be used to reset the stat counters. This can be
+useful for evaluating system behaviour under different governors without the
+need for a reboot.
+
- time_in_state
This gives the amount of time spent in each of the frequencies supported by
this CPU. The cat output will have "<frequency> <time>" pair in each line, which
diff --git a/Documentation/cpu-freq/intel-pstate.txt b/Documentation/cpu-freq/intel-pstate.txt
index e6bd1e6512a5..1953994ef5e6 100644
--- a/Documentation/cpu-freq/intel-pstate.txt
+++ b/Documentation/cpu-freq/intel-pstate.txt
@@ -48,7 +48,7 @@ In addition to the frequency-controlling interfaces provided by the cpufreq
core, the driver provides its own sysfs files to control the P-State selection.
These files have been added to /sys/devices/system/cpu/intel_pstate/.
Any changes made to these files are applicable to all CPUs (even in a
-multi-package system).
+multi-package system, Refer to later section on placing "Per-CPU limits").
max_perf_pct: Limits the maximum P-State that will be requested by
the driver. It states it as a percentage of the available performance. The
@@ -120,13 +120,57 @@ frequency is fictional for Intel Core processors. Even if the scaling
driver selects a single P-State, the actual frequency the processor
will run at is selected by the processor itself.
+Per-CPU limits
+
+The kernel command line option "intel_pstate=per_cpu_perf_limits" forces
+the intel_pstate driver to use per-CPU performance limits. When it is set,
+the sysfs control interface described above is subject to limitations.
+- The following controls are not available for both read and write
+ /sys/devices/system/cpu/intel_pstate/max_perf_pct
+ /sys/devices/system/cpu/intel_pstate/min_perf_pct
+- The following controls can be used to set performance limits, as far as the
+architecture of the processor permits:
+ /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
+ /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
+ /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
+- User can still observe turbo percent and number of P-States from
+ /sys/devices/system/cpu/intel_pstate/turbo_pct
+ /sys/devices/system/cpu/intel_pstate/num_pstates
+- User can read write system wide turbo status
+ /sys/devices/system/cpu/no_turbo
+
+Support of energy performance hints
+It is possible to provide hints to the HWP algorithms in the processor
+to be more performance centric to more energy centric. When the driver
+is using HWP, two additional cpufreq sysfs attributes are presented for
+each logical CPU.
+These attributes are:
+ - energy_performance_available_preferences
+ - energy_performance_preference
+
+To get list of supported hints:
+$ cat energy_performance_available_preferences
+ default performance balance_performance balance_power power
+
+The current preference can be read or changed via cpufreq sysfs
+attribute "energy_performance_preference". Reading from this attribute
+will display current effective setting. User can write any of the valid
+preference string to this attribute. User can always restore to power-on
+default by writing "default".
+
+Since threads can migrate to different CPUs, this is possible that the
+new CPU may have different energy performance preference than the previous
+one. To avoid such issues, either threads can be pinned to specific CPUs
+or set the same energy performance preference value to all CPUs.
+
Tuning Intel P-State driver
-When HWP mode is not used, debugfs files have also been added to allow the
-tuning of the internal governor algorithm. These files are located at
-/sys/kernel/debug/pstate_snb/. The algorithm uses a PID (Proportional
-Integral Derivative) controller. The PID tunable parameters are:
+When the performance can be tuned using PID (Proportional Integral
+Derivative) controller, debugfs files are provided for adjusting performance.
+They are presented under:
+/sys/kernel/debug/pstate_snb/
+The PID tunable parameters are:
deadband
d_gain_pct
i_gain_pct
diff --git a/Documentation/devicetree/bindings/cpufreq/brcm,stb-avs-cpu-freq.txt b/Documentation/devicetree/bindings/cpufreq/brcm,stb-avs-cpu-freq.txt
new file mode 100644
index 000000000000..af2385795d78
--- /dev/null
+++ b/Documentation/devicetree/bindings/cpufreq/brcm,stb-avs-cpu-freq.txt
@@ -0,0 +1,78 @@
+Broadcom AVS mail box and interrupt register bindings
+=====================================================
+
+A total of three DT nodes are required. One node (brcm,avs-cpu-data-mem)
+references the mailbox register used to communicate with the AVS CPU[1]. The
+second node (brcm,avs-cpu-l2-intr) is required to trigger an interrupt on
+the AVS CPU. The interrupt tells the AVS CPU that it needs to process a
+command sent to it by a driver. Interrupting the AVS CPU is mandatory for
+commands to be processed.
+
+The interface also requires a reference to the AVS host interrupt controller,
+so a driver can react to interrupts generated by the AVS CPU whenever a command
+has been processed. See [2] for more information on the brcm,l2-intc node.
+
+[1] The AVS CPU is an independent co-processor that runs proprietary
+firmware. On some SoCs, this firmware supports DFS and DVFS in addition to
+Adaptive Voltage Scaling.
+
+[2] Documentation/devicetree/bindings/interrupt-controller/brcm,l2-intc.txt
+
+
+Node brcm,avs-cpu-data-mem
+--------------------------
+
+Required properties:
+- compatible: must include: brcm,avs-cpu-data-mem and
+ should include: one of brcm,bcm7271-avs-cpu-data-mem or
+ brcm,bcm7268-avs-cpu-data-mem
+- reg: Specifies base physical address and size of the registers.
+- interrupts: The interrupt that the AVS CPU will use to interrupt the host
+ when a command completed.
+- interrupt-parent: The interrupt controller the above interrupt is routed
+ through.
+- interrupt-names: The name of the interrupt used to interrupt the host.
+
+Optional properties:
+- None
+
+Node brcm,avs-cpu-l2-intr
+-------------------------
+
+Required properties:
+- compatible: must include: brcm,avs-cpu-l2-intr and
+ should include: one of brcm,bcm7271-avs-cpu-l2-intr or
+ brcm,bcm7268-avs-cpu-l2-intr
+- reg: Specifies base physical address and size of the registers.
+
+Optional properties:
+- None
+
+
+Example
+=======
+
+ avs_host_l2_intc: interrupt-controller@f04d1200 {
+ #interrupt-cells = <1>;
+ compatible = "brcm,l2-intc";
+ interrupt-parent = <&intc>;
+ reg = <0xf04d1200 0x48>;
+ interrupt-controller;
+ interrupts = <0x0 0x19 0x0>;
+ interrupt-names = "avs";
+ };
+
+ avs-cpu-data-mem@f04c4000 {
+ compatible = "brcm,bcm7271-avs-cpu-data-mem",
+ "brcm,avs-cpu-data-mem";
+ reg = <0xf04c4000 0x60>;
+ interrupts = <0x1a>;
+ interrupt-parent = <&avs_host_l2_intc>;
+ interrupt-names = "sw_intr";
+ };
+
+ avs-cpu-l2-intr@f04d1100 {
+ compatible = "brcm,bcm7271-avs-cpu-l2-intr",
+ "brcm,avs-cpu-l2-intr";
+ reg = <0xf04d1100 0x10>;
+ };
diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt
index ee91cbdd95ee..9f5ca4457b5f 100644
--- a/Documentation/devicetree/bindings/opp/opp.txt
+++ b/Documentation/devicetree/bindings/opp/opp.txt
@@ -86,8 +86,14 @@ Optional properties:
Single entry is for target voltage and three entries are for <target min max>
voltages.
- Entries for multiple regulators must be present in the same order as
- regulators are specified in device's DT node.
+ Entries for multiple regulators shall be provided in the same field separated
+ by angular brackets <>. The OPP binding doesn't provide any provisions to
+ relate the values to their power supplies or the order in which the supplies
+ need to be configured and that is left for the implementation specific
+ binding.
+
+ Entries for all regulators shall be of the same size, i.e. either all use a
+ single value or triplets.
- opp-microvolt-<name>: Named opp-microvolt property. This is exactly similar to
the above opp-microvolt property, but allows multiple voltage ranges to be
@@ -104,10 +110,13 @@ Optional properties:
Should only be set if opp-microvolt is set for the OPP.
- Entries for multiple regulators must be present in the same order as
- regulators are specified in device's DT node. If this property isn't required
- for few regulators, then this should be marked as zero for them. If it isn't
- required for any regulator, then this property need not be present.
+ Entries for multiple regulators shall be provided in the same field separated
+ by angular brackets <>. If current values aren't required for a regulator,
+ then it shall be filled with 0. If current values aren't required for any of
+ the regulators, then this field is not required. The OPP binding doesn't
+ provide any provisions to relate the values to their power supplies or the
+ order in which the supplies need to be configured and that is left for the
+ implementation specific binding.
- opp-microamp-<name>: Named opp-microamp property. Similar to
opp-microvolt-<name> property, but for microamp instead.
@@ -386,10 +395,12 @@ Example 4: Handling multiple regulators
/ {
cpus {
cpu@0 {
- compatible = "arm,cortex-a7";
+ compatible = "vendor,cpu-type";
...
- cpu-supply = <&cpu_supply0>, <&cpu_supply1>, <&cpu_supply2>;
+ vcc0-supply = <&cpu_supply0>;
+ vcc1-supply = <&cpu_supply1>;
+ vcc2-supply = <&cpu_supply2>;
operating-points-v2 = <&cpu0_opp_table>;
};
};
diff --git a/Documentation/devicetree/bindings/power/domain-idle-state.txt b/Documentation/devicetree/bindings/power/domain-idle-state.txt
new file mode 100644
index 000000000000..eefc7ed22ca2
--- /dev/null
+++ b/Documentation/devicetree/bindings/power/domain-idle-state.txt
@@ -0,0 +1,33 @@
+PM Domain Idle State Node:
+
+A domain idle state node represents the state parameters that will be used to
+select the state when there are no active components in the domain.
+
+The state node has the following parameters -
+
+- compatible:
+ Usage: Required
+ Value type: <string>
+ Definition: Must be "domain-idle-state".
+
+- entry-latency-us
+ Usage: Required
+ Value type: <prop-encoded-array>
+ Definition: u32 value representing worst case latency in
+ microseconds required to enter the idle state.
+ The exit-latency-us duration may be guaranteed
+ only after entry-latency-us has passed.
+
+- exit-latency-us
+ Usage: Required
+ Value type: <prop-encoded-array>
+ Definition: u32 value representing worst case latency
+ in microseconds required to exit the idle state.
+
+- min-residency-us
+ Usage: Required
+ Value type: <prop-encoded-array>
+ Definition: u32 value representing minimum residency duration
+ in microseconds after which the idle state will yield
+ power benefits after overcoming the overhead in entering
+i the idle state.
diff --git a/Documentation/devicetree/bindings/power/power_domain.txt b/Documentation/devicetree/bindings/power/power_domain.txt
index 025b5e7df61c..723e1ad937da 100644
--- a/Documentation/devicetree/bindings/power/power_domain.txt
+++ b/Documentation/devicetree/bindings/power/power_domain.txt
@@ -29,6 +29,15 @@ Optional properties:
specified by this binding. More details about power domain specifier are
available in the next section.
+- domain-idle-states : A phandle of an idle-state that shall be soaked into a
+ generic domain power state. The idle state definitions are
+ compatible with domain-idle-state specified in [1].
+ The domain-idle-state property reflects the idle state of this PM domain and
+ not the idle states of the devices or sub-domains in the PM domain. Devices
+ and sub-domains have their own idle-states independent of the parent
+ domain's idle states. In the absence of this property, the domain would be
+ considered as capable of being powered-on or powered-off.
+
Example:
power: power-controller@12340000 {
@@ -59,6 +68,38 @@ The nodes above define two power controllers: 'parent' and 'child'.
Domains created by the 'child' power controller are subdomains of '0' power
domain provided by the 'parent' power controller.
+Example 3:
+ parent: power-controller@12340000 {
+ compatible = "foo,power-controller";
+ reg = <0x12340000 0x1000>;
+ #power-domain-cells = <0>;
+ domain-idle-states = <&DOMAIN_RET>, <&DOMAIN_PWR_DN>;
+ };
+
+ child: power-controller@12341000 {
+ compatible = "foo,power-controller";
+ reg = <0x12341000 0x1000>;
+ power-domains = <&parent 0>;
+ #power-domain-cells = <0>;
+ domain-idle-states = <&DOMAIN_PWR_DN>;
+ };
+
+ DOMAIN_RET: state@0 {
+ compatible = "domain-idle-state";
+ reg = <0x0>;
+ entry-latency-us = <1000>;
+ exit-latency-us = <2000>;
+ min-residency-us = <10000>;
+ };
+
+ DOMAIN_PWR_DN: state@1 {
+ compatible = "domain-idle-state";
+ reg = <0x1>;
+ entry-latency-us = <5000>;
+ exit-latency-us = <8000>;
+ min-residency-us = <7000>;
+ };
+
==PM domain consumers==
Required properties:
@@ -76,3 +117,5 @@ Example:
The node above defines a typical PM domain consumer device, which is located
inside a PM domain with index 0 of a power controller represented by a node
with the label "power".
+
+[1]. Documentation/devicetree/bindings/power/domain-idle-state.txt
diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt
index 8ba6625fdd63..73ddea39a9ce 100644
--- a/Documentation/power/devices.txt
+++ b/Documentation/power/devices.txt
@@ -607,7 +607,9 @@ individually. Instead, a set of devices sharing a power resource can be put
into a low-power state together at the same time by turning off the shared
power resource. Of course, they also need to be put into the full-power state
together, by turning the shared power resource on. A set of devices with this
-property is often referred to as a power domain.
+property is often referred to as a power domain. A power domain may also be
+nested inside another power domain. The nested domain is referred to as the
+sub-domain of the parent domain.
Support for power domains is provided through the pm_domain field of struct
device. This field is a pointer to an object of type struct dev_pm_domain,
@@ -629,6 +631,16 @@ support for power domains into subsystem-level callbacks, for example by
modifying the platform bus type. Other platforms need not implement it or take
it into account in any way.
+Devices may be defined as IRQ-safe which indicates to the PM core that their
+runtime PM callbacks may be invoked with disabled interrupts (see
+Documentation/power/runtime_pm.txt for more information). If an IRQ-safe
+device belongs to a PM domain, the runtime PM of the domain will be
+disallowed, unless the domain itself is defined as IRQ-safe. However, it
+makes sense to define a PM domain as IRQ-safe only if all the devices in it
+are IRQ-safe. Moreover, if an IRQ-safe domain has a parent domain, the runtime
+PM of the parent is only allowed if the parent itself is IRQ-safe too with the
+additional restriction that all child domains of an IRQ-safe parent must also
+be IRQ-safe.
Device Low Power (suspend) States
---------------------------------
diff --git a/Documentation/power/states.txt b/Documentation/power/states.txt
index 50f3ef9177c1..8a39ce45d8a0 100644
--- a/Documentation/power/states.txt
+++ b/Documentation/power/states.txt
@@ -8,25 +8,43 @@ for each state.
The states are represented by strings that can be read or written to the
/sys/power/state file. Those strings may be "mem", "standby", "freeze" and
-"disk", where the last one always represents hibernation (Suspend-To-Disk) and
-the meaning of the remaining ones depends on the relative_sleep_states command
-line argument.
-
-For relative_sleep_states=1, the strings "mem", "standby" and "freeze" label the
-available non-hibernation sleep states from the deepest to the shallowest,
-respectively. In that case, "mem" is always present in /sys/power/state,
-because there is at least one non-hibernation sleep state in every system. If
-the given system supports two non-hibernation sleep states, "standby" is present
-in /sys/power/state in addition to "mem". If the system supports three
-non-hibernation sleep states, "freeze" will be present in /sys/power/state in
-addition to "mem" and "standby".
-
-For relative_sleep_states=0, which is the default, the following descriptions
-apply.
-
-state: Suspend-To-Idle
+"disk", where the last three always represent Power-On Suspend (if supported),
+Suspend-To-Idle and hibernation (Suspend-To-Disk), respectively.
+
+The meaning of the "mem" string is controlled by the /sys/power/mem_sleep file.
+It contains strings representing the available modes of system suspend that may
+be triggered by writing "mem" to /sys/power/state. These modes are "s2idle"
+(Suspend-To-Idle), "shallow" (Power-On Suspend) and "deep" (Suspend-To-RAM).
+The "s2idle" mode is always available, while the other ones are only available
+if supported by the platform (if not supported, the strings representing them
+are not present in /sys/power/mem_sleep). The string representing the suspend
+mode to be used subsequently is enclosed in square brackets. Writing one of
+the other strings present in /sys/power/mem_sleep to it causes the suspend mode
+to be used subsequently to change to the one represented by that string.
+
+Consequently, there are two ways to cause the system to go into the
+Suspend-To-Idle sleep state. The first one is to write "freeze" directly to
+/sys/power/state. The second one is to write "s2idle" to /sys/power/mem_sleep
+and then to wrtie "mem" to /sys/power/state. Similarly, there are two ways
+to cause the system to go into the Power-On Suspend sleep state (the strings to
+write to the control files in that case are "standby" or "shallow" and "mem",
+respectively) if that state is supported by the platform. In turn, there is
+only one way to cause the system to go into the Suspend-To-RAM state (write
+"deep" into /sys/power/mem_sleep and "mem" into /sys/power/state).
+
+The default suspend mode (ie. the one to be used without writing anything into
+/sys/power/mem_sleep) is either "deep" (if Suspend-To-RAM is supported) or
+"s2idle", but it can be overridden by the value of the "mem_sleep_default"
+parameter in the kernel command line. On some ACPI-based systems, depending on
+the information in the FADT, the default may be "s2idle" even if Suspend-To-RAM
+is supported.
+
+The properties of all of the sleep states are described below.
+
+
+State: Suspend-To-Idle
ACPI state: S0
-Label: "freeze"
+Label: "s2idle" ("freeze")
This state is a generic, pure software, light-weight, system sleep state.
It allows more energy to be saved relative to runtime idle by freezing user
@@ -35,13 +53,13 @@ lower-power than available at run time), such that the processors can
spend more time in their idle states.
This state can be used for platforms without Power-On Suspend/Suspend-to-RAM
-support, or it can be used in addition to Suspend-to-RAM (memory sleep)
-to provide reduced resume latency. It is always supported.
+support, or it can be used in addition to Suspend-to-RAM to provide reduced
+resume latency. It is always supported.
State: Standby / Power-On Suspend
ACPI State: S1
-Label: "standby"
+Label: "shallow" ("standby")
This state, if supported, offers moderate, though real, power savings, while
providing a relatively low-latency transition back to a working system. No
@@ -58,7 +76,7 @@ state.
State: Suspend-to-RAM
ACPI State: S3
-Label: "mem"
+Label: "deep"
This state, if supported, offers significant power savings as everything in the
system is put into a low-power state, except for memory, which should be placed