summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorIngo Molnar <mingo@elte.hu>2011-03-18 10:38:53 +0100
committerIngo Molnar <mingo@elte.hu>2011-03-18 10:39:00 +0100
commit8dd8997d2c56c9f248294805e129e1fc69444380 (patch)
tree3b030a04295fc031db98746c4074c2df1ed6a19f /Documentation
parent1eda75c131ea42ec173323b6c34aeed78ae637c1 (diff)
parent016aa2ed1cc9cf704cf76d8df07751b6daa9750f (diff)
Merge branch 'linus' into x86/urgent
Merge reason: Merge upstream commits to avoid conflicts in upcoming patches. Signed-off-by: Ingo Molnar <mingo@elte.hu>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/RCU/whatisRCU.txt31
-rw-r--r--Documentation/devicetree/bindings/i2c/ce4100-i2c.txt93
-rw-r--r--Documentation/devicetree/bindings/rtc/rtc-cmos.txt28
-rw-r--r--Documentation/devicetree/bindings/x86/ce4100.txt38
-rw-r--r--Documentation/devicetree/bindings/x86/interrupt.txt26
-rw-r--r--Documentation/devicetree/bindings/x86/timer.txt6
-rw-r--r--Documentation/devicetree/booting-without-of.txt20
-rw-r--r--Documentation/kernel-parameters.txt4
-rw-r--r--Documentation/memory-barriers.txt58
-rw-r--r--Documentation/rtc.txt29
-rw-r--r--Documentation/spinlocks.txt24
-rw-r--r--Documentation/trace/ftrace-design.txt7
-rw-r--r--Documentation/trace/ftrace.txt151
-rw-r--r--Documentation/trace/kprobetrace.txt16
14 files changed, 360 insertions, 171 deletions
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index cfaac34c4557..6ef692667e2f 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -849,6 +849,37 @@ All: lockdep-checked RCU-protected pointer access
See the comment headers in the source code (or the docbook generated
from them) for more information.
+However, given that there are no fewer than four families of RCU APIs
+in the Linux kernel, how do you choose which one to use? The following
+list can be helpful:
+
+a. Will readers need to block? If so, you need SRCU.
+
+b. What about the -rt patchset? If readers would need to block
+ in an non-rt kernel, you need SRCU. If readers would block
+ in a -rt kernel, but not in a non-rt kernel, SRCU is not
+ necessary.
+
+c. Do you need to treat NMI handlers, hardirq handlers,
+ and code segments with preemption disabled (whether
+ via preempt_disable(), local_irq_save(), local_bh_disable(),
+ or some other mechanism) as if they were explicit RCU readers?
+ If so, you need RCU-sched.
+
+d. Do you need RCU grace periods to complete even in the face
+ of softirq monopolization of one or more of the CPUs? For
+ example, is your code subject to network-based denial-of-service
+ attacks? If so, you need RCU-bh.
+
+e. Is your workload too update-intensive for normal use of
+ RCU, but inappropriate for other synchronization mechanisms?
+ If so, consider SLAB_DESTROY_BY_RCU. But please be careful!
+
+f. Otherwise, use RCU.
+
+Of course, this all assumes that you have determined that RCU is in fact
+the right tool for your job.
+
8. ANSWERS TO QUICK QUIZZES
diff --git a/Documentation/devicetree/bindings/i2c/ce4100-i2c.txt b/Documentation/devicetree/bindings/i2c/ce4100-i2c.txt
new file mode 100644
index 000000000000..569b16248514
--- /dev/null
+++ b/Documentation/devicetree/bindings/i2c/ce4100-i2c.txt
@@ -0,0 +1,93 @@
+CE4100 I2C
+----------
+
+CE4100 has one PCI device which is described as the I2C-Controller. This
+PCI device has three PCI-bars, each bar contains a complete I2C
+controller. So we have a total of three independent I2C-Controllers
+which share only an interrupt line.
+The driver is probed via the PCI-ID and is gathering the information of
+attached devices from the devices tree.
+Grant Likely recommended to use the ranges property to map the PCI-Bar
+number to its physical address and to use this to find the child nodes
+of the specific I2C controller. This were his exact words:
+
+ Here's where the magic happens. Each entry in
+ ranges describes how the parent pci address space
+ (middle group of 3) is translated to the local
+ address space (first group of 2) and the size of
+ each range (last cell). In this particular case,
+ the first cell of the local address is chosen to be
+ 1:1 mapped to the BARs, and the second is the
+ offset from be base of the BAR (which would be
+ non-zero if you had 2 or more devices mapped off
+ the same BAR)
+
+ ranges allows the address mapping to be described
+ in a way that the OS can interpret without
+ requiring custom device driver code.
+
+This is an example which is used on FalconFalls:
+------------------------------------------------
+ i2c-controller@b,2 {
+ #address-cells = <2>;
+ #size-cells = <1>;
+ compatible = "pci8086,2e68.2",
+ "pci8086,2e68",
+ "pciclass,ff0000",
+ "pciclass,ff00";
+
+ reg = <0x15a00 0x0 0x0 0x0 0x0>;
+ interrupts = <16 1>;
+
+ /* as described by Grant, the first number in the group of
+ * three is the bar number followed by the 64bit bar address
+ * followed by size of the mapping. The bar address
+ * requires also a valid translation in parents ranges
+ * property.
+ */
+ ranges = <0 0 0x02000000 0 0xdffe0500 0x100
+ 1 0 0x02000000 0 0xdffe0600 0x100
+ 2 0 0x02000000 0 0xdffe0700 0x100>;
+
+ i2c@0 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ compatible = "intel,ce4100-i2c-controller";
+
+ /* The first number in the reg property is the
+ * number of the bar
+ */
+ reg = <0 0 0x100>;
+
+ /* This I2C controller has no devices */
+ };
+
+ i2c@1 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ compatible = "intel,ce4100-i2c-controller";
+ reg = <1 0 0x100>;
+
+ /* This I2C controller has one gpio controller */
+ gpio@26 {
+ #gpio-cells = <2>;
+ compatible = "ti,pcf8575";
+ reg = <0x26>;
+ gpio-controller;
+ };
+ };
+
+ i2c@2 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ compatible = "intel,ce4100-i2c-controller";
+ reg = <2 0 0x100>;
+
+ gpio@26 {
+ #gpio-cells = <2>;
+ compatible = "ti,pcf8575";
+ reg = <0x26>;
+ gpio-controller;
+ };
+ };
+ };
diff --git a/Documentation/devicetree/bindings/rtc/rtc-cmos.txt b/Documentation/devicetree/bindings/rtc/rtc-cmos.txt
new file mode 100644
index 000000000000..7382989b3052
--- /dev/null
+++ b/Documentation/devicetree/bindings/rtc/rtc-cmos.txt
@@ -0,0 +1,28 @@
+ Motorola mc146818 compatible RTC
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Required properties:
+ - compatible : "motorola,mc146818"
+ - reg : should contain registers location and length.
+
+Optional properties:
+ - interrupts : should contain interrupt.
+ - interrupt-parent : interrupt source phandle.
+ - ctrl-reg : Contains the initial value of the control register also
+ called "Register B".
+ - freq-reg : Contains the initial value of the frequency register also
+ called "Regsiter A".
+
+"Register A" and "B" are usually initialized by the firmware (BIOS for
+instance). If this is not done, it can be performed by the driver.
+
+ISA Example:
+
+ rtc@70 {
+ compatible = "motorola,mc146818";
+ interrupts = <8 3>;
+ interrupt-parent = <&ioapic1>;
+ ctrl-reg = <2>;
+ freq-reg = <0x26>;
+ reg = <1 0x70 2>;
+ };
diff --git a/Documentation/devicetree/bindings/x86/ce4100.txt b/Documentation/devicetree/bindings/x86/ce4100.txt
new file mode 100644
index 000000000000..b49ae593a60b
--- /dev/null
+++ b/Documentation/devicetree/bindings/x86/ce4100.txt
@@ -0,0 +1,38 @@
+CE4100 Device Tree Bindings
+---------------------------
+
+The CE4100 SoC uses for in core peripherals the following compatible
+format: <vendor>,<chip>-<device>.
+Many of the "generic" devices like HPET or IO APIC have the ce4100
+name in their compatible property because they first appeared in this
+SoC.
+
+The CPU node
+------------
+ cpu@0 {
+ device_type = "cpu";
+ compatible = "intel,ce4100";
+ reg = <0>;
+ lapic = <&lapic0>;
+ };
+
+The reg property describes the CPU number. The lapic property points to
+the local APIC timer.
+
+The SoC node
+------------
+
+This node describes the in-core peripherals. Required property:
+ compatible = "intel,ce4100-cp";
+
+The PCI node
+------------
+This node describes the PCI bus on the SoC. Its property should be
+ compatible = "intel,ce4100-pci", "pci";
+
+If the OS is using the IO-APIC for interrupt routing then the reported
+interrupt numbers for devices is no longer true. In order to obtain the
+correct interrupt number, the child node which represents the device has
+to contain the interrupt property. Besides the interrupt property it has
+to contain at least the reg property containing the PCI bus address and
+compatible property according to "PCI Bus Binding Revision 2.1".
diff --git a/Documentation/devicetree/bindings/x86/interrupt.txt b/Documentation/devicetree/bindings/x86/interrupt.txt
new file mode 100644
index 000000000000..7d19f494f19a
--- /dev/null
+++ b/Documentation/devicetree/bindings/x86/interrupt.txt
@@ -0,0 +1,26 @@
+Interrupt chips
+---------------
+
+* Intel I/O Advanced Programmable Interrupt Controller (IO APIC)
+
+ Required properties:
+ --------------------
+ compatible = "intel,ce4100-ioapic";
+ #interrupt-cells = <2>;
+
+ Device's interrupt property:
+
+ interrupts = <P S>;
+
+ The first number (P) represents the interrupt pin which is wired to the
+ IO APIC. The second number (S) represents the sense of interrupt which
+ should be configured and can be one of:
+ 0 - Edge Rising
+ 1 - Level Low
+ 2 - Level High
+ 3 - Edge Falling
+
+* Local APIC
+ Required property:
+
+ compatible = "intel,ce4100-lapic";
diff --git a/Documentation/devicetree/bindings/x86/timer.txt b/Documentation/devicetree/bindings/x86/timer.txt
new file mode 100644
index 000000000000..c688af58e3bd
--- /dev/null
+++ b/Documentation/devicetree/bindings/x86/timer.txt
@@ -0,0 +1,6 @@
+Timers
+------
+
+* High Precision Event Timer (HPET)
+ Required property:
+ compatible = "intel,ce4100-hpet";
diff --git a/Documentation/devicetree/booting-without-of.txt b/Documentation/devicetree/booting-without-of.txt
index 28b1c9d3d351..55fd2623445b 100644
--- a/Documentation/devicetree/booting-without-of.txt
+++ b/Documentation/devicetree/booting-without-of.txt
@@ -13,6 +13,7 @@ Table of Contents
I - Introduction
1) Entry point for arch/powerpc
+ 2) Entry point for arch/x86
II - The DT block format
1) Header
@@ -225,6 +226,25 @@ it with special cases.
cannot support both configurations with Book E and configurations
with classic Powerpc architectures.
+2) Entry point for arch/x86
+-------------------------------
+
+ There is one single 32bit entry point to the kernel at code32_start,
+ the decompressor (the real mode entry point goes to the same 32bit
+ entry point once it switched into protected mode). That entry point
+ supports one calling convention which is documented in
+ Documentation/x86/boot.txt
+ The physical pointer to the device-tree block (defined in chapter II)
+ is passed via setup_data which requires at least boot protocol 2.09.
+ The type filed is defined as
+
+ #define SETUP_DTB 2
+
+ This device-tree is used as an extension to the "boot page". As such it
+ does not parse / consider data which is already covered by the boot
+ page. This includes memory size, reserved ranges, command line arguments
+ or initrd address. It simply holds information which can not be retrieved
+ otherwise like interrupt routing or a list of devices behind an I2C bus.
II - The DT block format
========================
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index f4a04c0c7edc..738c6fda3fb0 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2444,6 +2444,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
<deci-seconds>: poll all this frequency
0: no polling (default)
+ threadirqs [KNL]
+ Force threading of all interrupt handlers except those
+ marked explicitely IRQF_NO_THREAD.
+
topology= [S390]
Format: {off | on}
Specify if the kernel should make use of the cpu
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 631ad2f1b229..f0d3a8026a56 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -21,6 +21,7 @@ Contents:
- SMP barrier pairing.
- Examples of memory barrier sequences.
- Read memory barriers vs load speculation.
+ - Transitivity
(*) Explicit kernel barriers.
@@ -959,6 +960,63 @@ the speculation will be cancelled and the value reloaded:
retrieved : : +-------+
+TRANSITIVITY
+------------
+
+Transitivity is a deeply intuitive notion about ordering that is not
+always provided by real computer systems. The following example
+demonstrates transitivity (also called "cumulativity"):
+
+ CPU 1 CPU 2 CPU 3
+ ======================= ======================= =======================
+ { X = 0, Y = 0 }
+ STORE X=1 LOAD X STORE Y=1
+ <general barrier> <general barrier>
+ LOAD Y LOAD X
+
+Suppose that CPU 2's load from X returns 1 and its load from Y returns 0.
+This indicates that CPU 2's load from X in some sense follows CPU 1's
+store to X and that CPU 2's load from Y in some sense preceded CPU 3's
+store to Y. The question is then "Can CPU 3's load from X return 0?"
+
+Because CPU 2's load from X in some sense came after CPU 1's store, it
+is natural to expect that CPU 3's load from X must therefore return 1.
+This expectation is an example of transitivity: if a load executing on
+CPU A follows a load from the same variable executing on CPU B, then
+CPU A's load must either return the same value that CPU B's load did,
+or must return some later value.
+
+In the Linux kernel, use of general memory barriers guarantees
+transitivity. Therefore, in the above example, if CPU 2's load from X
+returns 1 and its load from Y returns 0, then CPU 3's load from X must
+also return 1.
+
+However, transitivity is -not- guaranteed for read or write barriers.
+For example, suppose that CPU 2's general barrier in the above example
+is changed to a read barrier as shown below:
+
+ CPU 1 CPU 2 CPU 3
+ ======================= ======================= =======================
+ { X = 0, Y = 0 }
+ STORE X=1 LOAD X STORE Y=1
+ <read barrier> <general barrier>
+ LOAD Y LOAD X
+
+This substitution destroys transitivity: in this example, it is perfectly
+legal for CPU 2's load from X to return 1, its load from Y to return 0,
+and CPU 3's load from X to return 0.
+
+The key point is that although CPU 2's read barrier orders its pair
+of loads, it does not guarantee to order CPU 1's store. Therefore, if
+this example runs on a system where CPUs 1 and 2 share a store buffer
+or a level of cache, CPU 2 might have early access to CPU 1's writes.
+General barriers are therefore required to ensure that all CPUs agree
+on the combined order of CPU 1's and CPU 2's accesses.
+
+To reiterate, if your code requires transitivity, use general barriers
+throughout.
+
+
========================
EXPLICIT KERNEL BARRIERS
========================
diff --git a/Documentation/rtc.txt b/Documentation/rtc.txt
index 9104c1062084..250160469d83 100644
--- a/Documentation/rtc.txt
+++ b/Documentation/rtc.txt
@@ -178,38 +178,29 @@ RTC class framework, but can't be supported by the older driver.
setting the longer alarm time and enabling its IRQ using a single
request (using the same model as EFI firmware).
- * RTC_UIE_ON, RTC_UIE_OFF ... if the RTC offers IRQs, it probably
- also offers update IRQs whenever the "seconds" counter changes.
- If needed, the RTC framework can emulate this mechanism.
+ * RTC_UIE_ON, RTC_UIE_OFF ... if the RTC offers IRQs, the RTC framework
+ will emulate this mechanism.
- * RTC_PIE_ON, RTC_PIE_OFF, RTC_IRQP_SET, RTC_IRQP_READ ... another
- feature often accessible with an IRQ line is a periodic IRQ, issued
- at settable frequencies (usually 2^N Hz).
+ * RTC_PIE_ON, RTC_PIE_OFF, RTC_IRQP_SET, RTC_IRQP_READ ... these icotls
+ are emulated via a kernel hrtimer.
In many cases, the RTC alarm can be a system wake event, used to force
Linux out of a low power sleep state (or hibernation) back to a fully
operational state. For example, a system could enter a deep power saving
state until it's time to execute some scheduled tasks.
-Note that many of these ioctls need not actually be implemented by your
-driver. The common rtc-dev interface handles many of these nicely if your
-driver returns ENOIOCTLCMD. Some common examples:
+Note that many of these ioctls are handled by the common rtc-dev interface.
+Some common examples:
* RTC_RD_TIME, RTC_SET_TIME: the read_time/set_time functions will be
called with appropriate values.
- * RTC_ALM_SET, RTC_ALM_READ, RTC_WKALM_SET, RTC_WKALM_RD: the
- set_alarm/read_alarm functions will be called.
+ * RTC_ALM_SET, RTC_ALM_READ, RTC_WKALM_SET, RTC_WKALM_RD: gets or sets
+ the alarm rtc_timer. May call the set_alarm driver function.
- * RTC_IRQP_SET, RTC_IRQP_READ: the irq_set_freq function will be called
- to set the frequency while the framework will handle the read for you
- since the frequency is stored in the irq_freq member of the rtc_device
- structure. Your driver needs to initialize the irq_freq member during
- init. Make sure you check the requested frequency is in range of your
- hardware in the irq_set_freq function. If it isn't, return -EINVAL. If
- you cannot actually change the frequency, do not define irq_set_freq.
+ * RTC_IRQP_SET, RTC_IRQP_READ: These are emulated by the generic code.
- * RTC_PIE_ON, RTC_PIE_OFF: the irq_set_state function will be called.
+ * RTC_PIE_ON, RTC_PIE_OFF: These are also emulated by the generic code.
If all else fails, check out the rtc-test.c driver!
diff --git a/Documentation/spinlocks.txt b/Documentation/spinlocks.txt
index 178c831b907d..2e3c64b1a6a5 100644
--- a/Documentation/spinlocks.txt
+++ b/Documentation/spinlocks.txt
@@ -86,7 +86,7 @@ to change the variables it has to get an exclusive write lock.
The routines look the same as above:
- rwlock_t xxx_lock = RW_LOCK_UNLOCKED;
+ rwlock_t xxx_lock = __RW_LOCK_UNLOCKED(xxx_lock);
unsigned long flags;
@@ -196,25 +196,3 @@ appropriate:
For static initialization, use DEFINE_SPINLOCK() / DEFINE_RWLOCK() or
__SPIN_LOCK_UNLOCKED() / __RW_LOCK_UNLOCKED() as appropriate.
-
-SPIN_LOCK_UNLOCKED and RW_LOCK_UNLOCKED are deprecated. These interfere
-with lockdep state tracking.
-
-Most of the time, you can simply turn:
- static spinlock_t xxx_lock = SPIN_LOCK_UNLOCKED;
-into:
- static DEFINE_SPINLOCK(xxx_lock);
-
-Static structure member variables go from:
-
- struct foo bar {
- .lock = SPIN_LOCK_UNLOCKED;
- };
-
-to:
-
- struct foo bar {
- .lock = __SPIN_LOCK_UNLOCKED(bar.lock);
- };
-
-Declaration of static rw_locks undergo a similar transformation.
diff --git a/Documentation/trace/ftrace-design.txt b/Documentation/trace/ftrace-design.txt
index dc52bd442c92..79fcafc7fd64 100644
--- a/Documentation/trace/ftrace-design.txt
+++ b/Documentation/trace/ftrace-design.txt
@@ -247,6 +247,13 @@ You need very few things to get the syscalls tracing in an arch.
- Support the TIF_SYSCALL_TRACEPOINT thread flags.
- Put the trace_sys_enter() and trace_sys_exit() tracepoints calls from ptrace
in the ptrace syscalls tracing path.
+- If the system call table on this arch is more complicated than a simple array
+ of addresses of the system calls, implement an arch_syscall_addr to return
+ the address of a given system call.
+- If the symbol names of the system calls do not match the function names on
+ this arch, define ARCH_HAS_SYSCALL_MATCH_SYM_NAME in asm/ftrace.h and
+ implement arch_syscall_match_sym_name with the appropriate logic to return
+ true if the function name corresponds with the symbol name.
- Tag this arch as HAVE_SYSCALL_TRACEPOINTS.
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 557c1edeccaf..1ebc24cf9a55 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -80,11 +80,11 @@ of ftrace. Here is a list of some of the key files:
tracers listed here can be configured by
echoing their name into current_tracer.
- tracing_enabled:
+ tracing_on:
- This sets or displays whether the current_tracer
- is activated and tracing or not. Echo 0 into this
- file to disable the tracer or 1 to enable it.
+ This sets or displays whether writing to the trace
+ ring buffer is enabled. Echo 0 into this file to disable
+ the tracer or 1 to enable it.
trace:
@@ -202,10 +202,6 @@ Here is the list of current tracers that may be configured.
to draw a graph of function calls similar to C code
source.
- "sched_switch"
-
- Traces the context switches and wakeups between tasks.
-
"irqsoff"
Traces the areas that disable interrupts and saves
@@ -273,39 +269,6 @@ format, the function name that was traced "path_put" and the
parent function that called this function "path_walk". The
timestamp is the time at which the function was entered.
-The sched_switch tracer also includes tracing of task wakeups
-and context switches.
-
- ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 2916:115:S
- ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 10:115:S
- ksoftirqd/1-7 [01] 1453.070013: 7:115:R ==> 10:115:R
- events/1-10 [01] 1453.070013: 10:115:S ==> 2916:115:R
- kondemand/1-2916 [01] 1453.070013: 2916:115:S ==> 7:115:R
- ksoftirqd/1-7 [01] 1453.070013: 7:115:S ==> 0:140:R
-
-Wake ups are represented by a "+" and the context switches are
-shown as "==>". The format is:
-
- Context switches:
-
- Previous task Next Task
-
- <pid>:<prio>:<state> ==> <pid>:<prio>:<state>
-
- Wake ups:
-
- Current task Task waking up
-
- <pid>:<prio>:<state> + <pid>:<prio>:<state>
-
-The prio is the internal kernel priority, which is the inverse
-of the priority that is usually displayed by user-space tools.
-Zero represents the highest priority (99). Prio 100 starts the
-"nice" priorities with 100 being equal to nice -20 and 139 being
-nice 19. The prio "140" is reserved for the idle task which is
-the lowest priority thread (pid 0).
-
-
Latency trace format
--------------------
@@ -491,78 +454,10 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
latencies, as described in "Latency
trace format".
-sched_switch
-------------
-
-This tracer simply records schedule switches. Here is an example
-of how to use it.
-
- # echo sched_switch > current_tracer
- # echo 1 > tracing_enabled
- # sleep 1
- # echo 0 > tracing_enabled
- # cat trace
-
-# tracer: sched_switch
-#
-# TASK-PID CPU# TIMESTAMP FUNCTION
-# | | | | |
- bash-3997 [01] 240.132281: 3997:120:R + 4055:120:R
- bash-3997 [01] 240.132284: 3997:120:R ==> 4055:120:R
- sleep-4055 [01] 240.132371: 4055:120:S ==> 3997:120:R
- bash-3997 [01] 240.132454: 3997:120:R + 4055:120:S
- bash-3997 [01] 240.132457: 3997:120:R ==> 4055:120:R
- sleep-4055 [01] 240.132460: 4055:120:D ==> 3997:120:R
- bash-3997 [01] 240.132463: 3997:120:R + 4055:120:D
- bash-3997 [01] 240.132465: 3997:120:R ==> 4055:120:R
- <idle>-0 [00] 240.132589: 0:140:R + 4:115:S
- <idle>-0 [00] 240.132591: 0:140:R ==> 4:115:R
- ksoftirqd/0-4 [00] 240.132595: 4:115:S ==> 0:140:R
- <idle>-0 [00] 240.132598: 0:140:R + 4:115:S
- <idle>-0 [00] 240.132599: 0:140:R ==> 4:115:R
- ksoftirqd/0-4 [00] 240.132603: 4:115:S ==> 0:140:R
- sleep-4055 [01] 240.133058: 4055:120:S ==> 3997:120:R
- [...]
-
-
-As we have discussed previously about this format, the header
-shows the name of the trace and points to the options. The
-"FUNCTION" is a misnomer since here it represents the wake ups
-and context switches.
-
-The sched_switch file only lists the wake ups (represented with
-'+') and context switches ('==>') with the previous task or
-current task first followed by the next task or task waking up.
-The format for both of these is PID:KERNEL-PRIO:TASK-STATE.
-Remember that the KERNEL-PRIO is the inverse of the actual
-priority with zero (0) being the highest priority and the nice
-values starting at 100 (nice -20). Below is a quick chart to map
-the kernel priority to user land priorities.
-
- Kernel Space User Space
- ===============================================================
- 0(high) to 98(low) user RT priority 99(high) to 1(low)
- with SCHED_RR or SCHED_FIFO
- ---------------------------------------------------------------
- 99 sched_priority is not used in scheduling
- decisions(it must be specified as 0)
- ---------------------------------------------------------------
- 100(high) to 139(low) user nice -20(high) to 19(low)
- ---------------------------------------------------------------
- 140 idle task priority
- ---------------------------------------------------------------
-
-The task states are:
-
- R - running : wants to run, may not actually be running
- S - sleep : process is waiting to be woken up (handles signals)
- D - disk sleep (uninterruptible sleep) : process must be woken up
- (ignores signals)
- T - stopped : process suspended
- t - traced : process is being traced (with something like gdb)
- Z - zombie : process waiting to be cleaned up
- X - unknown
-
+ overwrite - This controls what happens when the trace buffer is
+ full. If "1" (default), the oldest events are
+ discarded and overwritten. If "0", then the newest
+ events are discarded.
ftrace_enabled
--------------
@@ -607,10 +502,10 @@ an example:
# echo irqsoff > current_tracer
# echo latency-format > trace_options
# echo 0 > tracing_max_latency
- # echo 1 > tracing_enabled
+ # echo 1 > tracing_on
# ls -ltr
[...]
- # echo 0 > tracing_enabled
+ # echo 0 > tracing_on
# cat trace
# tracer: irqsoff
#
@@ -715,10 +610,10 @@ is much like the irqsoff tracer.
# echo preemptoff > current_tracer
# echo latency-format > trace_options
# echo 0 > tracing_max_latency
- # echo 1 > tracing_enabled
+ # echo 1 > tracing_on
# ls -ltr
[...]
- # echo 0 > tracing_enabled
+ # echo 0 > tracing_on
# cat trace
# tracer: preemptoff
#
@@ -863,10 +758,10 @@ tracers.
# echo preemptirqsoff > current_tracer
# echo latency-format > trace_options
# echo 0 > tracing_max_latency
- # echo 1 > tracing_enabled
+ # echo 1 > tracing_on
# ls -ltr
[...]
- # echo 0 > tracing_enabled
+ # echo 0 > tracing_on
# cat trace
# tracer: preemptirqsoff
#
@@ -1026,9 +921,9 @@ Instead of performing an 'ls', we will run 'sleep 1' under
# echo wakeup > current_tracer
# echo latency-format > trace_options
# echo 0 > tracing_max_latency
- # echo 1 > tracing_enabled
+ # echo 1 > tracing_on
# chrt -f 5 sleep 1
- # echo 0 > tracing_enabled
+ # echo 0 > tracing_on
# cat trace
# tracer: wakeup
#
@@ -1140,9 +1035,9 @@ ftrace_enabled is set; otherwise this tracer is a nop.
# sysctl kernel.ftrace_enabled=1
# echo function > current_tracer
- # echo 1 > tracing_enabled
+ # echo 1 > tracing_on
# usleep 1
- # echo 0 > tracing_enabled
+ # echo 0 > tracing_on
# cat trace
# tracer: function
#
@@ -1180,7 +1075,7 @@ int trace_fd;
[...]
int main(int argc, char *argv[]) {
[...]
- trace_fd = open(tracing_file("tracing_enabled"), O_WRONLY);
+ trace_fd = open(tracing_file("tracing_on"), O_WRONLY);
[...]
if (condition_hit()) {
write(trace_fd, "0", 1);
@@ -1631,9 +1526,9 @@ If I am only interested in sys_nanosleep and hrtimer_interrupt:
# echo sys_nanosleep hrtimer_interrupt \
> set_ftrace_filter
# echo function > current_tracer
- # echo 1 > tracing_enabled
+ # echo 1 > tracing_on
# usleep 1
- # echo 0 > tracing_enabled
+ # echo 0 > tracing_on
# cat trace
# tracer: ftrace
#
@@ -1879,9 +1774,9 @@ different. The trace is live.
# echo function > current_tracer
# cat trace_pipe > /tmp/trace.out &
[1] 4153
- # echo 1 > tracing_enabled
+ # echo 1 > tracing_on
# usleep 1
- # echo 0 > tracing_enabled
+ # echo 0 > tracing_on
# cat trace
# tracer: function
#
diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt
index 5f77d94598dd..6d27ab8d6e9f 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -42,11 +42,25 @@ Synopsis of kprobe_events
+|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
- (u8/u16/u32/u64/s8/s16/s32/s64) and string are supported.
+ (u8/u16/u32/u64/s8/s16/s32/s64), "string" and bitfield
+ are supported.
(*) only for return probe.
(**) this is useful for fetching a field of data structures.
+Types
+-----
+Several types are supported for fetch-args. Kprobe tracer will access memory
+by given type. Prefix 's' and 'u' means those types are signed and unsigned
+respectively. Traced arguments are shown in decimal (signed) or hex (unsigned).
+String type is a special type, which fetches a "null-terminated" string from
+kernel space. This means it will fail and store NULL if the string container
+has been paged out.
+Bitfield is another special type, which takes 3 parameters, bit-width, bit-
+offset, and container-size (usually 32). The syntax is;
+
+ b<bit-width>@<bit-offset>/<container-size>
+
Per-Probe Event Filtering
-------------------------