summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorCarsten Emde <C.Emde@osadl.org>2011-07-19 13:53:12 +0100
committerClark Williams <williams@redhat.com>2012-03-24 10:26:13 -0500
commitf5deacdcf48b9b121ff984ca42fc206eb76ba3a1 (patch)
treeb6ae992a58a8919ebf7ff6ec91d17a1826117c1f /Documentation
parent475709312125faad07c6f3f1aa5b447ed6259c95 (diff)
hwlatdetect.patch
Jon Masters developed this wonderful SMI detector. For details please consult Documentation/hwlat_detector.txt. It could be ported to Linux 3.0 RT without any major change. Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/hwlat_detector.txt64
1 files changed, 64 insertions, 0 deletions
diff --git a/Documentation/hwlat_detector.txt b/Documentation/hwlat_detector.txt
new file mode 100644
index 000000000000..cb61516483d3
--- /dev/null
+++ b/Documentation/hwlat_detector.txt
@@ -0,0 +1,64 @@
+Introduction:
+-------------
+
+The module hwlat_detector is a special purpose kernel module that is used to
+detect large system latencies induced by the behavior of certain underlying
+hardware or firmware, independent of Linux itself. The code was developed
+originally to detect SMIs (System Management Interrupts) on x86 systems,
+however there is nothing x86 specific about this patchset. It was
+originally written for use by the "RT" patch since the Real Time
+kernel is highly latency sensitive.
+
+SMIs are usually not serviced by the Linux kernel, which typically does not
+even know that they are occuring. SMIs are instead are set up by BIOS code
+and are serviced by BIOS code, usually for "critical" events such as
+management of thermal sensors and fans. Sometimes though, SMIs are used for
+other tasks and those tasks can spend an inordinate amount of time in the
+handler (sometimes measured in milliseconds). Obviously this is a problem if
+you are trying to keep event service latencies down in the microsecond range.
+
+The hardware latency detector works by hogging all of the cpus for configurable
+amounts of time (by calling stop_machine()), polling the CPU Time Stamp Counter
+for some period, then looking for gaps in the TSC data. Any gap indicates a
+time when the polling was interrupted and since the machine is stopped and
+interrupts turned off the only thing that could do that would be an SMI.
+
+Note that the SMI detector should *NEVER* be used in a production environment.
+It is intended to be run manually to determine if the hardware platform has a
+problem with long system firmware service routines.
+
+Usage:
+------
+
+Loading the module hwlat_detector passing the parameter "enabled=1" (or by
+setting the "enable" entry in "hwlat_detector" debugfs toggled on) is the only
+step required to start the hwlat_detector. It is possible to redefine the
+threshold in microseconds (us) above which latency spikes will be taken
+into account (parameter "threshold=").
+
+Example:
+
+ # modprobe hwlat_detector enabled=1 threshold=100
+
+After the module is loaded, it creates a directory named "hwlat_detector" under
+the debugfs mountpoint, "/debug/hwlat_detector" for this text. It is necessary
+to have debugfs mounted, which might be on /sys/debug on your system.
+
+The /debug/hwlat_detector interface contains the following files:
+
+count - number of latency spikes observed since last reset
+enable - a global enable/disable toggle (0/1), resets count
+max - maximum hardware latency actually observed (usecs)
+sample - a pipe from which to read current raw sample data
+ in the format <timestamp> <latency observed usecs>
+ (can be opened O_NONBLOCK for a single sample)
+threshold - minimum latency value to be considered (usecs)
+width - time period to sample with CPUs held (usecs)
+ must be less than the total window size (enforced)
+window - total period of sampling, width being inside (usecs)
+
+By default we will set width to 500,000 and window to 1,000,000, meaning that
+we will sample every 1,000,000 usecs (1s) for 500,000 usecs (0.5s). If we
+observe any latencies that exceed the threshold (initially 100 usecs),
+then we write to a global sample ring buffer of 8K samples, which is
+consumed by reading from the "sample" (pipe) debugfs file interface.