summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorMarcel Ziswiler <marcel.ziswiler@toradex.com>2019-12-18 22:52:20 +0100
committerMarcel Ziswiler <marcel.ziswiler@toradex.com>2019-12-18 22:52:20 +0100
commit1ddf624b0b268fdc0b80b1de618b98f8d117afea (patch)
tree3d3218332bcb34cb0afa01d6ad996058a3dbcb77 /Documentation
parent6b774eec1f9d3064e9b33634dfa99d5666d0a73a (diff)
parentbfb9e5c03076a446b1f4f6a523ddc8d723c907a6 (diff)
Merge tag 'v4.14.159' into 4.14-2.0.x-imx
This is the 4.14.159 stable release Conflicts: arch/arm/Kconfig.debug arch/arm/boot/dts/imx7s.dtsi arch/arm/mach-imx/cpuidle-imx6sx.c drivers/crypto/caam/caamalg.c drivers/crypto/mxs-dcp.c drivers/dma/imx-sdma.c drivers/input/keyboard/imx_keypad.c drivers/net/can/flexcan.c drivers/net/can/rx-offload.c drivers/net/wireless/ath/ath10k/pci.c drivers/pci/dwc/pci-imx6.c drivers/spi/spi-fsl-lpspi.c drivers/usb/dwc3/gadget.c
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-class-net-qmi4
-rw-r--r--Documentation/ABI/testing/sysfs-devices-system-cpu2
-rw-r--r--Documentation/admin-guide/hw-vuln/index.rst3
-rw-r--r--Documentation/admin-guide/hw-vuln/mds.rst7
-rw-r--r--Documentation/admin-guide/hw-vuln/multihit.rst163
-rw-r--r--Documentation/admin-guide/hw-vuln/spectre.rst769
-rw-r--r--Documentation/admin-guide/hw-vuln/tsx_async_abort.rst279
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt159
-rw-r--r--Documentation/arm64/cpu-feature-registers.txt26
-rw-r--r--Documentation/atomic_t.txt3
-rw-r--r--Documentation/devicetree/bindings/media/i2c/adv748x.txt4
-rw-r--r--Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt3
-rw-r--r--Documentation/devicetree/bindings/net/can/microchip,mcp251x.txt1
-rw-r--r--Documentation/devicetree/bindings/net/marvell-orion-mdio.txt2
-rw-r--r--Documentation/devicetree/bindings/rtc/abracon,abx80x.txt2
-rw-r--r--Documentation/devicetree/bindings/serial/mvebu-uart.txt2
-rw-r--r--Documentation/devicetree/bindings/usb/dwc3.txt2
-rw-r--r--Documentation/hid/uhid.txt2
-rw-r--r--Documentation/networking/ip-sysctl.txt8
-rw-r--r--Documentation/robust-futexes.txt3
-rw-r--r--Documentation/scheduler/sched-bwc.txt45
-rw-r--r--Documentation/sysctl/net.txt8
-rw-r--r--Documentation/usb/rio.txt138
-rw-r--r--Documentation/userspace-api/spec_ctrl.rst2
-rw-r--r--Documentation/virtual/kvm/locking.txt4
-rw-r--r--Documentation/x86/index.rst1
-rw-r--r--Documentation/x86/tsx_async_abort.rst117
27 files changed, 1582 insertions, 177 deletions
diff --git a/Documentation/ABI/testing/sysfs-class-net-qmi b/Documentation/ABI/testing/sysfs-class-net-qmi
index 7122d6264c49..c310db4ccbc2 100644
--- a/Documentation/ABI/testing/sysfs-class-net-qmi
+++ b/Documentation/ABI/testing/sysfs-class-net-qmi
@@ -29,7 +29,7 @@ Contact: Bjørn Mork <bjorn@mork.no>
Description:
Unsigned integer.
- Write a number ranging from 1 to 127 to add a qmap mux
+ Write a number ranging from 1 to 254 to add a qmap mux
based network device, supported by recent Qualcomm based
modems.
@@ -46,5 +46,5 @@ Contact: Bjørn Mork <bjorn@mork.no>
Description:
Unsigned integer.
- Write a number ranging from 1 to 127 to delete a previously
+ Write a number ranging from 1 to 254 to delete a previously
created qmap mux based network device.
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 645687b1870d..9ebca6a750f3 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -381,6 +381,8 @@ What: /sys/devices/system/cpu/vulnerabilities
/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
/sys/devices/system/cpu/vulnerabilities/l1tf
/sys/devices/system/cpu/vulnerabilities/mds
+ /sys/devices/system/cpu/vulnerabilities/tsx_async_abort
+ /sys/devices/system/cpu/vulnerabilities/itlb_multihit
Date: January 2018
Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
Description: Information about CPU vulnerabilities
diff --git a/Documentation/admin-guide/hw-vuln/index.rst b/Documentation/admin-guide/hw-vuln/index.rst
index ffc064c1ec68..0795e3c2643f 100644
--- a/Documentation/admin-guide/hw-vuln/index.rst
+++ b/Documentation/admin-guide/hw-vuln/index.rst
@@ -9,5 +9,8 @@ are configurable at compile, boot or run time.
.. toctree::
:maxdepth: 1
+ spectre
l1tf
mds
+ tsx_async_abort
+ multihit.rst
diff --git a/Documentation/admin-guide/hw-vuln/mds.rst b/Documentation/admin-guide/hw-vuln/mds.rst
index e3a796c0d3a2..2d19c9f4c1fe 100644
--- a/Documentation/admin-guide/hw-vuln/mds.rst
+++ b/Documentation/admin-guide/hw-vuln/mds.rst
@@ -265,8 +265,11 @@ time with the option "mds=". The valid arguments for this option are:
============ =============================================================
-Not specifying this option is equivalent to "mds=full".
-
+Not specifying this option is equivalent to "mds=full". For processors
+that are affected by both TAA (TSX Asynchronous Abort) and MDS,
+specifying just "mds=off" without an accompanying "tsx_async_abort=off"
+will have no effect as the same mitigation is used for both
+vulnerabilities.
Mitigation selection guide
--------------------------
diff --git a/Documentation/admin-guide/hw-vuln/multihit.rst b/Documentation/admin-guide/hw-vuln/multihit.rst
new file mode 100644
index 000000000000..ba9988d8bce5
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/multihit.rst
@@ -0,0 +1,163 @@
+iTLB multihit
+=============
+
+iTLB multihit is an erratum where some processors may incur a machine check
+error, possibly resulting in an unrecoverable CPU lockup, when an
+instruction fetch hits multiple entries in the instruction TLB. This can
+occur when the page size is changed along with either the physical address
+or cache type. A malicious guest running on a virtualized system can
+exploit this erratum to perform a denial of service attack.
+
+
+Affected processors
+-------------------
+
+Variations of this erratum are present on most Intel Core and Xeon processor
+models. The erratum is not present on:
+
+ - non-Intel processors
+
+ - Some Atoms (Airmont, Bonnell, Goldmont, GoldmontPlus, Saltwell, Silvermont)
+
+ - Intel processors that have the PSCHANGE_MC_NO bit set in the
+ IA32_ARCH_CAPABILITIES MSR.
+
+
+Related CVEs
+------------
+
+The following CVE entry is related to this issue:
+
+ ============== =================================================
+ CVE-2018-12207 Machine Check Error Avoidance on Page Size Change
+ ============== =================================================
+
+
+Problem
+-------
+
+Privileged software, including OS and virtual machine managers (VMM), are in
+charge of memory management. A key component in memory management is the control
+of the page tables. Modern processors use virtual memory, a technique that creates
+the illusion of a very large memory for processors. This virtual space is split
+into pages of a given size. Page tables translate virtual addresses to physical
+addresses.
+
+To reduce latency when performing a virtual to physical address translation,
+processors include a structure, called TLB, that caches recent translations.
+There are separate TLBs for instruction (iTLB) and data (dTLB).
+
+Under this errata, instructions are fetched from a linear address translated
+using a 4 KB translation cached in the iTLB. Privileged software modifies the
+paging structure so that the same linear address using large page size (2 MB, 4
+MB, 1 GB) with a different physical address or memory type. After the page
+structure modification but before the software invalidates any iTLB entries for
+the linear address, a code fetch that happens on the same linear address may
+cause a machine-check error which can result in a system hang or shutdown.
+
+
+Attack scenarios
+----------------
+
+Attacks against the iTLB multihit erratum can be mounted from malicious
+guests in a virtualized system.
+
+
+iTLB multihit system information
+--------------------------------
+
+The Linux kernel provides a sysfs interface to enumerate the current iTLB
+multihit status of the system:whether the system is vulnerable and which
+mitigations are active. The relevant sysfs file is:
+
+/sys/devices/system/cpu/vulnerabilities/itlb_multihit
+
+The possible values in this file are:
+
+.. list-table::
+
+ * - Not affected
+ - The processor is not vulnerable.
+ * - KVM: Mitigation: Split huge pages
+ - Software changes mitigate this issue.
+ * - KVM: Vulnerable
+ - The processor is vulnerable, but no mitigation enabled
+
+
+Enumeration of the erratum
+--------------------------------
+
+A new bit has been allocated in the IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) msr
+and will be set on CPU's which are mitigated against this issue.
+
+ ======================================= =========== ===============================
+ IA32_ARCH_CAPABILITIES MSR Not present Possibly vulnerable,check model
+ IA32_ARCH_CAPABILITIES[PSCHANGE_MC_NO] '0' Likely vulnerable,check model
+ IA32_ARCH_CAPABILITIES[PSCHANGE_MC_NO] '1' Not vulnerable
+ ======================================= =========== ===============================
+
+
+Mitigation mechanism
+-------------------------
+
+This erratum can be mitigated by restricting the use of large page sizes to
+non-executable pages. This forces all iTLB entries to be 4K, and removes
+the possibility of multiple hits.
+
+In order to mitigate the vulnerability, KVM initially marks all huge pages
+as non-executable. If the guest attempts to execute in one of those pages,
+the page is broken down into 4K pages, which are then marked executable.
+
+If EPT is disabled or not available on the host, KVM is in control of TLB
+flushes and the problematic situation cannot happen. However, the shadow
+EPT paging mechanism used by nested virtualization is vulnerable, because
+the nested guest can trigger multiple iTLB hits by modifying its own
+(non-nested) page tables. For simplicity, KVM will make large pages
+non-executable in all shadow paging modes.
+
+Mitigation control on the kernel command line and KVM - module parameter
+------------------------------------------------------------------------
+
+The KVM hypervisor mitigation mechanism for marking huge pages as
+non-executable can be controlled with a module parameter "nx_huge_pages=".
+The kernel command line allows to control the iTLB multihit mitigations at
+boot time with the option "kvm.nx_huge_pages=".
+
+The valid arguments for these options are:
+
+ ========== ================================================================
+ force Mitigation is enabled. In this case, the mitigation implements
+ non-executable huge pages in Linux kernel KVM module. All huge
+ pages in the EPT are marked as non-executable.
+ If a guest attempts to execute in one of those pages, the page is
+ broken down into 4K pages, which are then marked executable.
+
+ off Mitigation is disabled.
+
+ auto Enable mitigation only if the platform is affected and the kernel
+ was not booted with the "mitigations=off" command line parameter.
+ This is the default option.
+ ========== ================================================================
+
+
+Mitigation selection guide
+--------------------------
+
+1. No virtualization in use
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+ The system is protected by the kernel unconditionally and no further
+ action is required.
+
+2. Virtualization with trusted guests
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+ If the guest comes from a trusted source, you may assume that the guest will
+ not attempt to maliciously exploit these errata and no further action is
+ required.
+
+3. Virtualization with untrusted guests
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+ If the guest comes from an untrusted source, the guest host kernel will need
+ to apply iTLB multihit mitigation via the kernel command line or kvm
+ module parameter.
diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
new file mode 100644
index 000000000000..e05e581af5cf
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -0,0 +1,769 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Spectre Side Channels
+=====================
+
+Spectre is a class of side channel attacks that exploit branch prediction
+and speculative execution on modern CPUs to read memory, possibly
+bypassing access controls. Speculative execution side channel exploits
+do not modify memory but attempt to infer privileged data in the memory.
+
+This document covers Spectre variant 1 and Spectre variant 2.
+
+Affected processors
+-------------------
+
+Speculative execution side channel methods affect a wide range of modern
+high performance processors, since most modern high speed processors
+use branch prediction and speculative execution.
+
+The following CPUs are vulnerable:
+
+ - Intel Core, Atom, Pentium, and Xeon processors
+
+ - AMD Phenom, EPYC, and Zen processors
+
+ - IBM POWER and zSeries processors
+
+ - Higher end ARM processors
+
+ - Apple CPUs
+
+ - Higher end MIPS CPUs
+
+ - Likely most other high performance CPUs. Contact your CPU vendor for details.
+
+Whether a processor is affected or not can be read out from the Spectre
+vulnerability files in sysfs. See :ref:`spectre_sys_info`.
+
+Related CVEs
+------------
+
+The following CVE entries describe Spectre variants:
+
+ ============= ======================= ==========================
+ CVE-2017-5753 Bounds check bypass Spectre variant 1
+ CVE-2017-5715 Branch target injection Spectre variant 2
+ CVE-2019-1125 Spectre v1 swapgs Spectre variant 1 (swapgs)
+ ============= ======================= ==========================
+
+Problem
+-------
+
+CPUs use speculative operations to improve performance. That may leave
+traces of memory accesses or computations in the processor's caches,
+buffers, and branch predictors. Malicious software may be able to
+influence the speculative execution paths, and then use the side effects
+of the speculative execution in the CPUs' caches and buffers to infer
+privileged data touched during the speculative execution.
+
+Spectre variant 1 attacks take advantage of speculative execution of
+conditional branches, while Spectre variant 2 attacks use speculative
+execution of indirect branches to leak privileged memory.
+See :ref:`[1] <spec_ref1>` :ref:`[5] <spec_ref5>` :ref:`[7] <spec_ref7>`
+:ref:`[10] <spec_ref10>` :ref:`[11] <spec_ref11>`.
+
+Spectre variant 1 (Bounds Check Bypass)
+---------------------------------------
+
+The bounds check bypass attack :ref:`[2] <spec_ref2>` takes advantage
+of speculative execution that bypasses conditional branch instructions
+used for memory access bounds check (e.g. checking if the index of an
+array results in memory access within a valid range). This results in
+memory accesses to invalid memory (with out-of-bound index) that are
+done speculatively before validation checks resolve. Such speculative
+memory accesses can leave side effects, creating side channels which
+leak information to the attacker.
+
+There are some extensions of Spectre variant 1 attacks for reading data
+over the network, see :ref:`[12] <spec_ref12>`. However such attacks
+are difficult, low bandwidth, fragile, and are considered low risk.
+
+Note that, despite "Bounds Check Bypass" name, Spectre variant 1 is not
+only about user-controlled array bounds checks. It can affect any
+conditional checks. The kernel entry code interrupt, exception, and NMI
+handlers all have conditional swapgs checks. Those may be problematic
+in the context of Spectre v1, as kernel code can speculatively run with
+a user GS.
+
+Spectre variant 2 (Branch Target Injection)
+-------------------------------------------
+
+The branch target injection attack takes advantage of speculative
+execution of indirect branches :ref:`[3] <spec_ref3>`. The indirect
+branch predictors inside the processor used to guess the target of
+indirect branches can be influenced by an attacker, causing gadget code
+to be speculatively executed, thus exposing sensitive data touched by
+the victim. The side effects left in the CPU's caches during speculative
+execution can be measured to infer data values.
+
+.. _poison_btb:
+
+In Spectre variant 2 attacks, the attacker can steer speculative indirect
+branches in the victim to gadget code by poisoning the branch target
+buffer of a CPU used for predicting indirect branch addresses. Such
+poisoning could be done by indirect branching into existing code,
+with the address offset of the indirect branch under the attacker's
+control. Since the branch prediction on impacted hardware does not
+fully disambiguate branch address and uses the offset for prediction,
+this could cause privileged code's indirect branch to jump to a gadget
+code with the same offset.
+
+The most useful gadgets take an attacker-controlled input parameter (such
+as a register value) so that the memory read can be controlled. Gadgets
+without input parameters might be possible, but the attacker would have
+very little control over what memory can be read, reducing the risk of
+the attack revealing useful data.
+
+One other variant 2 attack vector is for the attacker to poison the
+return stack buffer (RSB) :ref:`[13] <spec_ref13>` to cause speculative
+subroutine return instruction execution to go to a gadget. An attacker's
+imbalanced subroutine call instructions might "poison" entries in the
+return stack buffer which are later consumed by a victim's subroutine
+return instructions. This attack can be mitigated by flushing the return
+stack buffer on context switch, or virtual machine (VM) exit.
+
+On systems with simultaneous multi-threading (SMT), attacks are possible
+from the sibling thread, as level 1 cache and branch target buffer
+(BTB) may be shared between hardware threads in a CPU core. A malicious
+program running on the sibling thread may influence its peer's BTB to
+steer its indirect branch speculations to gadget code, and measure the
+speculative execution's side effects left in level 1 cache to infer the
+victim's data.
+
+Attack scenarios
+----------------
+
+The following list of attack scenarios have been anticipated, but may
+not cover all possible attack vectors.
+
+1. A user process attacking the kernel
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Spectre variant 1
+~~~~~~~~~~~~~~~~~
+
+ The attacker passes a parameter to the kernel via a register or
+ via a known address in memory during a syscall. Such parameter may
+ be used later by the kernel as an index to an array or to derive
+ a pointer for a Spectre variant 1 attack. The index or pointer
+ is invalid, but bound checks are bypassed in the code branch taken
+ for speculative execution. This could cause privileged memory to be
+ accessed and leaked.
+
+ For kernel code that has been identified where data pointers could
+ potentially be influenced for Spectre attacks, new "nospec" accessor
+ macros are used to prevent speculative loading of data.
+
+Spectre variant 1 (swapgs)
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+ An attacker can train the branch predictor to speculatively skip the
+ swapgs path for an interrupt or exception. If they initialize
+ the GS register to a user-space value, if the swapgs is speculatively
+ skipped, subsequent GS-related percpu accesses in the speculation
+ window will be done with the attacker-controlled GS value. This
+ could cause privileged memory to be accessed and leaked.
+
+ For example:
+
+ ::
+
+ if (coming from user space)
+ swapgs
+ mov %gs:<percpu_offset>, %reg
+ mov (%reg), %reg1
+
+ When coming from user space, the CPU can speculatively skip the
+ swapgs, and then do a speculative percpu load using the user GS
+ value. So the user can speculatively force a read of any kernel
+ value. If a gadget exists which uses the percpu value as an address
+ in another load/store, then the contents of the kernel value may
+ become visible via an L1 side channel attack.
+
+ A similar attack exists when coming from kernel space. The CPU can
+ speculatively do the swapgs, causing the user GS to get used for the
+ rest of the speculative window.
+
+Spectre variant 2
+~~~~~~~~~~~~~~~~~
+
+ A spectre variant 2 attacker can :ref:`poison <poison_btb>` the branch
+ target buffer (BTB) before issuing syscall to launch an attack.
+ After entering the kernel, the kernel could use the poisoned branch
+ target buffer on indirect jump and jump to gadget code in speculative
+ execution.
+
+ If an attacker tries to control the memory addresses leaked during
+ speculative execution, he would also need to pass a parameter to the
+ gadget, either through a register or a known address in memory. After
+ the gadget has executed, he can measure the side effect.
+
+ The kernel can protect itself against consuming poisoned branch
+ target buffer entries by using return trampolines (also known as
+ "retpoline") :ref:`[3] <spec_ref3>` :ref:`[9] <spec_ref9>` for all
+ indirect branches. Return trampolines trap speculative execution paths
+ to prevent jumping to gadget code during speculative execution.
+ x86 CPUs with Enhanced Indirect Branch Restricted Speculation
+ (Enhanced IBRS) available in hardware should use the feature to
+ mitigate Spectre variant 2 instead of retpoline. Enhanced IBRS is
+ more efficient than retpoline.
+
+ There may be gadget code in firmware which could be exploited with
+ Spectre variant 2 attack by a rogue user process. To mitigate such
+ attacks on x86, Indirect Branch Restricted Speculation (IBRS) feature
+ is turned on before the kernel invokes any firmware code.
+
+2. A user process attacking another user process
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+ A malicious user process can try to attack another user process,
+ either via a context switch on the same hardware thread, or from the
+ sibling hyperthread sharing a physical processor core on simultaneous
+ multi-threading (SMT) system.
+
+ Spectre variant 1 attacks generally require passing parameters
+ between the processes, which needs a data passing relationship, such
+ as remote procedure calls (RPC). Those parameters are used in gadget
+ code to derive invalid data pointers accessing privileged memory in
+ the attacked process.
+
+ Spectre variant 2 attacks can be launched from a rogue process by
+ :ref:`poisoning <poison_btb>` the branch target buffer. This can
+ influence the indirect branch targets for a victim process that either
+ runs later on the same hardware thread, or running concurrently on
+ a sibling hardware thread sharing the same physical core.
+
+ A user process can protect itself against Spectre variant 2 attacks
+ by using the prctl() syscall to disable indirect branch speculation
+ for itself. An administrator can also cordon off an unsafe process
+ from polluting the branch target buffer by disabling the process's
+ indirect branch speculation. This comes with a performance cost
+ from not using indirect branch speculation and clearing the branch
+ target buffer. When SMT is enabled on x86, for a process that has
+ indirect branch speculation disabled, Single Threaded Indirect Branch
+ Predictors (STIBP) :ref:`[4] <spec_ref4>` are turned on to prevent the
+ sibling thread from controlling branch target buffer. In addition,
+ the Indirect Branch Prediction Barrier (IBPB) is issued to clear the
+ branch target buffer when context switching to and from such process.
+
+ On x86, the return stack buffer is stuffed on context switch.
+ This prevents the branch target buffer from being used for branch
+ prediction when the return stack buffer underflows while switching to
+ a deeper call stack. Any poisoned entries in the return stack buffer
+ left by the previous process will also be cleared.
+
+ User programs should use address space randomization to make attacks
+ more difficult (Set /proc/sys/kernel/randomize_va_space = 1 or 2).
+
+3. A virtualized guest attacking the host
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+ The attack mechanism is similar to how user processes attack the
+ kernel. The kernel is entered via hyper-calls or other virtualization
+ exit paths.
+
+ For Spectre variant 1 attacks, rogue guests can pass parameters
+ (e.g. in registers) via hyper-calls to derive invalid pointers to
+ speculate into privileged memory after entering the kernel. For places
+ where such kernel code has been identified, nospec accessor macros
+ are used to stop speculative memory access.
+
+ For Spectre variant 2 attacks, rogue guests can :ref:`poison
+ <poison_btb>` the branch target buffer or return stack buffer, causing
+ the kernel to jump to gadget code in the speculative execution paths.
+
+ To mitigate variant 2, the host kernel can use return trampolines
+ for indirect branches to bypass the poisoned branch target buffer,
+ and flushing the return stack buffer on VM exit. This prevents rogue
+ guests from affecting indirect branching in the host kernel.
+
+ To protect host processes from rogue guests, host processes can have
+ indirect branch speculation disabled via prctl(). The branch target
+ buffer is cleared before context switching to such processes.
+
+4. A virtualized guest attacking other guest
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+ A rogue guest may attack another guest to get data accessible by the
+ other guest.
+
+ Spectre variant 1 attacks are possible if parameters can be passed
+ between guests. This may be done via mechanisms such as shared memory
+ or message passing. Such parameters could be used to derive data
+ pointers to privileged data in guest. The privileged data could be
+ accessed by gadget code in the victim's speculation paths.
+
+ Spectre variant 2 attacks can be launched from a rogue guest by
+ :ref:`poisoning <poison_btb>` the branch target buffer or the return
+ stack buffer. Such poisoned entries could be used to influence
+ speculation execution paths in the victim guest.
+
+ Linux kernel mitigates attacks to other guests running in the same
+ CPU hardware thread by flushing the return stack buffer on VM exit,
+ and clearing the branch target buffer before switching to a new guest.
+
+ If SMT is used, Spectre variant 2 attacks from an untrusted guest
+ in the sibling hyperthread can be mitigated by the administrator,
+ by turning off the unsafe guest's indirect branch speculation via
+ prctl(). A guest can also protect itself by turning on microcode
+ based mitigations (such as IBPB or STIBP on x86) within the guest.
+
+.. _spectre_sys_info:
+
+Spectre system information
+--------------------------
+
+The Linux kernel provides a sysfs interface to enumerate the current
+mitigation status of the system for Spectre: whether the system is
+vulnerable, and which mitigations are active.
+
+The sysfs file showing Spectre variant 1 mitigation status is:
+
+ /sys/devices/system/cpu/vulnerabilities/spectre_v1
+
+The possible values in this file are:
+
+ .. list-table::
+
+ * - 'Not affected'
+ - The processor is not vulnerable.
+ * - 'Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers'
+ - The swapgs protections are disabled; otherwise it has
+ protection in the kernel on a case by case base with explicit
+ pointer sanitation and usercopy LFENCE barriers.
+ * - 'Mitigation: usercopy/swapgs barriers and __user pointer sanitization'
+ - Protection in the kernel on a case by case base with explicit
+ pointer sanitation, usercopy LFENCE barriers, and swapgs LFENCE
+ barriers.
+
+However, the protections are put in place on a case by case basis,
+and there is no guarantee that all possible attack vectors for Spectre
+variant 1 are covered.
+
+The spectre_v2 kernel file reports if the kernel has been compiled with
+retpoline mitigation or if the CPU has hardware mitigation, and if the
+CPU has support for additional process-specific mitigation.
+
+This file also reports CPU features enabled by microcode to mitigate
+attack between user processes:
+
+1. Indirect Branch Prediction Barrier (IBPB) to add additional
+ isolation between processes of different users.
+2. Single Thread Indirect Branch Predictors (STIBP) to add additional
+ isolation between CPU threads running on the same core.
+
+These CPU features may impact performance when used and can be enabled
+per process on a case-by-case base.
+
+The sysfs file showing Spectre variant 2 mitigation status is:
+
+ /sys/devices/system/cpu/vulnerabilities/spectre_v2
+
+The possible values in this file are:
+
+ - Kernel status:
+
+ ==================================== =================================
+ 'Not affected' The processor is not vulnerable
+ 'Vulnerable' Vulnerable, no mitigation
+ 'Mitigation: Full generic retpoline' Software-focused mitigation
+ 'Mitigation: Full AMD retpoline' AMD-specific software mitigation
+ 'Mitigation: Enhanced IBRS' Hardware-focused mitigation
+ ==================================== =================================
+
+ - Firmware status: Show if Indirect Branch Restricted Speculation (IBRS) is
+ used to protect against Spectre variant 2 attacks when calling firmware (x86 only).
+
+ ========== =============================================================
+ 'IBRS_FW' Protection against user program attacks when calling firmware
+ ========== =============================================================
+
+ - Indirect branch prediction barrier (IBPB) status for protection between
+ processes of different users. This feature can be controlled through
+ prctl() per process, or through kernel command line options. This is
+ an x86 only feature. For more details see below.
+
+ =================== ========================================================
+ 'IBPB: disabled' IBPB unused
+ 'IBPB: always-on' Use IBPB on all tasks
+ 'IBPB: conditional' Use IBPB on SECCOMP or indirect branch restricted tasks
+ =================== ========================================================
+
+ - Single threaded indirect branch prediction (STIBP) status for protection
+ between different hyper threads. This feature can be controlled through
+ prctl per process, or through kernel command line options. This is x86
+ only feature. For more details see below.
+
+ ==================== ========================================================
+ 'STIBP: disabled' STIBP unused
+ 'STIBP: forced' Use STIBP on all tasks
+ 'STIBP: conditional' Use STIBP on SECCOMP or indirect branch restricted tasks
+ ==================== ========================================================
+
+ - Return stack buffer (RSB) protection status:
+
+ ============= ===========================================
+ 'RSB filling' Protection of RSB on context switch enabled
+ ============= ===========================================
+
+Full mitigation might require a microcode update from the CPU
+vendor. When the necessary microcode is not available, the kernel will
+report vulnerability.
+
+Turning on mitigation for Spectre variant 1 and Spectre variant 2
+-----------------------------------------------------------------
+
+1. Kernel mitigation
+^^^^^^^^^^^^^^^^^^^^
+
+Spectre variant 1
+~~~~~~~~~~~~~~~~~
+
+ For the Spectre variant 1, vulnerable kernel code (as determined
+ by code audit or scanning tools) is annotated on a case by case
+ basis to use nospec accessor macros for bounds clipping :ref:`[2]
+ <spec_ref2>` to avoid any usable disclosure gadgets. However, it may
+ not cover all attack vectors for Spectre variant 1.
+
+ Copy-from-user code has an LFENCE barrier to prevent the access_ok()
+ check from being mis-speculated. The barrier is done by the
+ barrier_nospec() macro.
+
+ For the swapgs variant of Spectre variant 1, LFENCE barriers are
+ added to interrupt, exception and NMI entry where needed. These
+ barriers are done by the FENCE_SWAPGS_KERNEL_ENTRY and
+ FENCE_SWAPGS_USER_ENTRY macros.
+
+Spectre variant 2
+~~~~~~~~~~~~~~~~~
+
+ For Spectre variant 2 mitigation, the compiler turns indirect calls or
+ jumps in the kernel into equivalent return trampolines (retpolines)
+ :ref:`[3] <spec_ref3>` :ref:`[9] <spec_ref9>` to go to the target
+ addresses. Speculative execution paths under retpolines are trapped
+ in an infinite loop to prevent any speculative execution jumping to
+ a gadget.
+
+ To turn on retpoline mitigation on a vulnerable CPU, the kernel
+ needs to be compiled with a gcc compiler that supports the
+ -mindirect-branch=thunk-extern -mindirect-branch-register options.
+ If the kernel is compiled with a Clang compiler, the compiler needs
+ to support -mretpoline-external-thunk option. The kernel config
+ CONFIG_RETPOLINE needs to be turned on, and the CPU needs to run with
+ the latest updated microcode.
+
+ On Intel Skylake-era systems the mitigation covers most, but not all,
+ cases. See :ref:`[3] <spec_ref3>` for more details.
+
+ On CPUs with hardware mitigation for Spectre variant 2 (e.g. Enhanced
+ IBRS on x86), retpoline is automatically disabled at run time.
+
+ The retpoline mitigation is turned on by default on vulnerable
+ CPUs. It can be forced on or off by the administrator
+ via the kernel command line and sysfs control files. See
+ :ref:`spectre_mitigation_control_command_line`.
+
+ On x86, indirect branch restricted speculation is turned on by default
+ before invoking any firmware code to prevent Spectre variant 2 exploits
+ using the firmware.
+
+ Using kernel address space randomization (CONFIG_RANDOMIZE_SLAB=y
+ and CONFIG_SLAB_FREELIST_RANDOM=y in the kernel configuration) makes
+ attacks on the kernel generally more difficult.
+
+2. User program mitigation
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+ User programs can mitigate Spectre variant 1 using LFENCE or "bounds
+ clipping". For more details see :ref:`[2] <spec_ref2>`.
+
+ For Spectre variant 2 mitigation, individual user programs
+ can be compiled with return trampolines for indirect branches.
+ This protects them from consuming poisoned entries in the branch
+ target buffer left by malicious software. Alternatively, the
+ programs can disable their indirect branch speculation via prctl()
+ (See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`).
+ On x86, this will turn on STIBP to guard against attacks from the
+ sibling thread when the user program is running, and use IBPB to
+ flush the branch target buffer when switching to/from the program.
+
+ Restricting indirect branch speculation on a user program will
+ also prevent the program from launching a variant 2 attack
+ on x86. All sand-boxed SECCOMP programs have indirect branch
+ speculation restricted by default. Administrators can change
+ that behavior via the kernel command line and sysfs control files.
+ See :ref:`spectre_mitigation_control_command_line`.
+
+ Programs that disable their indirect branch speculation will have
+ more overhead and run slower.
+
+ User programs should use address space randomization
+ (/proc/sys/kernel/randomize_va_space = 1 or 2) to make attacks more
+ difficult.
+
+3. VM mitigation
+^^^^^^^^^^^^^^^^
+
+ Within the kernel, Spectre variant 1 attacks from rogue guests are
+ mitigated on a case by case basis in VM exit paths. Vulnerable code
+ uses nospec accessor macros for "bounds clipping", to avoid any
+ usable disclosure gadgets. However, this may not cover all variant
+ 1 attack vectors.
+
+ For Spectre variant 2 attacks from rogue guests to the kernel, the
+ Linux kernel uses retpoline or Enhanced IBRS to prevent consumption of
+ poisoned entries in branch target buffer left by rogue guests. It also
+ flushes the return stack buffer on every VM exit to prevent a return
+ stack buffer underflow so poisoned branch target buffer could be used,
+ or attacker guests leaving poisoned entries in the return stack buffer.
+
+ To mitigate guest-to-guest attacks in the same CPU hardware thread,
+ the branch target buffer is sanitized by flushing before switching
+ to a new guest on a CPU.
+
+ The above mitigations are turned on by default on vulnerable CPUs.
+
+ To mitigate guest-to-guest attacks from sibling thread when SMT is
+ in use, an untrusted guest running in the sibling thread can have
+ its indirect branch speculation disabled by administrator via prctl().
+
+ The kernel also allows guests to use any microcode based mitigation
+ they choose to use (such as IBPB or STIBP on x86) to protect themselves.
+
+.. _spectre_mitigation_control_command_line:
+
+Mitigation control on the kernel command line
+---------------------------------------------
+
+Spectre variant 2 mitigation can be disabled or force enabled at the
+kernel command line.
+
+ nospectre_v1
+
+ [X86,PPC] Disable mitigations for Spectre Variant 1
+ (bounds check bypass). With this option data leaks are
+ possible in the system.
+
+ nospectre_v2
+
+ [X86] Disable all mitigations for the Spectre variant 2
+ (indirect branch prediction) vulnerability. System may
+ allow data leaks with this option, which is equivalent
+ to spectre_v2=off.
+
+
+ spectre_v2=
+
+ [X86] Control mitigation of Spectre variant 2
+ (indirect branch speculation) vulnerability.
+ The default operation protects the kernel from
+ user space attacks.
+
+ on
+ unconditionally enable, implies
+ spectre_v2_user=on
+ off
+ unconditionally disable, implies
+ spectre_v2_user=off
+ auto
+ kernel detects whether your CPU model is
+ vulnerable
+
+ Selecting 'on' will, and 'auto' may, choose a
+ mitigation method at run time according to the
+ CPU, the available microcode, the setting of the
+ CONFIG_RETPOLINE configuration option, and the
+ compiler with which the kernel was built.
+
+ Selecting 'on' will also enable the mitigation
+ against user space to user space task attacks.
+
+ Selecting 'off' will disable both the kernel and
+ the user space protections.
+
+ Specific mitigations can also be selected manually:
+
+ retpoline
+ replace indirect branches
+ retpoline,generic
+ google's original retpoline
+ retpoline,amd
+ AMD-specific minimal thunk
+
+ Not specifying this option is equivalent to
+ spectre_v2=auto.
+
+For user space mitigation:
+
+ spectre_v2_user=
+
+ [X86] Control mitigation of Spectre variant 2
+ (indirect branch speculation) vulnerability between
+ user space tasks
+
+ on
+ Unconditionally enable mitigations. Is
+ enforced by spectre_v2=on
+
+ off
+ Unconditionally disable mitigations. Is
+ enforced by spectre_v2=off
+
+ prctl
+ Indirect branch speculation is enabled,
+ but mitigation can be enabled via prctl
+ per thread. The mitigation control state
+ is inherited on fork.
+
+ prctl,ibpb
+ Like "prctl" above, but only STIBP is
+ controlled per thread. IBPB is issued
+ always when switching between different user
+ space processes.
+
+ seccomp
+ Same as "prctl" above, but all seccomp
+ threads will enable the mitigation unless
+ they explicitly opt out.
+
+ seccomp,ibpb
+ Like "seccomp" above, but only STIBP is
+ controlled per thread. IBPB is issued
+ always when switching between different
+ user space processes.
+
+ auto
+ Kernel selects the mitigation depending on
+ the available CPU features and vulnerability.
+
+ Default mitigation:
+ If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl"
+
+ Not specifying this option is equivalent to
+ spectre_v2_user=auto.
+
+ In general the kernel by default selects
+ reasonable mitigations for the current CPU. To
+ disable Spectre variant 2 mitigations, boot with
+ spectre_v2=off. Spectre variant 1 mitigations
+ cannot be disabled.
+
+Mitigation selection guide
+--------------------------
+
+1. Trusted userspace
+^^^^^^^^^^^^^^^^^^^^
+
+ If all userspace applications are from trusted sources and do not
+ execute externally supplied untrusted code, then the mitigations can
+ be disabled.
+
+2. Protect sensitive programs
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+ For security-sensitive programs that have secrets (e.g. crypto
+ keys), protection against Spectre variant 2 can be put in place by
+ disabling indirect branch speculation when the program is running
+ (See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`).
+
+3. Sandbox untrusted programs
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+ Untrusted programs that could be a source of attacks can be cordoned
+ off by disabling their indirect branch speculation when they are run
+ (See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`).
+ This prevents untrusted programs from polluting the branch target
+ buffer. All programs running in SECCOMP sandboxes have indirect
+ branch speculation restricted by default. This behavior can be
+ changed via the kernel command line and sysfs control files. See
+ :ref:`spectre_mitigation_control_command_line`.
+
+3. High security mode
+^^^^^^^^^^^^^^^^^^^^^
+
+ All Spectre variant 2 mitigations can be forced on
+ at boot time for all programs (See the "on" option in
+ :ref:`spectre_mitigation_control_command_line`). This will add
+ overhead as indirect branch speculations for all programs will be
+ restricted.
+
+ On x86, branch target buffer will be flushed with IBPB when switching
+ to a new program. STIBP is left on all the time to protect programs
+ against variant 2 attacks originating from programs running on
+ sibling threads.
+
+ Alternatively, STIBP can be used only when running programs
+ whose indirect branch speculation is explicitly disabled,
+ while IBPB is still used all the time when switching to a new
+ program to clear the branch target buffer (See "ibpb" option in
+ :ref:`spectre_mitigation_control_command_line`). This "ibpb" option
+ has less performance cost than the "on" option, which leaves STIBP
+ on all the time.
+
+References on Spectre
+---------------------
+
+Intel white papers:
+
+.. _spec_ref1:
+
+[1] `Intel analysis of speculative execution side channels <https://newsroom.intel.com/wp-content/uploads/sites/11/2018/01/Intel-Analysis-of-Speculative-Execution-Side-Channels.pdf>`_.
+
+.. _spec_ref2:
+
+[2] `Bounds check bypass <https://software.intel.com/security-software-guidance/software-guidance/bounds-check-bypass>`_.
+
+.. _spec_ref3:
+
+[3] `Deep dive: Retpoline: A branch target injection mitigation <https://software.intel.com/security-software-guidance/insights/deep-dive-retpoline-branch-target-injection-mitigation>`_.
+
+.. _spec_ref4:
+
+[4] `Deep Dive: Single Thread Indirect Branch Predictors <https://software.intel.com/security-software-guidance/insights/deep-dive-single-thread-indirect-branch-predictors>`_.
+
+AMD white papers:
+
+.. _spec_ref5:
+
+[5] `AMD64 technology indirect branch control extension <https://developer.amd.com/wp-content/resources/Architecture_Guidelines_Update_Indirect_Branch_Control.pdf>`_.
+
+.. _spec_ref6:
+
+[6] `Software techniques for managing speculation on AMD processors <https://developer.amd.com/wp-content/resources/90343-B_SoftwareTechniquesforManagingSpeculation_WP_7-18Update_FNL.pdf>`_.
+
+ARM white papers:
+
+.. _spec_ref7:
+
+[7] `Cache speculation side-channels <https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability/download-the-whitepaper>`_.
+
+.. _spec_ref8:
+
+[8] `Cache speculation issues update <https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability/latest-updates/cache-speculation-issues-update>`_.
+
+Google white paper:
+
+.. _spec_ref9:
+
+[9] `Retpoline: a software construct for preventing branch-target-injection <https://support.google.com/faqs/answer/7625886>`_.
+
+MIPS white paper:
+
+.. _spec_ref10:
+
+[10] `MIPS: response on speculative execution and side channel vulnerabilities <https://www.mips.com/blog/mips-response-on-speculative-execution-and-side-channel-vulnerabilities/>`_.
+
+Academic papers:
+
+.. _spec_ref11:
+
+[11] `Spectre Attacks: Exploiting Speculative Execution <https://spectreattack.com/spectre.pdf>`_.
+
+.. _spec_ref12:
+
+[12] `NetSpectre: Read Arbitrary Memory over Network <https://arxiv.org/abs/1807.10535>`_.
+
+.. _spec_ref13:
+
+[13] `Spectre Returns! Speculation Attacks using the Return Stack Buffer <https://www.usenix.org/system/files/conference/woot18/woot18-paper-koruyeh.pdf>`_.
diff --git a/Documentation/admin-guide/hw-vuln/tsx_async_abort.rst b/Documentation/admin-guide/hw-vuln/tsx_async_abort.rst
new file mode 100644
index 000000000000..af6865b822d2
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/tsx_async_abort.rst
@@ -0,0 +1,279 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+TAA - TSX Asynchronous Abort
+======================================
+
+TAA is a hardware vulnerability that allows unprivileged speculative access to
+data which is available in various CPU internal buffers by using asynchronous
+aborts within an Intel TSX transactional region.
+
+Affected processors
+-------------------
+
+This vulnerability only affects Intel processors that support Intel
+Transactional Synchronization Extensions (TSX) when the TAA_NO bit (bit 8)
+is 0 in the IA32_ARCH_CAPABILITIES MSR. On processors where the MDS_NO bit
+(bit 5) is 0 in the IA32_ARCH_CAPABILITIES MSR, the existing MDS mitigations
+also mitigate against TAA.
+
+Whether a processor is affected or not can be read out from the TAA
+vulnerability file in sysfs. See :ref:`tsx_async_abort_sys_info`.
+
+Related CVEs
+------------
+
+The following CVE entry is related to this TAA issue:
+
+ ============== ===== ===================================================
+ CVE-2019-11135 TAA TSX Asynchronous Abort (TAA) condition on some
+ microprocessors utilizing speculative execution may
+ allow an authenticated user to potentially enable
+ information disclosure via a side channel with
+ local access.
+ ============== ===== ===================================================
+
+Problem
+-------
+
+When performing store, load or L1 refill operations, processors write
+data into temporary microarchitectural structures (buffers). The data in
+those buffers can be forwarded to load operations as an optimization.
+
+Intel TSX is an extension to the x86 instruction set architecture that adds
+hardware transactional memory support to improve performance of multi-threaded
+software. TSX lets the processor expose and exploit concurrency hidden in an
+application due to dynamically avoiding unnecessary synchronization.
+
+TSX supports atomic memory transactions that are either committed (success) or
+aborted. During an abort, operations that happened within the transactional region
+are rolled back. An asynchronous abort takes place, among other options, when a
+different thread accesses a cache line that is also used within the transactional
+region when that access might lead to a data race.
+
+Immediately after an uncompleted asynchronous abort, certain speculatively
+executed loads may read data from those internal buffers and pass it to dependent
+operations. This can be then used to infer the value via a cache side channel
+attack.
+
+Because the buffers are potentially shared between Hyper-Threads cross
+Hyper-Thread attacks are possible.
+
+The victim of a malicious actor does not need to make use of TSX. Only the
+attacker needs to begin a TSX transaction and raise an asynchronous abort
+which in turn potenitally leaks data stored in the buffers.
+
+More detailed technical information is available in the TAA specific x86
+architecture section: :ref:`Documentation/x86/tsx_async_abort.rst <tsx_async_abort>`.
+
+
+Attack scenarios
+----------------
+
+Attacks against the TAA vulnerability can be implemented from unprivileged
+applications running on hosts or guests.
+
+As for MDS, the attacker has no control over the memory addresses that can
+be leaked. Only the victim is responsible for bringing data to the CPU. As
+a result, the malicious actor has to sample as much data as possible and
+then postprocess it to try to infer any useful information from it.
+
+A potential attacker only has read access to the data. Also, there is no direct
+privilege escalation by using this technique.
+
+
+.. _tsx_async_abort_sys_info:
+
+TAA system information
+-----------------------
+
+The Linux kernel provides a sysfs interface to enumerate the current TAA status
+of mitigated systems. The relevant sysfs file is:
+
+/sys/devices/system/cpu/vulnerabilities/tsx_async_abort
+
+The possible values in this file are:
+
+.. list-table::
+
+ * - 'Vulnerable'
+ - The CPU is affected by this vulnerability and the microcode and kernel mitigation are not applied.
+ * - 'Vulnerable: Clear CPU buffers attempted, no microcode'
+ - The system tries to clear the buffers but the microcode might not support the operation.
+ * - 'Mitigation: Clear CPU buffers'
+ - The microcode has been updated to clear the buffers. TSX is still enabled.
+ * - 'Mitigation: TSX disabled'
+ - TSX is disabled.
+ * - 'Not affected'
+ - The CPU is not affected by this issue.
+
+.. _ucode_needed:
+
+Best effort mitigation mode
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If the processor is vulnerable, but the availability of the microcode-based
+mitigation mechanism is not advertised via CPUID the kernel selects a best
+effort mitigation mode. This mode invokes the mitigation instructions
+without a guarantee that they clear the CPU buffers.
+
+This is done to address virtualization scenarios where the host has the
+microcode update applied, but the hypervisor is not yet updated to expose the
+CPUID to the guest. If the host has updated microcode the protection takes
+effect; otherwise a few CPU cycles are wasted pointlessly.
+
+The state in the tsx_async_abort sysfs file reflects this situation
+accordingly.
+
+
+Mitigation mechanism
+--------------------
+
+The kernel detects the affected CPUs and the presence of the microcode which is
+required. If a CPU is affected and the microcode is available, then the kernel
+enables the mitigation by default.
+
+
+The mitigation can be controlled at boot time via a kernel command line option.
+See :ref:`taa_mitigation_control_command_line`.
+
+.. _virt_mechanism:
+
+Virtualization mitigation
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Affected systems where the host has TAA microcode and TAA is mitigated by
+having disabled TSX previously, are not vulnerable regardless of the status
+of the VMs.
+
+In all other cases, if the host either does not have the TAA microcode or
+the kernel is not mitigated, the system might be vulnerable.
+
+
+.. _taa_mitigation_control_command_line:
+
+Mitigation control on the kernel command line
+---------------------------------------------
+
+The kernel command line allows to control the TAA mitigations at boot time with
+the option "tsx_async_abort=". The valid arguments for this option are:
+
+ ============ =============================================================
+ off This option disables the TAA mitigation on affected platforms.
+ If the system has TSX enabled (see next parameter) and the CPU
+ is affected, the system is vulnerable.
+
+ full TAA mitigation is enabled. If TSX is enabled, on an affected
+ system it will clear CPU buffers on ring transitions. On
+ systems which are MDS-affected and deploy MDS mitigation,
+ TAA is also mitigated. Specifying this option on those
+ systems will have no effect.
+
+ full,nosmt The same as tsx_async_abort=full, with SMT disabled on
+ vulnerable CPUs that have TSX enabled. This is the complete
+ mitigation. When TSX is disabled, SMT is not disabled because
+ CPU is not vulnerable to cross-thread TAA attacks.
+ ============ =============================================================
+
+Not specifying this option is equivalent to "tsx_async_abort=full". For
+processors that are affected by both TAA and MDS, specifying just
+"tsx_async_abort=off" without an accompanying "mds=off" will have no
+effect as the same mitigation is used for both vulnerabilities.
+
+The kernel command line also allows to control the TSX feature using the
+parameter "tsx=" on CPUs which support TSX control. MSR_IA32_TSX_CTRL is used
+to control the TSX feature and the enumeration of the TSX feature bits (RTM
+and HLE) in CPUID.
+
+The valid options are:
+
+ ============ =============================================================
+ off Disables TSX on the system.
+
+ Note that this option takes effect only on newer CPUs which are
+ not vulnerable to MDS, i.e., have MSR_IA32_ARCH_CAPABILITIES.MDS_NO=1
+ and which get the new IA32_TSX_CTRL MSR through a microcode
+ update. This new MSR allows for the reliable deactivation of
+ the TSX functionality.
+
+ on Enables TSX.
+
+ Although there are mitigations for all known security
+ vulnerabilities, TSX has been known to be an accelerator for
+ several previous speculation-related CVEs, and so there may be
+ unknown security risks associated with leaving it enabled.
+
+ auto Disables TSX if X86_BUG_TAA is present, otherwise enables TSX
+ on the system.
+ ============ =============================================================
+
+Not specifying this option is equivalent to "tsx=off".
+
+The following combinations of the "tsx_async_abort" and "tsx" are possible. For
+affected platforms tsx=auto is equivalent to tsx=off and the result will be:
+
+ ========= ========================== =========================================
+ tsx=on tsx_async_abort=full The system will use VERW to clear CPU
+ buffers. Cross-thread attacks are still
+ possible on SMT machines.
+ tsx=on tsx_async_abort=full,nosmt As above, cross-thread attacks on SMT
+ mitigated.
+ tsx=on tsx_async_abort=off The system is vulnerable.
+ tsx=off tsx_async_abort=full TSX might be disabled if microcode
+ provides a TSX control MSR. If so,
+ system is not vulnerable.
+ tsx=off tsx_async_abort=full,nosmt Ditto
+ tsx=off tsx_async_abort=off ditto
+ ========= ========================== =========================================
+
+
+For unaffected platforms "tsx=on" and "tsx_async_abort=full" does not clear CPU
+buffers. For platforms without TSX control (MSR_IA32_ARCH_CAPABILITIES.MDS_NO=0)
+"tsx" command line argument has no effect.
+
+For the affected platforms below table indicates the mitigation status for the
+combinations of CPUID bit MD_CLEAR and IA32_ARCH_CAPABILITIES MSR bits MDS_NO
+and TSX_CTRL_MSR.
+
+ ======= ========= ============= ========================================
+ MDS_NO MD_CLEAR TSX_CTRL_MSR Status
+ ======= ========= ============= ========================================
+ 0 0 0 Vulnerable (needs microcode)
+ 0 1 0 MDS and TAA mitigated via VERW
+ 1 1 0 MDS fixed, TAA vulnerable if TSX enabled
+ because MD_CLEAR has no meaning and
+ VERW is not guaranteed to clear buffers
+ 1 X 1 MDS fixed, TAA can be mitigated by
+ VERW or TSX_CTRL_MSR
+ ======= ========= ============= ========================================
+
+Mitigation selection guide
+--------------------------
+
+1. Trusted userspace and guests
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If all user space applications are from a trusted source and do not execute
+untrusted code which is supplied externally, then the mitigation can be
+disabled. The same applies to virtualized environments with trusted guests.
+
+
+2. Untrusted userspace and guests
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If there are untrusted applications or guests on the system, enabling TSX
+might allow a malicious actor to leak data from the host or from other
+processes running on the same physical core.
+
+If the microcode is available and the TSX is disabled on the host, attacks
+are prevented in a virtualized environment as well, even if the VMs do not
+explicitly enable the mitigation.
+
+
+.. _taa_default_mitigations:
+
+Default mitigations
+-------------------
+
+The kernel's default action for vulnerable processors is:
+
+ - Deploy TSX disable mitigation (tsx_async_abort=full tsx=off).
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9240b2caa0b1..933465eff40e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1852,6 +1852,25 @@
KVM MMU at runtime.
Default is 0 (off)
+ kvm.nx_huge_pages=
+ [KVM] Controls the software workaround for the
+ X86_BUG_ITLB_MULTIHIT bug.
+ force : Always deploy workaround.
+ off : Never deploy workaround.
+ auto : Deploy workaround based on the presence of
+ X86_BUG_ITLB_MULTIHIT.
+
+ Default is 'auto'.
+
+ If the software workaround is enabled for the host,
+ guests do need not to enable it for nested guests.
+
+ kvm.nx_huge_pages_recovery_ratio=
+ [KVM] Controls how many 4KiB pages are periodically zapped
+ back to huge pages. 0 disables the recovery, otherwise if
+ the value is N KVM will zap 1/Nth of the 4KiB pages every
+ minute. The default is 60.
+
kvm-amd.nested= [KVM,AMD] Allow nested virtualization in KVM/SVM.
Default is 1 (enabled)
@@ -2235,6 +2254,12 @@
SMT on vulnerable CPUs
off - Unconditionally disable MDS mitigation
+ On TAA-affected machines, mds=off can be prevented by
+ an active TAA mitigation as both vulnerabilities are
+ mitigated with the same mechanism so in order to disable
+ this mitigation, you need to specify tsx_async_abort=off
+ too.
+
Not specifying this option is equivalent to
mds=full.
@@ -2389,8 +2414,8 @@
http://repo.or.cz/w/linux-2.6/mini2440.git
mitigations=
- [X86,PPC,S390] Control optional mitigations for CPU
- vulnerabilities. This is a set of curated,
+ [X86,PPC,S390,ARM64] Control optional mitigations for
+ CPU vulnerabilities. This is a set of curated,
arch-independent options, each of which is an
aggregation of existing arch-specific options.
@@ -2399,13 +2424,23 @@
improves system performance, but it may also
expose users to several CPU vulnerabilities.
Equivalent to: nopti [X86,PPC]
+ kpti=0 [ARM64]
nospectre_v1 [PPC]
nobp=0 [S390]
- nospectre_v2 [X86,PPC,S390]
+ nospectre_v1 [X86]
+ nospectre_v2 [X86,PPC,S390,ARM64]
spectre_v2_user=off [X86]
spec_store_bypass_disable=off [X86,PPC]
+ ssbd=force-off [ARM64]
l1tf=off [X86]
mds=off [X86]
+ tsx_async_abort=off [X86]
+ kvm.nx_huge_pages=off [X86]
+
+ Exceptions:
+ This does not have any effect on
+ kvm.nx_huge_pages when
+ kvm.nx_huge_pages=force.
auto (default)
Mitigate all CPU vulnerabilities, but leave SMT
@@ -2421,6 +2456,7 @@
be fully mitigated, even if it means losing SMT.
Equivalent to: l1tf=flush,nosmt [X86]
mds=full,nosmt [X86]
+ tsx_async_abort=full,nosmt [X86]
mminit_loglevel=
[KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this
@@ -2740,14 +2776,14 @@
nosmt=force: Force disable SMT, cannot be undone
via the sysfs control file.
- nospectre_v1 [PPC] Disable mitigations for Spectre Variant 1 (bounds
- check bypass). With this option data leaks are possible
- in the system.
+ nospectre_v1 [X66, PPC] Disable mitigations for Spectre Variant 1
+ (bounds check bypass). With this option data leaks
+ are possible in the system.
- nospectre_v2 [X86,PPC_FSL_BOOK3E] Disable all mitigations for the Spectre variant 2
- (indirect branch prediction) vulnerability. System may
- allow data leaks with this option, which is equivalent
- to spectre_v2=off.
+ nospectre_v2 [X86,PPC_FSL_BOOK3E,ARM64] Disable all mitigations for
+ the Spectre variant 2 (indirect branch prediction)
+ vulnerability. System may allow data leaks with this
+ option.
nospec_store_bypass_disable
[HW] Disable all mitigations for the Speculative Store Bypass vulnerability
@@ -3787,6 +3823,13 @@
Run specified binary instead of /init from the ramdisk,
used for early userspace startup. See initrd.
+ rdrand= [X86]
+ force - Override the decision by the kernel to hide the
+ advertisement of RDRAND support (this affects
+ certain AMD processors because of buggy BIOS
+ support, specifically around the suspend/resume
+ path).
+
rdt= [HW,X86,RDT]
Turn on/off individual RDT features. List is:
cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, mba.
@@ -4495,6 +4538,76 @@
platforms where RDTSC is slow and this accounting
can add overhead.
+ tsx= [X86] Control Transactional Synchronization
+ Extensions (TSX) feature in Intel processors that
+ support TSX control.
+
+ This parameter controls the TSX feature. The options are:
+
+ on - Enable TSX on the system. Although there are
+ mitigations for all known security vulnerabilities,
+ TSX has been known to be an accelerator for
+ several previous speculation-related CVEs, and
+ so there may be unknown security risks associated
+ with leaving it enabled.
+
+ off - Disable TSX on the system. (Note that this
+ option takes effect only on newer CPUs which are
+ not vulnerable to MDS, i.e., have
+ MSR_IA32_ARCH_CAPABILITIES.MDS_NO=1 and which get
+ the new IA32_TSX_CTRL MSR through a microcode
+ update. This new MSR allows for the reliable
+ deactivation of the TSX functionality.)
+
+ auto - Disable TSX if X86_BUG_TAA is present,
+ otherwise enable TSX on the system.
+
+ Not specifying this option is equivalent to tsx=off.
+
+ See Documentation/admin-guide/hw-vuln/tsx_async_abort.rst
+ for more details.
+
+ tsx_async_abort= [X86,INTEL] Control mitigation for the TSX Async
+ Abort (TAA) vulnerability.
+
+ Similar to Micro-architectural Data Sampling (MDS)
+ certain CPUs that support Transactional
+ Synchronization Extensions (TSX) are vulnerable to an
+ exploit against CPU internal buffers which can forward
+ information to a disclosure gadget under certain
+ conditions.
+
+ In vulnerable processors, the speculatively forwarded
+ data can be used in a cache side channel attack, to
+ access data to which the attacker does not have direct
+ access.
+
+ This parameter controls the TAA mitigation. The
+ options are:
+
+ full - Enable TAA mitigation on vulnerable CPUs
+ if TSX is enabled.
+
+ full,nosmt - Enable TAA mitigation and disable SMT on
+ vulnerable CPUs. If TSX is disabled, SMT
+ is not disabled because CPU is not
+ vulnerable to cross-thread TAA attacks.
+ off - Unconditionally disable TAA mitigation
+
+ On MDS-affected machines, tsx_async_abort=off can be
+ prevented by an active MDS mitigation as both vulnerabilities
+ are mitigated with the same mechanism so in order to disable
+ this mitigation, you need to specify mds=off too.
+
+ Not specifying this option is equivalent to
+ tsx_async_abort=full. On CPUs which are MDS affected
+ and deploy MDS mitigation, TAA mitigation is not
+ required and doesn't provide any additional
+ mitigation.
+
+ For details see:
+ Documentation/admin-guide/hw-vuln/tsx_async_abort.rst
+
turbografx.map[2|3]= [HW,JOY]
TurboGraFX parallel port interface
Format:
@@ -4580,13 +4693,13 @@
Flags is a set of characters, each corresponding
to a common usb-storage quirk flag as follows:
a = SANE_SENSE (collect more than 18 bytes
- of sense data);
+ of sense data, not on uas);
b = BAD_SENSE (don't collect more than 18
- bytes of sense data);
+ bytes of sense data, not on uas);
c = FIX_CAPACITY (decrease the reported
device capacity by one sector);
d = NO_READ_DISC_INFO (don't use
- READ_DISC_INFO command);
+ READ_DISC_INFO command, not on uas);
e = NO_READ_CAPACITY_16 (don't use
READ_CAPACITY_16 command);
f = NO_REPORT_OPCODES (don't use report opcodes
@@ -4601,17 +4714,18 @@
j = NO_REPORT_LUNS (don't use report luns
command, uas only);
l = NOT_LOCKABLE (don't try to lock and
- unlock ejectable media);
+ unlock ejectable media, not on uas);
m = MAX_SECTORS_64 (don't transfer more
- than 64 sectors = 32 KB at a time);
+ than 64 sectors = 32 KB at a time,
+ not on uas);
n = INITIAL_READ10 (force a retry of the
- initial READ(10) command);
+ initial READ(10) command, not on uas);
o = CAPACITY_OK (accept the capacity
- reported by the device);
+ reported by the device, not on uas);
p = WRITE_CACHE (the device cache is ON
- by default);
+ by default, not on uas);
r = IGNORE_RESIDUE (the device reports
- bogus residue values);
+ bogus residue values, not on uas);
s = SINGLE_LUN (the device has only one
Logical Unit);
t = NO_ATA_1X (don't allow ATA(12) and ATA(16)
@@ -4620,7 +4734,8 @@
w = NO_WP_DETECT (don't test whether the
medium is write-protected).
y = ALWAYS_SYNC (issue a SYNCHRONIZE_CACHE
- even if the device claims no cache)
+ even if the device claims no cache,
+ not on uas)
Example: quirks=0419:aaf5:rl,0421:0433:rc
user_debug= [KNL,ARM]
@@ -4865,6 +4980,10 @@
the unplug protocol
never -- do not unplug even if version check succeeds
+ xen_legacy_crash [X86,XEN]
+ Crash from Xen panic notifier, without executing late
+ panic() code such as dumping handler.
+
xen_nopvspin [X86,XEN]
Disables the ticketlock slowpath using Xen PV
optimizations.
diff --git a/Documentation/arm64/cpu-feature-registers.txt b/Documentation/arm64/cpu-feature-registers.txt
index dad411d635d8..7964f03846b1 100644
--- a/Documentation/arm64/cpu-feature-registers.txt
+++ b/Documentation/arm64/cpu-feature-registers.txt
@@ -110,7 +110,17 @@ infrastructure:
x--------------------------------------------------x
| Name | bits | visible |
|--------------------------------------------------|
- | RES0 | [63-32] | n |
+ | TS | [55-52] | y |
+ |--------------------------------------------------|
+ | FHM | [51-48] | y |
+ |--------------------------------------------------|
+ | DP | [47-44] | y |
+ |--------------------------------------------------|
+ | SM4 | [43-40] | y |
+ |--------------------------------------------------|
+ | SM3 | [39-36] | y |
+ |--------------------------------------------------|
+ | SHA3 | [35-32] | y |
|--------------------------------------------------|
| RDM | [31-28] | y |
|--------------------------------------------------|
@@ -123,8 +133,6 @@ infrastructure:
| SHA1 | [11-8] | y |
|--------------------------------------------------|
| AES | [7-4] | y |
- |--------------------------------------------------|
- | RES0 | [3-0] | n |
x--------------------------------------------------x
@@ -132,7 +140,9 @@ infrastructure:
x--------------------------------------------------x
| Name | bits | visible |
|--------------------------------------------------|
- | RES0 | [63-28] | n |
+ | DIT | [51-48] | y |
+ |--------------------------------------------------|
+ | SVE | [35-32] | y |
|--------------------------------------------------|
| GIC | [27-24] | n |
|--------------------------------------------------|
@@ -183,6 +193,14 @@ infrastructure:
| DPB | [3-0] | y |
x--------------------------------------------------x
+ 5) ID_AA64MMFR2_EL1 - Memory model feature register 2
+
+ x--------------------------------------------------x
+ | Name | bits | visible |
+ |--------------------------------------------------|
+ | AT | [35-32] | y |
+ x--------------------------------------------------x
+
Appendix I: Example
---------------------------
diff --git a/Documentation/atomic_t.txt b/Documentation/atomic_t.txt
index 913396ac5824..ed0d814df7e0 100644
--- a/Documentation/atomic_t.txt
+++ b/Documentation/atomic_t.txt
@@ -177,6 +177,9 @@ These helper barriers exist because architectures have varying implicit
ordering on their SMP atomic primitives. For example our TSO architectures
provide full ordered atomics and these barriers are no-ops.
+NOTE: when the atomic RmW ops are fully ordered, they should also imply a
+compiler barrier.
+
Thus:
atomic_fetch_add();
diff --git a/Documentation/devicetree/bindings/media/i2c/adv748x.txt b/Documentation/devicetree/bindings/media/i2c/adv748x.txt
index 21ffb5ed8183..54d1d3bc1869 100644
--- a/Documentation/devicetree/bindings/media/i2c/adv748x.txt
+++ b/Documentation/devicetree/bindings/media/i2c/adv748x.txt
@@ -73,7 +73,7 @@ Example:
};
};
- port@10 {
+ port@a {
reg = <10>;
adv7482_txa: endpoint {
@@ -83,7 +83,7 @@ Example:
};
};
- port@11 {
+ port@b {
reg = <11>;
adv7482_txb: endpoint {
diff --git a/Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt b/Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt
index 4648948f7c3b..e15589f47787 100644
--- a/Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt
+++ b/Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt
@@ -19,6 +19,9 @@ Optional properties:
- interrupt-names: must be "mdio_done_error" when there is a share interrupt fed
to this hardware block, or must be "mdio_done" for the first interrupt and
"mdio_error" for the second when there are separate interrupts
+- clocks: A reference to the clock supplying the MDIO bus controller
+- clock-frequency: the MDIO bus clock that must be output by the MDIO bus
+ hardware, if absent, the default hardware values are used
Child nodes of this MDIO bus controller node are standard Ethernet PHY device
nodes as described in Documentation/devicetree/bindings/net/phy.txt
diff --git a/Documentation/devicetree/bindings/net/can/microchip,mcp251x.txt b/Documentation/devicetree/bindings/net/can/microchip,mcp251x.txt
index ee3723beb701..33b38716b77f 100644
--- a/Documentation/devicetree/bindings/net/can/microchip,mcp251x.txt
+++ b/Documentation/devicetree/bindings/net/can/microchip,mcp251x.txt
@@ -4,6 +4,7 @@ Required properties:
- compatible: Should be one of the following:
- "microchip,mcp2510" for MCP2510.
- "microchip,mcp2515" for MCP2515.
+ - "microchip,mcp25625" for MCP25625.
- reg: SPI chip select.
- clocks: The clock feeding the CAN controller.
- interrupt-parent: The parent interrupt controller.
diff --git a/Documentation/devicetree/bindings/net/marvell-orion-mdio.txt b/Documentation/devicetree/bindings/net/marvell-orion-mdio.txt
index 42cd81090a2c..3f3cfc1d8d4d 100644
--- a/Documentation/devicetree/bindings/net/marvell-orion-mdio.txt
+++ b/Documentation/devicetree/bindings/net/marvell-orion-mdio.txt
@@ -16,7 +16,7 @@ Required properties:
Optional properties:
- interrupts: interrupt line number for the SMI error/done interrupt
-- clocks: phandle for up to three required clocks for the MDIO instance
+- clocks: phandle for up to four required clocks for the MDIO instance
The child nodes of the MDIO driver are the individual PHY devices
connected to this MDIO bus. They must have a "reg" property given the
diff --git a/Documentation/devicetree/bindings/rtc/abracon,abx80x.txt b/Documentation/devicetree/bindings/rtc/abracon,abx80x.txt
index be789685a1c2..18b892d010d8 100644
--- a/Documentation/devicetree/bindings/rtc/abracon,abx80x.txt
+++ b/Documentation/devicetree/bindings/rtc/abracon,abx80x.txt
@@ -27,4 +27,4 @@ and valid to enable charging:
- "abracon,tc-diode": should be "standard" (0.6V) or "schottky" (0.3V)
- "abracon,tc-resistor": should be <0>, <3>, <6> or <11>. 0 disables the output
- resistor, the other values are in ohm.
+ resistor, the other values are in kOhm.
diff --git a/Documentation/devicetree/bindings/serial/mvebu-uart.txt b/Documentation/devicetree/bindings/serial/mvebu-uart.txt
index 6087defd9f93..d37fabe17bd1 100644
--- a/Documentation/devicetree/bindings/serial/mvebu-uart.txt
+++ b/Documentation/devicetree/bindings/serial/mvebu-uart.txt
@@ -8,6 +8,6 @@ Required properties:
Example:
serial@12000 {
compatible = "marvell,armada-3700-uart";
- reg = <0x12000 0x400>;
+ reg = <0x12000 0x200>;
interrupts = <43>;
};
diff --git a/Documentation/devicetree/bindings/usb/dwc3.txt b/Documentation/devicetree/bindings/usb/dwc3.txt
index e4d76b66f272..681bcd9590b6 100644
--- a/Documentation/devicetree/bindings/usb/dwc3.txt
+++ b/Documentation/devicetree/bindings/usb/dwc3.txt
@@ -47,6 +47,8 @@ Optional properties:
from P0 to P1/P2/P3 without delay.
- snps,dis-tx-ipgap-linecheck-quirk: when set, disable u2mac linestate check
during HS transmit.
+ - snps,dis_metastability_quirk: when set, disable metastability workaround.
+ CAUTION: use only if you are absolutely sure of it.
- snps,is-utmi-l1-suspend: true when DWC3 asserts output signal
utmi_l1_suspend_n, false when asserts utmi_sleep_n
- snps,hird-threshold: HIRD threshold
diff --git a/Documentation/hid/uhid.txt b/Documentation/hid/uhid.txt
index c8656dd029a9..958fff945304 100644
--- a/Documentation/hid/uhid.txt
+++ b/Documentation/hid/uhid.txt
@@ -160,7 +160,7 @@ them but you should handle them according to your needs.
UHID_OUTPUT:
This is sent if the HID device driver wants to send raw data to the I/O
device on the interrupt channel. You should read the payload and forward it to
- the device. The payload is of type "struct uhid_data_req".
+ the device. The payload is of type "struct uhid_output_req".
This may be received even though you haven't received UHID_OPEN, yet.
UHID_GET_REPORT:
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 828fcd6711b3..3bbe6fb2b086 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -241,6 +241,14 @@ tcp_base_mss - INTEGER
Path MTU discovery (MTU probing). If MTU probing is enabled,
this is the initial MSS used by the connection.
+tcp_min_snd_mss - INTEGER
+ TCP SYN and SYNACK messages usually advertise an ADVMSS option,
+ as described in RFC 1122 and RFC 6691.
+ If this ADVMSS option is smaller than tcp_min_snd_mss,
+ it is silently capped to tcp_min_snd_mss.
+
+ Default : 48 (at least 8 bytes of payload per segment)
+
tcp_congestion_control - STRING
Set the congestion control algorithm to be used for new
connections. The algorithm "reno" is always available, but
diff --git a/Documentation/robust-futexes.txt b/Documentation/robust-futexes.txt
index 6c42c75103eb..6361fb01c9c1 100644
--- a/Documentation/robust-futexes.txt
+++ b/Documentation/robust-futexes.txt
@@ -218,5 +218,4 @@ All other architectures should build just fine too - but they won't have
the new syscalls yet.
Architectures need to implement the new futex_atomic_cmpxchg_inatomic()
-inline function before writing up the syscalls (that function returns
--ENOSYS right now).
+inline function before writing up the syscalls.
diff --git a/Documentation/scheduler/sched-bwc.txt b/Documentation/scheduler/sched-bwc.txt
index f6b1873f68ab..de583fbbfe42 100644
--- a/Documentation/scheduler/sched-bwc.txt
+++ b/Documentation/scheduler/sched-bwc.txt
@@ -90,6 +90,51 @@ There are two ways in which a group may become throttled:
In case b) above, even though the child may have runtime remaining it will not
be allowed to until the parent's runtime is refreshed.
+CFS Bandwidth Quota Caveats
+---------------------------
+Once a slice is assigned to a cpu it does not expire. However all but 1ms of
+the slice may be returned to the global pool if all threads on that cpu become
+unrunnable. This is configured at compile time by the min_cfs_rq_runtime
+variable. This is a performance tweak that helps prevent added contention on
+the global lock.
+
+The fact that cpu-local slices do not expire results in some interesting corner
+cases that should be understood.
+
+For cgroup cpu constrained applications that are cpu limited this is a
+relatively moot point because they will naturally consume the entirety of their
+quota as well as the entirety of each cpu-local slice in each period. As a
+result it is expected that nr_periods roughly equal nr_throttled, and that
+cpuacct.usage will increase roughly equal to cfs_quota_us in each period.
+
+For highly-threaded, non-cpu bound applications this non-expiration nuance
+allows applications to briefly burst past their quota limits by the amount of
+unused slice on each cpu that the task group is running on (typically at most
+1ms per cpu or as defined by min_cfs_rq_runtime). This slight burst only
+applies if quota had been assigned to a cpu and then not fully used or returned
+in previous periods. This burst amount will not be transferred between cores.
+As a result, this mechanism still strictly limits the task group to quota
+average usage, albeit over a longer time window than a single period. This
+also limits the burst ability to no more than 1ms per cpu. This provides
+better more predictable user experience for highly threaded applications with
+small quota limits on high core count machines. It also eliminates the
+propensity to throttle these applications while simultanously using less than
+quota amounts of cpu. Another way to say this, is that by allowing the unused
+portion of a slice to remain valid across periods we have decreased the
+possibility of wastefully expiring quota on cpu-local silos that don't need a
+full slice's amount of cpu time.
+
+The interaction between cpu-bound and non-cpu-bound-interactive applications
+should also be considered, especially when single core usage hits 100%. If you
+gave each of these applications half of a cpu-core and they both got scheduled
+on the same CPU it is theoretically possible that the non-cpu bound application
+will use up to 1ms additional quota in some periods, thereby preventing the
+cpu-bound application from fully using its quota by that same amount. In these
+instances it will be up to the CFS algorithm (see sched-design-CFS.rst) to
+decide which application is chosen to run, as they will both be runnable and
+have remaining quota. This runtime discrepancy will be made up in the following
+periods when the interactive application idles.
+
Examples
--------
1. Limit a group to 1 CPU worth of runtime.
diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index b67044a2575f..e12b39f40a6b 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -91,6 +91,14 @@ Values :
0 - disable JIT kallsyms export (default value)
1 - enable JIT kallsyms export for privileged users only
+bpf_jit_limit
+-------------
+
+This enforces a global limit for memory allocations to the BPF JIT
+compiler in order to reject unprivileged JIT requests once it has
+been surpassed. bpf_jit_limit contains the value of the global limit
+in bytes.
+
dev_weight
--------------
diff --git a/Documentation/usb/rio.txt b/Documentation/usb/rio.txt
deleted file mode 100644
index aee715af7db7..000000000000
--- a/Documentation/usb/rio.txt
+++ /dev/null
@@ -1,138 +0,0 @@
-Copyright (C) 1999, 2000 Bruce Tenison
-Portions Copyright (C) 1999, 2000 David Nelson
-Thanks to David Nelson for guidance and the usage of the scanner.txt
-and scanner.c files to model our driver and this informative file.
-
-Mar. 2, 2000
-
-CHANGES
-
-- Initial Revision
-
-
-OVERVIEW
-
-This README will address issues regarding how to configure the kernel
-to access a RIO 500 mp3 player.
-Before I explain how to use this to access the Rio500 please be warned:
-
-W A R N I N G:
---------------
-
-Please note that this software is still under development. The authors
-are in no way responsible for any damage that may occur, no matter how
-inconsequential.
-
-It seems that the Rio has a problem when sending .mp3 with low batteries.
-I suggest when the batteries are low and you want to transfer stuff that you
-replace it with a fresh one. In my case, what happened is I lost two 16kb
-blocks (they are no longer usable to store information to it). But I don't
-know if that's normal or not; it could simply be a problem with the flash
-memory.
-
-In an extreme case, I left my Rio playing overnight and the batteries wore
-down to nothing and appear to have corrupted the flash memory. My RIO
-needed to be replaced as a result. Diamond tech support is aware of the
-problem. Do NOT allow your batteries to wear down to nothing before
-changing them. It appears RIO 500 firmware does not handle low battery
-power well at all.
-
-On systems with OHCI controllers, the kernel OHCI code appears to have
-power on problems with some chipsets. If you are having problems
-connecting to your RIO 500, try turning it on first and then plugging it
-into the USB cable.
-
-Contact information:
---------------------
-
- The main page for the project is hosted at sourceforge.net in the following
- URL: <http://rio500.sourceforge.net>. You can also go to the project's
- sourceforge home page at: <http://sourceforge.net/projects/rio500/>.
- There is also a mailing list: rio500-users@lists.sourceforge.net
-
-Authors:
--------
-
-Most of the code was written by Cesar Miquel <miquel@df.uba.ar>. Keith
-Clayton <kclayton@jps.net> is incharge of the PPC port and making sure
-things work there. Bruce Tenison <btenison@dibbs.net> is adding support
-for .fon files and also does testing. The program will mostly sure be
-re-written and Pete Ikusz along with the rest will re-design it. I would
-also like to thank Tri Nguyen <tmn_3022000@hotmail.com> who provided use
-with some important information regarding the communication with the Rio.
-
-ADDITIONAL INFORMATION and Userspace tools
-
-http://rio500.sourceforge.net/
-
-
-REQUIREMENTS
-
-A host with a USB port. Ideally, either a UHCI (Intel) or OHCI
-(Compaq and others) hardware port should work.
-
-A Linux development kernel (2.3.x) with USB support enabled or a
-backported version to linux-2.2.x. See http://www.linux-usb.org for
-more information on accomplishing this.
-
-A Linux kernel with RIO 500 support enabled.
-
-'lspci' which is only needed to determine the type of USB hardware
-available in your machine.
-
-CONFIGURATION
-
-Using `lspci -v`, determine the type of USB hardware available.
-
- If you see something like:
-
- USB Controller: ......
- Flags: .....
- I/O ports at ....
-
- Then you have a UHCI based controller.
-
- If you see something like:
-
- USB Controller: .....
- Flags: ....
- Memory at .....
-
- Then you have a OHCI based controller.
-
-Using `make menuconfig` or your preferred method for configuring the
-kernel, select 'Support for USB', 'OHCI/UHCI' depending on your
-hardware (determined from the steps above), 'USB Diamond Rio500 support', and
-'Preliminary USB device filesystem'. Compile and install the modules
-(you may need to execute `depmod -a` to update the module
-dependencies).
-
-Add a device for the USB rio500:
- `mknod /dev/usb/rio500 c 180 64`
-
-Set appropriate permissions for /dev/usb/rio500 (don't forget about
-group and world permissions). Both read and write permissions are
-required for proper operation.
-
-Load the appropriate modules (if compiled as modules):
-
- OHCI:
- modprobe usbcore
- modprobe usb-ohci
- modprobe rio500
-
- UHCI:
- modprobe usbcore
- modprobe usb-uhci (or uhci)
- modprobe rio500
-
-That's it. The Rio500 Utils at: http://rio500.sourceforge.net should
-be able to access the rio500.
-
-BUGS
-
-If you encounter any problems feel free to drop me an email.
-
-Bruce Tenison
-btenison@dibbs.net
-
diff --git a/Documentation/userspace-api/spec_ctrl.rst b/Documentation/userspace-api/spec_ctrl.rst
index c4dbe6f7cdae..0fda8f614110 100644
--- a/Documentation/userspace-api/spec_ctrl.rst
+++ b/Documentation/userspace-api/spec_ctrl.rst
@@ -47,6 +47,8 @@ If PR_SPEC_PRCTL is set, then the per-task control of the mitigation is
available. If not set, prctl(PR_SET_SPECULATION_CTRL) for the speculation
misfeature will fail.
+.. _set_spec_ctrl:
+
PR_SET_SPECULATION_CTRL
-----------------------
diff --git a/Documentation/virtual/kvm/locking.txt b/Documentation/virtual/kvm/locking.txt
index 1bb8bcaf8497..635cd6eaf714 100644
--- a/Documentation/virtual/kvm/locking.txt
+++ b/Documentation/virtual/kvm/locking.txt
@@ -15,8 +15,6 @@ The acquisition orders for mutexes are as follows:
On x86, vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock.
-For spinlocks, kvm_lock is taken outside kvm->mmu_lock.
-
Everything else is a leaf: no other lock is taken inside the critical
sections.
@@ -169,7 +167,7 @@ which time it will be set using the Dirty tracking mechanism described above.
------------
Name: kvm_lock
-Type: spinlock_t
+Type: mutex
Arch: any
Protects: - vm_list
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index ef389dcf1b1d..0780d55c5aa8 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -6,3 +6,4 @@ x86 architecture specifics
:maxdepth: 1
mds
+ tsx_async_abort
diff --git a/Documentation/x86/tsx_async_abort.rst b/Documentation/x86/tsx_async_abort.rst
new file mode 100644
index 000000000000..583ddc185ba2
--- /dev/null
+++ b/Documentation/x86/tsx_async_abort.rst
@@ -0,0 +1,117 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+TSX Async Abort (TAA) mitigation
+================================
+
+.. _tsx_async_abort:
+
+Overview
+--------
+
+TSX Async Abort (TAA) is a side channel attack on internal buffers in some
+Intel processors similar to Microachitectural Data Sampling (MDS). In this
+case certain loads may speculatively pass invalid data to dependent operations
+when an asynchronous abort condition is pending in a Transactional
+Synchronization Extensions (TSX) transaction. This includes loads with no
+fault or assist condition. Such loads may speculatively expose stale data from
+the same uarch data structures as in MDS, with same scope of exposure i.e.
+same-thread and cross-thread. This issue affects all current processors that
+support TSX.
+
+Mitigation strategy
+-------------------
+
+a) TSX disable - one of the mitigations is to disable TSX. A new MSR
+IA32_TSX_CTRL will be available in future and current processors after
+microcode update which can be used to disable TSX. In addition, it
+controls the enumeration of the TSX feature bits (RTM and HLE) in CPUID.
+
+b) Clear CPU buffers - similar to MDS, clearing the CPU buffers mitigates this
+vulnerability. More details on this approach can be found in
+:ref:`Documentation/admin-guide/hw-vuln/mds.rst <mds>`.
+
+Kernel internal mitigation modes
+--------------------------------
+
+ ============= ============================================================
+ off Mitigation is disabled. Either the CPU is not affected or
+ tsx_async_abort=off is supplied on the kernel command line.
+
+ tsx disabled Mitigation is enabled. TSX feature is disabled by default at
+ bootup on processors that support TSX control.
+
+ verw Mitigation is enabled. CPU is affected and MD_CLEAR is
+ advertised in CPUID.
+
+ ucode needed Mitigation is enabled. CPU is affected and MD_CLEAR is not
+ advertised in CPUID. That is mainly for virtualization
+ scenarios where the host has the updated microcode but the
+ hypervisor does not expose MD_CLEAR in CPUID. It's a best
+ effort approach without guarantee.
+ ============= ============================================================
+
+If the CPU is affected and the "tsx_async_abort" kernel command line parameter is
+not provided then the kernel selects an appropriate mitigation depending on the
+status of RTM and MD_CLEAR CPUID bits.
+
+Below tables indicate the impact of tsx=on|off|auto cmdline options on state of
+TAA mitigation, VERW behavior and TSX feature for various combinations of
+MSR_IA32_ARCH_CAPABILITIES bits.
+
+1. "tsx=off"
+
+========= ========= ============ ============ ============== =================== ======================
+MSR_IA32_ARCH_CAPABILITIES bits Result with cmdline tsx=off
+---------------------------------- -------------------------------------------------------------------------
+TAA_NO MDS_NO TSX_CTRL_MSR TSX state VERW can clear TAA mitigation TAA mitigation
+ after bootup CPU buffers tsx_async_abort=off tsx_async_abort=full
+========= ========= ============ ============ ============== =================== ======================
+ 0 0 0 HW default Yes Same as MDS Same as MDS
+ 0 0 1 Invalid case Invalid case Invalid case Invalid case
+ 0 1 0 HW default No Need ucode update Need ucode update
+ 0 1 1 Disabled Yes TSX disabled TSX disabled
+ 1 X 1 Disabled X None needed None needed
+========= ========= ============ ============ ============== =================== ======================
+
+2. "tsx=on"
+
+========= ========= ============ ============ ============== =================== ======================
+MSR_IA32_ARCH_CAPABILITIES bits Result with cmdline tsx=on
+---------------------------------- -------------------------------------------------------------------------
+TAA_NO MDS_NO TSX_CTRL_MSR TSX state VERW can clear TAA mitigation TAA mitigation
+ after bootup CPU buffers tsx_async_abort=off tsx_async_abort=full
+========= ========= ============ ============ ============== =================== ======================
+ 0 0 0 HW default Yes Same as MDS Same as MDS
+ 0 0 1 Invalid case Invalid case Invalid case Invalid case
+ 0 1 0 HW default No Need ucode update Need ucode update
+ 0 1 1 Enabled Yes None Same as MDS
+ 1 X 1 Enabled X None needed None needed
+========= ========= ============ ============ ============== =================== ======================
+
+3. "tsx=auto"
+
+========= ========= ============ ============ ============== =================== ======================
+MSR_IA32_ARCH_CAPABILITIES bits Result with cmdline tsx=auto
+---------------------------------- -------------------------------------------------------------------------
+TAA_NO MDS_NO TSX_CTRL_MSR TSX state VERW can clear TAA mitigation TAA mitigation
+ after bootup CPU buffers tsx_async_abort=off tsx_async_abort=full
+========= ========= ============ ============ ============== =================== ======================
+ 0 0 0 HW default Yes Same as MDS Same as MDS
+ 0 0 1 Invalid case Invalid case Invalid case Invalid case
+ 0 1 0 HW default No Need ucode update Need ucode update
+ 0 1 1 Disabled Yes TSX disabled TSX disabled
+ 1 X 1 Enabled X None needed None needed
+========= ========= ============ ============ ============== =================== ======================
+
+In the tables, TSX_CTRL_MSR is a new bit in MSR_IA32_ARCH_CAPABILITIES that
+indicates whether MSR_IA32_TSX_CTRL is supported.
+
+There are two control bits in IA32_TSX_CTRL MSR:
+
+ Bit 0: When set it disables the Restricted Transactional Memory (RTM)
+ sub-feature of TSX (will force all transactions to abort on the
+ XBEGIN instruction).
+
+ Bit 1: When set it disables the enumeration of the RTM and HLE feature
+ (i.e. it will make CPUID(EAX=7).EBX{bit4} and
+ CPUID(EAX=7).EBX{bit11} read as 0).