From cd6f35b8421ff20365ff711c0ac7647fd70e9af7 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sat, 15 Jun 2019 17:44:24 -0700 Subject: tcp: add tcp_min_snd_mss sysctl commit 5f3e2bf008c2221478101ee72f5cb4654b9fc363 upstream. Some TCP peers announce a very small MSS option in their SYN and/or SYN/ACK messages. This forces the stack to send packets with a very high network/cpu overhead. Linux has enforced a minimal value of 48. Since this value includes the size of TCP options, and that the options can consume up to 40 bytes, this means that each segment can include only 8 bytes of payload. In some cases, it can be useful to increase the minimal value to a saner value. We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility reasons. Note that TCP_MAXSEG socket option enforces a minimal value of (TCP_MIN_MSS). David Miller increased this minimal value in commit c39508d6f118 ("tcp: Make TCP_MAXSEG minimum more correct.") from 64 to 88. We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS. CVE-2019-11479 -- tcp mss hardcoded to 48 Signed-off-by: Eric Dumazet Suggested-by: Jonathan Looney Acked-by: Neal Cardwell Cc: Yuchung Cheng Cc: Tyler Hicks Cc: Bruce Curtis Cc: Jonathan Lemon Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- Documentation/networking/ip-sysctl.txt | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'Documentation') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 828fcd6711b3..3bbe6fb2b086 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -241,6 +241,14 @@ tcp_base_mss - INTEGER Path MTU discovery (MTU probing). If MTU probing is enabled, this is the initial MSS used by the connection. +tcp_min_snd_mss - INTEGER + TCP SYN and SYNACK messages usually advertise an ADVMSS option, + as described in RFC 1122 and RFC 6691. + If this ADVMSS option is smaller than tcp_min_snd_mss, + it is silently capped to tcp_min_snd_mss. + + Default : 48 (at least 8 bytes of payload per segment) + tcp_congestion_control - STRING Set the congestion control algorithm to be used for new connections. The algorithm "reno" is always available, but -- cgit v1.2.3 From b5800872a9c8216d0a80582a24b4bad6e7102f50 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Wed, 10 Apr 2019 11:51:54 +0100 Subject: futex: Update comments and docs about return values of arch futex code commit 427503519739e779c0db8afe876c1b33f3ac60ae upstream. The architecture implementations of 'arch_futex_atomic_op_inuser()' and 'futex_atomic_cmpxchg_inatomic()' are permitted to return only -EFAULT, -EAGAIN or -ENOSYS in the case of failure. Update the comments in the asm-generic/ implementation and also a stray reference in the robust futex documentation. Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman --- Documentation/robust-futexes.txt | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/robust-futexes.txt b/Documentation/robust-futexes.txt index 6c42c75103eb..6361fb01c9c1 100644 --- a/Documentation/robust-futexes.txt +++ b/Documentation/robust-futexes.txt @@ -218,5 +218,4 @@ All other architectures should build just fine too - but they won't have the new syscalls yet. Architectures need to implement the new futex_atomic_cmpxchg_inatomic() -inline function before writing up the syscalls (that function returns --ENOSYS right now). +inline function before writing up the syscalls. -- cgit v1.2.3 From 2b0ce4096f1aad23fdc541156299bd7b00728e68 Mon Sep 17 00:00:00 2001 From: Sean Nyekjaer Date: Tue, 7 May 2019 11:34:37 +0200 Subject: dt-bindings: can: mcp251x: add mcp25625 support [ Upstream commit 0df82dcd55832a99363ab7f9fab954fcacdac3ae ] Fully compatible with mcp2515, the mcp25625 have integrated transceiver. This patch add the mcp25625 to the device tree bindings documentation. Signed-off-by: Sean Nyekjaer Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin --- Documentation/devicetree/bindings/net/can/microchip,mcp251x.txt | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/can/microchip,mcp251x.txt b/Documentation/devicetree/bindings/net/can/microchip,mcp251x.txt index ee3723beb701..33b38716b77f 100644 --- a/Documentation/devicetree/bindings/net/can/microchip,mcp251x.txt +++ b/Documentation/devicetree/bindings/net/can/microchip,mcp251x.txt @@ -4,6 +4,7 @@ Required properties: - compatible: Should be one of the following: - "microchip,mcp2510" for MCP2510. - "microchip,mcp2515" for MCP2515. + - "microchip,mcp25625" for MCP25625. - reg: SPI chip select. - clocks: The clock feeding the CAN controller. - interrupt-parent: The parent interrupt controller. -- cgit v1.2.3 From 065cf4159258caf507d96fa5449c601e5a9e865a Mon Sep 17 00:00:00 2001 From: Reinhard Speyerer Date: Wed, 12 Jun 2019 19:03:50 +0200 Subject: qmi_wwan: extend permitted QMAP mux_id value range [ Upstream commit 36815b416fa48766ac5a98e4b2dc3ebc5887222e ] Permit mux_id values up to 254 to be used in qmimux_register_device() for compatibility with ip(8) and the rmnet driver. Fixes: c6adf77953bc ("net: usb: qmi_wwan: add qmap mux protocol support") Cc: Daniele Palmas Signed-off-by: Reinhard Speyerer Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- Documentation/ABI/testing/sysfs-class-net-qmi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-class-net-qmi b/Documentation/ABI/testing/sysfs-class-net-qmi index 7122d6264c49..c310db4ccbc2 100644 --- a/Documentation/ABI/testing/sysfs-class-net-qmi +++ b/Documentation/ABI/testing/sysfs-class-net-qmi @@ -29,7 +29,7 @@ Contact: Bjørn Mork Description: Unsigned integer. - Write a number ranging from 1 to 127 to add a qmap mux + Write a number ranging from 1 to 254 to add a qmap mux based network device, supported by recent Qualcomm based modems. @@ -46,5 +46,5 @@ Contact: Bjørn Mork Description: Unsigned integer. - Write a number ranging from 1 to 127 to delete a previously + Write a number ranging from 1 to 254 to delete a previously created qmap mux based network device. -- cgit v1.2.3 From d30cb648f3ec77ca5f773a9b74d565cca1561aa1 Mon Sep 17 00:00:00 2001 From: Tim Chen Date: Thu, 20 Jun 2019 16:10:50 -0700 Subject: Documentation: Add section about CPU vulnerabilities for Spectre commit 6e88559470f581741bcd0f2794f9054814ac9740 upstream. Add documentation for Spectre vulnerability and the mitigation mechanisms: - Explain the problem and risks - Document the mitigation mechanisms - Document the command line controls - Document the sysfs files Co-developed-by: Andi Kleen Signed-off-by: Andi Kleen Co-developed-by: Tim Chen Signed-off-by: Tim Chen Reviewed-by: Randy Dunlap Reviewed-by: Thomas Gleixner Cc: stable@vger.kernel.org Signed-off-by: Jonathan Corbet Signed-off-by: Greg Kroah-Hartman --- Documentation/admin-guide/hw-vuln/index.rst | 1 + Documentation/admin-guide/hw-vuln/spectre.rst | 697 ++++++++++++++++++++++++++ Documentation/userspace-api/spec_ctrl.rst | 2 + 3 files changed, 700 insertions(+) create mode 100644 Documentation/admin-guide/hw-vuln/spectre.rst (limited to 'Documentation') diff --git a/Documentation/admin-guide/hw-vuln/index.rst b/Documentation/admin-guide/hw-vuln/index.rst index ffc064c1ec68..49311f3da6f2 100644 --- a/Documentation/admin-guide/hw-vuln/index.rst +++ b/Documentation/admin-guide/hw-vuln/index.rst @@ -9,5 +9,6 @@ are configurable at compile, boot or run time. .. toctree:: :maxdepth: 1 + spectre l1tf mds diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst new file mode 100644 index 000000000000..25f3b2532198 --- /dev/null +++ b/Documentation/admin-guide/hw-vuln/spectre.rst @@ -0,0 +1,697 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Spectre Side Channels +===================== + +Spectre is a class of side channel attacks that exploit branch prediction +and speculative execution on modern CPUs to read memory, possibly +bypassing access controls. Speculative execution side channel exploits +do not modify memory but attempt to infer privileged data in the memory. + +This document covers Spectre variant 1 and Spectre variant 2. + +Affected processors +------------------- + +Speculative execution side channel methods affect a wide range of modern +high performance processors, since most modern high speed processors +use branch prediction and speculative execution. + +The following CPUs are vulnerable: + + - Intel Core, Atom, Pentium, and Xeon processors + + - AMD Phenom, EPYC, and Zen processors + + - IBM POWER and zSeries processors + + - Higher end ARM processors + + - Apple CPUs + + - Higher end MIPS CPUs + + - Likely most other high performance CPUs. Contact your CPU vendor for details. + +Whether a processor is affected or not can be read out from the Spectre +vulnerability files in sysfs. See :ref:`spectre_sys_info`. + +Related CVEs +------------ + +The following CVE entries describe Spectre variants: + + ============= ======================= ================= + CVE-2017-5753 Bounds check bypass Spectre variant 1 + CVE-2017-5715 Branch target injection Spectre variant 2 + ============= ======================= ================= + +Problem +------- + +CPUs use speculative operations to improve performance. That may leave +traces of memory accesses or computations in the processor's caches, +buffers, and branch predictors. Malicious software may be able to +influence the speculative execution paths, and then use the side effects +of the speculative execution in the CPUs' caches and buffers to infer +privileged data touched during the speculative execution. + +Spectre variant 1 attacks take advantage of speculative execution of +conditional branches, while Spectre variant 2 attacks use speculative +execution of indirect branches to leak privileged memory. +See :ref:`[1] ` :ref:`[5] ` :ref:`[7] ` +:ref:`[10] ` :ref:`[11] `. + +Spectre variant 1 (Bounds Check Bypass) +--------------------------------------- + +The bounds check bypass attack :ref:`[2] ` takes advantage +of speculative execution that bypasses conditional branch instructions +used for memory access bounds check (e.g. checking if the index of an +array results in memory access within a valid range). This results in +memory accesses to invalid memory (with out-of-bound index) that are +done speculatively before validation checks resolve. Such speculative +memory accesses can leave side effects, creating side channels which +leak information to the attacker. + +There are some extensions of Spectre variant 1 attacks for reading data +over the network, see :ref:`[12] `. However such attacks +are difficult, low bandwidth, fragile, and are considered low risk. + +Spectre variant 2 (Branch Target Injection) +------------------------------------------- + +The branch target injection attack takes advantage of speculative +execution of indirect branches :ref:`[3] `. The indirect +branch predictors inside the processor used to guess the target of +indirect branches can be influenced by an attacker, causing gadget code +to be speculatively executed, thus exposing sensitive data touched by +the victim. The side effects left in the CPU's caches during speculative +execution can be measured to infer data values. + +.. _poison_btb: + +In Spectre variant 2 attacks, the attacker can steer speculative indirect +branches in the victim to gadget code by poisoning the branch target +buffer of a CPU used for predicting indirect branch addresses. Such +poisoning could be done by indirect branching into existing code, +with the address offset of the indirect branch under the attacker's +control. Since the branch prediction on impacted hardware does not +fully disambiguate branch address and uses the offset for prediction, +this could cause privileged code's indirect branch to jump to a gadget +code with the same offset. + +The most useful gadgets take an attacker-controlled input parameter (such +as a register value) so that the memory read can be controlled. Gadgets +without input parameters might be possible, but the attacker would have +very little control over what memory can be read, reducing the risk of +the attack revealing useful data. + +One other variant 2 attack vector is for the attacker to poison the +return stack buffer (RSB) :ref:`[13] ` to cause speculative +subroutine return instruction execution to go to a gadget. An attacker's +imbalanced subroutine call instructions might "poison" entries in the +return stack buffer which are later consumed by a victim's subroutine +return instructions. This attack can be mitigated by flushing the return +stack buffer on context switch, or virtual machine (VM) exit. + +On systems with simultaneous multi-threading (SMT), attacks are possible +from the sibling thread, as level 1 cache and branch target buffer +(BTB) may be shared between hardware threads in a CPU core. A malicious +program running on the sibling thread may influence its peer's BTB to +steer its indirect branch speculations to gadget code, and measure the +speculative execution's side effects left in level 1 cache to infer the +victim's data. + +Attack scenarios +---------------- + +The following list of attack scenarios have been anticipated, but may +not cover all possible attack vectors. + +1. A user process attacking the kernel +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + The attacker passes a parameter to the kernel via a register or + via a known address in memory during a syscall. Such parameter may + be used later by the kernel as an index to an array or to derive + a pointer for a Spectre variant 1 attack. The index or pointer + is invalid, but bound checks are bypassed in the code branch taken + for speculative execution. This could cause privileged memory to be + accessed and leaked. + + For kernel code that has been identified where data pointers could + potentially be influenced for Spectre attacks, new "nospec" accessor + macros are used to prevent speculative loading of data. + + Spectre variant 2 attacker can :ref:`poison ` the branch + target buffer (BTB) before issuing syscall to launch an attack. + After entering the kernel, the kernel could use the poisoned branch + target buffer on indirect jump and jump to gadget code in speculative + execution. + + If an attacker tries to control the memory addresses leaked during + speculative execution, he would also need to pass a parameter to the + gadget, either through a register or a known address in memory. After + the gadget has executed, he can measure the side effect. + + The kernel can protect itself against consuming poisoned branch + target buffer entries by using return trampolines (also known as + "retpoline") :ref:`[3] ` :ref:`[9] ` for all + indirect branches. Return trampolines trap speculative execution paths + to prevent jumping to gadget code during speculative execution. + x86 CPUs with Enhanced Indirect Branch Restricted Speculation + (Enhanced IBRS) available in hardware should use the feature to + mitigate Spectre variant 2 instead of retpoline. Enhanced IBRS is + more efficient than retpoline. + + There may be gadget code in firmware which could be exploited with + Spectre variant 2 attack by a rogue user process. To mitigate such + attacks on x86, Indirect Branch Restricted Speculation (IBRS) feature + is turned on before the kernel invokes any firmware code. + +2. A user process attacking another user process +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + A malicious user process can try to attack another user process, + either via a context switch on the same hardware thread, or from the + sibling hyperthread sharing a physical processor core on simultaneous + multi-threading (SMT) system. + + Spectre variant 1 attacks generally require passing parameters + between the processes, which needs a data passing relationship, such + as remote procedure calls (RPC). Those parameters are used in gadget + code to derive invalid data pointers accessing privileged memory in + the attacked process. + + Spectre variant 2 attacks can be launched from a rogue process by + :ref:`poisoning ` the branch target buffer. This can + influence the indirect branch targets for a victim process that either + runs later on the same hardware thread, or running concurrently on + a sibling hardware thread sharing the same physical core. + + A user process can protect itself against Spectre variant 2 attacks + by using the prctl() syscall to disable indirect branch speculation + for itself. An administrator can also cordon off an unsafe process + from polluting the branch target buffer by disabling the process's + indirect branch speculation. This comes with a performance cost + from not using indirect branch speculation and clearing the branch + target buffer. When SMT is enabled on x86, for a process that has + indirect branch speculation disabled, Single Threaded Indirect Branch + Predictors (STIBP) :ref:`[4] ` are turned on to prevent the + sibling thread from controlling branch target buffer. In addition, + the Indirect Branch Prediction Barrier (IBPB) is issued to clear the + branch target buffer when context switching to and from such process. + + On x86, the return stack buffer is stuffed on context switch. + This prevents the branch target buffer from being used for branch + prediction when the return stack buffer underflows while switching to + a deeper call stack. Any poisoned entries in the return stack buffer + left by the previous process will also be cleared. + + User programs should use address space randomization to make attacks + more difficult (Set /proc/sys/kernel/randomize_va_space = 1 or 2). + +3. A virtualized guest attacking the host +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + The attack mechanism is similar to how user processes attack the + kernel. The kernel is entered via hyper-calls or other virtualization + exit paths. + + For Spectre variant 1 attacks, rogue guests can pass parameters + (e.g. in registers) via hyper-calls to derive invalid pointers to + speculate into privileged memory after entering the kernel. For places + where such kernel code has been identified, nospec accessor macros + are used to stop speculative memory access. + + For Spectre variant 2 attacks, rogue guests can :ref:`poison + ` the branch target buffer or return stack buffer, causing + the kernel to jump to gadget code in the speculative execution paths. + + To mitigate variant 2, the host kernel can use return trampolines + for indirect branches to bypass the poisoned branch target buffer, + and flushing the return stack buffer on VM exit. This prevents rogue + guests from affecting indirect branching in the host kernel. + + To protect host processes from rogue guests, host processes can have + indirect branch speculation disabled via prctl(). The branch target + buffer is cleared before context switching to such processes. + +4. A virtualized guest attacking other guest +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + A rogue guest may attack another guest to get data accessible by the + other guest. + + Spectre variant 1 attacks are possible if parameters can be passed + between guests. This may be done via mechanisms such as shared memory + or message passing. Such parameters could be used to derive data + pointers to privileged data in guest. The privileged data could be + accessed by gadget code in the victim's speculation paths. + + Spectre variant 2 attacks can be launched from a rogue guest by + :ref:`poisoning ` the branch target buffer or the return + stack buffer. Such poisoned entries could be used to influence + speculation execution paths in the victim guest. + + Linux kernel mitigates attacks to other guests running in the same + CPU hardware thread by flushing the return stack buffer on VM exit, + and clearing the branch target buffer before switching to a new guest. + + If SMT is used, Spectre variant 2 attacks from an untrusted guest + in the sibling hyperthread can be mitigated by the administrator, + by turning off the unsafe guest's indirect branch speculation via + prctl(). A guest can also protect itself by turning on microcode + based mitigations (such as IBPB or STIBP on x86) within the guest. + +.. _spectre_sys_info: + +Spectre system information +-------------------------- + +The Linux kernel provides a sysfs interface to enumerate the current +mitigation status of the system for Spectre: whether the system is +vulnerable, and which mitigations are active. + +The sysfs file showing Spectre variant 1 mitigation status is: + + /sys/devices/system/cpu/vulnerabilities/spectre_v1 + +The possible values in this file are: + + ======================================= ================================= + 'Mitigation: __user pointer sanitation' Protection in kernel on a case by + case base with explicit pointer + sanitation. + ======================================= ================================= + +However, the protections are put in place on a case by case basis, +and there is no guarantee that all possible attack vectors for Spectre +variant 1 are covered. + +The spectre_v2 kernel file reports if the kernel has been compiled with +retpoline mitigation or if the CPU has hardware mitigation, and if the +CPU has support for additional process-specific mitigation. + +This file also reports CPU features enabled by microcode to mitigate +attack between user processes: + +1. Indirect Branch Prediction Barrier (IBPB) to add additional + isolation between processes of different users. +2. Single Thread Indirect Branch Predictors (STIBP) to add additional + isolation between CPU threads running on the same core. + +These CPU features may impact performance when used and can be enabled +per process on a case-by-case base. + +The sysfs file showing Spectre variant 2 mitigation status is: + + /sys/devices/system/cpu/vulnerabilities/spectre_v2 + +The possible values in this file are: + + - Kernel status: + + ==================================== ================================= + 'Not affected' The processor is not vulnerable + 'Vulnerable' Vulnerable, no mitigation + 'Mitigation: Full generic retpoline' Software-focused mitigation + 'Mitigation: Full AMD retpoline' AMD-specific software mitigation + 'Mitigation: Enhanced IBRS' Hardware-focused mitigation + ==================================== ================================= + + - Firmware status: Show if Indirect Branch Restricted Speculation (IBRS) is + used to protect against Spectre variant 2 attacks when calling firmware (x86 only). + + ========== ============================================================= + 'IBRS_FW' Protection against user program attacks when calling firmware + ========== ============================================================= + + - Indirect branch prediction barrier (IBPB) status for protection between + processes of different users. This feature can be controlled through + prctl() per process, or through kernel command line options. This is + an x86 only feature. For more details see below. + + =================== ======================================================== + 'IBPB: disabled' IBPB unused + 'IBPB: always-on' Use IBPB on all tasks + 'IBPB: conditional' Use IBPB on SECCOMP or indirect branch restricted tasks + =================== ======================================================== + + - Single threaded indirect branch prediction (STIBP) status for protection + between different hyper threads. This feature can be controlled through + prctl per process, or through kernel command line options. This is x86 + only feature. For more details see below. + + ==================== ======================================================== + 'STIBP: disabled' STIBP unused + 'STIBP: forced' Use STIBP on all tasks + 'STIBP: conditional' Use STIBP on SECCOMP or indirect branch restricted tasks + ==================== ======================================================== + + - Return stack buffer (RSB) protection status: + + ============= =========================================== + 'RSB filling' Protection of RSB on context switch enabled + ============= =========================================== + +Full mitigation might require a microcode update from the CPU +vendor. When the necessary microcode is not available, the kernel will +report vulnerability. + +Turning on mitigation for Spectre variant 1 and Spectre variant 2 +----------------------------------------------------------------- + +1. Kernel mitigation +^^^^^^^^^^^^^^^^^^^^ + + For the Spectre variant 1, vulnerable kernel code (as determined + by code audit or scanning tools) is annotated on a case by case + basis to use nospec accessor macros for bounds clipping :ref:`[2] + ` to avoid any usable disclosure gadgets. However, it may + not cover all attack vectors for Spectre variant 1. + + For Spectre variant 2 mitigation, the compiler turns indirect calls or + jumps in the kernel into equivalent return trampolines (retpolines) + :ref:`[3] ` :ref:`[9] ` to go to the target + addresses. Speculative execution paths under retpolines are trapped + in an infinite loop to prevent any speculative execution jumping to + a gadget. + + To turn on retpoline mitigation on a vulnerable CPU, the kernel + needs to be compiled with a gcc compiler that supports the + -mindirect-branch=thunk-extern -mindirect-branch-register options. + If the kernel is compiled with a Clang compiler, the compiler needs + to support -mretpoline-external-thunk option. The kernel config + CONFIG_RETPOLINE needs to be turned on, and the CPU needs to run with + the latest updated microcode. + + On Intel Skylake-era systems the mitigation covers most, but not all, + cases. See :ref:`[3] ` for more details. + + On CPUs with hardware mitigation for Spectre variant 2 (e.g. Enhanced + IBRS on x86), retpoline is automatically disabled at run time. + + The retpoline mitigation is turned on by default on vulnerable + CPUs. It can be forced on or off by the administrator + via the kernel command line and sysfs control files. See + :ref:`spectre_mitigation_control_command_line`. + + On x86, indirect branch restricted speculation is turned on by default + before invoking any firmware code to prevent Spectre variant 2 exploits + using the firmware. + + Using kernel address space randomization (CONFIG_RANDOMIZE_SLAB=y + and CONFIG_SLAB_FREELIST_RANDOM=y in the kernel configuration) makes + attacks on the kernel generally more difficult. + +2. User program mitigation +^^^^^^^^^^^^^^^^^^^^^^^^^^ + + User programs can mitigate Spectre variant 1 using LFENCE or "bounds + clipping". For more details see :ref:`[2] `. + + For Spectre variant 2 mitigation, individual user programs + can be compiled with return trampolines for indirect branches. + This protects them from consuming poisoned entries in the branch + target buffer left by malicious software. Alternatively, the + programs can disable their indirect branch speculation via prctl() + (See :ref:`Documentation/userspace-api/spec_ctrl.rst `). + On x86, this will turn on STIBP to guard against attacks from the + sibling thread when the user program is running, and use IBPB to + flush the branch target buffer when switching to/from the program. + + Restricting indirect branch speculation on a user program will + also prevent the program from launching a variant 2 attack + on x86. All sand-boxed SECCOMP programs have indirect branch + speculation restricted by default. Administrators can change + that behavior via the kernel command line and sysfs control files. + See :ref:`spectre_mitigation_control_command_line`. + + Programs that disable their indirect branch speculation will have + more overhead and run slower. + + User programs should use address space randomization + (/proc/sys/kernel/randomize_va_space = 1 or 2) to make attacks more + difficult. + +3. VM mitigation +^^^^^^^^^^^^^^^^ + + Within the kernel, Spectre variant 1 attacks from rogue guests are + mitigated on a case by case basis in VM exit paths. Vulnerable code + uses nospec accessor macros for "bounds clipping", to avoid any + usable disclosure gadgets. However, this may not cover all variant + 1 attack vectors. + + For Spectre variant 2 attacks from rogue guests to the kernel, the + Linux kernel uses retpoline or Enhanced IBRS to prevent consumption of + poisoned entries in branch target buffer left by rogue guests. It also + flushes the return stack buffer on every VM exit to prevent a return + stack buffer underflow so poisoned branch target buffer could be used, + or attacker guests leaving poisoned entries in the return stack buffer. + + To mitigate guest-to-guest attacks in the same CPU hardware thread, + the branch target buffer is sanitized by flushing before switching + to a new guest on a CPU. + + The above mitigations are turned on by default on vulnerable CPUs. + + To mitigate guest-to-guest attacks from sibling thread when SMT is + in use, an untrusted guest running in the sibling thread can have + its indirect branch speculation disabled by administrator via prctl(). + + The kernel also allows guests to use any microcode based mitigation + they choose to use (such as IBPB or STIBP on x86) to protect themselves. + +.. _spectre_mitigation_control_command_line: + +Mitigation control on the kernel command line +--------------------------------------------- + +Spectre variant 2 mitigation can be disabled or force enabled at the +kernel command line. + + nospectre_v2 + + [X86] Disable all mitigations for the Spectre variant 2 + (indirect branch prediction) vulnerability. System may + allow data leaks with this option, which is equivalent + to spectre_v2=off. + + + spectre_v2= + + [X86] Control mitigation of Spectre variant 2 + (indirect branch speculation) vulnerability. + The default operation protects the kernel from + user space attacks. + + on + unconditionally enable, implies + spectre_v2_user=on + off + unconditionally disable, implies + spectre_v2_user=off + auto + kernel detects whether your CPU model is + vulnerable + + Selecting 'on' will, and 'auto' may, choose a + mitigation method at run time according to the + CPU, the available microcode, the setting of the + CONFIG_RETPOLINE configuration option, and the + compiler with which the kernel was built. + + Selecting 'on' will also enable the mitigation + against user space to user space task attacks. + + Selecting 'off' will disable both the kernel and + the user space protections. + + Specific mitigations can also be selected manually: + + retpoline + replace indirect branches + retpoline,generic + google's original retpoline + retpoline,amd + AMD-specific minimal thunk + + Not specifying this option is equivalent to + spectre_v2=auto. + +For user space mitigation: + + spectre_v2_user= + + [X86] Control mitigation of Spectre variant 2 + (indirect branch speculation) vulnerability between + user space tasks + + on + Unconditionally enable mitigations. Is + enforced by spectre_v2=on + + off + Unconditionally disable mitigations. Is + enforced by spectre_v2=off + + prctl + Indirect branch speculation is enabled, + but mitigation can be enabled via prctl + per thread. The mitigation control state + is inherited on fork. + + prctl,ibpb + Like "prctl" above, but only STIBP is + controlled per thread. IBPB is issued + always when switching between different user + space processes. + + seccomp + Same as "prctl" above, but all seccomp + threads will enable the mitigation unless + they explicitly opt out. + + seccomp,ibpb + Like "seccomp" above, but only STIBP is + controlled per thread. IBPB is issued + always when switching between different + user space processes. + + auto + Kernel selects the mitigation depending on + the available CPU features and vulnerability. + + Default mitigation: + If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl" + + Not specifying this option is equivalent to + spectre_v2_user=auto. + + In general the kernel by default selects + reasonable mitigations for the current CPU. To + disable Spectre variant 2 mitigations, boot with + spectre_v2=off. Spectre variant 1 mitigations + cannot be disabled. + +Mitigation selection guide +-------------------------- + +1. Trusted userspace +^^^^^^^^^^^^^^^^^^^^ + + If all userspace applications are from trusted sources and do not + execute externally supplied untrusted code, then the mitigations can + be disabled. + +2. Protect sensitive programs +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + For security-sensitive programs that have secrets (e.g. crypto + keys), protection against Spectre variant 2 can be put in place by + disabling indirect branch speculation when the program is running + (See :ref:`Documentation/userspace-api/spec_ctrl.rst `). + +3. Sandbox untrusted programs +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + Untrusted programs that could be a source of attacks can be cordoned + off by disabling their indirect branch speculation when they are run + (See :ref:`Documentation/userspace-api/spec_ctrl.rst `). + This prevents untrusted programs from polluting the branch target + buffer. All programs running in SECCOMP sandboxes have indirect + branch speculation restricted by default. This behavior can be + changed via the kernel command line and sysfs control files. See + :ref:`spectre_mitigation_control_command_line`. + +3. High security mode +^^^^^^^^^^^^^^^^^^^^^ + + All Spectre variant 2 mitigations can be forced on + at boot time for all programs (See the "on" option in + :ref:`spectre_mitigation_control_command_line`). This will add + overhead as indirect branch speculations for all programs will be + restricted. + + On x86, branch target buffer will be flushed with IBPB when switching + to a new program. STIBP is left on all the time to protect programs + against variant 2 attacks originating from programs running on + sibling threads. + + Alternatively, STIBP can be used only when running programs + whose indirect branch speculation is explicitly disabled, + while IBPB is still used all the time when switching to a new + program to clear the branch target buffer (See "ibpb" option in + :ref:`spectre_mitigation_control_command_line`). This "ibpb" option + has less performance cost than the "on" option, which leaves STIBP + on all the time. + +References on Spectre +--------------------- + +Intel white papers: + +.. _spec_ref1: + +[1] `Intel analysis of speculative execution side channels `_. + +.. _spec_ref2: + +[2] `Bounds check bypass `_. + +.. _spec_ref3: + +[3] `Deep dive: Retpoline: A branch target injection mitigation `_. + +.. _spec_ref4: + +[4] `Deep Dive: Single Thread Indirect Branch Predictors `_. + +AMD white papers: + +.. _spec_ref5: + +[5] `AMD64 technology indirect branch control extension `_. + +.. _spec_ref6: + +[6] `Software techniques for managing speculation on AMD processors `_. + +ARM white papers: + +.. _spec_ref7: + +[7] `Cache speculation side-channels `_. + +.. _spec_ref8: + +[8] `Cache speculation issues update `_. + +Google white paper: + +.. _spec_ref9: + +[9] `Retpoline: a software construct for preventing branch-target-injection `_. + +MIPS white paper: + +.. _spec_ref10: + +[10] `MIPS: response on speculative execution and side channel vulnerabilities `_. + +Academic papers: + +.. _spec_ref11: + +[11] `Spectre Attacks: Exploiting Speculative Execution `_. + +.. _spec_ref12: + +[12] `NetSpectre: Read Arbitrary Memory over Network `_. + +.. _spec_ref13: + +[13] `Spectre Returns! Speculation Attacks using the Return Stack Buffer `_. diff --git a/Documentation/userspace-api/spec_ctrl.rst b/Documentation/userspace-api/spec_ctrl.rst index c4dbe6f7cdae..0fda8f614110 100644 --- a/Documentation/userspace-api/spec_ctrl.rst +++ b/Documentation/userspace-api/spec_ctrl.rst @@ -47,6 +47,8 @@ If PR_SPEC_PRCTL is set, then the per-task control of the mitigation is available. If not set, prctl(PR_SET_SPECULATION_CTRL) for the speculation misfeature will fail. +.. _set_spec_ctrl: + PR_SET_SPECULATION_CTRL ----------------------- -- cgit v1.2.3 From 11a6dd0034b6fc3a55ffe1382262797f4aec9f5d Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Wed, 24 Apr 2019 13:38:23 +0200 Subject: x86/atomic: Fix smp_mb__{before,after}_atomic() [ Upstream commit 69d927bba39517d0980462efc051875b7f4db185 ] Recent probing at the Linux Kernel Memory Model uncovered a 'surprise'. Strongly ordered architectures where the atomic RmW primitive implies full memory ordering and smp_mb__{before,after}_atomic() are a simple barrier() (such as x86) fail for: *x = 1; atomic_inc(u); smp_mb__after_atomic(); r0 = *y; Because, while the atomic_inc() implies memory order, it (surprisingly) does not provide a compiler barrier. This then allows the compiler to re-order like so: atomic_inc(u); *x = 1; smp_mb__after_atomic(); r0 = *y; Which the CPU is then allowed to re-order (under TSO rules) like: atomic_inc(u); r0 = *y; *x = 1; And this very much was not intended. Therefore strengthen the atomic RmW ops to include a compiler barrier. NOTE: atomic_{or,and,xor} and the bitops already had the compiler barrier. Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin --- Documentation/atomic_t.txt | 3 +++ 1 file changed, 3 insertions(+) (limited to 'Documentation') diff --git a/Documentation/atomic_t.txt b/Documentation/atomic_t.txt index 913396ac5824..ed0d814df7e0 100644 --- a/Documentation/atomic_t.txt +++ b/Documentation/atomic_t.txt @@ -177,6 +177,9 @@ These helper barriers exist because architectures have varying implicit ordering on their SMP atomic primitives. For example our TSO architectures provide full ordered atomics and these barriers are no-ops. +NOTE: when the atomic RmW ops are fully ordered, they should also imply a +compiler barrier. + Thus: atomic_fetch_add(); -- cgit v1.2.3 From 3533b12dfc3e04ef4ea6d995fefd13cc3dc0fa5d Mon Sep 17 00:00:00 2001 From: Josua Mayer Date: Tue, 9 Jul 2019 15:00:58 +0200 Subject: dt-bindings: allow up to four clocks for orion-mdio commit 80785f5a22e9073e2ded5958feb7f220e066d17b upstream. Armada 8040 needs four clocks to be enabled for MDIO accesses to work. Update the binding to allow the extra clock to be specified. Cc: stable@vger.kernel.org Fixes: 6d6a331f44a1 ("dt-bindings: allow up to three clocks for orion-mdio") Reviewed-by: Andrew Lunn Signed-off-by: Josua Mayer Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- Documentation/devicetree/bindings/net/marvell-orion-mdio.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/marvell-orion-mdio.txt b/Documentation/devicetree/bindings/net/marvell-orion-mdio.txt index 42cd81090a2c..3f3cfc1d8d4d 100644 --- a/Documentation/devicetree/bindings/net/marvell-orion-mdio.txt +++ b/Documentation/devicetree/bindings/net/marvell-orion-mdio.txt @@ -16,7 +16,7 @@ Required properties: Optional properties: - interrupts: interrupt line number for the SMI error/done interrupt -- clocks: phandle for up to three required clocks for the MDIO instance +- clocks: phandle for up to four required clocks for the MDIO instance The child nodes of the MDIO driver are the individual PHY devices connected to this MDIO bus. They must have a "reg" property given the -- cgit v1.2.3 From 8604ac922424d5930a672f4d31eb858e82cd1111 Mon Sep 17 00:00:00 2001 From: allen yan Date: Thu, 7 Sep 2017 15:04:53 +0200 Subject: arm64: dts: marvell: Fix A37xx UART0 register size commit c737abc193d16e62e23e2fb585b8b7398ab380d8 upstream. Armada-37xx UART0 registers are 0x200 bytes wide. Right next to them are the UART1 registers that should not be declared in this node. Update the example in DT bindings document accordingly. Signed-off-by: allen yan Signed-off-by: Miquel Raynal Signed-off-by: Gregory CLEMENT Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman --- Documentation/devicetree/bindings/serial/mvebu-uart.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/serial/mvebu-uart.txt b/Documentation/devicetree/bindings/serial/mvebu-uart.txt index 6087defd9f93..d37fabe17bd1 100644 --- a/Documentation/devicetree/bindings/serial/mvebu-uart.txt +++ b/Documentation/devicetree/bindings/serial/mvebu-uart.txt @@ -8,6 +8,6 @@ Required properties: Example: serial@12000 { compatible = "marvell,armada-3700-uart"; - reg = <0x12000 0x400>; + reg = <0x12000 0x200>; interrupts = <43>; }; -- cgit v1.2.3 From ee524c14856d2eab5d3d2202c12c06d2e7190c2a Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Mon, 8 Jul 2019 11:52:26 -0500 Subject: x86/speculation: Enable Spectre v1 swapgs mitigations commit a2059825986a1c8143fd6698774fa9d83733bb11 upstream The previous commit added macro calls in the entry code which mitigate the Spectre v1 swapgs issue if the X86_FEATURE_FENCE_SWAPGS_* features are enabled. Enable those features where applicable. The mitigations may be disabled with "nospectre_v1" or "mitigations=off". There are different features which can affect the risk of attack: - When FSGSBASE is enabled, unprivileged users are able to place any value in GS, using the wrgsbase instruction. This means they can write a GS value which points to any value in kernel space, which can be useful with the following gadget in an interrupt/exception/NMI handler: if (coming from user space) swapgs mov %gs:, %reg1 // dependent load or store based on the value of %reg // for example: mov %(reg1), %reg2 If an interrupt is coming from user space, and the entry code speculatively skips the swapgs (due to user branch mistraining), it may speculatively execute the GS-based load and a subsequent dependent load or store, exposing the kernel data to an L1 side channel leak. Note that, on Intel, a similar attack exists in the above gadget when coming from kernel space, if the swapgs gets speculatively executed to switch back to the user GS. On AMD, this variant isn't possible because swapgs is serializing with respect to future GS-based accesses. NOTE: The FSGSBASE patch set hasn't been merged yet, so the above case doesn't exist quite yet. - When FSGSBASE is disabled, the issue is mitigated somewhat because unprivileged users must use prctl(ARCH_SET_GS) to set GS, which restricts GS values to user space addresses only. That means the gadget would need an additional step, since the target kernel address needs to be read from user space first. Something like: if (coming from user space) swapgs mov %gs:, %reg1 mov (%reg1), %reg2 // dependent load or store based on the value of %reg2 // for example: mov %(reg2), %reg3 It's difficult to audit for this gadget in all the handlers, so while there are no known instances of it, it's entirely possible that it exists somewhere (or could be introduced in the future). Without tooling to analyze all such code paths, consider it vulnerable. Effects of SMAP on the !FSGSBASE case: - If SMAP is enabled, and the CPU reports RDCL_NO (i.e., not susceptible to Meltdown), the kernel is prevented from speculatively reading user space memory, even L1 cached values. This effectively disables the !FSGSBASE attack vector. - If SMAP is enabled, but the CPU *is* susceptible to Meltdown, SMAP still prevents the kernel from speculatively reading user space memory. But it does *not* prevent the kernel from reading the user value from L1, if it has already been cached. This is probably only a small hurdle for an attacker to overcome. Thanks to Dave Hansen for contributing the speculative_smap() function. Thanks to Andrew Cooper for providing the inside scoop on whether swapgs is serializing on AMD. [ tglx: Fixed the USER fence decision and polished the comment as suggested by Dave Hansen ] Signed-off-by: Josh Poimboeuf Signed-off-by: Thomas Gleixner Reviewed-by: Dave Hansen Signed-off-by: Greg Kroah-Hartman --- Documentation/admin-guide/kernel-parameters.txt | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 9240b2caa0b1..13d80111bc1f 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2401,6 +2401,7 @@ Equivalent to: nopti [X86,PPC] nospectre_v1 [PPC] nobp=0 [S390] + nospectre_v1 [X86] nospectre_v2 [X86,PPC,S390] spectre_v2_user=off [X86] spec_store_bypass_disable=off [X86,PPC] @@ -2740,9 +2741,9 @@ nosmt=force: Force disable SMT, cannot be undone via the sysfs control file. - nospectre_v1 [PPC] Disable mitigations for Spectre Variant 1 (bounds - check bypass). With this option data leaks are possible - in the system. + nospectre_v1 [X66, PPC] Disable mitigations for Spectre Variant 1 + (bounds check bypass). With this option data leaks + are possible in the system. nospectre_v2 [X86,PPC_FSL_BOOK3E] Disable all mitigations for the Spectre variant 2 (indirect branch prediction) vulnerability. System may -- cgit v1.2.3 From d4187064959c0df9f3e610de02cb3b822c64b88d Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Sat, 3 Aug 2019 21:21:54 +0200 Subject: Documentation: Add swapgs description to the Spectre v1 documentation commit 4c92057661a3412f547ede95715641d7ee16ddac upstream Add documentation to the Spectre document about the new swapgs variant of Spectre v1. Signed-off-by: Josh Poimboeuf Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- Documentation/admin-guide/hw-vuln/spectre.rst | 88 ++++++++++++++++++++++++--- 1 file changed, 80 insertions(+), 8 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst index 25f3b2532198..e05e581af5cf 100644 --- a/Documentation/admin-guide/hw-vuln/spectre.rst +++ b/Documentation/admin-guide/hw-vuln/spectre.rst @@ -41,10 +41,11 @@ Related CVEs The following CVE entries describe Spectre variants: - ============= ======================= ================= + ============= ======================= ========================== CVE-2017-5753 Bounds check bypass Spectre variant 1 CVE-2017-5715 Branch target injection Spectre variant 2 - ============= ======================= ================= + CVE-2019-1125 Spectre v1 swapgs Spectre variant 1 (swapgs) + ============= ======================= ========================== Problem ------- @@ -78,6 +79,13 @@ There are some extensions of Spectre variant 1 attacks for reading data over the network, see :ref:`[12] `. However such attacks are difficult, low bandwidth, fragile, and are considered low risk. +Note that, despite "Bounds Check Bypass" name, Spectre variant 1 is not +only about user-controlled array bounds checks. It can affect any +conditional checks. The kernel entry code interrupt, exception, and NMI +handlers all have conditional swapgs checks. Those may be problematic +in the context of Spectre v1, as kernel code can speculatively run with +a user GS. + Spectre variant 2 (Branch Target Injection) ------------------------------------------- @@ -132,6 +140,9 @@ not cover all possible attack vectors. 1. A user process attacking the kernel ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Spectre variant 1 +~~~~~~~~~~~~~~~~~ + The attacker passes a parameter to the kernel via a register or via a known address in memory during a syscall. Such parameter may be used later by the kernel as an index to an array or to derive @@ -144,7 +155,40 @@ not cover all possible attack vectors. potentially be influenced for Spectre attacks, new "nospec" accessor macros are used to prevent speculative loading of data. - Spectre variant 2 attacker can :ref:`poison ` the branch +Spectre variant 1 (swapgs) +~~~~~~~~~~~~~~~~~~~~~~~~~~ + + An attacker can train the branch predictor to speculatively skip the + swapgs path for an interrupt or exception. If they initialize + the GS register to a user-space value, if the swapgs is speculatively + skipped, subsequent GS-related percpu accesses in the speculation + window will be done with the attacker-controlled GS value. This + could cause privileged memory to be accessed and leaked. + + For example: + + :: + + if (coming from user space) + swapgs + mov %gs:, %reg + mov (%reg), %reg1 + + When coming from user space, the CPU can speculatively skip the + swapgs, and then do a speculative percpu load using the user GS + value. So the user can speculatively force a read of any kernel + value. If a gadget exists which uses the percpu value as an address + in another load/store, then the contents of the kernel value may + become visible via an L1 side channel attack. + + A similar attack exists when coming from kernel space. The CPU can + speculatively do the swapgs, causing the user GS to get used for the + rest of the speculative window. + +Spectre variant 2 +~~~~~~~~~~~~~~~~~ + + A spectre variant 2 attacker can :ref:`poison ` the branch target buffer (BTB) before issuing syscall to launch an attack. After entering the kernel, the kernel could use the poisoned branch target buffer on indirect jump and jump to gadget code in speculative @@ -280,11 +324,18 @@ The sysfs file showing Spectre variant 1 mitigation status is: The possible values in this file are: - ======================================= ================================= - 'Mitigation: __user pointer sanitation' Protection in kernel on a case by - case base with explicit pointer - sanitation. - ======================================= ================================= + .. list-table:: + + * - 'Not affected' + - The processor is not vulnerable. + * - 'Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers' + - The swapgs protections are disabled; otherwise it has + protection in the kernel on a case by case base with explicit + pointer sanitation and usercopy LFENCE barriers. + * - 'Mitigation: usercopy/swapgs barriers and __user pointer sanitization' + - Protection in the kernel on a case by case base with explicit + pointer sanitation, usercopy LFENCE barriers, and swapgs LFENCE + barriers. However, the protections are put in place on a case by case basis, and there is no guarantee that all possible attack vectors for Spectre @@ -366,12 +417,27 @@ Turning on mitigation for Spectre variant 1 and Spectre variant 2 1. Kernel mitigation ^^^^^^^^^^^^^^^^^^^^ +Spectre variant 1 +~~~~~~~~~~~~~~~~~ + For the Spectre variant 1, vulnerable kernel code (as determined by code audit or scanning tools) is annotated on a case by case basis to use nospec accessor macros for bounds clipping :ref:`[2] ` to avoid any usable disclosure gadgets. However, it may not cover all attack vectors for Spectre variant 1. + Copy-from-user code has an LFENCE barrier to prevent the access_ok() + check from being mis-speculated. The barrier is done by the + barrier_nospec() macro. + + For the swapgs variant of Spectre variant 1, LFENCE barriers are + added to interrupt, exception and NMI entry where needed. These + barriers are done by the FENCE_SWAPGS_KERNEL_ENTRY and + FENCE_SWAPGS_USER_ENTRY macros. + +Spectre variant 2 +~~~~~~~~~~~~~~~~~ + For Spectre variant 2 mitigation, the compiler turns indirect calls or jumps in the kernel into equivalent return trampolines (retpolines) :ref:`[3] ` :ref:`[9] ` to go to the target @@ -473,6 +539,12 @@ Mitigation control on the kernel command line Spectre variant 2 mitigation can be disabled or force enabled at the kernel command line. + nospectre_v1 + + [X86,PPC] Disable mitigations for Spectre Variant 1 + (bounds check bypass). With this option data leaks are + possible in the system. + nospectre_v2 [X86] Disable all mitigations for the Spectre variant 2 -- cgit v1.2.3 From a1fe647042affe713a17243cd10e9b25f3d83948 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 16 Aug 2019 23:05:36 +0100 Subject: bpf: add bpf_jit_limit knob to restrict unpriv allocations commit ede95a63b5e84ddeea6b0c473b36ab8bfd8c6ce3 upstream. Rick reported that the BPF JIT could potentially fill the entire module space with BPF programs from unprivileged users which would prevent later attempts to load normal kernel modules or privileged BPF programs, for example. If JIT was enabled but unsuccessful to generate the image, then before commit 290af86629b2 ("bpf: introduce BPF_JIT_ALWAYS_ON config") we would always fall back to the BPF interpreter. Nowadays in the case where the CONFIG_BPF_JIT_ALWAYS_ON could be set, then the load will abort with a failure since the BPF interpreter was compiled out. Add a global limit and enforce it for unprivileged users such that in case of BPF interpreter compiled out we fail once the limit has been reached or we fall back to BPF interpreter earlier w/o using module mem if latter was compiled in. In a next step, fair share among unprivileged users can be resolved in particular for the case where we would fail hard once limit is reached. Fixes: 290af86629b2 ("bpf: introduce BPF_JIT_ALWAYS_ON config") Fixes: 0a14842f5a3c ("net: filter: Just In Time compiler for x86-64") Co-Developed-by: Rick Edgecombe Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov Cc: Eric Dumazet Cc: Jann Horn Cc: Kees Cook Cc: LKML Signed-off-by: Alexei Starovoitov Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- Documentation/sysctl/net.txt | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'Documentation') diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt index b67044a2575f..e12b39f40a6b 100644 --- a/Documentation/sysctl/net.txt +++ b/Documentation/sysctl/net.txt @@ -91,6 +91,14 @@ Values : 0 - disable JIT kallsyms export (default value) 1 - enable JIT kallsyms export for privileged users only +bpf_jit_limit +------------- + +This enforces a global limit for memory allocations to the BPF JIT +compiler in order to reject unprivileged JIT requests once it has +been surpassed. bpf_jit_limit contains the value of the global limit +in bytes. + dev_weight -------------- -- cgit v1.2.3 From 0bfaf1c88b589d51d9cbf0758e5fd987004ce789 Mon Sep 17 00:00:00 2001 From: Tom Lendacky Date: Mon, 19 Aug 2019 15:52:35 +0000 Subject: x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h commit c49a0a80137c7ca7d6ced4c812c9e07a949f6f24 upstream. There have been reports of RDRAND issues after resuming from suspend on some AMD family 15h and family 16h systems. This issue stems from a BIOS not performing the proper steps during resume to ensure RDRAND continues to function properly. RDRAND support is indicated by CPUID Fn00000001_ECX[30]. This bit can be reset by clearing MSR C001_1004[62]. Any software that checks for RDRAND support using CPUID, including the kernel, will believe that RDRAND is not supported. Update the CPU initialization to clear the RDRAND CPUID bit for any family 15h and 16h processor that supports RDRAND. If it is known that the family 15h or family 16h system does not have an RDRAND resume issue or that the system will not be placed in suspend, the "rdrand=force" kernel parameter can be used to stop the clearing of the RDRAND CPUID bit. Additionally, update the suspend and resume path to save and restore the MSR C001_1004 value to ensure that the RDRAND CPUID setting remains in place after resuming from suspend. Note, that clearing the RDRAND CPUID bit does not prevent a processor that normally supports the RDRAND instruction from executing it. So any code that determined the support based on family and model won't #UD. Signed-off-by: Tom Lendacky Signed-off-by: Borislav Petkov Cc: Andrew Cooper Cc: Andrew Morton Cc: Chen Yu Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jonathan Corbet Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Kees Cook Cc: "linux-doc@vger.kernel.org" Cc: "linux-pm@vger.kernel.org" Cc: Nathan Chancellor Cc: Paolo Bonzini Cc: Pavel Machek Cc: "Rafael J. Wysocki" Cc: Cc: Thomas Gleixner Cc: "x86@kernel.org" Link: https://lkml.kernel.org/r/7543af91666f491547bd86cebb1e17c66824ab9f.1566229943.git.thomas.lendacky@amd.com Signed-off-by: Greg Kroah-Hartman --- Documentation/admin-guide/kernel-parameters.txt | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'Documentation') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 13d80111bc1f..188a7db8501b 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3788,6 +3788,13 @@ Run specified binary instead of /init from the ramdisk, used for early userspace startup. See initrd. + rdrand= [X86] + force - Override the decision by the kernel to hide the + advertisement of RDRAND support (this affects + certain AMD processors because of buggy BIOS + support, specifically around the suspend/resume + path). + rdt= [HW,X86,RDT] Turn on/off individual RDT features. List is: cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, mba. -- cgit v1.2.3 From 63469e0ab70251f247e58761f046fe5de5e2156d Mon Sep 17 00:00:00 2001 From: Bastien Nocera Date: Mon, 23 Sep 2019 18:18:43 +0200 Subject: USB: rio500: Remove Rio 500 kernel driver commit 015664d15270a112c2371d812f03f7c579b35a73 upstream. The Rio500 kernel driver has not been used by Rio500 owners since 2001 not long after the rio500 project added support for a user-space USB stack through the very first versions of usbdevfs and then libusb. Support for the kernel driver was removed from the upstream utilities in 2008: https://gitlab.freedesktop.org/hadess/rio500/commit/943f624ab721eb8281c287650fcc9e2026f6f5db Cc: Cesar Miquel Signed-off-by: Bastien Nocera Cc: stable Link: https://lore.kernel.org/r/6251c17584d220472ce882a3d9c199c401a51a71.camel@hadess.net Signed-off-by: Greg Kroah-Hartman --- Documentation/usb/rio.txt | 138 ---------------------------------------------- 1 file changed, 138 deletions(-) delete mode 100644 Documentation/usb/rio.txt (limited to 'Documentation') diff --git a/Documentation/usb/rio.txt b/Documentation/usb/rio.txt deleted file mode 100644 index aee715af7db7..000000000000 --- a/Documentation/usb/rio.txt +++ /dev/null @@ -1,138 +0,0 @@ -Copyright (C) 1999, 2000 Bruce Tenison -Portions Copyright (C) 1999, 2000 David Nelson -Thanks to David Nelson for guidance and the usage of the scanner.txt -and scanner.c files to model our driver and this informative file. - -Mar. 2, 2000 - -CHANGES - -- Initial Revision - - -OVERVIEW - -This README will address issues regarding how to configure the kernel -to access a RIO 500 mp3 player. -Before I explain how to use this to access the Rio500 please be warned: - -W A R N I N G: --------------- - -Please note that this software is still under development. The authors -are in no way responsible for any damage that may occur, no matter how -inconsequential. - -It seems that the Rio has a problem when sending .mp3 with low batteries. -I suggest when the batteries are low and you want to transfer stuff that you -replace it with a fresh one. In my case, what happened is I lost two 16kb -blocks (they are no longer usable to store information to it). But I don't -know if that's normal or not; it could simply be a problem with the flash -memory. - -In an extreme case, I left my Rio playing overnight and the batteries wore -down to nothing and appear to have corrupted the flash memory. My RIO -needed to be replaced as a result. Diamond tech support is aware of the -problem. Do NOT allow your batteries to wear down to nothing before -changing them. It appears RIO 500 firmware does not handle low battery -power well at all. - -On systems with OHCI controllers, the kernel OHCI code appears to have -power on problems with some chipsets. If you are having problems -connecting to your RIO 500, try turning it on first and then plugging it -into the USB cable. - -Contact information: --------------------- - - The main page for the project is hosted at sourceforge.net in the following - URL: . You can also go to the project's - sourceforge home page at: . - There is also a mailing list: rio500-users@lists.sourceforge.net - -Authors: -------- - -Most of the code was written by Cesar Miquel . Keith -Clayton is incharge of the PPC port and making sure -things work there. Bruce Tenison is adding support -for .fon files and also does testing. The program will mostly sure be -re-written and Pete Ikusz along with the rest will re-design it. I would -also like to thank Tri Nguyen who provided use -with some important information regarding the communication with the Rio. - -ADDITIONAL INFORMATION and Userspace tools - -http://rio500.sourceforge.net/ - - -REQUIREMENTS - -A host with a USB port. Ideally, either a UHCI (Intel) or OHCI -(Compaq and others) hardware port should work. - -A Linux development kernel (2.3.x) with USB support enabled or a -backported version to linux-2.2.x. See http://www.linux-usb.org for -more information on accomplishing this. - -A Linux kernel with RIO 500 support enabled. - -'lspci' which is only needed to determine the type of USB hardware -available in your machine. - -CONFIGURATION - -Using `lspci -v`, determine the type of USB hardware available. - - If you see something like: - - USB Controller: ...... - Flags: ..... - I/O ports at .... - - Then you have a UHCI based controller. - - If you see something like: - - USB Controller: ..... - Flags: .... - Memory at ..... - - Then you have a OHCI based controller. - -Using `make menuconfig` or your preferred method for configuring the -kernel, select 'Support for USB', 'OHCI/UHCI' depending on your -hardware (determined from the steps above), 'USB Diamond Rio500 support', and -'Preliminary USB device filesystem'. Compile and install the modules -(you may need to execute `depmod -a` to update the module -dependencies). - -Add a device for the USB rio500: - `mknod /dev/usb/rio500 c 180 64` - -Set appropriate permissions for /dev/usb/rio500 (don't forget about -group and world permissions). Both read and write permissions are -required for proper operation. - -Load the appropriate modules (if compiled as modules): - - OHCI: - modprobe usbcore - modprobe usb-ohci - modprobe rio500 - - UHCI: - modprobe usbcore - modprobe usb-uhci (or uhci) - modprobe rio500 - -That's it. The Rio500 Utils at: http://rio500.sourceforge.net should -be able to access the rio500. - -BUGS - -If you encounter any problems feel free to drop me an email. - -Bruce Tenison -btenison@dibbs.net - -- cgit v1.2.3 From 5f005c7e4d37213a76a1d64ed558fe7e1ff138b6 Mon Sep 17 00:00:00 2001 From: Suzuki K Poulose Date: Thu, 24 Oct 2019 14:47:47 +0200 Subject: arm64: Expose support for optional ARMv8-A features [ Upstream commit f5e035f8694c3bdddc66ea46ecda965ee6853718 ] ARMv8-A adds a few optional features for ARMv8.2 and ARMv8.3. Expose them to the userspace via HWCAPs and mrs emulation. SHA2-512 - Instruction support for SHA512 Hash algorithm (e.g SHA512H, SHA512H2, SHA512U0, SHA512SU1) SHA3 - SHA3 crypto instructions (EOR3, RAX1, XAR, BCAX). SM3 - Instruction support for Chinese cryptography algorithm SM3 SM4 - Instruction support for Chinese cryptography algorithm SM4 DP - Dot Product instructions (UDOT, SDOT). Cc: Will Deacon Cc: Mark Rutland Cc: Dave Martin Cc: Marc Zyngier Reviewed-by: Catalin Marinas Signed-off-by: Suzuki K Poulose Signed-off-by: Will Deacon Signed-off-by: Ard Biesheuvel Signed-off-by: Greg Kroah-Hartman --- Documentation/arm64/cpu-feature-registers.txt | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/arm64/cpu-feature-registers.txt b/Documentation/arm64/cpu-feature-registers.txt index dad411d635d8..011ddfc1e570 100644 --- a/Documentation/arm64/cpu-feature-registers.txt +++ b/Documentation/arm64/cpu-feature-registers.txt @@ -110,10 +110,20 @@ infrastructure: x--------------------------------------------------x | Name | bits | visible | |--------------------------------------------------| - | RES0 | [63-32] | n | + | RES0 | [63-48] | n | + |--------------------------------------------------| + | DP | [47-44] | y | + |--------------------------------------------------| + | SM4 | [43-40] | y | + |--------------------------------------------------| + | SM3 | [39-36] | y | + |--------------------------------------------------| + | SHA3 | [35-32] | y | |--------------------------------------------------| | RDM | [31-28] | y | |--------------------------------------------------| + | RES0 | [27-24] | n | + |--------------------------------------------------| | ATOMICS | [23-20] | y | |--------------------------------------------------| | CRC32 | [19-16] | y | -- cgit v1.2.3 From 1caa4f72dfc4a0401ec7fad210cfb0ed73d06b4d Mon Sep 17 00:00:00 2001 From: Dongjiu Geng Date: Thu, 24 Oct 2019 14:47:49 +0200 Subject: arm64: v8.4: Support for new floating point multiplication instructions [ Upstream commit 3b3b681097fae73b7f5dcdd42db6cfdf32943d4c ] ARM v8.4 extensions add new neon instructions for performing a multiplication of each FP16 element of one vector with the corresponding FP16 element of a second vector, and to add or subtract this without an intermediate rounding to the corresponding FP32 element in a third vector. This patch detects this feature and let the userspace know about it via a HWCAP bit and MRS emulation. Cc: Dave Martin Reviewed-by: Suzuki K Poulose Signed-off-by: Dongjiu Geng Reviewed-by: Dave Martin Signed-off-by: Catalin Marinas [ardb: fix up for missing SVE in context] Signed-off-by: Ard Biesheuvel Signed-off-by: Greg Kroah-Hartman --- Documentation/arm64/cpu-feature-registers.txt | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/arm64/cpu-feature-registers.txt b/Documentation/arm64/cpu-feature-registers.txt index 011ddfc1e570..ddd566fea3f2 100644 --- a/Documentation/arm64/cpu-feature-registers.txt +++ b/Documentation/arm64/cpu-feature-registers.txt @@ -110,7 +110,9 @@ infrastructure: x--------------------------------------------------x | Name | bits | visible | |--------------------------------------------------| - | RES0 | [63-48] | n | + | RES0 | [63-52] | n | + |--------------------------------------------------| + | FHM | [51-48] | y | |--------------------------------------------------| | DP | [47-44] | y | |--------------------------------------------------| -- cgit v1.2.3 From eb952c6bce53b5389c41ffc3698594cb43b5aeb9 Mon Sep 17 00:00:00 2001 From: Suzuki K Poulose Date: Thu, 24 Oct 2019 14:47:50 +0200 Subject: arm64: Documentation: cpu-feature-registers: Remove RES0 fields [ Upstream commit 847ecd3fa311cde0f10a1b66c572abb136742b1d ] Remove the invisible RES0 field entries from the table, listing fields in CPU ID feature registers, as : 1) We are only interested in the user visible fields. 2) The field description may not be up-to-date, as the field could be assigned a new meaning. 3) We already explain the rules of the fields which are not visible. Cc: Catalin Marinas Cc: Will Deacon Acked-by: Mark Rutland Reviewed-by: Dave Martin Signed-off-by: Suzuki K Poulose Signed-off-by: Will Deacon [ardb: fix up for missing SVE in context] Signed-off-by: Ard Biesheuvel Signed-off-by: Greg Kroah-Hartman --- Documentation/arm64/cpu-feature-registers.txt | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/arm64/cpu-feature-registers.txt b/Documentation/arm64/cpu-feature-registers.txt index ddd566fea3f2..22cfb86143ee 100644 --- a/Documentation/arm64/cpu-feature-registers.txt +++ b/Documentation/arm64/cpu-feature-registers.txt @@ -110,7 +110,6 @@ infrastructure: x--------------------------------------------------x | Name | bits | visible | |--------------------------------------------------| - | RES0 | [63-52] | n | |--------------------------------------------------| | FHM | [51-48] | y | |--------------------------------------------------| @@ -124,8 +123,6 @@ infrastructure: |--------------------------------------------------| | RDM | [31-28] | y | |--------------------------------------------------| - | RES0 | [27-24] | n | - |--------------------------------------------------| | ATOMICS | [23-20] | y | |--------------------------------------------------| | CRC32 | [19-16] | y | @@ -135,8 +132,6 @@ infrastructure: | SHA1 | [11-8] | y | |--------------------------------------------------| | AES | [7-4] | y | - |--------------------------------------------------| - | RES0 | [3-0] | n | x--------------------------------------------------x @@ -144,7 +139,8 @@ infrastructure: x--------------------------------------------------x | Name | bits | visible | |--------------------------------------------------| - | RES0 | [63-28] | n | + |--------------------------------------------------| + | SVE | [35-32] | y | |--------------------------------------------------| | GIC | [27-24] | n | |--------------------------------------------------| -- cgit v1.2.3 From 053cdffad3dd5e4a9330433ba2b400956f33bd2b Mon Sep 17 00:00:00 2001 From: Suzuki K Poulose Date: Thu, 24 Oct 2019 14:47:51 +0200 Subject: arm64: Expose Arm v8.4 features [ Upstream commit 7206dc93a58fb76421c4411eefa3c003337bcb2d ] Expose the new features introduced by Arm v8.4 extensions to Arm v8-A profile. These include : 1) Data indpendent timing of instructions. (DIT, exposed as HWCAP_DIT) 2) Unaligned atomic instructions and Single-copy atomicity of loads and stores. (AT, expose as HWCAP_USCAT) 3) LDAPR and STLR instructions with immediate offsets (extension to LRCPC, exposed as HWCAP_ILRCPC) 4) Flag manipulation instructions (TS, exposed as HWCAP_FLAGM). Cc: Catalin Marinas Cc: Will Deacon Cc: Mark Rutland Reviewed-by: Dave Martin Signed-off-by: Suzuki K Poulose Signed-off-by: Will Deacon [ardb: fix up context for missing SVE] Signed-off-by: Ard Biesheuvel Signed-off-by: Greg Kroah-Hartman --- Documentation/arm64/cpu-feature-registers.txt | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'Documentation') diff --git a/Documentation/arm64/cpu-feature-registers.txt b/Documentation/arm64/cpu-feature-registers.txt index 22cfb86143ee..7964f03846b1 100644 --- a/Documentation/arm64/cpu-feature-registers.txt +++ b/Documentation/arm64/cpu-feature-registers.txt @@ -110,6 +110,7 @@ infrastructure: x--------------------------------------------------x | Name | bits | visible | |--------------------------------------------------| + | TS | [55-52] | y | |--------------------------------------------------| | FHM | [51-48] | y | |--------------------------------------------------| @@ -139,6 +140,7 @@ infrastructure: x--------------------------------------------------x | Name | bits | visible | |--------------------------------------------------| + | DIT | [51-48] | y | |--------------------------------------------------| | SVE | [35-32] | y | |--------------------------------------------------| @@ -191,6 +193,14 @@ infrastructure: | DPB | [3-0] | y | x--------------------------------------------------x + 5) ID_AA64MMFR2_EL1 - Memory model feature register 2 + + x--------------------------------------------------x + | Name | bits | visible | + |--------------------------------------------------| + | AT | [35-32] | y | + x--------------------------------------------------x + Appendix I: Example --------------------------- -- cgit v1.2.3 From 02fd5d7f6d027912901a68d671c0449295966cd7 Mon Sep 17 00:00:00 2001 From: Jeremy Linton Date: Thu, 24 Oct 2019 14:48:25 +0200 Subject: arm64: Provide a command line to disable spectre_v2 mitigation [ Upstream commit e5ce5e7267ddcbe13ab9ead2542524e1b7993e5a ] There are various reasons, such as benchmarking, to disable spectrev2 mitigation on a machine. Provide a command-line option to do so. Signed-off-by: Jeremy Linton Reviewed-by: Suzuki K Poulose Reviewed-by: Andre Przywara Reviewed-by: Catalin Marinas Tested-by: Stefan Wahren Cc: Jonathan Corbet Cc: linux-doc@vger.kernel.org Signed-off-by: Will Deacon Signed-off-by: Ard Biesheuvel Signed-off-by: Greg Kroah-Hartman --- Documentation/admin-guide/kernel-parameters.txt | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 188a7db8501b..5205740ed39b 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2745,10 +2745,10 @@ (bounds check bypass). With this option data leaks are possible in the system. - nospectre_v2 [X86,PPC_FSL_BOOK3E] Disable all mitigations for the Spectre variant 2 - (indirect branch prediction) vulnerability. System may - allow data leaks with this option, which is equivalent - to spectre_v2=off. + nospectre_v2 [X86,PPC_FSL_BOOK3E,ARM64] Disable all mitigations for + the Spectre variant 2 (indirect branch prediction) + vulnerability. System may allow data leaks with this + option. nospec_store_bypass_disable [HW] Disable all mitigations for the Speculative Store Bypass vulnerability -- cgit v1.2.3 From 5fff7a398c266c8c202a24e573327ba2c1566524 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Thu, 24 Oct 2019 14:48:33 +0200 Subject: arm64/speculation: Support 'mitigations=' cmdline option [ Upstream commit a111b7c0f20e13b54df2fa959b3dc0bdf1925ae6 ] Configure arm64 runtime CPU speculation bug mitigations in accordance with the 'mitigations=' cmdline option. This affects Meltdown, Spectre v2, and Speculative Store Bypass. The default behavior is unchanged. Signed-off-by: Josh Poimboeuf [will: reorder checks so KASLR implies KPTI and SSBS is affected by cmdline] Signed-off-by: Will Deacon Signed-off-by: Ard Biesheuvel Signed-off-by: Greg Kroah-Hartman --- Documentation/admin-guide/kernel-parameters.txt | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 5205740ed39b..b67a6cd08ca1 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2389,8 +2389,8 @@ http://repo.or.cz/w/linux-2.6/mini2440.git mitigations= - [X86,PPC,S390] Control optional mitigations for CPU - vulnerabilities. This is a set of curated, + [X86,PPC,S390,ARM64] Control optional mitigations for + CPU vulnerabilities. This is a set of curated, arch-independent options, each of which is an aggregation of existing arch-specific options. @@ -2399,12 +2399,14 @@ improves system performance, but it may also expose users to several CPU vulnerabilities. Equivalent to: nopti [X86,PPC] + kpti=0 [ARM64] nospectre_v1 [PPC] nobp=0 [S390] nospectre_v1 [X86] - nospectre_v2 [X86,PPC,S390] + nospectre_v2 [X86,PPC,S390,ARM64] spectre_v2_user=off [X86] spec_store_bypass_disable=off [X86,PPC] + ssbd=force-off [ARM64] l1tf=off [X86] mds=off [X86] -- cgit v1.2.3 From 588c0f3282207a7b25951de0e96e5485b3bfa62d Mon Sep 17 00:00:00 2001 From: Boris Ostrovsky Date: Mon, 30 Sep 2019 16:44:41 -0400 Subject: x86/xen: Return from panic notifier [ Upstream commit c6875f3aacf2a5a913205accddabf0bfb75cac76 ] Currently execution of panic() continues until Xen's panic notifier (xen_panic_event()) is called at which point we make a hypercall that never returns. This means that any notifier that is supposed to be called later as well as significant part of panic() code (such as pstore writes from kmsg_dump()) is never executed. There is no reason for xen_panic_event() to be this last point in execution since panic()'s emergency_restart() will call into xen_emergency_restart() from where we can perform our hypercall. Nevertheless, we will provide xen_legacy_crash boot option that will preserve original behavior during crash. This option could be used, for example, if running kernel dumper (which happens after panic notifiers) is undesirable. Reported-by: James Dingwall Signed-off-by: Boris Ostrovsky Reviewed-by: Juergen Gross Signed-off-by: Sasha Levin --- Documentation/admin-guide/kernel-parameters.txt | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'Documentation') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index b67a6cd08ca1..671f518b09ee 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4875,6 +4875,10 @@ the unplug protocol never -- do not unplug even if version check succeeds + xen_legacy_crash [X86,XEN] + Crash from Xen panic notifier, without executing late + panic() code such as dumping handler. + xen_nopvspin [X86,XEN] Disables the ticketlock slowpath using Xen PV optimizations. -- cgit v1.2.3 From c813b4467a2291d14dd9c90f9542ad55eb1ec315 Mon Sep 17 00:00:00 2001 From: Roger Quadros Date: Thu, 5 Sep 2019 10:17:45 -0600 Subject: usb: dwc3: Allow disabling of metastability workaround commit 42bf02ec6e420e541af9a47437d0bdf961ca2972 upstream Some platforms (e.g. TI's DRA7 USB2 instance) have more trouble with the metastability workaround as it supports only a High-Speed PHY and the PHY can enter into an Erratic state [1] when the controller is set in SuperSpeed mode as part of the metastability workaround. This causes upto 2 seconds delay in enumeration on DRA7's USB2 instance in gadget mode. If these platforms can be better off without the workaround, provide a device tree property to suggest that so the workaround is avoided. [1] Device mode enumeration trace showing PHY Erratic Error. irq/90-dwc3-969 [000] d... 52.323145: dwc3_event: event (00000901): Erratic Error [U0] irq/90-dwc3-969 [000] d... 52.560646: dwc3_event: event (00000901): Erratic Error [U0] irq/90-dwc3-969 [000] d... 52.798144: dwc3_event: event (00000901): Erratic Error [U0] Signed-off-by: Roger Quadros Signed-off-by: Felipe Balbi Signed-off-by: Mathieu Poirier Signed-off-by: Greg Kroah-Hartman --- Documentation/devicetree/bindings/usb/dwc3.txt | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/usb/dwc3.txt b/Documentation/devicetree/bindings/usb/dwc3.txt index 52fb41046b34..44e8bab159ad 100644 --- a/Documentation/devicetree/bindings/usb/dwc3.txt +++ b/Documentation/devicetree/bindings/usb/dwc3.txt @@ -47,6 +47,8 @@ Optional properties: from P0 to P1/P2/P3 without delay. - snps,dis-tx-ipgap-linecheck-quirk: when set, disable u2mac linestate check during HS transmit. + - snps,dis_metastability_quirk: when set, disable metastability workaround. + CAUTION: use only if you are absolutely sure of it. - snps,is-utmi-l1-suspend: true when DWC3 asserts output signal utmi_l1_suspend_n, false when asserts utmi_sleep_n - snps,hird-threshold: HIRD threshold -- cgit v1.2.3 From 3dec71e388f95382d83ebb5589f0016eac4a6d2b Mon Sep 17 00:00:00 2001 From: Dave Chiluk Date: Tue, 23 Jul 2019 11:44:26 -0500 Subject: sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices commit de53fd7aedb100f03e5d2231cfce0e4993282425 upstream. It has been observed, that highly-threaded, non-cpu-bound applications running under cpu.cfs_quota_us constraints can hit a high percentage of periods throttled while simultaneously not consuming the allocated amount of quota. This use case is typical of user-interactive non-cpu bound applications, such as those running in kubernetes or mesos when run on multiple cpu cores. This has been root caused to cpu-local run queue being allocated per cpu bandwidth slices, and then not fully using that slice within the period. At which point the slice and quota expires. This expiration of unused slice results in applications not being able to utilize the quota for which they are allocated. The non-expiration of per-cpu slices was recently fixed by 'commit 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition")'. Prior to that it appears that this had been broken since at least 'commit 51f2176d74ac ("sched/fair: Fix unlocked reads of some cfs_b->quota/period")' which was introduced in v3.16-rc1 in 2014. That added the following conditional which resulted in slices never being expired. if (cfs_rq->runtime_expires != cfs_b->runtime_expires) { /* extend local deadline, drift is bounded above by 2 ticks */ cfs_rq->runtime_expires += TICK_NSEC; Because this was broken for nearly 5 years, and has recently been fixed and is now being noticed by many users running kubernetes (https://github.com/kubernetes/kubernetes/issues/67577) it is my opinion that the mechanisms around expiring runtime should be removed altogether. This allows quota already allocated to per-cpu run-queues to live longer than the period boundary. This allows threads on runqueues that do not use much CPU to continue to use their remaining slice over a longer period of time than cpu.cfs_period_us. However, this helps prevent the above condition of hitting throttling while also not fully utilizing your cpu quota. This theoretically allows a machine to use slightly more than its allotted quota in some periods. This overflow would be bounded by the remaining quota left on each per-cpu runqueueu. This is typically no more than min_cfs_rq_runtime=1ms per cpu. For CPU bound tasks this will change nothing, as they should theoretically fully utilize all of their quota in each period. For user-interactive tasks as described above this provides a much better user/application experience as their cpu utilization will more closely match the amount they requested when they hit throttling. This means that cpu limits no longer strictly apply per period for non-cpu bound applications, but that they are still accurate over longer timeframes. This greatly improves performance of high-thread-count, non-cpu bound applications with low cfs_quota_us allocation on high-core-count machines. In the case of an artificial testcase (10ms/100ms of quota on 80 CPU machine), this commit resulted in almost 30x performance improvement, while still maintaining correct cpu quota restrictions. That testcase is available at https://github.com/indeedeng/fibtest. Fixes: 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition") Signed-off-by: Dave Chiluk Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Phil Auld Reviewed-by: Ben Segall Cc: Ingo Molnar Cc: John Hammond Cc: Jonathan Corbet Cc: Kyle Anderson Cc: Gabriel Munos Cc: Peter Oskolkov Cc: Cong Wang Cc: Brendan Gregg Link: https://lkml.kernel.org/r/1563900266-19734-2-git-send-email-chiluk+linux@indeed.com Signed-off-by: Greg Kroah-Hartman --- Documentation/scheduler/sched-bwc.txt | 45 +++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) (limited to 'Documentation') diff --git a/Documentation/scheduler/sched-bwc.txt b/Documentation/scheduler/sched-bwc.txt index f6b1873f68ab..de583fbbfe42 100644 --- a/Documentation/scheduler/sched-bwc.txt +++ b/Documentation/scheduler/sched-bwc.txt @@ -90,6 +90,51 @@ There are two ways in which a group may become throttled: In case b) above, even though the child may have runtime remaining it will not be allowed to until the parent's runtime is refreshed. +CFS Bandwidth Quota Caveats +--------------------------- +Once a slice is assigned to a cpu it does not expire. However all but 1ms of +the slice may be returned to the global pool if all threads on that cpu become +unrunnable. This is configured at compile time by the min_cfs_rq_runtime +variable. This is a performance tweak that helps prevent added contention on +the global lock. + +The fact that cpu-local slices do not expire results in some interesting corner +cases that should be understood. + +For cgroup cpu constrained applications that are cpu limited this is a +relatively moot point because they will naturally consume the entirety of their +quota as well as the entirety of each cpu-local slice in each period. As a +result it is expected that nr_periods roughly equal nr_throttled, and that +cpuacct.usage will increase roughly equal to cfs_quota_us in each period. + +For highly-threaded, non-cpu bound applications this non-expiration nuance +allows applications to briefly burst past their quota limits by the amount of +unused slice on each cpu that the task group is running on (typically at most +1ms per cpu or as defined by min_cfs_rq_runtime). This slight burst only +applies if quota had been assigned to a cpu and then not fully used or returned +in previous periods. This burst amount will not be transferred between cores. +As a result, this mechanism still strictly limits the task group to quota +average usage, albeit over a longer time window than a single period. This +also limits the burst ability to no more than 1ms per cpu. This provides +better more predictable user experience for highly threaded applications with +small quota limits on high core count machines. It also eliminates the +propensity to throttle these applications while simultanously using less than +quota amounts of cpu. Another way to say this, is that by allowing the unused +portion of a slice to remain valid across periods we have decreased the +possibility of wastefully expiring quota on cpu-local silos that don't need a +full slice's amount of cpu time. + +The interaction between cpu-bound and non-cpu-bound-interactive applications +should also be considered, especially when single core usage hits 100%. If you +gave each of these applications half of a cpu-core and they both got scheduled +on the same CPU it is theoretically possible that the non-cpu bound application +will use up to 1ms additional quota in some periods, thereby preventing the +cpu-bound application from fully using its quota by that same amount. In these +instances it will be up to the CFS algorithm (see sched-design-CFS.rst) to +decide which application is chosen to run, as they will both be runnable and +have remaining quota. This runtime discrepancy will be made up in the following +periods when the interactive application idles. + Examples -------- 1. Limit a group to 1 CPU worth of runtime. -- cgit v1.2.3 From 4b708ea4e5e772747b89619489ab96e9d1a1a44d Mon Sep 17 00:00:00 2001 From: Pawan Gupta Date: Wed, 23 Oct 2019 11:01:53 +0200 Subject: x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default commit 95c5824f75f3ba4c9e8e5a4b1a623c95390ac266 upstream. Add a kernel cmdline parameter "tsx" to control the Transactional Synchronization Extensions (TSX) feature. On CPUs that support TSX control, use "tsx=on|off" to enable or disable TSX. Not specifying this option is equivalent to "tsx=off". This is because on certain processors TSX may be used as a part of a speculative side channel attack. Carve out the TSX controlling functionality into a separate compilation unit because TSX is a CPU feature while the TSX async abort control machinery will go to cpu/bugs.c. [ bp: - Massage, shorten and clear the arg buffer. - Clarifications of the tsx= possible options - Josh. - Expand on TSX_CTRL availability - Pawan. ] Signed-off-by: Pawan Gupta Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Reviewed-by: Josh Poimboeuf Signed-off-by: Greg Kroah-Hartman --- Documentation/admin-guide/kernel-parameters.txt | 26 +++++++++++++++++++++++++ 1 file changed, 26 insertions(+) (limited to 'Documentation') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 671f518b09ee..01463ac89a00 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4505,6 +4505,32 @@ platforms where RDTSC is slow and this accounting can add overhead. + tsx= [X86] Control Transactional Synchronization + Extensions (TSX) feature in Intel processors that + support TSX control. + + This parameter controls the TSX feature. The options are: + + on - Enable TSX on the system. Although there are + mitigations for all known security vulnerabilities, + TSX has been known to be an accelerator for + several previous speculation-related CVEs, and + so there may be unknown security risks associated + with leaving it enabled. + + off - Disable TSX on the system. (Note that this + option takes effect only on newer CPUs which are + not vulnerable to MDS, i.e., have + MSR_IA32_ARCH_CAPABILITIES.MDS_NO=1 and which get + the new IA32_TSX_CTRL MSR through a microcode + update. This new MSR allows for the reliable + deactivation of the TSX functionality.) + + Not specifying this option is equivalent to tsx=off. + + See Documentation/admin-guide/hw-vuln/tsx_async_abort.rst + for more details. + turbografx.map[2|3]= [HW,JOY] TurboGraFX parallel port interface Format: -- cgit v1.2.3 From 8c99df217f8e36fde46cbf2af50b5b191857d9d4 Mon Sep 17 00:00:00 2001 From: Pawan Gupta Date: Wed, 23 Oct 2019 12:28:57 +0200 Subject: x86/tsx: Add "auto" option to the tsx= cmdline parameter commit 7531a3596e3272d1f6841e0d601a614555dc6b65 upstream. Platforms which are not affected by X86_BUG_TAA may want the TSX feature enabled. Add "auto" option to the TSX cmdline parameter. When tsx=auto disable TSX when X86_BUG_TAA is present, otherwise enable TSX. More details on X86_BUG_TAA can be found here: https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html [ bp: Extend the arg buffer to accommodate "auto\0". ] Signed-off-by: Pawan Gupta Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Reviewed-by: Tony Luck Reviewed-by: Josh Poimboeuf Signed-off-by: Greg Kroah-Hartman --- Documentation/admin-guide/kernel-parameters.txt | 3 +++ 1 file changed, 3 insertions(+) (limited to 'Documentation') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 01463ac89a00..3cae24bff0a1 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4526,6 +4526,9 @@ update. This new MSR allows for the reliable deactivation of the TSX functionality.) + auto - Disable TSX if X86_BUG_TAA is present, + otherwise enable TSX on the system. + Not specifying this option is equivalent to tsx=off. See Documentation/admin-guide/hw-vuln/tsx_async_abort.rst -- cgit v1.2.3 From a4f14d5a0795fe7c4f75d31ef4abf816570e3872 Mon Sep 17 00:00:00 2001 From: Pawan Gupta Date: Wed, 23 Oct 2019 12:32:55 +0200 Subject: x86/speculation/taa: Add documentation for TSX Async Abort commit a7a248c593e4fd7a67c50b5f5318fe42a0db335e upstream. Add the documenation for TSX Async Abort. Include the description of the issue, how to check the mitigation state, control the mitigation, guidance for system administrators. [ bp: Add proper SPDX tags, touch ups by Josh and me. ] Co-developed-by: Antonio Gomez Iglesias Signed-off-by: Pawan Gupta Signed-off-by: Antonio Gomez Iglesias Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Reviewed-by: Mark Gross Reviewed-by: Tony Luck Reviewed-by: Josh Poimboeuf Signed-off-by: Greg Kroah-Hartman --- Documentation/ABI/testing/sysfs-devices-system-cpu | 1 + Documentation/admin-guide/hw-vuln/index.rst | 1 + .../admin-guide/hw-vuln/tsx_async_abort.rst | 276 +++++++++++++++++++++ Documentation/admin-guide/kernel-parameters.txt | 38 +++ Documentation/x86/index.rst | 1 + Documentation/x86/tsx_async_abort.rst | 117 +++++++++ 6 files changed, 434 insertions(+) create mode 100644 Documentation/admin-guide/hw-vuln/tsx_async_abort.rst create mode 100644 Documentation/x86/tsx_async_abort.rst (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 645687b1870d..0fffc1c66da1 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -381,6 +381,7 @@ What: /sys/devices/system/cpu/vulnerabilities /sys/devices/system/cpu/vulnerabilities/spec_store_bypass /sys/devices/system/cpu/vulnerabilities/l1tf /sys/devices/system/cpu/vulnerabilities/mds + /sys/devices/system/cpu/vulnerabilities/tsx_async_abort Date: January 2018 Contact: Linux kernel mailing list Description: Information about CPU vulnerabilities diff --git a/Documentation/admin-guide/hw-vuln/index.rst b/Documentation/admin-guide/hw-vuln/index.rst index 49311f3da6f2..0802b1c67452 100644 --- a/Documentation/admin-guide/hw-vuln/index.rst +++ b/Documentation/admin-guide/hw-vuln/index.rst @@ -12,3 +12,4 @@ are configurable at compile, boot or run time. spectre l1tf mds + tsx_async_abort diff --git a/Documentation/admin-guide/hw-vuln/tsx_async_abort.rst b/Documentation/admin-guide/hw-vuln/tsx_async_abort.rst new file mode 100644 index 000000000000..fddbd7579c53 --- /dev/null +++ b/Documentation/admin-guide/hw-vuln/tsx_async_abort.rst @@ -0,0 +1,276 @@ +.. SPDX-License-Identifier: GPL-2.0 + +TAA - TSX Asynchronous Abort +====================================== + +TAA is a hardware vulnerability that allows unprivileged speculative access to +data which is available in various CPU internal buffers by using asynchronous +aborts within an Intel TSX transactional region. + +Affected processors +------------------- + +This vulnerability only affects Intel processors that support Intel +Transactional Synchronization Extensions (TSX) when the TAA_NO bit (bit 8) +is 0 in the IA32_ARCH_CAPABILITIES MSR. On processors where the MDS_NO bit +(bit 5) is 0 in the IA32_ARCH_CAPABILITIES MSR, the existing MDS mitigations +also mitigate against TAA. + +Whether a processor is affected or not can be read out from the TAA +vulnerability file in sysfs. See :ref:`tsx_async_abort_sys_info`. + +Related CVEs +------------ + +The following CVE entry is related to this TAA issue: + + ============== ===== =================================================== + CVE-2019-11135 TAA TSX Asynchronous Abort (TAA) condition on some + microprocessors utilizing speculative execution may + allow an authenticated user to potentially enable + information disclosure via a side channel with + local access. + ============== ===== =================================================== + +Problem +------- + +When performing store, load or L1 refill operations, processors write +data into temporary microarchitectural structures (buffers). The data in +those buffers can be forwarded to load operations as an optimization. + +Intel TSX is an extension to the x86 instruction set architecture that adds +hardware transactional memory support to improve performance of multi-threaded +software. TSX lets the processor expose and exploit concurrency hidden in an +application due to dynamically avoiding unnecessary synchronization. + +TSX supports atomic memory transactions that are either committed (success) or +aborted. During an abort, operations that happened within the transactional region +are rolled back. An asynchronous abort takes place, among other options, when a +different thread accesses a cache line that is also used within the transactional +region when that access might lead to a data race. + +Immediately after an uncompleted asynchronous abort, certain speculatively +executed loads may read data from those internal buffers and pass it to dependent +operations. This can be then used to infer the value via a cache side channel +attack. + +Because the buffers are potentially shared between Hyper-Threads cross +Hyper-Thread attacks are possible. + +The victim of a malicious actor does not need to make use of TSX. Only the +attacker needs to begin a TSX transaction and raise an asynchronous abort +which in turn potenitally leaks data stored in the buffers. + +More detailed technical information is available in the TAA specific x86 +architecture section: :ref:`Documentation/x86/tsx_async_abort.rst `. + + +Attack scenarios +---------------- + +Attacks against the TAA vulnerability can be implemented from unprivileged +applications running on hosts or guests. + +As for MDS, the attacker has no control over the memory addresses that can +be leaked. Only the victim is responsible for bringing data to the CPU. As +a result, the malicious actor has to sample as much data as possible and +then postprocess it to try to infer any useful information from it. + +A potential attacker only has read access to the data. Also, there is no direct +privilege escalation by using this technique. + + +.. _tsx_async_abort_sys_info: + +TAA system information +----------------------- + +The Linux kernel provides a sysfs interface to enumerate the current TAA status +of mitigated systems. The relevant sysfs file is: + +/sys/devices/system/cpu/vulnerabilities/tsx_async_abort + +The possible values in this file are: + +.. list-table:: + + * - 'Vulnerable' + - The CPU is affected by this vulnerability and the microcode and kernel mitigation are not applied. + * - 'Vulnerable: Clear CPU buffers attempted, no microcode' + - The system tries to clear the buffers but the microcode might not support the operation. + * - 'Mitigation: Clear CPU buffers' + - The microcode has been updated to clear the buffers. TSX is still enabled. + * - 'Mitigation: TSX disabled' + - TSX is disabled. + * - 'Not affected' + - The CPU is not affected by this issue. + +.. _ucode_needed: + +Best effort mitigation mode +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +If the processor is vulnerable, but the availability of the microcode-based +mitigation mechanism is not advertised via CPUID the kernel selects a best +effort mitigation mode. This mode invokes the mitigation instructions +without a guarantee that they clear the CPU buffers. + +This is done to address virtualization scenarios where the host has the +microcode update applied, but the hypervisor is not yet updated to expose the +CPUID to the guest. If the host has updated microcode the protection takes +effect; otherwise a few CPU cycles are wasted pointlessly. + +The state in the tsx_async_abort sysfs file reflects this situation +accordingly. + + +Mitigation mechanism +-------------------- + +The kernel detects the affected CPUs and the presence of the microcode which is +required. If a CPU is affected and the microcode is available, then the kernel +enables the mitigation by default. + + +The mitigation can be controlled at boot time via a kernel command line option. +See :ref:`taa_mitigation_control_command_line`. + +.. _virt_mechanism: + +Virtualization mitigation +^^^^^^^^^^^^^^^^^^^^^^^^^ + +Affected systems where the host has TAA microcode and TAA is mitigated by +having disabled TSX previously, are not vulnerable regardless of the status +of the VMs. + +In all other cases, if the host either does not have the TAA microcode or +the kernel is not mitigated, the system might be vulnerable. + + +.. _taa_mitigation_control_command_line: + +Mitigation control on the kernel command line +--------------------------------------------- + +The kernel command line allows to control the TAA mitigations at boot time with +the option "tsx_async_abort=". The valid arguments for this option are: + + ============ ============================================================= + off This option disables the TAA mitigation on affected platforms. + If the system has TSX enabled (see next parameter) and the CPU + is affected, the system is vulnerable. + + full TAA mitigation is enabled. If TSX is enabled, on an affected + system it will clear CPU buffers on ring transitions. On + systems which are MDS-affected and deploy MDS mitigation, + TAA is also mitigated. Specifying this option on those + systems will have no effect. + + full,nosmt The same as tsx_async_abort=full, with SMT disabled on + vulnerable CPUs that have TSX enabled. This is the complete + mitigation. When TSX is disabled, SMT is not disabled because + CPU is not vulnerable to cross-thread TAA attacks. + ============ ============================================================= + +Not specifying this option is equivalent to "tsx_async_abort=full". + +The kernel command line also allows to control the TSX feature using the +parameter "tsx=" on CPUs which support TSX control. MSR_IA32_TSX_CTRL is used +to control the TSX feature and the enumeration of the TSX feature bits (RTM +and HLE) in CPUID. + +The valid options are: + + ============ ============================================================= + off Disables TSX on the system. + + Note that this option takes effect only on newer CPUs which are + not vulnerable to MDS, i.e., have MSR_IA32_ARCH_CAPABILITIES.MDS_NO=1 + and which get the new IA32_TSX_CTRL MSR through a microcode + update. This new MSR allows for the reliable deactivation of + the TSX functionality. + + on Enables TSX. + + Although there are mitigations for all known security + vulnerabilities, TSX has been known to be an accelerator for + several previous speculation-related CVEs, and so there may be + unknown security risks associated with leaving it enabled. + + auto Disables TSX if X86_BUG_TAA is present, otherwise enables TSX + on the system. + ============ ============================================================= + +Not specifying this option is equivalent to "tsx=off". + +The following combinations of the "tsx_async_abort" and "tsx" are possible. For +affected platforms tsx=auto is equivalent to tsx=off and the result will be: + + ========= ========================== ========================================= + tsx=on tsx_async_abort=full The system will use VERW to clear CPU + buffers. Cross-thread attacks are still + possible on SMT machines. + tsx=on tsx_async_abort=full,nosmt As above, cross-thread attacks on SMT + mitigated. + tsx=on tsx_async_abort=off The system is vulnerable. + tsx=off tsx_async_abort=full TSX might be disabled if microcode + provides a TSX control MSR. If so, + system is not vulnerable. + tsx=off tsx_async_abort=full,nosmt Ditto + tsx=off tsx_async_abort=off ditto + ========= ========================== ========================================= + + +For unaffected platforms "tsx=on" and "tsx_async_abort=full" does not clear CPU +buffers. For platforms without TSX control (MSR_IA32_ARCH_CAPABILITIES.MDS_NO=0) +"tsx" command line argument has no effect. + +For the affected platforms below table indicates the mitigation status for the +combinations of CPUID bit MD_CLEAR and IA32_ARCH_CAPABILITIES MSR bits MDS_NO +and TSX_CTRL_MSR. + + ======= ========= ============= ======================================== + MDS_NO MD_CLEAR TSX_CTRL_MSR Status + ======= ========= ============= ======================================== + 0 0 0 Vulnerable (needs microcode) + 0 1 0 MDS and TAA mitigated via VERW + 1 1 0 MDS fixed, TAA vulnerable if TSX enabled + because MD_CLEAR has no meaning and + VERW is not guaranteed to clear buffers + 1 X 1 MDS fixed, TAA can be mitigated by + VERW or TSX_CTRL_MSR + ======= ========= ============= ======================================== + +Mitigation selection guide +-------------------------- + +1. Trusted userspace and guests +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +If all user space applications are from a trusted source and do not execute +untrusted code which is supplied externally, then the mitigation can be +disabled. The same applies to virtualized environments with trusted guests. + + +2. Untrusted userspace and guests +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +If there are untrusted applications or guests on the system, enabling TSX +might allow a malicious actor to leak data from the host or from other +processes running on the same physical core. + +If the microcode is available and the TSX is disabled on the host, attacks +are prevented in a virtualized environment as well, even if the VMs do not +explicitly enable the mitigation. + + +.. _taa_default_mitigations: + +Default mitigations +------------------- + +The kernel's default action for vulnerable processors is: + + - Deploy TSX disable mitigation (tsx_async_abort=full tsx=off). diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 3cae24bff0a1..d16b3d41ffe5 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2409,6 +2409,7 @@ ssbd=force-off [ARM64] l1tf=off [X86] mds=off [X86] + tsx_async_abort=off [X86] auto (default) Mitigate all CPU vulnerabilities, but leave SMT @@ -2424,6 +2425,7 @@ be fully mitigated, even if it means losing SMT. Equivalent to: l1tf=flush,nosmt [X86] mds=full,nosmt [X86] + tsx_async_abort=full,nosmt [X86] mminit_loglevel= [KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this @@ -4534,6 +4536,42 @@ See Documentation/admin-guide/hw-vuln/tsx_async_abort.rst for more details. + tsx_async_abort= [X86,INTEL] Control mitigation for the TSX Async + Abort (TAA) vulnerability. + + Similar to Micro-architectural Data Sampling (MDS) + certain CPUs that support Transactional + Synchronization Extensions (TSX) are vulnerable to an + exploit against CPU internal buffers which can forward + information to a disclosure gadget under certain + conditions. + + In vulnerable processors, the speculatively forwarded + data can be used in a cache side channel attack, to + access data to which the attacker does not have direct + access. + + This parameter controls the TAA mitigation. The + options are: + + full - Enable TAA mitigation on vulnerable CPUs + if TSX is enabled. + + full,nosmt - Enable TAA mitigation and disable SMT on + vulnerable CPUs. If TSX is disabled, SMT + is not disabled because CPU is not + vulnerable to cross-thread TAA attacks. + off - Unconditionally disable TAA mitigation + + Not specifying this option is equivalent to + tsx_async_abort=full. On CPUs which are MDS affected + and deploy MDS mitigation, TAA mitigation is not + required and doesn't provide any additional + mitigation. + + For details see: + Documentation/admin-guide/hw-vuln/tsx_async_abort.rst + turbografx.map[2|3]= [HW,JOY] TurboGraFX parallel port interface Format: diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index ef389dcf1b1d..0780d55c5aa8 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -6,3 +6,4 @@ x86 architecture specifics :maxdepth: 1 mds + tsx_async_abort diff --git a/Documentation/x86/tsx_async_abort.rst b/Documentation/x86/tsx_async_abort.rst new file mode 100644 index 000000000000..583ddc185ba2 --- /dev/null +++ b/Documentation/x86/tsx_async_abort.rst @@ -0,0 +1,117 @@ +.. SPDX-License-Identifier: GPL-2.0 + +TSX Async Abort (TAA) mitigation +================================ + +.. _tsx_async_abort: + +Overview +-------- + +TSX Async Abort (TAA) is a side channel attack on internal buffers in some +Intel processors similar to Microachitectural Data Sampling (MDS). In this +case certain loads may speculatively pass invalid data to dependent operations +when an asynchronous abort condition is pending in a Transactional +Synchronization Extensions (TSX) transaction. This includes loads with no +fault or assist condition. Such loads may speculatively expose stale data from +the same uarch data structures as in MDS, with same scope of exposure i.e. +same-thread and cross-thread. This issue affects all current processors that +support TSX. + +Mitigation strategy +------------------- + +a) TSX disable - one of the mitigations is to disable TSX. A new MSR +IA32_TSX_CTRL will be available in future and current processors after +microcode update which can be used to disable TSX. In addition, it +controls the enumeration of the TSX feature bits (RTM and HLE) in CPUID. + +b) Clear CPU buffers - similar to MDS, clearing the CPU buffers mitigates this +vulnerability. More details on this approach can be found in +:ref:`Documentation/admin-guide/hw-vuln/mds.rst `. + +Kernel internal mitigation modes +-------------------------------- + + ============= ============================================================ + off Mitigation is disabled. Either the CPU is not affected or + tsx_async_abort=off is supplied on the kernel command line. + + tsx disabled Mitigation is enabled. TSX feature is disabled by default at + bootup on processors that support TSX control. + + verw Mitigation is enabled. CPU is affected and MD_CLEAR is + advertised in CPUID. + + ucode needed Mitigation is enabled. CPU is affected and MD_CLEAR is not + advertised in CPUID. That is mainly for virtualization + scenarios where the host has the updated microcode but the + hypervisor does not expose MD_CLEAR in CPUID. It's a best + effort approach without guarantee. + ============= ============================================================ + +If the CPU is affected and the "tsx_async_abort" kernel command line parameter is +not provided then the kernel selects an appropriate mitigation depending on the +status of RTM and MD_CLEAR CPUID bits. + +Below tables indicate the impact of tsx=on|off|auto cmdline options on state of +TAA mitigation, VERW behavior and TSX feature for various combinations of +MSR_IA32_ARCH_CAPABILITIES bits. + +1. "tsx=off" + +========= ========= ============ ============ ============== =================== ====================== +MSR_IA32_ARCH_CAPABILITIES bits Result with cmdline tsx=off +---------------------------------- ------------------------------------------------------------------------- +TAA_NO MDS_NO TSX_CTRL_MSR TSX state VERW can clear TAA mitigation TAA mitigation + after bootup CPU buffers tsx_async_abort=off tsx_async_abort=full +========= ========= ============ ============ ============== =================== ====================== + 0 0 0 HW default Yes Same as MDS Same as MDS + 0 0 1 Invalid case Invalid case Invalid case Invalid case + 0 1 0 HW default No Need ucode update Need ucode update + 0 1 1 Disabled Yes TSX disabled TSX disabled + 1 X 1 Disabled X None needed None needed +========= ========= ============ ============ ============== =================== ====================== + +2. "tsx=on" + +========= ========= ============ ============ ============== =================== ====================== +MSR_IA32_ARCH_CAPABILITIES bits Result with cmdline tsx=on +---------------------------------- ------------------------------------------------------------------------- +TAA_NO MDS_NO TSX_CTRL_MSR TSX state VERW can clear TAA mitigation TAA mitigation + after bootup CPU buffers tsx_async_abort=off tsx_async_abort=full +========= ========= ============ ============ ============== =================== ====================== + 0 0 0 HW default Yes Same as MDS Same as MDS + 0 0 1 Invalid case Invalid case Invalid case Invalid case + 0 1 0 HW default No Need ucode update Need ucode update + 0 1 1 Enabled Yes None Same as MDS + 1 X 1 Enabled X None needed None needed +========= ========= ============ ============ ============== =================== ====================== + +3. "tsx=auto" + +========= ========= ============ ============ ============== =================== ====================== +MSR_IA32_ARCH_CAPABILITIES bits Result with cmdline tsx=auto +---------------------------------- ------------------------------------------------------------------------- +TAA_NO MDS_NO TSX_CTRL_MSR TSX state VERW can clear TAA mitigation TAA mitigation + after bootup CPU buffers tsx_async_abort=off tsx_async_abort=full +========= ========= ============ ============ ============== =================== ====================== + 0 0 0 HW default Yes Same as MDS Same as MDS + 0 0 1 Invalid case Invalid case Invalid case Invalid case + 0 1 0 HW default No Need ucode update Need ucode update + 0 1 1 Disabled Yes TSX disabled TSX disabled + 1 X 1 Enabled X None needed None needed +========= ========= ============ ============ ============== =================== ====================== + +In the tables, TSX_CTRL_MSR is a new bit in MSR_IA32_ARCH_CAPABILITIES that +indicates whether MSR_IA32_TSX_CTRL is supported. + +There are two control bits in IA32_TSX_CTRL MSR: + + Bit 0: When set it disables the Restricted Transactional Memory (RTM) + sub-feature of TSX (will force all transactions to abort on the + XBEGIN instruction). + + Bit 1: When set it disables the enumeration of the RTM and HLE feature + (i.e. it will make CPUID(EAX=7).EBX{bit4} and + CPUID(EAX=7).EBX{bit11} read as 0). -- cgit v1.2.3 From 56a0f3867c1bc40c6f155c780a284cc881a89488 Mon Sep 17 00:00:00 2001 From: Vineela Tummalapalli Date: Mon, 4 Nov 2019 12:22:01 +0100 Subject: x86/bugs: Add ITLB_MULTIHIT bug infrastructure commit db4d30fbb71b47e4ecb11c4efa5d8aad4b03dfae upstream. Some processors may incur a machine check error possibly resulting in an unrecoverable CPU lockup when an instruction fetch encounters a TLB multi-hit in the instruction TLB. This can occur when the page size is changed along with either the physical address or cache type. The relevant erratum can be found here: https://bugzilla.kernel.org/show_bug.cgi?id=205195 There are other processors affected for which the erratum does not fully disclose the impact. This issue affects both bare-metal x86 page tables and EPT. It can be mitigated by either eliminating the use of large pages or by using careful TLB invalidations when changing the page size in the page tables. Just like Spectre, Meltdown, L1TF and MDS, a new bit has been allocated in MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will be set on CPUs which are mitigated against this issue. Signed-off-by: Vineela Tummalapalli Co-developed-by: Pawan Gupta Signed-off-by: Pawan Gupta Signed-off-by: Paolo Bonzini Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- Documentation/ABI/testing/sysfs-devices-system-cpu | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 0fffc1c66da1..9ebca6a750f3 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -382,6 +382,7 @@ What: /sys/devices/system/cpu/vulnerabilities /sys/devices/system/cpu/vulnerabilities/l1tf /sys/devices/system/cpu/vulnerabilities/mds /sys/devices/system/cpu/vulnerabilities/tsx_async_abort + /sys/devices/system/cpu/vulnerabilities/itlb_multihit Date: January 2018 Contact: Linux kernel mailing list Description: Information about CPU vulnerabilities -- cgit v1.2.3 From cc5b0b7602f6f56cf6a03cbf091fc3ac2e4bb744 Mon Sep 17 00:00:00 2001 From: "Gomez Iglesias, Antonio" Date: Mon, 4 Nov 2019 12:22:03 +0100 Subject: Documentation: Add ITLB_MULTIHIT documentation commit 7f00cc8d4a51074eb0ad4c3f16c15757b1ddfb7d upstream. Add the initial ITLB_MULTIHIT documentation. [ tglx: Add it to the index so it gets actually built. ] Signed-off-by: Antonio Gomez Iglesias Signed-off-by: Nelson D'Souza Signed-off-by: Paolo Bonzini Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- Documentation/admin-guide/hw-vuln/index.rst | 1 + Documentation/admin-guide/hw-vuln/multihit.rst | 163 +++++++++++++++++++++++++ 2 files changed, 164 insertions(+) create mode 100644 Documentation/admin-guide/hw-vuln/multihit.rst (limited to 'Documentation') diff --git a/Documentation/admin-guide/hw-vuln/index.rst b/Documentation/admin-guide/hw-vuln/index.rst index 0802b1c67452..0795e3c2643f 100644 --- a/Documentation/admin-guide/hw-vuln/index.rst +++ b/Documentation/admin-guide/hw-vuln/index.rst @@ -13,3 +13,4 @@ are configurable at compile, boot or run time. l1tf mds tsx_async_abort + multihit.rst diff --git a/Documentation/admin-guide/hw-vuln/multihit.rst b/Documentation/admin-guide/hw-vuln/multihit.rst new file mode 100644 index 000000000000..ba9988d8bce5 --- /dev/null +++ b/Documentation/admin-guide/hw-vuln/multihit.rst @@ -0,0 +1,163 @@ +iTLB multihit +============= + +iTLB multihit is an erratum where some processors may incur a machine check +error, possibly resulting in an unrecoverable CPU lockup, when an +instruction fetch hits multiple entries in the instruction TLB. This can +occur when the page size is changed along with either the physical address +or cache type. A malicious guest running on a virtualized system can +exploit this erratum to perform a denial of service attack. + + +Affected processors +------------------- + +Variations of this erratum are present on most Intel Core and Xeon processor +models. The erratum is not present on: + + - non-Intel processors + + - Some Atoms (Airmont, Bonnell, Goldmont, GoldmontPlus, Saltwell, Silvermont) + + - Intel processors that have the PSCHANGE_MC_NO bit set in the + IA32_ARCH_CAPABILITIES MSR. + + +Related CVEs +------------ + +The following CVE entry is related to this issue: + + ============== ================================================= + CVE-2018-12207 Machine Check Error Avoidance on Page Size Change + ============== ================================================= + + +Problem +------- + +Privileged software, including OS and virtual machine managers (VMM), are in +charge of memory management. A key component in memory management is the control +of the page tables. Modern processors use virtual memory, a technique that creates +the illusion of a very large memory for processors. This virtual space is split +into pages of a given size. Page tables translate virtual addresses to physical +addresses. + +To reduce latency when performing a virtual to physical address translation, +processors include a structure, called TLB, that caches recent translations. +There are separate TLBs for instruction (iTLB) and data (dTLB). + +Under this errata, instructions are fetched from a linear address translated +using a 4 KB translation cached in the iTLB. Privileged software modifies the +paging structure so that the same linear address using large page size (2 MB, 4 +MB, 1 GB) with a different physical address or memory type. After the page +structure modification but before the software invalidates any iTLB entries for +the linear address, a code fetch that happens on the same linear address may +cause a machine-check error which can result in a system hang or shutdown. + + +Attack scenarios +---------------- + +Attacks against the iTLB multihit erratum can be mounted from malicious +guests in a virtualized system. + + +iTLB multihit system information +-------------------------------- + +The Linux kernel provides a sysfs interface to enumerate the current iTLB +multihit status of the system:whether the system is vulnerable and which +mitigations are active. The relevant sysfs file is: + +/sys/devices/system/cpu/vulnerabilities/itlb_multihit + +The possible values in this file are: + +.. list-table:: + + * - Not affected + - The processor is not vulnerable. + * - KVM: Mitigation: Split huge pages + - Software changes mitigate this issue. + * - KVM: Vulnerable + - The processor is vulnerable, but no mitigation enabled + + +Enumeration of the erratum +-------------------------------- + +A new bit has been allocated in the IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) msr +and will be set on CPU's which are mitigated against this issue. + + ======================================= =========== =============================== + IA32_ARCH_CAPABILITIES MSR Not present Possibly vulnerable,check model + IA32_ARCH_CAPABILITIES[PSCHANGE_MC_NO] '0' Likely vulnerable,check model + IA32_ARCH_CAPABILITIES[PSCHANGE_MC_NO] '1' Not vulnerable + ======================================= =========== =============================== + + +Mitigation mechanism +------------------------- + +This erratum can be mitigated by restricting the use of large page sizes to +non-executable pages. This forces all iTLB entries to be 4K, and removes +the possibility of multiple hits. + +In order to mitigate the vulnerability, KVM initially marks all huge pages +as non-executable. If the guest attempts to execute in one of those pages, +the page is broken down into 4K pages, which are then marked executable. + +If EPT is disabled or not available on the host, KVM is in control of TLB +flushes and the problematic situation cannot happen. However, the shadow +EPT paging mechanism used by nested virtualization is vulnerable, because +the nested guest can trigger multiple iTLB hits by modifying its own +(non-nested) page tables. For simplicity, KVM will make large pages +non-executable in all shadow paging modes. + +Mitigation control on the kernel command line and KVM - module parameter +------------------------------------------------------------------------ + +The KVM hypervisor mitigation mechanism for marking huge pages as +non-executable can be controlled with a module parameter "nx_huge_pages=". +The kernel command line allows to control the iTLB multihit mitigations at +boot time with the option "kvm.nx_huge_pages=". + +The valid arguments for these options are: + + ========== ================================================================ + force Mitigation is enabled. In this case, the mitigation implements + non-executable huge pages in Linux kernel KVM module. All huge + pages in the EPT are marked as non-executable. + If a guest attempts to execute in one of those pages, the page is + broken down into 4K pages, which are then marked executable. + + off Mitigation is disabled. + + auto Enable mitigation only if the platform is affected and the kernel + was not booted with the "mitigations=off" command line parameter. + This is the default option. + ========== ================================================================ + + +Mitigation selection guide +-------------------------- + +1. No virtualization in use +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + The system is protected by the kernel unconditionally and no further + action is required. + +2. Virtualization with trusted guests +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + If the guest comes from a trusted source, you may assume that the guest will + not attempt to maliciously exploit these errata and no further action is + required. + +3. Virtualization with untrusted guests +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + If the guest comes from an untrusted source, the guest host kernel will need + to apply iTLB multihit mitigation via the kernel command line or kvm + module parameter. -- cgit v1.2.3 From 05fe997e30d439e3ff0c7e7e46499e9d41d98ba7 Mon Sep 17 00:00:00 2001 From: Junaid Shahid Date: Thu, 3 Jan 2019 17:14:28 -0800 Subject: kvm: Convert kvm_lock to a mutex commit 0d9ce162cf46c99628cc5da9510b959c7976735b upstream. It doesn't seem as if there is any particular need for kvm_lock to be a spinlock, so convert the lock to a mutex so that sleepable functions (in particular cond_resched()) can be called while holding it. Signed-off-by: Junaid Shahid Signed-off-by: Paolo Bonzini Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- Documentation/virtual/kvm/locking.txt | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/locking.txt b/Documentation/virtual/kvm/locking.txt index 1bb8bcaf8497..635cd6eaf714 100644 --- a/Documentation/virtual/kvm/locking.txt +++ b/Documentation/virtual/kvm/locking.txt @@ -15,8 +15,6 @@ The acquisition orders for mutexes are as follows: On x86, vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock. -For spinlocks, kvm_lock is taken outside kvm->mmu_lock. - Everything else is a leaf: no other lock is taken inside the critical sections. @@ -169,7 +167,7 @@ which time it will be set using the Dirty tracking mechanism described above. ------------ Name: kvm_lock -Type: spinlock_t +Type: mutex Arch: any Protects: - vm_list -- cgit v1.2.3 From bb16a6ba5d1ed79b40caea8d924e237f63205b7c Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Mon, 4 Nov 2019 12:22:02 +0100 Subject: kvm: mmu: ITLB_MULTIHIT mitigation commit b8e8c8303ff28c61046a4d0f6ea99aea609a7dc0 upstream. With some Intel processors, putting the same virtual address in the TLB as both a 4 KiB and 2 MiB page can confuse the instruction fetch unit and cause the processor to issue a machine check resulting in a CPU lockup. Unfortunately when EPT page tables use huge pages, it is possible for a malicious guest to cause this situation. Add a knob to mark huge pages as non-executable. When the nx_huge_pages parameter is enabled (and we are using EPT), all huge pages are marked as NX. If the guest attempts to execute in one of those pages, the page is broken down into 4K pages, which are then marked executable. This is not an issue for shadow paging (except nested EPT), because then the host is in control of TLB flushes and the problematic situation cannot happen. With nested EPT, again the nested guest can cause problems shadow and direct EPT is treated in the same way. [ tglx: Fixup default to auto and massage wording a bit ] Originally-by: Junaid Shahid Signed-off-by: Paolo Bonzini Signed-off-by: Thomas Gleixner Signed-off-by: Paolo Bonzini Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- Documentation/admin-guide/kernel-parameters.txt | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) (limited to 'Documentation') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index d16b3d41ffe5..496bc24733a6 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1852,6 +1852,19 @@ KVM MMU at runtime. Default is 0 (off) + kvm.nx_huge_pages= + [KVM] Controls the software workaround for the + X86_BUG_ITLB_MULTIHIT bug. + force : Always deploy workaround. + off : Never deploy workaround. + auto : Deploy workaround based on the presence of + X86_BUG_ITLB_MULTIHIT. + + Default is 'auto'. + + If the software workaround is enabled for the host, + guests do need not to enable it for nested guests. + kvm-amd.nested= [KVM,AMD] Allow nested virtualization in KVM/SVM. Default is 1 (enabled) @@ -2410,6 +2423,12 @@ l1tf=off [X86] mds=off [X86] tsx_async_abort=off [X86] + kvm.nx_huge_pages=off [X86] + + Exceptions: + This does not have any effect on + kvm.nx_huge_pages when + kvm.nx_huge_pages=force. auto (default) Mitigate all CPU vulnerabilities, but leave SMT -- cgit v1.2.3 From 2d371f8836c5d633f9f495c9165eaf814643539d Mon Sep 17 00:00:00 2001 From: Junaid Shahid Date: Fri, 1 Nov 2019 00:14:14 +0100 Subject: kvm: x86: mmu: Recovery of shattered NX large pages commit 1aa9b9572b10529c2e64e2b8f44025d86e124308 upstream. The page table pages corresponding to broken down large pages are zapped in FIFO order, so that the large page can potentially be recovered, if it is not longer being used for execution. This removes the performance penalty for walking deeper EPT page tables. By default, one large page will last about one hour once the guest reaches a steady state. Signed-off-by: Junaid Shahid Signed-off-by: Paolo Bonzini Signed-off-by: Thomas Gleixner Signed-off-by: Paolo Bonzini Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- Documentation/admin-guide/kernel-parameters.txt | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'Documentation') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 496bc24733a6..05596e05bc71 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1865,6 +1865,12 @@ If the software workaround is enabled for the host, guests do need not to enable it for nested guests. + kvm.nx_huge_pages_recovery_ratio= + [KVM] Controls how many 4KiB pages are periodically zapped + back to huge pages. 0 disables the recovery, otherwise if + the value is N KVM will zap 1/Nth of the 4KiB pages every + minute. The default is 60. + kvm-amd.nested= [KVM,AMD] Allow nested virtualization in KVM/SVM. Default is 1 (enabled) -- cgit v1.2.3 From 9b8ba684bee9254e3a31f4675b7efca7589324d0 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Thu, 14 Jun 2018 09:48:07 -0400 Subject: media: dt-bindings: adv748x: Fix decimal unit addresses [ Upstream commit 27582f0ea97fe3e4a38beb98ab36cce4b6f029d5 ] With recent dtc and W=1: Warning (graph_port): video-receiver@70/port@10: graph node unit address error, expected "a" Warning (graph_port): video-receiver@70/port@11: graph node unit address error, expected "b" Unit addresses are always hexadecimal (without prefix), while the bases of reg property values depend on their prefixes. Fixes: e69595170b1cad85 ("media: adv748x: Add adv7481, adv7482 bindings") Signed-off-by: Geert Uytterhoeven Reviewed-by: Rob Herring Reviewed-by: Kieran Bingham Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin --- Documentation/devicetree/bindings/media/i2c/adv748x.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/media/i2c/adv748x.txt b/Documentation/devicetree/bindings/media/i2c/adv748x.txt index 21ffb5ed8183..54d1d3bc1869 100644 --- a/Documentation/devicetree/bindings/media/i2c/adv748x.txt +++ b/Documentation/devicetree/bindings/media/i2c/adv748x.txt @@ -73,7 +73,7 @@ Example: }; }; - port@10 { + port@a { reg = <10>; adv7482_txa: endpoint { @@ -83,7 +83,7 @@ Example: }; }; - port@11 { + port@b { reg = <11>; adv7482_txb: endpoint { -- cgit v1.2.3 From 700feda306559979fe5daccefa07365332408f50 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Thu, 20 Sep 2018 17:05:40 -0700 Subject: net: phy: mdio-bcm-unimac: Allow configuring MDIO clock divider [ Upstream commit b78ac6ecd1b6b46f8767cbafa95a7b0b51b87ad8 ] Allow the configuration of the MDIO clock divider when the Device Tree contains 'clock-frequency' property (similar to I2C and SPI buses). Because the hardware may have lost its state during suspend/resume, re-apply the MDIO clock divider upon resumption. Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt | 3 +++ 1 file changed, 3 insertions(+) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt b/Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt index 4648948f7c3b..e15589f47787 100644 --- a/Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt +++ b/Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt @@ -19,6 +19,9 @@ Optional properties: - interrupt-names: must be "mdio_done_error" when there is a share interrupt fed to this hardware block, or must be "mdio_done" for the first interrupt and "mdio_error" for the second when there are separate interrupts +- clocks: A reference to the clock supplying the MDIO bus controller +- clock-frequency: the MDIO bus clock that must be output by the MDIO bus + hardware, if absent, the default hardware values are used Child nodes of this MDIO bus controller node are standard Ethernet PHY device nodes as described in Documentation/devicetree/bindings/net/phy.txt -- cgit v1.2.3 From d68d0c043eaa7d2f3e1ef3070116076983026fd5 Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Fri, 15 Nov 2019 11:14:44 -0500 Subject: x86/speculation: Fix incorrect MDS/TAA mitigation status commit 64870ed1b12e235cfca3f6c6da75b542c973ff78 upstream. For MDS vulnerable processors with TSX support, enabling either MDS or TAA mitigations will enable the use of VERW to flush internal processor buffers at the right code path. IOW, they are either both mitigated or both not. However, if the command line options are inconsistent, the vulnerabilites sysfs files may not report the mitigation status correctly. For example, with only the "mds=off" option: vulnerabilities/mds:Vulnerable; SMT vulnerable vulnerabilities/tsx_async_abort:Mitigation: Clear CPU buffers; SMT vulnerable The mds vulnerabilities file has wrong status in this case. Similarly, the taa vulnerability file will be wrong with mds mitigation on, but taa off. Change taa_select_mitigation() to sync up the two mitigation status and have them turned off if both "mds=off" and "tsx_async_abort=off" are present. Update documentation to emphasize the fact that both "mds=off" and "tsx_async_abort=off" have to be specified together for processors that are affected by both TAA and MDS to be effective. [ bp: Massage and add kernel-parameters.txt change too. ] Fixes: 1b42f017415b ("x86/speculation/taa: Add mitigation for TSX Async Abort") Signed-off-by: Waiman Long Signed-off-by: Borislav Petkov Cc: Greg Kroah-Hartman Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jiri Kosina Cc: Jonathan Corbet Cc: Josh Poimboeuf Cc: linux-doc@vger.kernel.org Cc: Mark Gross Cc: Cc: Pawan Gupta Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Tim Chen Cc: Tony Luck Cc: Tyler Hicks Cc: x86-ml Link: https://lkml.kernel.org/r/20191115161445.30809-2-longman@redhat.com Signed-off-by: Greg Kroah-Hartman --- Documentation/admin-guide/hw-vuln/mds.rst | 7 +++++-- Documentation/admin-guide/hw-vuln/tsx_async_abort.rst | 5 ++++- Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++++ 3 files changed, 20 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/hw-vuln/mds.rst b/Documentation/admin-guide/hw-vuln/mds.rst index e3a796c0d3a2..2d19c9f4c1fe 100644 --- a/Documentation/admin-guide/hw-vuln/mds.rst +++ b/Documentation/admin-guide/hw-vuln/mds.rst @@ -265,8 +265,11 @@ time with the option "mds=". The valid arguments for this option are: ============ ============================================================= -Not specifying this option is equivalent to "mds=full". - +Not specifying this option is equivalent to "mds=full". For processors +that are affected by both TAA (TSX Asynchronous Abort) and MDS, +specifying just "mds=off" without an accompanying "tsx_async_abort=off" +will have no effect as the same mitigation is used for both +vulnerabilities. Mitigation selection guide -------------------------- diff --git a/Documentation/admin-guide/hw-vuln/tsx_async_abort.rst b/Documentation/admin-guide/hw-vuln/tsx_async_abort.rst index fddbd7579c53..af6865b822d2 100644 --- a/Documentation/admin-guide/hw-vuln/tsx_async_abort.rst +++ b/Documentation/admin-guide/hw-vuln/tsx_async_abort.rst @@ -174,7 +174,10 @@ the option "tsx_async_abort=". The valid arguments for this option are: CPU is not vulnerable to cross-thread TAA attacks. ============ ============================================================= -Not specifying this option is equivalent to "tsx_async_abort=full". +Not specifying this option is equivalent to "tsx_async_abort=full". For +processors that are affected by both TAA and MDS, specifying just +"tsx_async_abort=off" without an accompanying "mds=off" will have no +effect as the same mitigation is used for both vulnerabilities. The kernel command line also allows to control the TSX feature using the parameter "tsx=" on CPUs which support TSX control. MSR_IA32_TSX_CTRL is used diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 05596e05bc71..b0da6050a254 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2254,6 +2254,12 @@ SMT on vulnerable CPUs off - Unconditionally disable MDS mitigation + On TAA-affected machines, mds=off can be prevented by + an active TAA mitigation as both vulnerabilities are + mitigated with the same mechanism so in order to disable + this mitigation, you need to specify tsx_async_abort=off + too. + Not specifying this option is equivalent to mds=full. @@ -4588,6 +4594,11 @@ vulnerable to cross-thread TAA attacks. off - Unconditionally disable TAA mitigation + On MDS-affected machines, tsx_async_abort=off can be + prevented by an active MDS mitigation as both vulnerabilities + are mitigated with the same mechanism so in order to disable + this mitigation, you need to specify mds=off too. + Not specifying this option is equivalent to tsx_async_abort=full. On CPUs which are MDS affected and deploy MDS mitigation, TAA mitigation is not -- cgit v1.2.3 From 0b2db05c8e561a53494fcac016d145dcfaa4fa7f Mon Sep 17 00:00:00 2001 From: Peter Hutterer Date: Thu, 13 Dec 2018 11:28:51 +1000 Subject: HID: doc: fix wrong data structure reference for UHID_OUTPUT [ Upstream commit 46b14eef59a8157138dc02f916a7f97c73b3ec53 ] Signed-off-by: Peter Hutterer Signed-off-by: Jiri Kosina Signed-off-by: Sasha Levin --- Documentation/hid/uhid.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/hid/uhid.txt b/Documentation/hid/uhid.txt index c8656dd029a9..958fff945304 100644 --- a/Documentation/hid/uhid.txt +++ b/Documentation/hid/uhid.txt @@ -160,7 +160,7 @@ them but you should handle them according to your needs. UHID_OUTPUT: This is sent if the HID device driver wants to send raw data to the I/O device on the interrupt channel. You should read the payload and forward it to - the device. The payload is of type "struct uhid_data_req". + the device. The payload is of type "struct uhid_output_req". This may be received even though you haven't received UHID_OPEN, yet. UHID_GET_REPORT: -- cgit v1.2.3 From ecebcf49dded44233c891da987cc2b6224b2ff90 Mon Sep 17 00:00:00 2001 From: Baruch Siach Date: Mon, 19 Nov 2018 14:34:02 +0200 Subject: rtc: dt-binding: abx80x: fix resistance scale [ Upstream commit 73852e56827f5cb5db9d6e8dd8191fc2f2e8f424 ] The abracon,tc-resistor property value is in kOhm. Signed-off-by: Baruch Siach Signed-off-by: Alexandre Belloni Signed-off-by: Sasha Levin --- Documentation/devicetree/bindings/rtc/abracon,abx80x.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/rtc/abracon,abx80x.txt b/Documentation/devicetree/bindings/rtc/abracon,abx80x.txt index be789685a1c2..18b892d010d8 100644 --- a/Documentation/devicetree/bindings/rtc/abracon,abx80x.txt +++ b/Documentation/devicetree/bindings/rtc/abracon,abx80x.txt @@ -27,4 +27,4 @@ and valid to enable charging: - "abracon,tc-diode": should be "standard" (0.6V) or "schottky" (0.3V) - "abracon,tc-resistor": should be <0>, <3>, <6> or <11>. 0 disables the output - resistor, the other values are in ohm. + resistor, the other values are in kOhm. -- cgit v1.2.3 From 657d54ede01239ea658a436f0d3407febfbbef91 Mon Sep 17 00:00:00 2001 From: Oliver Neukum Date: Thu, 14 Nov 2019 12:27:58 +0100 Subject: USB: documentation: flags on usb-storage versus UAS commit 65cc8bf99349f651a0a2cee69333525fe581f306 upstream. Document which flags work storage, UAS or both Signed-off-by: Oliver Neukum Cc: stable Link: https://lore.kernel.org/r/20191114112758.32747-4-oneukum@suse.com Signed-off-by: Greg Kroah-Hartman --- Documentation/admin-guide/kernel-parameters.txt | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index b0da6050a254..933465eff40e 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4693,13 +4693,13 @@ Flags is a set of characters, each corresponding to a common usb-storage quirk flag as follows: a = SANE_SENSE (collect more than 18 bytes - of sense data); + of sense data, not on uas); b = BAD_SENSE (don't collect more than 18 - bytes of sense data); + bytes of sense data, not on uas); c = FIX_CAPACITY (decrease the reported device capacity by one sector); d = NO_READ_DISC_INFO (don't use - READ_DISC_INFO command); + READ_DISC_INFO command, not on uas); e = NO_READ_CAPACITY_16 (don't use READ_CAPACITY_16 command); f = NO_REPORT_OPCODES (don't use report opcodes @@ -4714,17 +4714,18 @@ j = NO_REPORT_LUNS (don't use report luns command, uas only); l = NOT_LOCKABLE (don't try to lock and - unlock ejectable media); + unlock ejectable media, not on uas); m = MAX_SECTORS_64 (don't transfer more - than 64 sectors = 32 KB at a time); + than 64 sectors = 32 KB at a time, + not on uas); n = INITIAL_READ10 (force a retry of the - initial READ(10) command); + initial READ(10) command, not on uas); o = CAPACITY_OK (accept the capacity - reported by the device); + reported by the device, not on uas); p = WRITE_CACHE (the device cache is ON - by default); + by default, not on uas); r = IGNORE_RESIDUE (the device reports - bogus residue values); + bogus residue values, not on uas); s = SINGLE_LUN (the device has only one Logical Unit); t = NO_ATA_1X (don't allow ATA(12) and ATA(16) @@ -4733,7 +4734,8 @@ w = NO_WP_DETECT (don't test whether the medium is write-protected). y = ALWAYS_SYNC (issue a SYNCHRONIZE_CACHE - even if the device claims no cache) + even if the device claims no cache, + not on uas) Example: quirks=0419:aaf5:rl,0421:0433:rc user_debug= [KNL,ARM] -- cgit v1.2.3