summaryrefslogtreecommitdiff
path: root/Documentation/admin-guide
diff options
context:
space:
mode:
authorThomas Gleixner <tglx@linutronix.de>2018-06-29 16:05:47 +0200
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>2018-08-15 18:12:53 +0200
commitf3e68ab4e778e575ab9fba10894387d177af510a (patch)
tree7cc808186aed3eec2b554d7f790a43ef7c018de2 /Documentation/admin-guide
parent49221fc7e775656e87978aa56519fab12bb2560b (diff)
Revert "x86/apic: Ignore secondary threads if nosmt=force"
commit 506a66f374891ff08e064a058c446b336c5ac760 upstream Dave Hansen reported, that it's outright dangerous to keep SMT siblings disabled completely so they are stuck in the BIOS and wait for SIPI. The reason is that Machine Check Exceptions are broadcasted to siblings and the soft disabled sibling has CR4.MCE = 0. If a MCE is delivered to a logical core with CR4.MCE = 0, it asserts IERR#, which shuts down or reboots the machine. The MCE chapter in the SDM contains the following blurb: Because the logical processors within a physical package are tightly coupled with respect to shared hardware resources, both logical processors are notified of machine check errors that occur within a given physical processor. If machine-check exceptions are enabled when a fatal error is reported, all the logical processors within a physical package are dispatched to the machine-check exception handler. If machine-check exceptions are disabled, the logical processors enter the shutdown state and assert the IERR# signal. When enabling machine-check exceptions, the MCE flag in control register CR4 should be set for each logical processor. Reverting the commit which ignores siblings at enumeration time solves only half of the problem. The core cpuhotplug logic needs to be adjusted as well. This thoughtful engineered mechanism also turns the boot process on all Intel HT enabled systems into a MCE lottery. MCE is enabled on the boot CPU before the secondary CPUs are brought up. Depending on the number of physical cores the window in which this situation can happen is smaller or larger. On a HSW-EX it's about 750ms: MCE is enabled on the boot CPU: [ 0.244017] mce: CPU supports 22 MCE banks The corresponding sibling #72 boots: [ 1.008005] .... node #0, CPUs: #72 That means if an MCE hits on physical core 0 (logical CPUs 0 and 72) between these two points the machine is going to shutdown. At least it's a known safe state. It's obvious that the early boot can be hit by an MCE as well and then runs into the same situation because MCEs are not yet enabled on the boot CPU. But after enabling them on the boot CPU, it does not make any sense to prevent the kernel from recovering. Adjust the nosmt kernel parameter documentation as well. Reverts: 2207def700f9 ("x86/apic: Ignore secondary threads if nosmt=force") Reported-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Diffstat (limited to 'Documentation/admin-guide')
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt8
1 files changed, 2 insertions, 6 deletions
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 38d75cc78a3d..9f3ffe23a7c9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2596,12 +2596,8 @@
Equivalent to smt=1.
[KNL,x86] Disable symmetric multithreading (SMT).
- nosmt=force: Force disable SMT, similar to disabling
- it in the BIOS except that some of the
- resource partitioning effects which are
- caused by having SMT enabled in the BIOS
- cannot be undone. Depending on the CPU
- type this might have a performance impact.
+ nosmt=force: Force disable SMT, cannot be undone
+ via the sysfs control file.
nospectre_v2 [X86] Disable all mitigations for the Spectre variant 2
(indirect branch prediction) vulnerability. System may