summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/cpu/amd.c
AgeCommit message (Collapse)Author
2018-01-17x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSCTom Lendacky
commit 9c6a73c75864ad9fa49e5fa6513e4c4071c0e29f upstream. With LFENCE now a serializing instruction, use LFENCE_RDTSC in preference to MFENCE_RDTSC. However, since the kernel could be running under a hypervisor that does not support writing that MSR, read the MSR back and verify that the bit has been set successfully. If the MSR can be read and the bit is set, then set the LFENCE_RDTSC feature, otherwise set the MFENCE_RDTSC feature. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/20180108220932.12580.52458.stgit@tlendack-t1.amdoffice.net Signed-off-by: Razvan Ghitulete <rga@amazon.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17x86/cpu/AMD: Make LFENCE a serializing instructionTom Lendacky
commit e4d0e84e490790798691aaa0f2e598637f1867ec upstream. To aid in speculation control, make LFENCE a serializing instruction since it has less overhead than MFENCE. This is done by setting bit 1 of MSR 0xc0011029 (DE_CFG). Some families that support LFENCE do not have this MSR. For these families, the LFENCE instruction is already serializing. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/20180108220921.12580.71694.stgit@tlendack-t1.amdoffice.net Signed-off-by: Razvan Ghitulete <rga@amazon.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-14x86/CPU/AMD: Fix Zen SMT topologyYazen Ghannam
commit 08b259631b5a1d912af4832847b5642f377d9101 upstream. After: a33d331761bc ("x86/CPU/AMD: Fix Bulldozer topology") our SMT scheduling topology for Fam17h systems is broken, because the ThreadId is included in the ApicId when SMT is enabled. So, without further decoding cpu_core_id is unique for each thread rather than the same for threads on the same core. This didn't affect systems with SMT disabled. Make cpu_core_id be what it is defined to be. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170205105022.8705-2-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-14x86/CPU/AMD: Bring back Compute Unit IDBorislav Petkov
commit 79a8b9aa388b0620cc1d525d7c0f0d9a8a85e08e upstream. Commit: a33d331761bc ("x86/CPU/AMD: Fix Bulldozer topology") restored the initial approach we had with the Fam15h topology of enumerating CU (Compute Unit) threads as cores. And this is still correct - they're beefier than HT threads but still have some shared functionality. Our current approach has a problem with the Mad Max Steam game, for example. Yves Dionne reported a certain "choppiness" while playing on v4.9.5. That problem stems most likely from the fact that the CU threads share resources within one CU and when we schedule to a thread of a different compute unit, this incurs latency due to migrating the working set to a different CU through the caches. When the thread siblings mask mirrors that aspect of the CUs and threads, the scheduler pays attention to it and tries to schedule within one CU first. Which takes care of the latency, of course. Reported-by: Yves Dionne <yves.dionne@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20170205105022.8705-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-19x86/CPU/AMD: Fix Bulldozer topologyBorislav Petkov
commit a33d331761bc5dd330499ca5ceceb67f0640a8e6 upstream. The following commit: 8196dab4fc15 ("x86/cpu: Get rid of compute_unit_id") ... broke the initial strategy for Bulldozer-based cores' topology, where we consider each thread of a compute unit a standalone core and not a HT or SMT thread. Revert to the firmware-supplied core_id numbering and do not make them thread siblings as we don't consider them for such even if they technically are, more or less. Reported-and-tested-by: Brice Goglin <Brice.Goglin@inria.fr> Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 8196dab4fc15 ("x86/cpu: Get rid of compute_unit_id") Link: http://lkml.kernel.org/r/20170105092638.5247-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-19x86/bugs: Separate AMD E400 erratum and C1E bugThomas Gleixner
commit 3344ed30791af66dbbad5f375008f3d1863b6c99 upstream. The workaround for the AMD Erratum E400 (Local APIC timer stops in C1E state) is a two step process: - Selection of the E400 aware idle routine - Detection whether the platform is affected The idle routine selection happens for possibly affected CPUs depending on family/model/stepping information. These range of CPUs is not necessarily affected as the decision whether to enable the C1E feature is made by the firmware. Unfortunately there is no way to query this at early boot. The current implementation polls a MSR in the E400 aware idle routine to detect whether the CPU is affected. This is inefficient on non affected CPUs because every idle entry has to do the MSR read. There is a better way to detect this before going idle for the first time which requires to seperate the bug flags: X86_BUG_AMD_E400 - Selects the E400 aware idle routine and enables the detection X86_BUG_AMD_APIC_C1E - Set when the platform is affected by E400 Replace the current X86_BUG_AMD_APIC_C1E usage by the new X86_BUG_AMD_E400 bug bit to select the idle routine which currently does an unconditional detection poll. X86_BUG_AMD_APIC_C1E is going to be used in later patches to remove the MSR polling and simplify the handling of this misfeature. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20161209182912.2726-3-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-19x86/cpu/AMD: Clean up cpu_llc_id assignment per topology featureYazen Ghannam
commit b6a50cddbcbda7105355898ead18f1a647c22520 upstream. These changes do not affect current hw - just a cleanup: Currently, we assume that a system has a single Last Level Cache (LLC) per node, and that the cpu_llc_id is thus equal to the node_id. This no longer applies since Fam17h can have multiple last level caches within a node. So group the cpu_llc_id assignment by topology feature and family in order to make the computation of cpu_llc_id on the different families more clear. Here is how the LLC ID is being computed on the different families: The NODEID_MSR feature only applies to Fam10h in which case the LLC is at the node level. The TOPOEXT feature is used on families 15h, 16h and 17h. So far we only see multiple last level caches if L3 caches are available. Otherwise, the cpu_llc_id will default to be the phys_proc_id. We have L3 caches only on families 15h and 17h: - on Fam15h, the LLC is at the node level. - on Fam17h, the LLC is at the core complex level and can be found by right shifting the APIC ID. Also, keep the family checks explicit so that new families will fall back to the default, which will be node_id for TOPOEXT systems. Single node systems in families 10h and 15h will have a Node ID of 0 which will be the same as the phys_proc_id, so we don't need to check for multiple nodes before using the node_id. Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> [ Rewrote the commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161108153054.bs3sajbyevq6a6uu@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-09x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systemsYazen Ghannam
cpu_llc_id (Last Level Cache ID) derivation on AMD Fam17h has an underflow bug when extracting the socket_id value. It starts from 0 so subtracting 1 from it will result in an invalid value. This breaks scheduling topology later on since the cpu_llc_id will be incorrect. For example, the the cpu_llc_id of the *other* CPU in the loops in set_cpu_sibling_map() underflows and we're generating the funniest thread_siblings masks and then when I run 8 threads of nbench, they get spread around the LLC domains in a very strange pattern which doesn't give you the normal scheduling spread one would expect for performance. Other things like EDAC use cpu_llc_id so they will be b0rked too. So, the APIC ID is preset in APICx020 for bits 3 and above: they contain the core complex, node and socket IDs. The LLC is at the core complex level so we can find a unique cpu_llc_id by right shifting the APICID by 3 because then the least significant bit will be the Core Complex ID. Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> [ Cleaned up and extended the commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> # v4.4.. Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 3849e91f571d ("x86/AMD: Fix last level cache topology for AMD Fam17h systems") Link: http://lkml.kernel.org/r/20161108083506.rvqb5h4chrcptj7d@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-02x86/AMD: Apply erratum 665 on machines without a BIOS fixEmanuel Czirai
AMD F12h machines have an erratum which can cause DIV/IDIV to behave unpredictably. The workaround is to set MSRC001_1029[31] but sometimes there is no BIOS update containing that workaround so let's do it ourselves unconditionally. It is simple enough. [ Borislav: Wrote commit message. ] Signed-off-by: Emanuel Czirai <icanrealizeum@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Yaowu Xu <yaowu@google.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20160902053550.18097-1-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-06-08x86/cpu/AMD: Extend X86_FEATURE_TOPOEXT workaround to newer modelsBorislav Petkov
We need to reenable the topology extensions CPUID leafs on newer models too, if BIOS has disabled them, as we rely on them to get proper compute unit topology. Make the printk a once thing, while at it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rui Huang <ray.huang@amd.com> Cc: Sherry Hurwitz <sherry.hurwitz@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-hwmon@vger.kernel.org Link: http://lkml.kernel.org/r/1464775468-23355-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13x86/cpu: Simplify extended APIC ID detection on AMDBorislav Petkov
Both if-branches are under if (boot_cpu_has(X86_FEATURE_APIC)), unify them. Also, simplify the test for bits: - 17 ("ApicExtBrdCst: APIC extended broadcast enable") and - 18 ("ApicExtId: APIC extended ID enable.") in "D18F0x68 Link Transaction Control." No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1459837795-2588-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13x86/cpufeature: Replace cpu_has_apic with boot_cpu_has() usageBorislav Petkov
Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: iommu@lists.linux-foundation.org Cc: linux-pm@vger.kernel.org Cc: oprofile-list@lists.sf.net Link: http://lkml.kernel.org/r/1459801503-15600-8-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13Merge branch 'x86/urgent' into x86/asm to pick up dependent fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13Merge branch 'x86/cpu' into x86/asm, to merge more patchesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13x86/cpu: Add Erratum 88 detection on AMDBorislav Petkov
Erratum 88 affects old AMD K8s, where a SWAPGS fails to cause an input dependency on GS. Therefore, we need to MFENCE before it. But that MFENCE is expensive and unnecessary on the remaining x86 CPUs out there so patch it out on the CPUs which don't require it. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rudolf Marek <r.marek@assembler.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/aec6b2df1bfc56101d4e9e2e5d5d570bf41663c6.1460075211.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-31x86/cpufeature: Remove cpu_has_xmm2Borislav Petkov
Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-crypto@vger.kernel.org Link: http://lkml.kernel.org/r/1459266123-21878-8-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-29x86/cpu: Get rid of compute_unit_idBorislav Petkov
It is cpu_core_id anyway. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1458917557-8757-3-git-send-email-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-29x86/topology: Fix AMD core countPeter Zijlstra
It turns out AMD gets x86_max_cores wrong when there are compute units. The issue is that Linux assumes: nr_logical_cpus = nr_cores * nr_siblings But AMD reports its CU unit as 2 cores, but then sets num_smp_siblings to 2 as well. Boris: fixup ras/mce_amd_inj.c too, to compute the Node Base Core properly, according to the new nomenclature. Fixes: 1f12e32f4cd5 ("x86/topology: Create logical package id") Reported-by: Xiong Zhou <jencce.kernel@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andreas Herrmann <aherrmann@suse.com> Cc: Andy Lutomirski <luto@kernel.org> Link: http://lkml.kernel.org/r/20160317095220.GO6344@twins.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-24Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "This tree contains various perf fixes on the kernel side, plus three hw/event-enablement late additions: - Intel Memory Bandwidth Monitoring events and handling - the AMD Accumulated Power Mechanism reporting facility - more IOMMU events ... and a final round of perf tooling updates/fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) perf llvm: Use strerror_r instead of the thread unsafe strerror one perf llvm: Use realpath to canonicalize paths perf tools: Unexport some methods unused outside strbuf.c perf probe: No need to use formatting strbuf method perf help: Use asprintf instead of adhoc equivalents perf tools: Remove unused perf_pathdup, xstrdup functions perf tools: Do not include stringify.h from the kernel sources tools include: Copy linux/stringify.h from the kernel tools lib traceevent: Remove redundant CPU output perf tools: Remove needless 'extern' from function prototypes perf tools: Simplify die() mechanism perf tools: Remove unused DIE_IF macro perf script: Remove lots of unused arguments perf thread: Rename perf_event__preprocess_sample_addr to thread__resolve perf machine: Rename perf_event__preprocess_sample to machine__resolve perf tools: Add cpumode to struct perf_sample perf tests: Forward the perf_sample in the dwarf unwind test perf tools: Remove misplaced __maybe_unused perf list: Fix documentation of :ppp perf bench numa: Fix assertion for nodes bitfield ...
2016-03-24Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - fix hotplug bugs - fix irq live lock - fix various topology handling bugs - fix APIC ACK ordering - fix PV iopl handling - fix speling - fix/tweak memcpy_mcsafe() return value - fix fbcon bug - remove stray prototypes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/msr: Remove unused native_read_tscp() x86/apic: Remove declaration of unused hw_nmi_is_cpu_stuck x86/oprofile/nmi: Add missing hotplug FROZEN handling x86/hpet: Use proper mask to modify hotplug action x86/apic/uv: Fix the hotplug notifier x86/apb/timer: Use proper mask to modify hotplug action x86/topology: Use total_cpus not nr_cpu_ids for logical packages x86/topology: Fix Intel HT disable x86/topology: Fix logical package mapping x86/irq: Cure live lock in fixup_irqs() x86/tsc: Prevent NULL pointer deref in calibrate_delay_is_known() x86/apic: Fix suspicious RCU usage in smp_trace_call_function_interrupt() x86/iopl: Fix iopl capability check on Xen PV x86/iopl/64: Properly context-switch IOPL on Xen PV selftests/x86: Add an iopl test x86/mm, x86/mce: Fix return type/value for memcpy_mcsafe() x86/video: Don't assume all FB devices are PCI devices arch/x86/irq: Purge useless handler declarations from hw_irq.h x86: Fix misspellings in comments
2016-03-21x86/cpufeature, perf/x86: Add AMD Accumulated Power Mechanism feature flagHuang Rui
AMD CPU family 15h model 0x60 introduces a mechanism for measuring accumulated power. It is used to report the processor power consumption and support for it is indicated by CPUID Fn8000_0007_EDX[12]. Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andreas Herrmann <herrmann.der.user@googlemail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Kristen Carlson Accardi <kristen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Wan Zongshun <Vincent.Wan@amd.com> Cc: spg_linux_kernel@amd.com Link: http://lkml.kernel.org/r/1452739808-11871-4-git-send-email-ray.huang@amd.com [ Resolved conflict and moved the synthetic CPUID slot to 19. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21perf/x86/amd: Move nodes_per_socket into bsp_init_amd()Huang Rui
nodes_per_socket is static and it needn't be initialized many times during every CPU core init. So move its initialization into bsp_init_amd(). Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andreas Herrmann <herrmann.der.user@googlemail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: spg_linux_kernel@amd.com Link: http://lkml.kernel.org/r/1452739808-11871-2-git-send-email-ray.huang@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-20Merge branch 'core-objtool-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull 'objtool' stack frame validation from Ingo Molnar: "This tree adds a new kernel build-time object file validation feature (ONFIG_STACK_VALIDATION=y): kernel stack frame correctness validation. It was written by and is maintained by Josh Poimboeuf. The motivation: there's a category of hard to find kernel bugs, most of them in assembly code (but also occasionally in C code), that degrades the quality of kernel stack dumps/backtraces. These bugs are hard to detect at the source code level. Such bugs result in incorrect/incomplete backtraces most of time - but can also in some rare cases result in crashes or other undefined behavior. The build time correctness checking is done via the new 'objtool' user-space utility that was written for this purpose and which is hosted in the kernel repository in tools/objtool/. The tool's (very simple) UI and source code design is shaped after Git and perf and shares quite a bit of infrastructure with tools/perf (which tooling infrastructure sharing effort got merged via perf and is already upstream). Objtool follows the well-known kernel coding style. Objtool does not try to check .c or .S files, it instead analyzes the resulting .o generated machine code from first principles: it decodes the instruction stream and interprets it. (Right now objtool supports the x86-64 architecture.) From tools/objtool/Documentation/stack-validation.txt: "The kernel CONFIG_STACK_VALIDATION option enables a host tool named objtool which runs at compile time. It has a "check" subcommand which analyzes every .o file and ensures the validity of its stack metadata. It enforces a set of rules on asm code and C inline assembly code so that stack traces can be reliable. Currently it only checks frame pointer usage, but there are plans to add CFI validation for C files and CFI generation for asm files. For each function, it recursively follows all possible code paths and validates the correct frame pointer state at each instruction. It also follows code paths involving special sections, like .altinstructions, __jump_table, and __ex_table, which can add alternative execution paths to a given instruction (or set of instructions). Similarly, it knows how to follow switch statements, for which gcc sometimes uses jump tables." When this new kernel option is enabled (it's disabled by default), the tool, if it finds any suspicious assembly code pattern, outputs warnings in compiler warning format: warning: objtool: rtlwifi_rate_mapping()+0x2e7: frame pointer state mismatch warning: objtool: cik_tiling_mode_table_init()+0x6ce: call without frame pointer save/setup warning: objtool:__schedule()+0x3c0: duplicate frame pointer save warning: objtool:__schedule()+0x3fd: sibling call from callable instruction with changed frame pointer ... so that scripts that pick up compiler warnings will notice them. All known warnings triggered by the tool are fixed by the tree, most of the commits in fact prepare the kernel to be warning-free. Most of them are bugfixes or cleanups that stand on their own, but there are also some annotations of 'special' stack frames for justified cases such entries to JIT-ed code (BPF) or really special boot time code. There are two other long-term motivations behind this tool as well: - To improve the quality and reliability of kernel stack frames, so that they can be used for optimized live patching. - To create independent infrastructure to check the correctness of CFI stack frames at build time. CFI debuginfo is notoriously unreliable and we cannot use it in the kernel as-is without extra checking done both on the kernel side and on the build side. The quality of kernel stack frames matters to debuggability as well, so IMO we can merge this without having to consider the live patching or CFI debuginfo angle" * 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) objtool: Only print one warning per function objtool: Add several performance improvements tools: Copy hashtable.h into tools directory objtool: Fix false positive warnings for functions with multiple switch statements objtool: Rename some variables and functions objtool: Remove superflous INIT_LIST_HEAD objtool: Add helper macros for traversing instructions objtool: Fix false positive warnings related to sibling calls objtool: Compile with debugging symbols objtool: Detect infinite recursion objtool: Prevent infinite recursion in noreturn detection objtool: Detect and warn if libelf is missing and don't break the build tools: Support relative directory path for 'O=' objtool: Support CROSS_COMPILE x86/asm/decoder: Use explicitly signed chars objtool: Enable stack metadata validation on 64-bit x86 objtool: Add CONFIG_STACK_VALIDATION option objtool: Add tool to perform compile-time stack metadata validation x86/kprobes: Mark kretprobe_trampoline() stack frame as non-standard sched: Always inline context_switch() ...
2016-02-24x86: Fix misspellings in commentsAdam Buchbinder
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: trivial@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-24x86/amd: Set ELF function type for vide()Josh Poimboeuf
vide() is a callable function, but is missing the ELF function type, which confuses tools like stacktool. Properly annotate it to be a callable function. The generated code is unchanged. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chris J Arges <chris.j.arges@canonical.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Pedro Alves <palves@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/a324095f5c9390ff39b15b4562ea1bbeda1a8282.1453405861.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-03x86/cpu: Convert printk(KERN_<LEVEL> ...) to pr_<level>(...)Chen Yucong
- Use the more current logging style pr_<level>(...) instead of the old printk(KERN_<LEVEL> ...). - Convert pr_warning() to pr_warn(). Signed-off-by: Chen Yucong <slaoub@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454384702-21707-1-git-send-email-slaoub@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-14x86/cpu/amd: Remove an unneeded condition in srat_detect_node()Dan Carpenter
Originally we calculated ht_nodeid as "ht_nodeid = apicid - boot_cpu_id;" so presumably it could be negative. But after commit: 01aaea1afbcd ('x86: introduce initial apicid') we use c->initial_apicid which is an unsigned short and thus always >= 0. It causes a static checker warning to test for impossible conditions so let's remove it. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Borislav Petkov <bp@suse.de> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Huang Rui <ray.huang@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Link: http://lkml.kernel.org/r/20160113123940.GE19993@mwanda Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-11Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Ingo Molnar: "The main changes in this cycle were: - Improved CPU ID handling code and related enhancements (Borislav Petkov) - RDRAND fix (Len Brown)" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Replace RDRAND forced-reseed with simple sanity check x86/MSR: Chop off lower 32-bit value x86/cpu: Fix MSR value truncation issue x86/cpu/amd, kvm: Satisfy guest kernel reads of IC_CFG MSR kvm: Add accessors for guest CPU's family, model, stepping x86/cpu: Unify CPU family, model, stepping calculation
2015-12-19x86/cpufeature: Remove unused and seldomly used cpu_has_xx macrosBorislav Petkov
Those are stupid and code should use static_cpu_has_safe() or boot_cpu_has() instead. Kill the least used and unused ones. The remaining ones need more careful inspection before a conversion can happen. On the TODO. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1449481182-27541-4-git-send-email-bp@alien8.de Cc: David Sterba <dsterba@suse.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Matt Mackall <mpm@selenic.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-24x86/cpu/amd, kvm: Satisfy guest kernel reads of IC_CFG MSRBorislav Petkov
The kernel accesses IC_CFG MSR (0xc0011021) on AMD because it checks whether the way access filter is enabled on some F15h models, and, if so, disables it. kvm doesn't handle that MSR access and complains about it, which can get really noisy in dmesg when one starts kvm guests all the time for testing. And it is useless anyway - guest kernel shouldn't be doing such changes anyway so tell it that that filter is disabled. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1448273546-2567-4-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-07x86/AMD: Fix last level cache topology for AMD Fam17h systemsAravind Gopalakrishnan
On AMD Fam17h systems, the last level cache is not resident in the northbridge. Therefore, we cannot assign cpu_llc_id to the same value as Node ID as we have been doing until now. We should rather look at the ApicID bits of the core to provide us the last level cache ID info. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jacob Shin <jacob.w.shin@gmail.com> Link: http://lkml.kernel.org/r/1446582899-9378-1-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-22x86/asm/delay: Introduce an MWAITX-based delay with a configurable timerHuang Rui
MWAITX can enable a timer and a corresponding timer value specified in SW P0 clocks. The SW P0 frequency is the same as TSC. The timer provides an upper bound on how long the instruction waits before exiting. This way, a delay function in the kernel can leverage that MWAITX timer of MWAITX. When a CPU core executes MWAITX, it will be quiesced in a waiting phase, diminishing its power consumption. This way, we can save power in comparison to our default TSC-based delays. A simple test shows that: $ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc $ sleep 10000s $ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc Results: * TSC-based default delay: 485115 uWatts average power * MWAITX-based delay: 252738 uWatts average power Thus, that's about 240 milliWatts less power consumption. The test method relies on the support of AMD CPU accumulated power algorithm in fam15h_power for which patches are forthcoming. Suggested-by: Andy Lutomirski <luto@amacapital.net> Suggested-by: Borislav Petkov <bp@suse.de> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Huang Rui <ray.huang@amd.com> [ Fix delay truncation. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Andreas Herrmann <herrmann.der.user@gmail.com> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Li <tony.li@amd.com> Link: http://lkml.kernel.org/r/1438744732-1459-3-git-send-email-ray.huang@amd.com Link: http://lkml.kernel.org/r/1439201994-28067-4-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06x86/asm/tsc: Rename native_read_tsc() to rdtsc()Andy Lutomirski
Now that there is no paravirt TSC, the "native" is inappropriate. The function does RDTSC, so give it the obvious name: rdtsc(). Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/fd43e16281991f096c1e4d21574d9e1402c62d39.1434501121.git.luto@kernel.org [ Ported it to v4.2-rc1. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06x86/asm/tsc, x86/cpu/amd: Use the full 64-bit TSC to detect the 2.6.2 bugAndy Lutomirski
This code is timing 100k indirect calls, so the added overhead of counting the number of cycles elapsed as a 64-bit number should be insignificant. Drop the optimization of using a 32-bit count. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/d58f339a9c0dd8352b50d2f7a216f67ec2844f20.1434501121.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-22Merge branch 'x86-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Ingo Molnar: "There were so many changes in the x86/asm, x86/apic and x86/mm topics in this cycle that the topical separation of -tip broke down somewhat - so the result is a more traditional architecture pull request, collected into the 'x86/core' topic. The topics were still maintained separately as far as possible, so bisectability and conceptual separation should still be pretty good - but there were a handful of merge points to avoid excessive dependencies (and conflicts) that would have been poorly tested in the end. The next cycle will hopefully be much more quiet (or at least will have fewer dependencies). The main changes in this cycle were: * x86/apic changes, with related IRQ core changes: (Jiang Liu, Thomas Gleixner) - This is the second and most intrusive part of changes to the x86 interrupt handling - full conversion to hierarchical interrupt domains: [IOAPIC domain] ----- | [MSI domain] --------[Remapping domain] ----- [ Vector domain ] | (optional) | [HPET MSI domain] ----- | | [DMAR domain] ----------------------------- | [Legacy domain] ----------------------------- This now reflects the actual hardware and allowed us to distangle the domain specific code from the underlying parent domain, which can be optional in the case of interrupt remapping. It's a clear separation of functionality and removes quite some duct tape constructs which plugged the remap code between ioapic/msi/hpet and the vector management. - Intel IOMMU IRQ remapping enhancements, to allow direct interrupt injection into guests (Feng Wu) * x86/asm changes: - Tons of cleanups and small speedups, micro-optimizations. This is in preparation to move a good chunk of the low level entry code from assembly to C code (Denys Vlasenko, Andy Lutomirski, Brian Gerst) - Moved all system entry related code to a new home under arch/x86/entry/ (Ingo Molnar) - Removal of the fragile and ugly CFI dwarf debuginfo annotations. Conversion to C will reintroduce many of them - but meanwhile they are only getting in the way, and the upstream kernel does not rely on them (Ingo Molnar) - NOP handling refinements. (Borislav Petkov) * x86/mm changes: - Big PAT and MTRR rework: making the code more robust and preparing to phase out exposing direct MTRR interfaces to drivers - in favor of using PAT driven interfaces (Toshi Kani, Luis R Rodriguez, Borislav Petkov) - New ioremap_wt()/set_memory_wt() interfaces to support Write-Through cached memory mappings. This is especially important for good performance on NVDIMM hardware (Toshi Kani) * x86/ras changes: - Add support for deferred errors on AMD (Aravind Gopalakrishnan) This is an important RAS feature which adds hardware support for poisoned data. That means roughly that the hardware marks data which it has detected as corrupted but wasn't able to correct, as poisoned data and raises an APIC interrupt to signal that in the form of a deferred error. It is the OS's responsibility then to take proper recovery action and thus prolonge system lifetime as far as possible. - Add support for Intel "Local MCE"s: upcoming CPUs will support CPU-local MCE interrupts, as opposed to the traditional system- wide broadcasted MCE interrupts (Ashok Raj) - Misc cleanups (Borislav Petkov) * x86/platform changes: - Intel Atom SoC updates ... and lots of other cleanups, fixlets and other changes - see the shortlog and the Git log for details" * 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (222 commits) x86/hpet: Use proper hpet device number for MSI allocation x86/hpet: Check for irq==0 when allocating hpet MSI interrupts x86/mm/pat, drivers/infiniband/ipath: Use arch_phys_wc_add() and require PAT disabled x86/mm/pat, drivers/media/ivtv: Use arch_phys_wc_add() and require PAT disabled x86/platform/intel/baytrail: Add comments about why we disabled HPET on Baytrail genirq: Prevent crash in irq_move_irq() genirq: Enhance irq_data_to_desc() to support hierarchy irqdomain iommu, x86: Properly handle posted interrupts for IOMMU hotplug iommu, x86: Provide irq_remapping_cap() interface iommu, x86: Setup Posted-Interrupts capability for Intel iommu iommu, x86: Add cap_pi_support() to detect VT-d PI capability iommu, x86: Avoid migrating VT-d posted interrupts iommu, x86: Save the mode (posted or remapped) of an IRTE iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip iommu: dmar: Provide helper to copy shared irte fields iommu: dmar: Extend struct irte for VT-d Posted-Interrupts iommu: Add new member capability to struct irq_remap_ops x86/asm/entry/64: Disentangle error_entry/exit gsbase/ebx/usermode code x86/asm/entry/32: Shorten __audit_syscall_entry() args preparation x86/asm/entry/32: Explain reloading of registers after __audit_syscall_entry() ...
2015-06-18x86/cpu/amd: Give access to the number of nodes in a physical packageAravind Gopalakrishnan
Stash the number of nodes in a physical processor package locally and add an accessor to be called by interested parties. The first user is the MCE injection module which uses it to find the node base core in a package for injecting a certain type of errors. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> [ Rewrote the commit message, merged it with the accessor patch and unified naming. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mchehab@osg.samsung.com Link: http://lkml.kernel.org/r/1433868317-18417-2-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07x86: Kill CONFIG_X86_HTBorislav Petkov
In talking to Aravind recently about making certain AMD topology attributes available to the MCE injection module, it seemed like that CONFIG_X86_HT thing is more or less superfluous. It is def_bool y, depends on SMP and gets enabled in the majority of .configs - distro and otherwise - out there. So let's kill it and make code behind it depend directly on SMP. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Walter <dwalter@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433436928-31903-18-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-06x86/cpu/amd: Set X86_FEATURE_EXTD_APICID for future processorsAravind Gopalakrishnan
Decision to use a 4-bit mask or 8-bit mask in default_get_apic_id() is controlled by setting capability bit X86_FEATURE_EXTD_APICID. Currently, we detect extended APIC ID support by accessing Link Transaction Control register D18F0x68 in PCI config space. But, not even that is needed as we can safely postulate that future AMD processors will support 8-bit APIC IDs and we can simply set that feature bit on them, without the PCI access. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@linux.intel.com Cc: hecmargi@upv.es Cc: mgorman@suse.de Link: http://lkml.kernel.org/r/1430148351-9013-1-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-26x86_64, asm: Work around AMD SYSRET SS descriptor attribute issueAndy Lutomirski
AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with SS == 0 results in an invalid usermode state in which SS is apparently equal to __USER_DS but causes #SS if used. Work around the issue by setting SS to __KERNEL_DS __switch_to, thus ensuring that SYSRET never happens with SS set to NULL. This was exposed by a recent vDSO cleanup. Fixes: e7d6eefaaa44 x86/vdso32/syscall.S: Do not load __USER32_DS to %ss Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Brian Gerst <brgerst@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-13Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "The main changes in this cycle were: - reduce the x86/32 PAE per task PGD allocation overhead from 4K to 0.032k (Fenghua Yu) - early_ioremap/memunmap() usage cleanups (Juergen Gross) - gbpages support cleanups (Luis R Rodriguez) - improve AMD Bulldozer (family 0x15) ASLR I$ aliasing workaround to increase randomization by 3 bits (per bootup) (Hector Marco-Gisbert) - misc fixlets" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Improve AMD Bulldozer ASLR workaround x86/mm/pat: Initialize __cachemode2pte_tbl[] and __pte2cachemode_tbl[] in a bit more readable fashion init.h: Clean up the __setup()/early_param() macros x86/mm: Simplify probe_page_size_mask() x86/mm: Further simplify 1 GB kernel linear mappings handling x86/mm: Use early_param_on_off() for direct_gbpages init.h: Add early_param_on_off() x86/mm: Simplify enabling direct_gbpages x86/mm: Use IS_ENABLED() for direct_gbpages x86/mm: Unexport set_memory_ro() and set_memory_rw() x86/mm, efi: Use early_ioremap() in arch/x86/platform/efi/efi-bgrt.c x86/mm: Use early_memunmap() instead of early_iounmap() x86/mm/pat: Ensure different messages in STRICT_DEVMEM and PAT cases x86/mm: Reduce PAE-mode per task pgd allocation overhead from 4K to 32 bytes
2015-03-31x86/mm: Improve AMD Bulldozer ASLR workaroundHector Marco-Gisbert
The ASLR implementation needs to special-case AMD F15h processors by clearing out bits [14:12] of the virtual address in order to avoid I$ cross invalidations and thus performance penalty for certain workloads. For details, see: dfb09f9b7ab0 ("x86, amd: Avoid cache aliasing penalties on AMD family 15h") This special case reduces the mmapped file's entropy by 3 bits. The following output is the run on an AMD Opteron 62xx class CPU processor under x86_64 Linux 4.0.0: $ for i in `seq 1 10`; do cat /proc/self/maps | grep "r-xp.*libc" ; done b7588000-b7736000 r-xp 00000000 00:01 4924 /lib/i386-linux-gnu/libc.so.6 b7570000-b771e000 r-xp 00000000 00:01 4924 /lib/i386-linux-gnu/libc.so.6 b75d0000-b777e000 r-xp 00000000 00:01 4924 /lib/i386-linux-gnu/libc.so.6 b75b0000-b775e000 r-xp 00000000 00:01 4924 /lib/i386-linux-gnu/libc.so.6 b7578000-b7726000 r-xp 00000000 00:01 4924 /lib/i386-linux-gnu/libc.so.6 ... Bits [12:14] are always 0, i.e. the address always ends in 0x8000 or 0x0000. 32-bit systems, as in the example above, are especially sensitive to this issue because 32-bit randomness for VA space is 8 bits (see mmap_rnd()). With the Bulldozer special case, this diminishes to only 32 different slots of mmap virtual addresses. This patch randomizes per boot the three affected bits rather than setting them to zero. Since all the shared pages have the same value at bits [12..14], there is no cache aliasing problems. This value gets generated during system boot and it is thus not known to a potential remote attacker. Therefore, the impact from the Bulldozer workaround gets diminished and ASLR randomness increased. More details at: http://hmarco.org/bugs/AMD-Bulldozer-linux-ASLR-weakness-reducing-mmaped-files-by-eight.html Original white paper by AMD dealing with the issue: http://developer.amd.com/wordpress/media/2012/10/SharedL1InstructionCacheonAMD15hCPU.pdf Mentored-by: Ismael Ripoll <iripoll@disca.upv.es> Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan-Simon <dl9pf@gmx.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-fsdevel@vger.kernel.org Link: http://lkml.kernel.org/r/1427456301-3764-1-git-send-email-hecmargi@upv.es Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-23x86/asm: Cleanup prefetch primitivesBorislav Petkov
This is based on a patch originally by hpa. With the current improvements to the alternatives, we can simply use %P1 as a mem8 operand constraint and rely on the toolchain to generate the proper instruction sizes. For example, on 32-bit, where we use an empty old instruction we get: apply_alternatives: feat: 6*32+8, old: (c104648b, len: 4), repl: (c195566c, len: 4) c104648b: alt_insn: 90 90 90 90 c195566c: rpl_insn: 0f 0d 4b 5c ... apply_alternatives: feat: 6*32+8, old: (c18e09b4, len: 3), repl: (c1955948, len: 3) c18e09b4: alt_insn: 90 90 90 c1955948: rpl_insn: 0f 0d 08 ... apply_alternatives: feat: 6*32+8, old: (c1190cf9, len: 7), repl: (c1955a79, len: 7) c1190cf9: alt_insn: 90 90 90 90 90 90 90 c1955a79: rpl_insn: 0f 0d 0d a0 d4 85 c1 all with the proper padding done depending on the size of the replacement instruction the compiler generates. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@linux.intel.com>
2015-01-28Merge branch 'perf/hw_breakpoints' into perf/coreIngo Molnar
The new hw_breakpoint bits are now ready for v3.20, merge them into the main branch, to avoid conflicts. Conflicts: tools/perf/Documentation/perf-record.txt Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-03perf/x86/amd: AMD support for bp_len > HW_BREAKPOINT_LEN_8Jacob Shin
Implement hardware breakpoint address mask for AMD Family 16h and above processors. CPUID feature bit indicates hardware support for DRn_ADDR_MASK MSRs. These masks further qualify DRn/DR7 hardware breakpoint addresses to allow matching of larger addresses ranges. Valuable advice and pseudo code from Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jacob Shin <jacob.w.shin@gmail.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: xiakaixu <xiakaixu@huawei.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-11-11x86, CPU, AMD: Move K8 TLB flush filter workaround to K8 codeBorislav Petkov
This belongs with the rest of the code in init_amd_k8() which gets executed on family 0xf. Signed-off-by: Borislav Petkov <bp@suse.de>
2014-09-24x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-onlyPaolo Bonzini
On x86_64, kernel text mappings are mapped read-only with CONFIG_DEBUG_RODATA. In that case, KVM will fail to patch VMCALL instructions to VMMCALL as required on AMD processors. The failure mode is currently a divide-by-zero exception, which obviously is a KVM bug that has to be fixed. However, picking the right instruction between VMCALL and VMMCALL will be faster and will help if you cannot upgrade the hypervisor. Reported-by: Chris Webb <chris@arachsys.com> Tested-by: Chris Webb <chris@arachsys.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-04Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "The main change in this cycle is the rework of the TLB range flushing code, to simplify, fix and consolidate the code. By Dave Hansen" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Set TLB flush tunable to sane value (33) x86/mm: New tunable for single vs full TLB flush x86/mm: Add tracepoints for TLB flushes x86/mm: Unify remote INVLPG code x86/mm: Fix missed global TLB flush stat x86/mm: Rip out complicated, out-of-date, buggy TLB flushing x86/mm: Clean up the TLB flushing code x86/smep: Be more informative when signalling an SMEP fault
2014-07-31x86/mm: Rip out complicated, out-of-date, buggy TLB flushingDave Hansen
I think the flush_tlb_mm_range() code that tries to tune the flush sizes based on the CPU needs to get ripped out for several reasons: 1. It is obviously buggy. It uses mm->total_vm to judge the task's footprint in the TLB. It should certainly be using some measure of RSS, *NOT* ->total_vm since only resident memory can populate the TLB. 2. Haswell, and several other CPUs are missing from the intel_tlb_flushall_shift_set() function. Thus, it has been demonstrated to bitrot quickly in practice. 3. It is plain wrong in my vm: [ 0.037444] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0 [ 0.037444] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0 [ 0.037444] tlb_flushall_shift: 6 Which leads to it to never use invlpg. 4. The assumptions about TLB refill costs are wrong: http://lkml.kernel.org/r/1337782555-8088-3-git-send-email-alex.shi@intel.com (more on this in later patches) 5. I can not reproduce the original data: https://lkml.org/lkml/2012/5/17/59 I believe the sample times were too short. Running the benchmark in a loop yields times that vary quite a bit. Note that this leaves us with a static ceiling of 1 page. This is a conservative, dumb setting, and will be revised in a later patch. This also removes the code which attempts to predict whether we are flushing data or instructions. We expect instruction flushes to be relatively rare and not worth tuning for explicitly. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20140731154055.ABC88E89@viggo.jf.intel.com Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-07-14x86, amd: Cleanup init_amdBorislav Petkov
Distribute family-specific code to corresponding functions. Also, * move the direct mapping splitting around the TSEG SMM area to bsp_init_amd(). * kill ancient comment about what we should do for K5. * merge amd_k7_smp_check() into its only caller init_amd_k7 and drop cpu_has_mp macro. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1403609105-8332-3-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-06-18x86, cpufeature: Convert more "features" to bugsBorislav Petkov
X86_FEATURE_FXSAVE_LEAK, X86_FEATURE_11AP and X86_FEATURE_CLFLUSH_MONITOR are not really features but synthetic bits we use for applying different bug workarounds. Call them what they really are, and make sure they get the proper cross-CPU behavior (OR rather than AND). Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1403042783-23278-1-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>