summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2015-09-18x86, kvm: fix kvm's usage of kernel_fpu_begin/end()Suresh Siddha
commit b1a74bf8212367be2b1d6685c11a84e056eaaaf1 upstream. Preemption is disabled between kernel_fpu_begin/end() and as such it is not a good idea to use these routines in kvm_load/put_guest_fpu() which can be very far apart. kvm_load/put_guest_fpu() routines are already called with preemption disabled and KVM already uses the preempt notifier to save the guest fpu state using kvm_put_guest_fpu(). So introduce __kernel_fpu_begin/end() routines which don't touch preemption and use them instead of kernel_fpu_begin/end() for KVM's use model of saving/restoring guest FPU state. Also with this change (and with eagerFPU model), fix the host cr0.TS vm-exit state in the case of VMX. For eagerFPU case, host cr0.TS is always clear. So no need to worry about it. For the traditional lazyFPU restore case, change the cr0.TS bit for the host state during vm-exit to be always clear and cr0.TS bit is set in the __vmx_load_host_state() when the FPU (guest FPU or the host task's FPU) state is not active. This ensures that the host/guest FPU state is properly saved, restored during context-switch and with interrupts (using irq_fpu_usable()) not stomping on the active FPU state. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1348164109.26695.338.camel@sbsiddha-desk.sc.intel.com Cc: Avi Kivity <avi@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Zefan Li <lizefan@huawei.com> [xr: Backported to 3.4: Adjust context] Signed-off-by: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-09-18x86, kvm: use kernel_fpu_begin/end() in kvm_load/put_guest_fpu()Suresh Siddha
commit 9c1c3fac53378c9782c18f80107965578d7b7167 upstream. kvm's guest fpu save/restore should be wrapped around kernel_fpu_begin/end(). This will avoid for example taking a DNA in kvm_load_guest_fpu() when it tries to load the fpu immediately after doing unlazy_fpu() on the host side. More importantly this will prevent the host process fpu from being corrupted. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1345842782-24175-4-git-send-email-suresh.b.siddha@intel.com Cc: Avi Kivity <avi@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-09-18MIPS: Fix enabling of DEBUG_STACKOVERFLOWJames Hogan
commit 5f35b9cd553fd64415b563497d05a563c988dbd6 upstream. Commit 334c86c494b9 ("MIPS: IRQ: Add stackoverflow detection") added kernel stack overflow detection, however it only enabled it conditional upon the preprocessor definition DEBUG_STACKOVERFLOW, which is never actually defined. The Kconfig option is called DEBUG_STACKOVERFLOW, which manifests to the preprocessor as CONFIG_DEBUG_STACKOVERFLOW, so switch it to using that definition instead. Fixes: 334c86c494b9 ("MIPS: IRQ: Add stackoverflow detection") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Adam Jiang <jiang.adam@gmail.com> Cc: linux-mips@linux-mips.org Patchwork: http://patchwork.linux-mips.org/patch/10531/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-09-18x86: bpf_jit: fix compilation of large bpf programsAlexei Starovoitov
commit 3f7352bf21f8fd7ba3e2fcef9488756f188e12be upstream. x86 has variable length encoding. x86 JIT compiler is trying to pick the shortest encoding for given bpf instruction. While doing so the jump targets are changing, so JIT is doing multiple passes over the program. Typical program needs 3 passes. Some very short programs converge with 2 passes. Large programs may need 4 or 5. But specially crafted bpf programs may hit the pass limit and if the program converges on the last iteration the JIT compiler will be producing an image full of 'int 3' insns. Fix this corner case by doing final iteration over bpf program. Fixes: 0a14842f5a3c ("net: filter: Just In Time compiler for x86-64") Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Tested-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-09-18x86/mce: Fix MCE severity messagesBorislav Petkov
commit 17fea54bf0ab34fa09a06bbde2f58ed7bbdf9299 upstream. Derek noticed that a critical MCE gets reported with the wrong error type description: [Hardware Error]: CPU 34: Machine Check Exception: 5 Bank 9: f200003f000100b0 [Hardware Error]: RIP !INEXACT! 10:<ffffffff812e14c1> {intel_idle+0xb1/0x170} [Hardware Error]: TSC 49587b8e321cb [Hardware Error]: PROCESSOR 0:306e4 TIME 1431561296 SOCKET 1 APIC 29 [Hardware Error]: Some CPUs didn't answer in synchronization [Hardware Error]: Machine check: Invalid ^^^^^^^ The last line with 'Invalid' should have printed the high level MCE error type description we get from mce_severity, i.e. something like: [Hardware Error]: Machine check: Action required: data load error in a user process this happens due to the fact that mce_no_way_out() iterates over all MCA banks and possibly overwrites the @msg argument which is used in the panic printing later. Change behavior to take the message of only and the (last) critical MCE it detects. Reported-by: Derek <denc716@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1431936437-25286-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-09-18ARM: dts: imx27: only map 4 Kbyte for fec registersPhilippe Reynes
commit a29ef819f3f34f89a1b9b6a939b4c1cdfe1e85ce upstream. According to the imx27 documentation, fec has a 4 Kbyte memory space map. Moreover, the actual 16 Kbyte mapping overlaps the SCC (Security Controller) memory register space. So, we reduce the memory register space to 4 Kbyte. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Fixes: 9f0749e3eb88 ("ARM i.MX27: Add devicetree support") Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-09-18crypto: s390/ghash - Fix incorrect ghash icv buffer handling.Harald Freudenberger
commit a1cae34e23b1293eccbcc8ee9b39298039c3952a upstream. Multitheaded tests showed that the icv buffer in the current ghash implementation is not handled correctly. A move of this working ghash buffer value to the descriptor context fixed this. Code is tested and verified with an multithreaded application via af_alg interface. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Gerald Schaefer <geraldsc@linux.vnet.ibm.com> Reported-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [lizf: Backported to 3.4: - adjust context - drop the change to memcpy()] Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-09-18powerpc: Align TOC to 256 bytesAnton Blanchard
commit 5e95235ccd5442d4a4fe11ec4eb99ba1b7959368 upstream. Recent toolchains force the TOC to be 256 byte aligned. We need to enforce this alignment in our linker script, otherwise pointers to our TOC variables (__toc_start, __prom_init_toc_start) could be incorrect. If they are bad, we die a few hundred instructions into boot. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-09-18KVM: MMU: fix CR4.SMEP=1, CR0.WP=0 with shadow pagesPaolo Bonzini
commit 898761158be7682082955e3efa4ad24725305fc7 upstream. smep_andnot_wp is initialized in kvm_init_shadow_mmu and shadow pages should not be reused for different values of it. Thus, it has to be added to the mask in kvm_mmu_pte_write. Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-09-18ARM: net: delegate filter to kernel interpreter when imm_offset() return ↵Nicolas Schichan
value can't fit into 12bits. commit 0b59d8806a31bb0267b3a461e8fef20c727bdbf6 upstream. The ARM JIT code emits "ldr rX, [pc, #offset]" to access the literal pool. #offset maximum value is 4095 and if the generated code is too large, the #offset value can overflow and not point to the expected slot in the literal pool. Additionally, when overflow occurs, bits of the overflow can end up changing the destination register of the ldr instruction. Fix that by detecting the overflow in imm_offset() and setting a flag that is checked for each BPF instructions converted in build_body(). As of now it can only be detected in the second pass. As a result the second build_body() call can now fail, so add the corresponding cleanup code in that case. Using multiple literal pools in the JITed code is going to require lots of intrusive changes to the JIT code (which would better be done as a feature instead of fix), just delegating to the kernel BPF interpreter in that case is a more straight forward, minimal fix and easy to backport. Fixes: ddecdfcea0ae ("ARM: 7259/3: net: JIT compiler for packet filters") Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-09-18ARM: pxa: lubbock: use new pxa_cplds driverRobert Jarzmik
commit fc9e38c0f4d38bfc68b405cf48365d65f7b6319e upstream. As the interrupt handling was transferred to the pxa_cplds driver, make the switch in lubbock platform code. Fixes: 157d2644cb0c ("ARM: pxa: change gpio to platform device") Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-09-18ARM: pxa: mainstone: use new pxa_cplds driverRobert Jarzmik
commit 277688639f98a9e34a6f109f9cd6129f92e718c1 upstream. As the interrupt handling was transferred to the pxa_cplds driver, make the switch in mainstone platform code. Fixes: 157d2644cb0c ("ARM: pxa: change gpio to platform device") Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-09-18ARM: pxa: pxa_cplds: add lubbock and mainstone IORobert Jarzmik
commit aa8d6b73ea33c2167c543663ab66039ec94d58f1 upstream. Historically, this support was in arch/arm/mach-pxa/lubbock.c and arch/arm/mach-pxa/mainstone.c. When gpio-pxa was moved to drivers/pxa, it became a driver, and its initialization and probing happened at postcore initcall. The lubbock code used to install the chained lubbock interrupt handler at init_irq() time. The consequence of the gpio-pxa change is that the installed chained irq handler lubbock_irq_handler() was overwritten in pxa_gpio_probe(_dt)(), removing : - the handler - the falling edge detection setting of GPIO0, which revealed the interrupt request from the lubbock IO board. As a fix, move the gpio0 chained handler setup to a place where we have the guarantee that pxa_gpio_probe() was called before, so that lubbock handler becomes the true IRQ chained handler of GPIO0, demuxing the lubbock IO board interrupts. This patch moves all that handling to a mfd driver. It's only purpose for the time being is the interrupt handling, but in the future it should encompass all the motherboard CPLDs handling : - leds - switches - hexleds The same logic applies to mainstone board. Fixes: 157d2644cb0c ("ARM: pxa: change gpio to platform device") Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-09-18powerpc/pseries: Correct cpu affinity for dlpar added cpusNathan Fontenot
commit f32393c943e297b8ae180c8f83d81a156c7d0412 upstream. The incorrect ordering of operations during cpu dlpar add results in invalid affinity for the cpu being added. The ibm,associativity property in the device tree is populated with all zeroes for the added cpu which results in invalid affinity mappings and all cpus appear to belong to node 0. This occurs because rtas configure-connector is called prior to making the rtas set-indicator calls. Phyp does not assign affinity information for a cpu until the rtas set-indicator calls are made to set the isolation and allocation state. Correct the order of operations to make the rtas set-indicator calls (done in dlpar_acquire_drc) before calling rtas configure-connector. Fixes: 1a8061c46c46 ("powerpc/pseries: Add kernel based CPU DLPAR handling") Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> [lizf: Backported to 3.4: - adjust context - jump to the "out" lable instead of returning -EINVAL] Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-09-18powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTHAnton Blanchard
commit 9a5cbce421a283e6aea3c4007f141735bf9da8c3 upstream. We cap 32bit userspace backtraces to PERF_MAX_STACK_DEPTH (currently 127), but we forgot to do the same for 64bit backtraces. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-09-18s390/hibernate: fix save and restore of kernel text sectionHeiko Carstens
commit d74419495633493c9cd3f2bbeb7f3529d0edded6 upstream. Sebastian reported a crash caused by a jump label mismatch after resume. This happens because we do not save the kernel text section during suspend and therefore also do not restore it during resume, but use the kernel image that restores the old system. This means that after a suspend/resume cycle we lost all modifications done to the kernel text section. The reason for this is the pfn_is_nosave() function, which incorrectly returns that read-only pages don't need to be saved. This is incorrect since we mark the kernel text section read-only. We still need to make sure to not save and restore pages contained within NSS and DCSS segment. To fix this add an extra case for the kernel text section and only save those pages if they are not contained within an NSS segment. Fixes the following crash (and the above bugs as well): Jump label code mismatch at netif_receive_skb_internal+0x28/0xd0 Found: c0 04 00 00 00 00 Expected: c0 f4 00 00 00 11 New: c0 04 00 00 00 00 Kernel panic - not syncing: Corrupted kernel text CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.19.0-01975-gb1b096e70f23 #4 Call Trace: [<0000000000113972>] show_stack+0x72/0xf0 [<000000000081f15e>] dump_stack+0x6e/0x90 [<000000000081c4e8>] panic+0x108/0x2b0 [<000000000081be64>] jump_label_bug.isra.2+0x104/0x108 [<0000000000112176>] __jump_label_transform+0x9e/0xd0 [<00000000001121e6>] __sm_arch_jump_label_transform+0x3e/0x50 [<00000000001d1136>] multi_cpu_stop+0x12e/0x170 [<00000000001d1472>] cpu_stopper_thread+0xb2/0x168 [<000000000015d2ac>] smpboot_thread_fn+0x134/0x1b0 [<0000000000158baa>] kthread+0x10a/0x110 [<0000000000824a86>] kernel_thread_starter+0x6/0xc Reported-and-tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [lizf: Backported to 3.4: add necessary includes] Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-09-18powerpc: Fix missing L2 cache size in /sys/devices/system/cpuDave Olson
commit f7e9e358362557c3aa2c1ec47490f29fe880a09e upstream. This problem appears to have been introduced in 2.6.29 by commit 93197a36a9c1 "Rewrite sysfs processor cache info code". This caused lscpu to error out on at least e500v2 devices, eg: error: cannot open /sys/devices/system/cpu/cpu0/cache/index2/size: No such file or directory Some embedded powerpc systems use cache-size in DTS for the unified L2 cache size, not d-cache-size, so we need to allow for both DTS names. Added a new CACHE_TYPE_UNIFIED_D cache_type_info structure to handle this. Fixes: 93197a36a9c1 ("powerpc: Rewrite sysfs processor cache info code") Signed-off-by: Dave Olson <olson@cumulusnetworks.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-09-18MIPS: Hibernate: flush TLB entries earlierHuacai Chen
commit a843d00d038b11267279e3b5388222320f9ddc1d upstream. We found that TLB mismatch not only happens after kernel resume, but also happens during snapshot restore. So move it to the beginning of swsusp_arch_suspend(). Signed-off-by: Huacai Chen <chenhc@lemote.com> Cc: Steven J. Hill <Steven.Hill@imgtec.com> Cc: linux-mips@linux-mips.org Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com> Patchwork: https://patchwork.linux-mips.org/patch/9621/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-09-18x86/iommu: Fix header comments regarding standard and _FINISH macrosAravind Gopalakrishnan
commit b44915927ca88084a7292e4ddd4cf91036f365e1 upstream. The comment line regarding IOMMU_INIT and IOMMU_INIT_FINISH macros is incorrect: "The standard vs the _FINISH differs in that the _FINISH variant will continue detecting other IOMMUs in the call list..." It should be "..the *standard* variant will continue detecting..." Fix that. Also, make it readable while at it. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: konrad.wilk@oracle.com Fixes: 6e9636693373 ("x86, iommu: Update header comments with appropriate naming") Link: http://lkml.kernel.org/r/1428508017-5316-1-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-09-18ARM: 8320/1: fix integer overflow in ELF_ET_DYN_BASEAndrey Ryabinin
commit 8defb3367fcd19d1af64c07792aade0747b54e0f upstream. Usually ELF_ET_DYN_BASE is 2/3 of TASK_SIZE. With 3G/1G user/kernel split this is not so, because 2*TASK_SIZE overflows 32 bits, so the actual value of ELF_ET_DYN_BASE is: (2 * TASK_SIZE / 3) = 0x2a000000 When ASLR is disabled PIE binaries will load at ELF_ET_DYN_BASE address. On 32bit platforms AddressSanitzer uses addresses [0x20000000 - 0x40000000] for shadow memory [1]. So ASan doesn't work for PIE binaries when ASLR disabled as it fails to map shadow memory. Also after Kees's 'split ET_DYN ASLR from mmap ASLR' patchset PIE binaries has a high chance of loading somewhere in between [0x2a000000 - 0x40000000] even if ASLR enabled. This makes ASan with PIE absolutely incompatible. Fix overflow by dividing TASK_SIZE prior to multiplying. After this patch ELF_ET_DYN_BASE equals to (for CONFIG_VMSPLIT_3G=y): (TASK_SIZE / 3 * 2) = 0x7f555554 [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerAlgorithm#Mapping Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Reported-by: Maria Guseva <m.guseva@samsung.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-09-18C6x: time: Ensure consistency in __initNishanth Menon
commit f4831605f2dacd12730fe73961c77253cc2ea425 upstream. time_init invokes timer64_init (which is __init annotation) since all of these are invoked at init time, lets maintain consistency by ensuring time_init is marked appropriately as well. This fixes the following warning with CONFIG_DEBUG_SECTION_MISMATCH=y WARNING: vmlinux.o(.text+0x3bfc): Section mismatch in reference from the function time_init() to the function .init.text:timer64_init() The function time_init() references the function __init timer64_init(). This is often because time_init lacks a __init annotation or the annotation of timer64_init is wrong. Fixes: 546a39546c64 ("C6X: time management") Signed-off-by: Nishanth Menon <nm@ti.com> Signed-off-by: Mark Salter <msalter@redhat.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-09-18KVM: s390: Zero out current VMDB of STSI before including level3 data.Ekaterina Tumanova
commit b75f4c9afac2604feb971441116c07a24ecca1ec upstream. s390 documentation requires words 0 and 10-15 to be reserved and stored as zeros. As we fill out all other fields, we can memset the full structure. Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19PCI: Convert pcibios_resource_to_bus() to take a pci_bus, not a pci_devYinghai Lu
commit fc2798502f860b18f3c7121e4dc659d3d9d28d74 upstream. These interfaces: pcibios_resource_to_bus(struct pci_dev *dev, *bus_region, *resource) pcibios_bus_to_resource(struct pci_dev *dev, *resource, *bus_region) took a pci_dev, but they really depend only on the pci_bus. And we want to use them in resource allocation paths where we have the bus but not a device, so this patch converts them to take the pci_bus instead of the pci_dev: pcibios_resource_to_bus(struct pci_bus *bus, *bus_region, *resource) pcibios_bus_to_resource(struct pci_bus *bus, *resource, *bus_region) In fact, with standard PCI-PCI bridges, they only depend on the host bridge, because that's the only place address translation occurs, but we aren't going that far yet. [bhelgaas: changelog] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Dirk Behme <dirk.behme@gmail.com> [lizf: Backported to 3.4: - make changes to pci_host_bridge() instead of find_pci_root_bus() - adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selectedKonrad Rzeszutek Wilk
commit a6dfa128ce5c414ab46b1d690f7a1b8decb8526d upstream. A huge amount of NIC drivers use the DMA API, however if compiled under 32-bit an very important part of the DMA API can be ommitted leading to the drivers not working at all (especially if used with 'swiotlb=force iommu=soft'). As Prashant Sreedharan explains it: "the driver [tg3] uses DEFINE_DMA_UNMAP_ADDR(), dma_unmap_addr_set() to keep a copy of the dma "mapping" and dma_unmap_addr() to get the "mapping" value. On most of the platforms this is a no-op, but ... with "iommu=soft and swiotlb=force" this house keeping is required, ... otherwise we pass 0 while calling pci_unmap_/pci_dma_sync_ instead of the DMA address." As such enable this even when using 32-bit kernels. Reported-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Prashant Sreedharan <prashant@broadcom.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Chan <mchan@broadcom.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: cascardo@linux.vnet.ibm.com Cc: david.vrabel@citrix.com Cc: sanjeevb@broadcom.com Cc: siva.kallam@broadcom.com Cc: vyasevich@gmail.com Cc: xen-devel@lists.xensource.com Link: http://lkml.kernel.org/r/20150417190448.GA9462@l.oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Ben Hutchings <ben@decadent.org.uk> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19powerpc/mpc85xx: Add ranges to etsec2 nodesScott Wood
commit bb344ca5b90df62b1a3b7a35c6a9d00b306a170d upstream. Commit 746c9e9f92dd "of/base: Fix PowerPC address parsing hack" limited the applicability of the workaround whereby a missing ranges is treated as an empty ranges. This workaround was hiding a bug in the etsec2 device tree nodes, which have children with reg, but did not have ranges. Signed-off-by: Scott Wood <scottwood@freescale.com> Reported-by: Alexander Graf <agraf@suse.de> Cc: Scott Wood <scottwood@freescale.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19x86/reboot: Fix a warning message triggered by stop_other_cpus()Feng Tang
commit 55c844a4dd16a4d1fdc0cf2a283ec631a02ec448 upstream. When rebooting our 24 CPU Westmere servers with 3.4-rc6, we always see this warning msg: Restarting system. machine restart ------------[ cut here ]------------ WARNING: at arch/x86/kernel/smp.c:125 native_smp_send_reschedule+0x74/0xa7() Hardware name: X8DTN Modules linked in: igb [last unloaded: scsi_wait_scan] Pid: 1, comm: systemd-shutdow Not tainted 3.4.0-rc6+ #22 Call Trace: <IRQ> [<ffffffff8102a41f>] warn_slowpath_common+0x7e/0x96 [<ffffffff8102a44c>] warn_slowpath_null+0x15/0x17 [<ffffffff81018cf7>] native_smp_send_reschedule+0x74/0xa7 [<ffffffff810561c1>] trigger_load_balance+0x279/0x2a6 [<ffffffff81050112>] scheduler_tick+0xe0/0xe9 [<ffffffff81036768>] update_process_times+0x60/0x70 [<ffffffff81062f2f>] tick_sched_timer+0x68/0x92 [<ffffffff81046e33>] __run_hrtimer+0xb3/0x13c [<ffffffff81062ec7>] ? tick_nohz_handler+0xd0/0xd0 [<ffffffff810474f2>] hrtimer_interrupt+0xdb/0x198 [<ffffffff81019a35>] smp_apic_timer_interrupt+0x81/0x94 [<ffffffff81655187>] apic_timer_interrupt+0x67/0x70 <EOI> [<ffffffff8101a3c4>] ? default_send_IPI_mask_allbutself_phys+0xb4/0xc4 [<ffffffff8101c680>] physflat_send_IPI_allbutself+0x12/0x14 [<ffffffff81018db4>] native_nmi_stop_other_cpus+0x8a/0xd6 [<ffffffff810188ba>] native_machine_shutdown+0x50/0x67 [<ffffffff81018926>] machine_shutdown+0xa/0xc [<ffffffff8101897e>] native_machine_restart+0x20/0x32 [<ffffffff810189b0>] machine_restart+0xa/0xc [<ffffffff8103b196>] kernel_restart+0x47/0x4c [<ffffffff8103b2e6>] sys_reboot+0x13e/0x17c [<ffffffff8164e436>] ? _raw_spin_unlock_bh+0x10/0x12 [<ffffffff810fcac9>] ? bdi_queue_work+0xcf/0xd8 [<ffffffff810fe82f>] ? __bdi_start_writeback+0xae/0xb7 [<ffffffff810e0d64>] ? iterate_supers+0xa3/0xb7 [<ffffffff816547a2>] system_call_fastpath+0x16/0x1b ---[ end trace 320af5cb1cb60c5b ]--- The root cause seems to be the default_send_IPI_mask_allbutself_phys() takes quite some time (I measured it could be several ms) to complete sending NMIs to all the other 23 CPUs, and for HZ=250/1000 system, the time is long enough for a timer interrupt to happen, which will in turn trigger to kick load balance to a stopped CPU and cause this warning in native_smp_send_reschedule(). So disabling the local irq before stop_other_cpu() can fix this problem (tested 25 times reboot ok), and it is fine as there should be nobody caring the timer interrupt in such reboot stage. The latest 3.4 kernel slightly changes this behavior by sending REBOOT_VECTOR first and only send NMI_VECTOR if the REBOOT_VCTOR fails, and this patch is still needed to prevent the problem. Signed-off-by: Feng Tang <feng.tang@intel.com> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120530231541.4c13433a@feng-i7 Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Vinson Lee <vlee@twopensource.com> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19x86/reboot: Add ASRock Q1900DC-ITX mainboard reboot quirkStefan Lippers-Hollmann
commit 80313b3078fcd2ca51970880d90757f05879a193 upstream. The ASRock Q1900DC-ITX mainboard (Baytrail-D) hangs randomly in both BIOS and UEFI mode while rebooting unless reboot=pci is used. Add a quirk to reboot via the pci method. The problem is very intermittent and hard to debug, it might succeed rebooting just fine 40 times in a row - but fails half a dozen times the next day. It seems to be slightly less common in BIOS CSM mode than native UEFI (with the CSM disabled), but it does happen in either mode. Since I've started testing this patch in late january, rebooting has been 100% reliable. Most of the time it already hangs during POST, but occasionally it might even make it through the bootloader and the kernel might even start booting, but then hangs before the mode switch. The same symptoms occur with grub-efi, gummiboot and grub-pc, just as well as (at least) kernel 3.16-3.19 and 4.0-rc6 (I haven't tried older kernels than 3.16). Upgrading to the most current mainboard firmware of the ASRock Q1900DC-ITX, version 1.20, does not improve the situation. ( Searching the web seems to suggest that other Bay Trail-D mainboards might be affected as well. ) -- Signed-off-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> Cc: Matt Fleming <matt.fleming@intel.com> Link: http://lkml.kernel.org/r/20150330224427.0fb58e42@mir Signed-off-by: Ingo Molnar <mingo@kernel.org> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19crypto: aesni - fix memory usage in GCM decryptionStephan Mueller
commit ccfe8c3f7e52ae83155cb038753f4c75b774ca8a upstream. The kernel crypto API logic requires the caller to provide the length of (ciphertext || authentication tag) as cryptlen for the AEAD decryption operation. Thus, the cipher implementation must calculate the size of the plaintext output itself and cannot simply use cryptlen. The RFC4106 GCM decryption operation tries to overwrite cryptlen memory in req->dst. As the destination buffer for decryption only needs to hold the plaintext memory but cryptlen references the input buffer holding (ciphertext || authentication tag), the assumption of the destination buffer length in RFC4106 GCM operation leads to a too large size. This patch simply uses the already calculated plaintext size. In addition, this patch fixes the offset calculation of the AAD buffer pointer: as mentioned before, cryptlen already includes the size of the tag. Thus, the tag does not need to be added. With the addition, the AAD will be written beyond the already allocated buffer. Note, this fixes a kernel crash that can be triggered from user space via AF_ALG(aead) -- simply use the libkcapi test application from [1] and update it to use rfc4106-gcm-aes. Using [1], the changes were tested using CAVS vectors to demonstrate that the crypto operation still delivers the right results. [1] http://www.chronox.de/libkcapi.html CC: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19x86/asm/entry/32: Fix user_mode() misusesAndy Lutomirski
commit 394838c96013ba414a24ffe7a2a593a9154daadf upstream. The one in do_debug() is probably harmless, but better safe than sorry. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/d67deaa9df5458363623001f252d1aee3215d014.1425948056.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org> [lizf: Backported to 3.4: drop the change to do_bounds()] Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19x86/vdso: Fix the build on GCC5Jiri Slaby
commit e893286918d2cde3a94850d8f7101cd1039e0c62 upstream. On gcc5 the kernel does not link: ld: .eh_frame_hdr table[4] FDE at 0000000000000648 overlaps table[5] FDE at 0000000000000670. Because prior GCC versions always emitted NOPs on ALIGN directives, but gcc5 started omitting them. .LSTARTFDEDLSI1 says: /* HACK: The dwarf2 unwind routines will subtract 1 from the return address to get an address in the middle of the presumed call instruction. Since we didn't get here via a call, we need to include the nop before the real start to make up for it. */ .long .LSTART_sigreturn-1-. /* PC-relative start address */ But commit 69d0627a7f6e ("x86 vDSO: reorder vdso32 code") from 2.6.25 replaced .org __kernel_vsyscall+32,0x90 by ALIGN right before __kernel_sigreturn. Of course, ALIGN need not generate any NOP in there. Esp. gcc5 collapses vclock_gettime.o and int80.o together with no generated NOPs as "ALIGN". So fix this by adding to that point at least a single NOP and make the function ALIGN possibly with more NOPs then. Kudos for reporting and diagnosing should go to Richard. Reported-by: Richard Biener <rguenther@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1425543211-12542-1-git-send-email-jslaby@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19ARM: at91: pm: fix at91rm9200 standbyAlexandre Belloni
commit 84e871660bebfddb9a62ebd6f19d02536e782f0a upstream. at91rm9200 standby and suspend to ram has been broken since 00482a4078f4. It is wrongly using AT91_BASE_SYS which is a physical address and actually doesn't correspond to any register on at91rm9200. Use the correct at91_ramc_base[0] instead. Fixes: 00482a4078f4 (ARM: at91: implement the standby function for pm/cpuidle) Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimizationAndy Lutomirski
commit 956421fbb74c3a6261903f3836c0740187cf038b upstream. 'ret_from_fork' checks TIF_IA32 to determine whether 'pt_regs' and the related state make sense for 'ret_from_sys_call'. This is entirely the wrong check. TS_COMPAT would make a little more sense, but there's really no point in keeping this optimization at all. This fixes a return to the wrong user CS if we came from int 0x80 in a 64-bit task. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/4710be56d76ef994ddf59087aad98c000fbab9a4.1424989793.git.luto@amacapital.net [ Backported from tip:x86/asm. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19KVM: emulate: fix CMPXCHG8B on 32-bit hostsPaolo Bonzini
commit 4ff6f8e61eb7f96d3ca535c6d240f863ccd6fb7d upstream. This has been broken for a long time: it broke first in 2.6.35, then was almost fixed in 2.6.36 but this one-liner slipped through the cracks. The bug shows up as an infinite loop in Windows 7 (and newer) boot on 32-bit hosts without EPT. Windows uses CMPXCHG8B to write to page tables, which causes a page fault if running without EPT; the emulator is then called from kvm_mmu_page_fault. The loop then happens if the higher 4 bytes are not 0; the common case for this is that the NX bit (bit 63) is 1. Fixes: 6550e1f165f384f3a46b60a1be9aba4bc3c2adad Fixes: 16518d5ada690643453eb0aef3cc7841d3623c2d Reported-by: Erik Rull <erik.rull@rdsoftware.de> Tested-by: Erik Rull <erik.rull@rdsoftware.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19ARM: 8284/1: sa1100: clear RCSR_SMR on resumeDmitry Eremin-Solenikov
commit e461894dc2ce7778ccde1c3483c9b15a85a7fc5f upstream. StrongARM core uses RCSR SMR bit to tell to bootloader that it was reset by entering the sleep mode. After we have resumed, there is little point in having that bit enabled. Moreover, if this bit is set before reboot, the bootloader can become confused. Thus clear the SMR bit on resume just before clearing the scratchpad (resume address) register. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19KVM: s390: base hrtimer on a monotonic clockDavid Hildenbrand
commit 0ac96caf0f9381088c673a16d910b1d329670edf upstream. The hrtimer that handles the wait with enabled timer interrupts should not be disturbed by changes of the host time. This patch changes our hrtimer to be based on a monotonic clock. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19axonram: Fix bug in direct_accessMatthew Wilcox
commit 91117a20245b59f70b563523edbf998a62fc6383 upstream. The 'pfn' returned by axonram was completely bogus, and has been since 2008. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19hx4700: regulator: declare full constraintsMartin Vajnar
commit a52d209336f8fc7483a8c7f4a8a7d2a8e1692a6c upstream. Since the removal of CONFIG_REGULATOR_DUMMY option, the touchscreen stopped working. This patch enables the "replacement" for REGULATOR_DUMMY and allows the touchscreen to work even though there is no regulator for "vcc". Signed-off-by: Martin Vajnar <martin.vajnar@gmail.com> Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19ARM: pxa: add regulator_has_full_constraints to spitz board fileDmitry Eremin-Solenikov
commit baad2dc49c5d970ea881d92981a1b76c94a7b7a1 upstream. Add regulator_has_full_constraints() call to spitz board file to let regulator core know that we do not have any additional regulators left. This lets it substitute unprovided regulators with dummy ones. This fixes the following warnings that can be seen on spitz if regulators are enabled: ads7846 spi2.0: unable to get regulator: -517 spi spi2.0: Driver ads7846 requests probe deferral Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19ARM: pxa: add regulator_has_full_constraints to poodle board fileDmitry Eremin-Solenikov
commit 9bc78f32c2e430aebf6def965b316aa95e37a20c upstream. Add regulator_has_full_constraints() call to poodle board file to let regulator core know that we do not have any additional regulators left. This lets it substitute unprovided regulators with dummy ones. This fixes the following warnings that can be seen on poodle if regulators are enabled: ads7846 spi1.0: unable to get regulator: -517 spi spi1.0: Driver ads7846 requests probe deferral wm8731 0-001b: Failed to get supply 'AVDD': -517 wm8731 0-001b: Failed to request supplies: -517 wm8731 0-001b: ASoC: failed to probe component -517 Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19ARM: pxa: add regulator_has_full_constraints to corgi board fileDmitry Eremin-Solenikov
commit 271e80176aae4e5b481f4bb92df9768c6075bbca upstream. Add regulator_has_full_constraints() call to corgi board file to let regulator core know that we do not have any additional regulators left. This lets it substitute unprovided regulators with dummy ones. This fixes the following warnings that can be seen on corgi if regulators are enabled: ads7846 spi1.0: unable to get regulator: -517 spi spi1.0: Driver ads7846 requests probe deferral wm8731 0-001b: Failed to get supply 'AVDD': -517 wm8731 0-001b: Failed to request supplies: -517 wm8731 0-001b: ASoC: failed to probe component -517 corgi-audio corgi-audio: ASoC: failed to instantiate card -517 Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-04-14x86, cpu, amd: Add workaround for family 16h, erratum 793Borislav Petkov
commit 3b56496865f9f7d9bcb2f93b44c63f274f08e3b6 upstream. This adds the workaround for erratum 793 as a precaution in case not every BIOS implements it. This addresses CVE-2013-6885. Erratum text: [Revision Guide for AMD Family 16h Models 00h-0Fh Processors, document 51810 Rev. 3.04 November 2013] 793 Specific Combination of Writes to Write Combined Memory Types and Locked Instructions May Cause Core Hang Description Under a highly specific and detailed set of internal timing conditions, a locked instruction may trigger a timing sequence whereby the write to a write combined memory type is not flushed, causing the locked instruction to stall indefinitely. Potential Effect on System Processor core hang. Suggested Workaround BIOS should set MSR C001_1020[15] = 1b. Fix Planned No fix planned [ hpa: updated description, fixed typo in MSR name ] Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20140114230711.GS29865@pd.tnic Tested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> [bwh: Backported to 3.2: - Adjust filename - Venkatesh Srinivas pointed out we should use {rd,wr}msrl_safe() to avoid crashing on KVM. This was fixed upstream by commit 8f86a7373a1c ("x86, AMD: Convert to the new bit access MSR accessors") but that's too much trouble to backport. Here we must use {rd,wr}msrl_amd_safe().] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Moritz Muehlenhoff <jmm@debian.org> Cc: Venkatesh Srinivas <venkateshs@google.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-04-14MIPS: Fix kernel lockup or crash after CPU offline/onlineHemmo Nieminen
commit c7754e75100ed5e3068ac5085747f2bfc386c8d6 upstream. As printk() invocation can cause e.g. a TLB miss, printk() cannot be called before the exception handlers have been properly initialized. This can happen e.g. when netconsole has been loaded as a kernel module and the TLB table has been cleared when a CPU was offline. Call cpu_report() in start_secondary() only after the exception handlers have been initialized to fix this. Without the patch the kernel will randomly either lockup or crash after a CPU is onlined and the console driver is a module. Signed-off-by: Hemmo Nieminen <hemmo.nieminen@iki.fi> Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/8953/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-04-14MIPS: IRQ: Fix disable_irq on CPU IRQsFelix Fietkau
commit a3e6c1eff54878506b2dddcc202df9cc8180facb upstream. If the irq_chip does not define .irq_disable, any call to disable_irq will defer disabling the IRQ until it fires while marked as disabled. This assumes that the handler function checks for this condition, which handle_percpu_irq does not. In this case, calling disable_irq leads to an IRQ storm, if the interrupt fires while disabled. This optimization is only useful when disabling the IRQ is slow, which is not true for the MIPS CPU IRQ. Disable this optimization by implementing .irq_disable and .irq_enable Signed-off-by: Felix Fietkau <nbd@openwrt.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/8949/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-04-14move d_rcu from overlapping d_child to overlapping d_aliasAl Viro
commit 946e51f2bf37f1656916eb75bd0742ba33983c28 upstream. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [bwh: Backported to 3.2: - Apply name changes in all the different places we use d_alias and d_child - Move the WARN_ON() in __d_free() to d_free() as we don't have dentry_free()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> [lizf: Backported to 3.4: - adjust context - need one more name change in debugfs]
2015-04-14x86, mm/ASLR: Fix stack randomization on 64-bit systemsHector Marco-Gisbert
commit 4e7c22d447bb6d7e37bfe39ff658486ae78e8d77 upstream. The issue is that the stack for processes is not properly randomized on 64 bit architectures due to an integer overflow. The affected function is randomize_stack_top() in file "fs/binfmt_elf.c": static unsigned long randomize_stack_top(unsigned long stack_top) { unsigned int random_variable = 0; if ((current->flags & PF_RANDOMIZE) && !(current->personality & ADDR_NO_RANDOMIZE)) { random_variable = get_random_int() & STACK_RND_MASK; random_variable <<= PAGE_SHIFT; } return PAGE_ALIGN(stack_top) + random_variable; return PAGE_ALIGN(stack_top) - random_variable; } Note that, it declares the "random_variable" variable as "unsigned int". Since the result of the shifting operation between STACK_RND_MASK (which is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64): random_variable <<= PAGE_SHIFT; then the two leftmost bits are dropped when storing the result in the "random_variable". This variable shall be at least 34 bits long to hold the (22+12) result. These two dropped bits have an impact on the entropy of process stack. Concretely, the total stack entropy is reduced by four: from 2^28 to 2^30 (One fourth of expected entropy). This patch restores back the entropy by correcting the types involved in the operations in the functions randomize_stack_top() and stack_maxrandom_size(). The successful fix can be tested with: $ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done 7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0 [stack] 7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0 [stack] 7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0 [stack] 7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0 [stack] ... Once corrected, the leading bytes should be between 7ffc and 7fff, rather than always being 7fff. Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es> Signed-off-by: Ismael Ripoll <iripoll@upv.es> [ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Fixes: CVE-2015-1593 Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.net Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-04-14vm: add VM_FAULT_SIGSEGV handling supportLinus Torvalds
commit 33692f27597fcab536d7cbbcc8f52905133e4aa7 upstream. The core VM already knows about VM_FAULT_SIGBUS, but cannot return a "you should SIGSEGV" error, because the SIGSEGV case was generally handled by the caller - usually the architecture fault handler. That results in lots of duplication - all the architecture fault handlers end up doing very similar "look up vma, check permissions, do retries etc" - but it generally works. However, there are cases where the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV. In particular, when accessing the stack guard page, libsigsegv expects a SIGSEGV. And it usually got one, because the stack growth is handled by that duplicated architecture fault handler. However, when the generic VM layer started propagating the error return from the stack expansion in commit fee7e49d4514 ("mm: propagate error from stack expansion even for guard page"), that now exposed the existing VM_FAULT_SIGBUS result to user space. And user space really expected SIGSEGV, not SIGBUS. To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those duplicate architecture fault handlers about it. They all already have the code to handle SIGSEGV, so it's about just tying that new return value to the existing code, but it's all a bit annoying. This is the mindless minimal patch to do this. A more extensive patch would be to try to gather up the mostly shared fault handling logic into one generic helper routine, and long-term we really should do that cleanup. Just from this patch, you can generally see that most architectures just copied (directly or indirectly) the old x86 way of doing things, but in the meantime that original x86 model has been improved to hold the VM semaphore for shorter times etc and to handle VM_FAULT_RETRY and other "newer" things, so it would be a good idea to bring all those improvements to the generic case and teach other architectures about them too. Reported-and-tested-by: Takashi Iwai <tiwai@suse.de> Tested-by: Jan Engelhardt <jengelh@inai.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots" Cc: linux-arch@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: - Adjust filenames, context - Drop arc, metag, nios2 and lustre changes - For sh, patch both 32-bit and 64-bit implementations to use goto bad_area - For s390, pass int_code and trans_exc_code as arguments to do_no_context() and do_sigsegv()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> [lizf: Backported to 3.4: - adjust context in arch/power/mm/fault.c - apply the original change in upstream commit for s390] Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-04-14powerpc/xmon: Fix another endiannes issue in RTAS call from xmonLaurent Dufour
commit e6eb2eba494d6f99e69ca3c3748cd37a2544ab38 upstream. The commit 3b8a3c010969 ("powerpc/pseries: Fix endiannes issue in RTAS call from xmon") was fixing an endianness issue in the call made from xmon to RTAS. However, as Michael Ellerman noticed, this fix was not complete, the token value was not byte swapped. This lead to call an unexpected and most of the time unexisting RTAS function, which is silently ignored by RTAS. This fix addresses this hole. Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-04-14x86, hyperv: Mark the Hyper-V clocksource as being continuousK. Y. Srinivasan
commit 32c6590d126836a062b3140ed52d898507987017 upstream. The Hyper-V clocksource is continuous; mark it accordingly. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Acked-by: jasowang@redhat.com Cc: gregkh@linuxfoundation.org Cc: devel@linuxdriverproject.org Cc: olaf@aepfle.de Cc: apw@canonical.com Link: http://lkml.kernel.org/r/1421108762-3331-1-git-send-email-kys@microsoft.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-04-14ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracingSteven Rostedt (Red Hat)
commit 237d28db036e411f22c03cfd5b0f6dc2aa9bf3bc upstream. If the function graph tracer traces a jprobe callback, the system will crash. This can easily be demonstrated by compiling the jprobe sample module that is in the kernel tree, loading it and running the function graph tracer. # modprobe jprobe_example.ko # echo function_graph > /sys/kernel/debug/tracing/current_tracer # ls The first two commands end up in a nice crash after the first fork. (do_fork has a jprobe attached to it, so "ls" just triggers that fork) The problem is caused by the jprobe_return() that all jprobe callbacks must end with. The way jprobes works is that the function a jprobe is attached to has a breakpoint placed at the start of it (or it uses ftrace if fentry is supported). The breakpoint handler (or ftrace callback) will copy the stack frame and change the ip address to return to the jprobe handler instead of the function. The jprobe handler must end with jprobe_return() which swaps the stack and does an int3 (breakpoint). This breakpoint handler will then put back the saved stack frame, simulate the instruction at the beginning of the function it added a breakpoint to, and then continue on. For function tracing to work, it hijakes the return address from the stack frame, and replaces it with a hook function that will trace the end of the call. This hook function will restore the return address of the function call. If the function tracer traces the jprobe handler, the hook function for that handler will not be called, and its saved return address will be used for the next function. This will result in a kernel crash. To solve this, pause function tracing before the jprobe handler is called and unpause it before it returns back to the function it probed. Some other updates: Used a variable "saved_sp" to hold kcb->jprobe_saved_sp. This makes the code look a bit cleaner and easier to understand (various tries to fix this bug required this change). Note, if fentry is being used, jprobes will change the ip address before the function graph tracer runs and it will not be able to trace the function that the jprobe is probing. Link: http://lkml.kernel.org/r/20150114154329.552437962@goodmis.org Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> [lizf: Backported to 3.4: - adjust filename - adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-04-14x86, um: actually mark system call tables readonlyDaniel Borkmann
commit b485342bd79af363c77ef1a421c4a0aef2de9812 upstream. Commit a074335a370e ("x86, um: Mark system call tables readonly") was supposed to mark the sys_call_table in UML as RO by adding the const, but it doesn't have the desired effect as it's nevertheless being placed into the data section since __cacheline_aligned enforces sys_call_table being placed into .data..cacheline_aligned instead. We need to use the ____cacheline_aligned version instead to fix this issue. Before: $ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table" U sys_writev 0000000000000000 D sys_call_table 0000000000000000 D syscall_table_size After: $ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table" U sys_writev 0000000000000000 R sys_call_table 0000000000000000 D syscall_table_size Fixes: a074335a370e ("x86, um: Mark system call tables readonly") Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Zefan Li <lizefan@huawei.com>