summaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)Author
2017-07-05swiotlb: ensure that page-sized mappings are page-alignedNikita Yushchenko
[ Upstream commit 602d9858f07c72eab64f5f00e2fae55f9902cfbe ] Some drivers do depend on page mappings to be page aligned. Swiotlb already enforces such alignment for mappings greater than page, extend that to page-sized mappings as well. Without this fix, nvme hits BUG() in nvme_setup_prps(), because that routine assumes page-aligned mappings. Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29lib/cmdline.c: fix get_options() overflow while parsing rangesIlya Matveychikov
commit a91e0f680bcd9e10c253ae8b62462a38bd48f09f upstream. When using get_options() it's possible to specify a range of numbers, like 1-100500. The problem is that it doesn't track array size while calling internally to get_range() which iterates over the range and fills the memory with numbers. Link: http://lkml.kernel.org/r/2613C75C-B04D-4BFF-82A6-12F97BA0F620@gmail.com Signed-off-by: Ilya V. Matveychikov <matvejchikov@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24crypto: Work around deallocated stack frame reference gcc bug on sparc.David Miller
commit d41519a69b35b10af7fda867fb9100df24fdf403 upstream. On sparc, if we have an alloca() like situation, as is the case with SHASH_DESC_ON_STACK(), we can end up referencing deallocated stack memory. The result can be that the value is clobbered if a trap or interrupt arrives at just the right instruction. It only occurs if the function ends returning a value from that alloca() area and that value can be placed into the return value register using a single instruction. For example, in lib/libcrc32c.c:crc32c() we end up with a return sequence like: return %i7+8 lduw [%o5+16], %o0 ! MEM[(u32 *)__shash_desc.1_10 + 16B], %o5 holds the base of the on-stack area allocated for the shash descriptor. But the return released the stack frame and the register window. So if an intererupt arrives between 'return' and 'lduw', then the value read at %o5+16 can be corrupted. Add a data compiler barrier to work around this problem. This is exactly what the gcc fix will end up doing as well, and it absolutely should not change the code generated for other cpus (unless gcc on them has the same bug :-) With crucial insight from Eric Sandeen. Reported-by: Anatoly Pugachev <matorola@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14usercopy: Adjust tests to deal with SMAP/PANKees Cook
commit f5f893c57e37ca730808cb2eee3820abd05e7507 upstream. Under SMAP/PAN/etc, we cannot write directly to userspace memory, so this rearranges the test bytes to get written through copy_to_user(). Additionally drops the bad copy_from_user() test that would trigger a memcpy() against userspace on failure. [arnd: the test module was added in 3.14, and this backported patch should apply cleanly on all version from 3.14 to 4.10. The original patch was in 4.11 on top of a context change I saw the bug triggered with kselftest on a 4.4.y stable kernel] Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14bpf, arm64: fix jit branch offset related to ldimm64Daniel Borkmann
[ Upstream commit ddc665a4bb4b728b4e6ecec8db1b64efa9184b9c ] When the instruction right before the branch destination is a 64 bit load immediate, we currently calculate the wrong jump offset in the ctx->offset[] array as we only account one instruction slot for the 64 bit load immediate although it uses two BPF instructions. Fix it up by setting the offset into the right slot after we incremented the index. Before (ldimm64 test 1): [...] 00000020: 52800007 mov w7, #0x0 // #0 00000024: d2800060 mov x0, #0x3 // #3 00000028: d2800041 mov x1, #0x2 // #2 0000002c: eb01001f cmp x0, x1 00000030: 54ffff82 b.cs 0x00000020 00000034: d29fffe7 mov x7, #0xffff // #65535 00000038: f2bfffe7 movk x7, #0xffff, lsl #16 0000003c: f2dfffe7 movk x7, #0xffff, lsl #32 00000040: f2ffffe7 movk x7, #0xffff, lsl #48 00000044: d29dddc7 mov x7, #0xeeee // #61166 00000048: f2bdddc7 movk x7, #0xeeee, lsl #16 0000004c: f2ddddc7 movk x7, #0xeeee, lsl #32 00000050: f2fdddc7 movk x7, #0xeeee, lsl #48 [...] After (ldimm64 test 1): [...] 00000020: 52800007 mov w7, #0x0 // #0 00000024: d2800060 mov x0, #0x3 // #3 00000028: d2800041 mov x1, #0x2 // #2 0000002c: eb01001f cmp x0, x1 00000030: 540000a2 b.cs 0x00000044 00000034: d29fffe7 mov x7, #0xffff // #65535 00000038: f2bfffe7 movk x7, #0xffff, lsl #16 0000003c: f2dfffe7 movk x7, #0xffff, lsl #32 00000040: f2ffffe7 movk x7, #0xffff, lsl #48 00000044: d29dddc7 mov x7, #0xeeee // #61166 00000048: f2bdddc7 movk x7, #0xeeee, lsl #16 0000004c: f2ddddc7 movk x7, #0xeeee, lsl #32 00000050: f2fdddc7 movk x7, #0xeeee, lsl #48 [...] Also, add a couple of test cases to make sure JITs pass this test. Tested on Cavium ThunderX ARMv8. The added test cases all pass after the fix. Fixes: 8eee539ddea0 ("arm64: bpf: fix out-of-bounds read in bpf2a64_offset()") Reported-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Xi Wang <xi.wang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-21new privimitive: iov_iter_revert()Al Viro
commit 27c0e3748e41ca79171ffa3e97415a20af6facd0 upstream. opposite to iov_iter_advance(); the caller is responsible for never using it to move back past the initial position. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-08lib/syscall: Clear return values when no stackKees Cook
commit 854fbd6e5f60fe99e8e3a569865409fca378f143 upstream. Commit: aa1f1a639621 ("lib/syscall: Pin the task stack in collect_syscall()") ... added logic to handle a process stack not existing, but left sp and pc uninitialized, which can be later reported via /proc/$pid/syscall for zombie processes, potentially exposing kernel memory to userspace. Zombie /proc/$pid/syscall before: -1 0xffffffff9a060100 0xffff92f42d6ad900 Zombie /proc/$pid/syscall after: -1 0x0 0x0 Reported-by: Robert Święcki <robert@swiecki.net> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: aa1f1a639621 ("lib/syscall: Pin the task stack in collect_syscall()") Link: http://lkml.kernel.org/r/20170323224616.GA92694@beast Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26swiotlb: Add swiotlb=noforce debug optionGeert Uytterhoeven
commit fff5d99225107f5f13fe4a9805adc2a1c4b5fb00 upstream. On architectures like arm64, swiotlb is tied intimately to the core architecture DMA support. In addition, ZONE_DMA cannot be disabled. To aid debugging and catch devices not supporting DMA to memory outside the 32-bit address space, add a kernel command line option "swiotlb=noforce", which disables the use of bounce buffers. If specified, trying to map memory that cannot be used with DMA will fail, and a rate-limited warning will be printed. Note that io_tlb_nslabs is set to 1, which is the minimal supported value. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26swiotlb: Convert swiotlb_force from int to enumGeert Uytterhoeven
commit ae7871be189cb41184f1e05742b4a99e2c59774d upstream. Convert the flag swiotlb_force from an int to an enum, to prepare for the advent of more possible values. Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-19fix a fencepost error in pipe_advance()Al Viro
commit b9dc6f65bc5e232d1c05fe34b5daadc7e8bbf1fb upstream. The logics in pipe_advance() used to release all buffers past the new position failed in cases when the number of buffers to release was equal to pipe->buffers. If that happened, none of them had been released, leaving pipe full. Worse, it was trivial to trigger and we end up with pipe full of uninitialized pages. IOW, it's an infoleak. Reported-by: "Alan J. Wylie" <alan@wylie.me.uk> Tested-by: "Alan J. Wylie" <alan@wylie.me.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-09Revert "radix tree test suite: fix compilation"Linus Torvalds
This reverts commit 53855d10f4567a0577360b6448d52a863929775b. It shouldn't have come in yet - it depends on the changes in linux-next that will come in during the next merge window. As Matthew Wilcox says, the test suite is broken with the current state without the revert. Requested-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-07radix tree test suite: fix compilationMatthew Wilcox
Patch "lib/radix-tree: Convert to hotplug state machine" breaks the test suite as it adds a call to cpuhp_setup_state_nocalls() which is not currently emulated in the test suite. Add it, and delete the emulation of the old CPU hotplug mechanism. Link: http://lkml.kernel.org/r/1480369871-5271-36-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-07Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Two rtmutex race fixes (which miraculously never triggered, that we know of), plus two lockdep printk formatting regression fixes" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep: Fix report formatting locking/rtmutex: Use READ_ONCE() in rt_mutex_owner() locking/rtmutex: Prevent dequeue vs. unlock race locking/selftest: Fix output since KERN_CONT changes
2016-11-30kasan: support use-after-scope detectionDmitry Vyukov
Gcc revision 241896 implements use-after-scope detection. Will be available in gcc 7. Support it in KASAN. Gcc emits 2 new callbacks to poison/unpoison large stack objects when they go in/out of scope. Implement the callbacks and add a test. [dvyukov@google.com: v3] Link: http://lkml.kernel.org/r/1479998292-144502-1-git-send-email-dvyukov@google.com Link: http://lkml.kernel.org/r/1479226045-145148-1-git-send-email-dvyukov@google.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: <stable@vger.kernel.org> [4.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-30lib/debugobjects: export for use in modulesChris Wilson
Drivers, or other modules, that use a mixture of objects (especially objects embedded within other objects) would like to take advantage of the debugobjects facilities to help catch misuse. Currently, the debugobjects interface is only available to builtin drivers and requires a set of EXPORT_SYMBOL_GPL for use by modules. I am using the debugobjects in i915.ko to try and catch some invalid operations on embedded objects. The problem currently only presents itself across module unload so forcing i915 to be builtin is not an option. Link: http://lkml.kernel.org/r/20161122143039.6433-1-chris@chris-wilson.co.uk Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Du, Changbin" <changbin.du@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-25locking/selftest: Fix output since KERN_CONT changesMichael Ellerman
Since the KERN_CONT changes the locking-selftest output is messed up, eg: ---------------------------------------------------------------------------- | spin |wlock |rlock |mutex | wsem | rsem | -------------------------------------------------------------------------- A-A deadlock: ok | ok | ok | ok | ok | ok | Use pr_cont() to get it looking normal again: ---------------------------------------------------------------------------- | spin |wlock |rlock |mutex | wsem | rsem | -------------------------------------------------------------------------- A-A deadlock: ok | ok | ok | ok | ok | ok | Reported-by: Christian Kujau <lists@nerdbynature.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linuxppc-dev@ozlabs.org Link: http://lkml.kernel.org/r/1480027528-934-1-git-send-email-mpe@ellerman.id.au Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-25mpi: Fix NULL ptr dereference in mpi_powm() [ver #3]Andrey Ryabinin
This fixes CVE-2016-8650. If mpi_powm() is given a zero exponent, it wants to immediately return either 1 or 0, depending on the modulus. However, if the result was initalised with zero limb space, no limbs space is allocated and a NULL-pointer exception ensues. Fix this by allocating a minimal amount of limb space for the result when the 0-exponent case when the result is 1 and not touching the limb space when the result is 0. This affects the use of RSA keys and X.509 certificates that carry them. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6 PGD 0 Oops: 0002 [#1] SMP Modules linked in: CPU: 3 PID: 3014 Comm: keyctl Not tainted 4.9.0-rc6-fscache+ #278 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 task: ffff8804011944c0 task.stack: ffff880401294000 RIP: 0010:[<ffffffff8138ce5d>] [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6 RSP: 0018:ffff880401297ad8 EFLAGS: 00010212 RAX: 0000000000000000 RBX: ffff88040868bec0 RCX: ffff88040868bba0 RDX: ffff88040868b260 RSI: ffff88040868bec0 RDI: ffff88040868bee0 RBP: ffff880401297ba8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000047 R11: ffffffff8183b210 R12: 0000000000000000 R13: ffff8804087c7600 R14: 000000000000001f R15: ffff880401297c50 FS: 00007f7a7918c700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000401250000 CR4: 00000000001406e0 Stack: ffff88040868bec0 0000000000000020 ffff880401297b00 ffffffff81376cd4 0000000000000100 ffff880401297b10 ffffffff81376d12 ffff880401297b30 ffffffff81376f37 0000000000000100 0000000000000000 ffff880401297ba8 Call Trace: [<ffffffff81376cd4>] ? __sg_page_iter_next+0x43/0x66 [<ffffffff81376d12>] ? sg_miter_get_next_page+0x1b/0x5d [<ffffffff81376f37>] ? sg_miter_next+0x17/0xbd [<ffffffff8138ba3a>] ? mpi_read_raw_from_sgl+0xf2/0x146 [<ffffffff8132a95c>] rsa_verify+0x9d/0xee [<ffffffff8132acca>] ? pkcs1pad_sg_set_buf+0x2e/0xbb [<ffffffff8132af40>] pkcs1pad_verify+0xc0/0xe1 [<ffffffff8133cb5e>] public_key_verify_signature+0x1b0/0x228 [<ffffffff8133d974>] x509_check_for_self_signed+0xa1/0xc4 [<ffffffff8133cdde>] x509_cert_parse+0x167/0x1a1 [<ffffffff8133d609>] x509_key_preparse+0x21/0x1a1 [<ffffffff8133c3d7>] asymmetric_key_preparse+0x34/0x61 [<ffffffff812fc9f3>] key_create_or_update+0x145/0x399 [<ffffffff812fe227>] SyS_add_key+0x154/0x19e [<ffffffff81001c2b>] do_syscall_64+0x80/0x191 [<ffffffff816825e4>] entry_SYSCALL64_slow_path+0x25/0x25 Code: 56 41 55 41 54 53 48 81 ec a8 00 00 00 44 8b 71 04 8b 42 04 4c 8b 67 18 45 85 f6 89 45 80 0f 84 b4 06 00 00 85 c0 75 2f 41 ff ce <49> c7 04 24 01 00 00 00 b0 01 75 0b 48 8b 41 18 48 83 38 01 0f RIP [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6 RSP <ffff880401297ad8> CR2: 0000000000000000 ---[ end trace d82015255d4a5d8d ]--- Basically, this is a backport of a libgcrypt patch: http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=patch;h=6e1adb05d290aeeb1c230c763970695f4a538526 Fixes: cdec9cb5167a ("crypto: GnuPG based MPI lib - source files (part 1)") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> cc: linux-ima-devel@lists.sourceforge.net cc: stable@vger.kernel.org Signed-off-by: James Morris <james.l.morris@oracle.com>
2016-11-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds
Pull sparc fixes from David Miller: 1) With modern networking cards we can run out of 32-bit DMA space, so support 64-bit DMA addressing when possible on sparc64. From Dave Tushar. 2) Some signal frame validation checks are inverted on sparc32, fix from Andreas Larsson. 3) Lockdep tables can get too large in some circumstances on sparc64, add a way to adjust the size a bit. From Babu Moger. 4) Fix NUMA node probing on some sun4v systems, from Thomas Tai. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc: drop duplicate header scatterlist.h lockdep: Limit static allocations if PROVE_LOCKING_SMALL is defined config: Adding the new config parameter CONFIG_PROVE_LOCKING_SMALL for sparc sunbmac: Fix compiler warning sunqe: Fix compiler warnings sparc64: Enable 64-bit DMA sparc64: Enable sun4v dma ops to use IOMMU v2 APIs sparc64: Bind PCIe devices to use IOMMU v2 service sparc64: Initialize iommu_map_table and iommu_pool sparc64: Add ATU (new IOMMU) support sparc64: Add FORCE_MAX_ZONEORDER and default to 13 sparc64: fix compile warning section mismatch in find_node() sparc32: Fix inverted invalid_frame_pointer checks on sigreturns sparc64: Fix find_node warning if numa node cannot be found
2016-11-18config: Adding the new config parameter CONFIG_PROVE_LOCKING_SMALL for sparcBabu Moger
This new config parameter limits the space used for "Lock debugging: prove locking correctness" by about 4MB. The current sparc systems have the limitation of 32MB size for kernel size including .text, .data and .bss sections. With PROVE_LOCKING feature, the kernel size could grow beyond this limit and causing system boot-up issues. With this option, kernel limits the size of the entries of lock_chains, stack_trace etc., so that kernel fits in required size limit. This is not visible to user and only used for sparc. Signed-off-by: Babu Moger <babu.moger@oracle.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-17fix iov_iter_advance() for ITER_PIPEAbhi Das
iov_iter_advance() needs to decrement iter->count by the number of bytes we'd moved beyond. Normal flavours do that, but ITER_PIPE doesn't and ITER_PIPE generic_file_read_iter() for O_DIRECT files ends up with a bogus fallback to page cache read, resulting in incorrect values for file offset and bytes read. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-11-11lib/stackdepot: export save/fetch stack for driversChris Wilson
Some drivers would like to record stacktraces in order to aide leak tracing. As stackdepot already provides a facility for only storing the unique traces, thereby reducing the memory required, export that functionality for use by drivers. The code was originally created for KASAN and moved under lib in commit cd11016e5f521 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB") so that it could be shared with mm/. In turn, we want to share it now with drivers. Link: http://lkml.kernel.org/r/20161108133209.22704-1-chris@chris-wilson.co.uk Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Lots of fixes, mostly drivers as is usually the case. 1) Don't treat zero DMA address as invalid in vmxnet3, from Alexey Khoroshilov. 2) Fix element timeouts in netfilter's nft_dynset, from Anders K. Pedersen. 3) Don't put aead_req crypto struct on the stack in mac80211, from Ard Biesheuvel. 4) Several uninitialized variable warning fixes from Arnd Bergmann. 5) Fix memory leak in cxgb4, from Colin Ian King. 6) Fix bpf handling of VLAN header push/pop, from Daniel Borkmann. 7) Several VRF semantic fixes from David Ahern. 8) Set skb->protocol properly in ip6_tnl_xmit(), from Eli Cooper. 9) Socket needs to be locked in udp_disconnect(), from Eric Dumazet. 10) Div-by-zero on 32-bit fix in mlx4 driver, from Eugenia Emantayev. 11) Fix stale link state during failover in NCSCI driver, from Gavin Shan. 12) Fix netdev lower adjacency list traversal, from Ido Schimmel. 13) Propvide proper handle when emitting notifications of filter deletes, from Jamal Hadi Salim. 14) Memory leaks and big-endian issues in rtl8xxxu, from Jes Sorensen. 15) Fix DESYNC_FACTOR handling in ipv6, from Jiri Bohac. 16) Several routing offload fixes in mlxsw driver, from Jiri Pirko. 17) Fix broadcast sync problem in TIPC, from Jon Paul Maloy. 18) Validate chunk len before using it in SCTP, from Marcelo Ricardo Leitner. 19) Revert a netns locking change that causes regressions, from Paul Moore. 20) Add recursion limit to GRO handling, from Sabrina Dubroca. 21) GFP_KERNEL in irq context fix in ibmvnic, from Thomas Falcon. 22) Avoid accessing stale vxlan/geneve socket in data path, from Pravin Shelar" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (189 commits) geneve: avoid using stale geneve socket. vxlan: avoid using stale vxlan socket. qede: Fix out-of-bound fastpath memory access net: phy: dp83848: add dp83822 PHY support enic: fix rq disable tipc: fix broadcast link synchronization problem ibmvnic: Fix missing brackets in init_sub_crq_irqs ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context Revert "ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context" arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold net/mlx4_en: Save slave ethtool stats command net/mlx4_en: Fix potential deadlock in port statistics flow net/mlx4: Fix firmware command timeout during interrupt test net/mlx4_core: Do not access comm channel if it has not yet been initialized net/mlx4_en: Fix panic during reboot net/mlx4_en: Process all completions in RX rings after port goes up net/mlx4_en: Resolve dividing by zero in 32-bit system net/mlx4_core: Change the default value of enable_qos net/mlx4_core: Avoid setting ports to auto when only one port type is supported net/mlx4_core: Fix the resource-type enum in res tracker to conform to FW spec ...
2016-10-27lib/genalloc.c: start search from start of chunkDaniel Mentz
gen_pool_alloc_algo() iterates over the chunks of a pool trying to find a contiguous block of memory that satisfies the allocation request. The shortcut if (size > atomic_read(&chunk->avail)) continue; makes the loop skip over chunks that do not have enough bytes left to fulfill the request. There are two situations, though, where an allocation might still fail: (1) The available memory is not contiguous, i.e. the request cannot be fulfilled due to external fragmentation. (2) A race condition. Another thread runs the same code concurrently and is quicker to grab the available memory. In those situations, the loop calls pool->algo() to search the entire chunk, and pool->algo() returns some value that is >= end_bit to indicate that the search failed. This return value is then assigned to start_bit. The variables start_bit and end_bit describe the range that should be searched, and this range should be reset for every chunk that is searched. Today, the code fails to reset start_bit to 0. As a result, prefixes of subsequent chunks are ignored. Memory allocations might fail even though there is plenty of room left in these prefixes of those other chunks. Fixes: 7f184275aa30 ("lib, Make gen_pool memory allocator lockless") Link: http://lkml.kernel.org/r/1477420604-28918-1-git-send-email-danielmentz@google.com Signed-off-by: Daniel Mentz <danielmentz@google.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MBDmitry Vyukov
KASAN uses stackdepot to memorize stacks for all kmalloc/kfree calls. Current stackdepot capacity is 16MB (1024 top level entries x 4 pages on second level). Size of each stack is (num_frames + 3) * sizeof(long). Which gives us ~84K stacks. This capacity was chosen empirically and it is enough to run kernel normally. However, when lots of configs are enabled and a fuzzer tries to maximize code coverage, it easily hits the limit within tens of minutes. I've tested for long a time with number of top level entries bumped 4x (4096). And I think I've seen overflow only once. But I don't have all configs enabled and code coverage has not reached maximum yet. So bump it 8x to 8192. Since we have two-level table, memory cost of this is very moderate -- currently the top-level table is 8KB, with this patch it is 64KB, which is negligible under KASAN. Here is some approx math. 128MB allows us to memorize ~670K stacks (assuming stack is ~200b). I've grepped kernel for kmalloc|kfree|kmem_cache_alloc|kmem_cache_free| kzalloc|kstrdup|kstrndup|kmemdup and it gives ~60K matches. Most of alloc/free call sites are reachable with only one stack. But some utility functions can have large fanout. Assuming average fanout is 5x, total number of alloc/free stacks is ~300K. Link: http://lkml.kernel.org/r/1476458416-122131-1-git-send-email-dvyukov@google.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Baozeng Ding <sploving1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27latent_entropy: raise CONFIG_FRAME_WARN by defaultKees Cook
When building with the latent_entropy plugin, set the default CONFIG_FRAME_WARN to 2048, since some __init functions have many basic blocks that, when instrumented by the latent_entropy plugin, grow beyond 1024 byte stack size on 32-bit builds. Link: http://lkml.kernel.org/r/20161018211216.GA39687@beast Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: kbuild test robot <fengguang.wu@intel.com> Cc: Emese Revfy <re.emese@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michal Marek <mmarek@suse.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-20bpf, test: fix ld_abs + vlan push/pop stress testDaniel Borkmann
After commit 636c2628086e ("net: skbuff: Remove errornous length validation in skb_vlan_pop()") mentioned test case stopped working, throwing a -12 (ENOMEM) return code. The issue however is not due to 636c2628086e, but rather due to a buggy test case that got uncovered from the change in behaviour in 636c2628086e. The data_size of that test case for the skb was set to 1. In the bpf_fill_ld_abs_vlan_push_pop() handler bpf insns are generated that loop with: reading skb data, pushing 68 tags, reading skb data, popping 68 tags, reading skb data, etc, in order to force a skb expansion and thus trigger that JITs recache skb->data. Problem is that initial data_size is too small. While before 636c2628086e, the test silently bailed out due to the skb->len < VLAN_ETH_HLEN check with returning 0, and now throwing an error from failing skb_ensure_writable(). Set at least minimum of ETH_HLEN as an initial length so that on first push of data, equivalent pop will succeed. Fixes: 4d9c5c53ac99 ("test_bpf: add bpf_skb_vlan_push/pop() tests") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-15Merge tag 'gcc-plugins-v4.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull gcc plugins update from Kees Cook: "This adds a new gcc plugin named "latent_entropy". It is designed to extract as much possible uncertainty from a running system at boot time as possible, hoping to capitalize on any possible variation in CPU operation (due to runtime data differences, hardware differences, SMP ordering, thermal timing variation, cache behavior, etc). At the very least, this plugin is a much more comprehensive example for how to manipulate kernel code using the gcc plugin internals" * tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: latent_entropy: Mark functions with __latent_entropy gcc-plugins: Add latent_entropy plugin
2016-10-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more misc uaccess and vfs updates from Al Viro: "The rest of the stuff from -next (more uaccess work) + assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: score: traps: Add missing include file to fix build error fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths fs/super.c: fix race between freeze_super() and thaw_super() overlayfs: Fix setting IOP_XATTR flag iov_iter: kernel-doc import_iovec() and rw_copy_check_uvector() blackfin: no access_ok() for __copy_{to,from}_user() arm64: don't zero in __copy_from_user{,_inatomic} arm: don't zero in __copy_from_user_inatomic()/__copy_from_user() arc: don't leak bits of kernel stack into coredump alpha: get rid of tail-zeroing in __copy_user()
2016-10-14iov_iter: kernel-doc import_iovec() and rw_copy_check_uvector()Vegard Nossum
Both import_iovec() and rw_copy_check_uvector() take an array (typically small and on-stack) which is used to hold an iovec array copy from userspace. This is to avoid an expensive memory allocation in the fast path (i.e. few iovec elements). The caller may have to check whether these functions actually used the provided buffer or allocated a new one -- but this differs between the too. Let's just add a kernel doc to clarify what the semantics are for each function. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-14Merge tag 'linux-kselftest-4.9-rc1-update' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest updates from Shuah Khan: "This update consists of: - Fixes and improvements to existing tests - Moving code from Documentation to selftests, samples, and tools: * Moves dnotify_test, prctl, ptp, vDSO, ia64, watchdog, and networking tests from Documentation to selftests. * Moves mic/mpssd, misc-devices/mei, timers, watchdog, auxdisplay, and blackfin examples from Documentation to samples. * Moves accounting, laptops/dslm, and pcmcia/crc32hash tools from Documentation to tools. * Deletes BUILD_DOCSRC and its dependencies" * tag 'linux-kselftest-4.9-rc1-update' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (21 commits) selftests/futex: Check ANSI terminal color support Doc: update 00-INDEX files to reflect the runnable code move samples: move blackfin gptimers-example from Documentation tools: move pcmcia crc32hash tool from Documentation tools: move laptops dslm tool from Documentation tools: move accounting tool from Documentation samples: move auxdisplay example code from Documentation samples: move watchdog example code from Documentation samples: move timers example code from Documentation samples: move misc-devices/mei example code from Documentation samples: move mic/mpssd example code from Documentation selftests: Move networking/timestamping from Documentation selftests: move watchdog tests from Documentation/watchdog selftests: move ia64 tests from Documentation/ia64 selftests: move vDSO tests from Documentation/vDSO selftests: move ptp tests from Documentation/ptp selftests: move prctl tests from Documentation/prctl selftests: move dnotify_test from Documentation/filesystems selftests/timers: Add missing error code assignment before test selftests/zram: replace ZRAM_LZ4_COMPRESS ...
2016-10-14Merge branch 'for-4.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu updates from Tejun Heo: - Nick improved generic implementations of percpu operations which modify the variable and return so that they calculate the physical address only once. - percpu_ref percpu <-> atomic mode switching improvements. The patchset was originally posted about a year ago but fell through the crack. - misc non-critical fixes. * 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: mm/percpu.c: fix potential memory leakage for pcpu_embed_first_chunk() mm/percpu.c: correct max_distance calculation for pcpu_embed_first_chunk() percpu: eliminate two sparse warnings percpu: improve generic percpu modify-return implementation percpu-refcount: init ->confirm_switch member properly percpu_ref: allow operation mode switching operations to be called concurrently percpu_ref: restructure operation mode switching percpu_ref: unify staggered atomic switching wait behavior percpu_ref: reorganize __percpu_ref_switch_to_atomic() and relocate percpu_ref_switch_to_atomic() percpu_ref: remove unnecessary RCU grace period for staggered atomic switching confirmation
2016-10-11kcov: do not instrument lib/stackdepot.cAlexander Potapenko
There's no point in collecting coverage from lib/stackdepot.c, as it is not a function of syscall inputs. Disabling kcov instrumentation for that file will reduce the coverage noise level. Link: http://lkml.kernel.org/r/1474640972-104131-1-git-send-email-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11lib/bitmap.c: enhance bitmap syntaxNoam Camus
Today there are platforms with many CPUs (up to 4K). Trying to boot only part of the CPUs may result in too long string. For example lets take NPS platform that is part of arch/arc. This platform have SMP system with 256 cores each with 16 HW threads (SMT machine) where HW thread appears as CPU to the kernel. In this example there is total of 4K CPUs. When one tries to boot only part of the HW threads from each core the string representing the map may be long... For example if for sake of performance we decided to boot only first half of HW threads of each core the map will look like: 0-7,16-23,32-39,...,4080-4087 This patch introduce new syntax to accommodate with such use case. I added an optional postfix to a range of CPUs which will choose according to given modulo the desired range of reminders i.e.: <cpus range>:sed_size/group_size For example, above map can be described in new syntax like this: 0-4095:8/16 Note that this patch is backward compatible with current syntax. [akpm@linux-foundation.org: rework documentation] Link: http://lkml.kernel.org/r/1473579629-4283-1-git-send-email-noamca@mellanox.com Signed-off-by: Noam Camus <noamca@mellanox.com> Cc: David Decotigny <decot@googlers.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: David S. Miller <davem@davemloft.net> Cc: Pan Xinhui <xinhui@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11lib/kstrtox.c: smaller _parse_integer()Alexey Dobriyan
Set "overflow" bit upon encountering it instead of postponing to the end of the conversion. Somehow gcc unwedges itself and generates better code: $ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux _parse_integer 177 139 -38 Inspired by patch from Zhaoxiu Zeng. Link: http://lkml.kernel.org/r/20160826221920.GA1909@p183.telecom.by Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11lib: harden strncpy_from_userMark Rutland
The strncpy_from_user() accessor is effectively a copy_from_user() specialised to copy strings, terminating early at a NUL byte if possible. In other respects it is identical, and can be used to copy an arbitrarily large buffer from userspace into the kernel. Conceptually, it exposes a similar attack surface. As with copy_from_user(), we check the destination range when the kernel is built with KASAN, but unlike copy_from_user() we do not check the destination buffer when using HARDENED_USERCOPY. As strncpy_from_user() calls get_user() in a loop, we must call check_object_size() explicitly. This patch adds this instrumentation to strncpy_from_user(), per the same rationale as with the regular copy_from_user(). In the absence of hardened usercopy this will have no impact as the instrumentation expands to an empty static inline function. Link: http://lkml.kernel.org/r/1472221903-31181-1-git-send-email-mark.rutland@arm.com Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11Fix off-by-one in __pipe_get_pages()Al Viro
it actually worked only when requested area ended on the page boundary... Reported-by: Marco Grassi <marco.gra@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-10latent_entropy: Mark functions with __latent_entropyEmese Revfy
The __latent_entropy gcc attribute can be used only on functions and variables. If it is on a function then the plugin will instrument it for gathering control-flow entropy. If the attribute is on a variable then the plugin will initialize it with random contents. The variable must be an integer, an integer array type or a structure with integer fields. These specific functions have been selected because they are init functions (to help gather boot-time entropy), are called at unpredictable times, or they have variable loops, each of which provide some level of latent entropy. Signed-off-by: Emese Revfy <re.emese@gmail.com> [kees: expanded commit message] Signed-off-by: Kees Cook <keescook@chromium.org>
2016-10-10Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted misc bits and pieces. There are several single-topic branches left after this (rename2 series from Miklos, current_time series from Deepa Dinamani, xattr series from Andreas, uaccess stuff from from me) and I'd prefer to send those separately" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits) proc: switch auxv to use of __mem_open() hpfs: support FIEMAP cifs: get rid of unused arguments of CIFSSMBWrite() posix_acl: uapi header split posix_acl: xattr representation cleanups fs/aio.c: eliminate redundant loads in put_aio_ring_file fs/internal.h: add const to ns_dentry_operations declaration compat: remove compat_printk() fs/buffer.c: make __getblk_slow() static proc: unsigned file descriptors fs/file: more unsigned file descriptors fs: compat: remove redundant check of nr_segs cachefiles: Fix attempt to read i_blocks after deleting file [ver #2] cifs: don't use memcpy() to copy struct iov_iter get rid of separate multipage fault-in primitives fs: Avoid premature clearing of capabilities fs: Give dentry to inode_change_ok() instead of inode fuse: Propagate dentry down to inode_change_ok() ceph: Propagate dentry down to inode_change_ok() xfs: Propagate dentry down to inode_change_ok() ...
2016-10-10samples: move blackfin gptimers-example from DocumentationShuah Khan
Move blackfin gptimers-example to samples and remove it from Documentation Makefile. Update samples Kconfig and Makefile to build gptimers-example. blackfin is the last CONFIG_BUILD_DOCSRC target in Documentation/Makefile. Hence this patch also includes changes to remove CONFIG_BUILD_DOCSRC from Makefile and lib/Kconfig.debug and updates VIDEO_PCI_SKELETON dependency on BUILD_DOCSRC. Documentation/Makefile is not deleted to avoid braking make htmldocs and make distclean. Acked-by: Michal Marek <mmarek@suse.com> Acked-by: Jonathan Corbet <corbet@lwn.net> Reviewed-by: Kees Cook <keescook@chromium.org> Reported-by: Valentin Rothberg <valentinrothberg@gmail.com> Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
2016-10-07Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - fsnotify updates - ocfs2 updates - all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (127 commits) console: don't prefer first registered if DT specifies stdout-path cred: simpler, 1D supplementary groups CREDITS: update Pavel's information, add GPG key, remove snail mail address mailmap: add Johan Hovold .gitattributes: set git diff driver for C source code files uprobes: remove function declarations from arch/{mips,s390} spelling.txt: "modeled" is spelt correctly nmi_backtrace: generate one-line reports for idle cpus arch/tile: adopt the new nmi_backtrace framework nmi_backtrace: do a local dump_stack() instead of a self-NMI nmi_backtrace: add more trigger_*_cpu_backtrace() methods min/max: remove sparse warnings when they're nested Documentation/filesystems/proc.txt: add more description for maps/smaps mm, proc: fix region lost in /proc/self/smaps proc: fix timerslack_ns CAP_SYS_NICE check when adjusting self proc: add LSM hook checks to /proc/<tid>/timerslack_ns proc: relax /proc/<tid>/timerslack_ns capability requirements meminfo: break apart a very long seq_printf with #ifdefs seq/proc: modify seq_put_decimal_[u]ll to take a const char *, not char proc: faster /proc/*/status ...
2016-10-07nmi_backtrace: generate one-line reports for idle cpusChris Metcalf
When doing an nmi backtrace of many cores, most of which are idle, the output is a little overwhelming and very uninformative. Suppress messages for cpus that are idling when they are interrupted and just emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN". We do this by grouping all the cpuidle code together into a new .cpuidle.text section, and then checking the address of the interrupted PC to see if it lies within that section. This commit suitably tags x86 and tile idle routines, and only adds in the minimal framework for other architectures. Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.com Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm] Tested-by: Petr Mladek <pmladek@suse.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07nmi_backtrace: do a local dump_stack() instead of a self-NMIChris Metcalf
Currently on arm there is code that checks whether it should call dump_stack() explicitly, to avoid trying to raise an NMI when the current context is not preemptible by the backtrace IPI. Similarly, the forthcoming arch/tile support uses an IPI mechanism that does not support generating an NMI to self. Accordingly, move the code that guards this case into the generic mechanism, and invoke it unconditionally whenever we want a backtrace of the current cpu. It seems plausible that in all cases, dump_stack() will generate better information than generating a stack from the NMI handler. The register state will be missing, but that state is likely not particularly helpful in any case. Or, if we think it is helpful, we should be capturing and emitting the current register state in all cases when regs == NULL is passed to nmi_cpu_backtrace(). Link: http://lkml.kernel.org/r/1472487169-14923-3-git-send-email-cmetcalf@mellanox.com Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com> Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm] Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Aaron Tomlin <atomlin@redhat.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07nmi_backtrace: add more trigger_*_cpu_backtrace() methodsChris Metcalf
Patch series "improvements to the nmi_backtrace code" v9. This patch series modifies the trigger_xxx_backtrace() NMI-based remote backtracing code to make it more flexible, and makes a few small improvements along the way. The motivation comes from the task isolation code, where there are scenarios where we want to be able to diagnose a case where some cpu is about to interrupt a task-isolated cpu. It can be helpful to see both where the interrupting cpu is, and also an approximation of where the cpu that is being interrupted is. The nmi_backtrace framework allows us to discover the stack of the interrupted cpu. I've tested that the change works as desired on tile, and build-tested x86, arm, mips, and sparc64. For x86 I confirmed that the generic cpuidle stuff as well as the architecture-specific routines are in the new cpuidle section. For arm, mips, and sparc I just build-tested it and made sure the generic cpuidle routines were in the new cpuidle section, but I didn't attempt to figure out which the platform-specific idle routines might be. That might be more usefully done by someone with platform experience in follow-up patches. This patch (of 4): Currently you can only request a backtrace of either all cpus, or all cpus but yourself. It can also be helpful to request a remote backtrace of a single cpu, and since we want that, the logical extension is to support a cpumask as the underlying primitive. This change modifies the existing lib/nmi_backtrace.c code to take a cpumask as its basic primitive, and modifies the linux/nmi.h code to use the new "cpumask" method instead. The existing clients of nmi_backtrace (arm and x86) are converted to using the new cpumask approach in this change. The other users of the backtracing API (sparc64 and mips) are converted to use the cpumask approach rather than the all/allbutself approach. The mips code ignored the "include_self" boolean but with this change it will now also dump a local backtrace if requested. Link: http://lkml.kernel.org/r/1472487169-14923-2-git-send-email-cmetcalf@mellanox.com Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com> Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm] Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07atomic64: no need for CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVEVineet Gupta
This came to light when implementing native 64-bit atomics for ARCv2. The atomic64 self-test code uses CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE to check whether atomic64_dec_if_positive() is available. It seems it was needed when not every arch defined it. However as of current code the Kconfig option seems needless - for CONFIG_GENERIC_ATOMIC64 it is auto-enabled in lib/Kconfig and a generic definition of API is present lib/atomic64.c - arches with native 64-bit atomics select it in arch/*/Kconfig and define the API in their headers So I see no point in keeping the Kconfig option Compile tested for: - blackfin (CONFIG_GENERIC_ATOMIC64) - x86 (!CONFIG_GENERIC_ATOMIC64) - ia64 Link: http://lkml.kernel.org/r/1473703083-8625-3-git-send-email-vgupta@synopsys.com Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Zhaoxiu Zeng <zhaoxiu.zeng@gmail.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ming Lin <ming.l@ssi.samsung.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@suse.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07Merge branch 'work.splice_read' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS splice updates from Al Viro: "There's a bunch of branches this cycle, both mine and from other folks and I'd rather send pull requests separately. This one is the conversion of ->splice_read() to ITER_PIPE iov_iter (and introduction of such). Gets rid of a lot of code in fs/splice.c and elsewhere; there will be followups, but these are for the next cycle... Some pipe/splice-related cleanups from Miklos in the same branch as well" * 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: pipe: fix comment in pipe_buf_operations pipe: add pipe_buf_steal() helper pipe: add pipe_buf_confirm() helper pipe: add pipe_buf_release() helper pipe: add pipe_buf_get() helper relay: simplify relay_file_read() switch default_file_splice_read() to use of pipe-backed iov_iter switch generic_file_splice_read() to use of ->read_iter() new iov_iter flavour: pipe-backed fuse_dev_splice_read(): switch to add_to_pipe() skb_splice_bits(): get rid of callback new helper: add_to_pipe() splice: lift pipe_lock out of splice_to_pipe() splice: switch get_iovec_page_array() to iov_iter splice_to_pipe(): don't open-code wakeup_pipe_readers() consistent treatment of EFAULT on O_DIRECT read/write
2016-10-07Merge branch 'for-4.9/block' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer updates from Jens Axboe: "This is the main pull request for block layer changes in 4.9. As mentioned at the last merge window, I've changed things up and now do just one branch for core block layer changes, and driver changes. This avoids dependencies between the two branches. Outside of this main pull request, there are two topical branches coming as well. This pull request contains: - A set of fixes, and a conversion to blk-mq, of nbd. From Josef. - Set of fixes and updates for lightnvm from Matias, Simon, and Arnd. Followup dependency fix from Geert. - General fixes from Bart, Baoyou, Guoqing, and Linus W. - CFQ async write starvation fix from Glauber. - Add supprot for delayed kick of the requeue list, from Mike. - Pull out the scalable bitmap code from blk-mq-tag.c and make it generally available under the name of sbitmap. Only blk-mq-tag uses it for now, but the blk-mq scheduling bits will use it as well. From Omar. - bdev thaw error progagation from Pierre. - Improve the blk polling statistics, and allow the user to clear them. From Stephen. - Set of minor cleanups from Christoph in block/blk-mq. - Set of cleanups and optimizations from me for block/blk-mq. - Various nvme/nvmet/nvmeof fixes from the various folks" * 'for-4.9/block' of git://git.kernel.dk/linux-block: (54 commits) fs/block_dev.c: return the right error in thaw_bdev() nvme: Pass pointers, not dma addresses, to nvme_get/set_features() nvme/scsi: Remove power management support nvmet: Make dsm number of ranges zero based nvmet: Use direct IO for writes admin-cmd: Added smart-log command support. nvme-fabrics: Add host_traddr options field to host infrastructure nvme-fabrics: revise host transport option descriptions nvme-fabrics: rework nvmf_get_address() for variable options nbd: use BLK_MQ_F_BLOCKING blkcg: Annotate blkg_hint correctly cfq: fix starvation of asynchronous writes blk-mq: add flag for drivers wanting blocking ->queue_rq() blk-mq: remove non-blocking pass in blk_mq_map_request blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue() block: export bio_free_pages to other modules lightnvm: propagate device_add() error code lightnvm: expose device geometry through sysfs lightnvm: control life of nvm_dev in driver blk-mq: register device instead of disk ...
2016-10-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial updates from Jiri Kosina: "The usual rocket science from the trivial tree" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: tracing/syscalls: fix multiline in error message text lib/Kconfig.debug: fix DEBUG_SECTION_MISMATCH description doc: vfs: fix fadvise() sycall name x86/entry: spell EBX register correctly in documentation securityfs: fix securityfs_create_dir comment irq: Fix typo in tracepoint.xml
2016-10-07Merge tag 'md/4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/mdLinus Torvalds
Pull MD updates from Shaohua Li: "This update includes: - new AVX512 instruction based raid6 gen/recovery algorithm - a couple of md-cluster related bug fixes - fix a potential deadlock - set nonrotational bit for raid array with SSD - set correct max_hw_sectors for raid5/6, which hopefuly can improve performance a little bit - other minor fixes" * tag 'md/4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: md: set rotational bit raid6/test/test.c: bug fix: Specify aligned(alignment) attributes to the char arrays raid5: handle register_shrinker failure raid5: fix to detect failure of register_shrinker md: fix a potential deadlock md/bitmap: fix wrong cleanup raid5: allow arbitrary max_hw_sectors lib/raid6: Add AVX512 optimized xor_syndrome functions lib/raid6/test/Makefile: Add avx512 gen_syndrome and recovery functions lib/raid6: Add AVX512 optimized recovery functions lib/raid6: Add AVX512 optimized gen_syndrome functions md-cluster: make resync lock also could be interruptted md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang md-cluster: convert the completion to wait queue md-cluster: protect md_find_rdev_nr_rcu with rcu lock md-cluster: clean related infos of cluster md: changes for MD_STILL_CLOSED flag md-cluster: remove some unnecessary dlm_unlock_sync md-cluster: use FORCEUNLOCK in lockres_free md-cluster: call md_kick_rdev_from_array once ack failed
2016-10-06Merge tag 'dmaengine-4.9-rc1' of git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds
Pull dmaengine updates from Vinod Koul: "This is bit large pile of code which bring in some nice additions: - Error reporting: we have added a new mechanism for users of dmaenegine to register a callback_result which tells them the result of the dma transaction. Right now only one user (ntb) is using it. - As we discussed on KS mailing list and pointed out NO_IRQ has no place in kernel, this also remove NO_IRQ from dmaengine subsystem (both arm and ppc users) - Support for IOMMU slave transfers and its implementation for arm. - To get better build coverage, enable COMPILE_TEST for bunch of driver, and fix the warning and sparse complaints on these. - Apart from above, usual updates spread across drivers" * tag 'dmaengine-4.9-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (169 commits) async_pq_val: fix DMA memory leak dmaengine: virt-dma: move function declarations dmaengine: omap-dma: Enable burst and data pack for SG DT: dmaengine: rcar-dmac: document R8A7743/5 support dmaengine: fsldma: Unmap region obtained by of_iomap dmaengine: jz4780: fix resource leaks on error exit return dma-debug: fix ia64 build, use PHYS_PFN dmaengine: coh901318: fix integer overflow when shifting more than 32 places dmaengine: edma: avoid uninitialized variable use dma-mapping: fix m32r build warning dma-mapping: fix ia64 build, use PHYS_PFN dmaengine: ti-dma-crossbar: enable COMPILE_TEST dmaengine: omap-dma: enable COMPILE_TEST dmaengine: edma: enable COMPILE_TEST dmaengine: ti-dma-crossbar: Fix of_device_id data parameter usage dmaengine: ti-dma-crossbar: Correct type for of_find_property() third parameter dmaengine/ARM: omap-dma: Fix the DMAengine compile test on non OMAP configs dmaengine: edma: Rename set_bits and remove unused clear_bits helper dmaengine: edma: Use correct type for of_find_property() third parameter dmaengine: edma: Fix of_device_id data parameter usage (legacy vs TPCC) ...
2016-10-05pipe: add pipe_buf_release() helperMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>