summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2016-08-20SUNRPC: Don't allocate a full sockaddr_storage for tracingTrond Myklebust
commit db1bb44c4c7e8d49ed674dc59e5222d99c698088 upstream. We're always tracing IPv4 or IPv6 addresses, so we can save a lot of space on the ringbuffer by allocating the correct sockaddr size. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Fixes: 83a712e0afef "sunrpc: add some tracepoints around ..." Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-20target: Fix ordered task CHECK_CONDITION early exception handlingNicholas Bellinger
commit 410c29dfbfdf73d0d0b5d14a21868ab038eca703 upstream. If a Simple command is sent with a failure, target_setup_cmd_from_cdb returns with TCM_UNSUPPORTED_SCSI_OPCODE or TCM_INVALID_CDB_FIELD. So in the cases where target_setup_cmd_from_cdb returns an error, we never get far enough to call target_execute_cmd to increment simple_cmds. Since simple_cmds isn't incremented, the result of the failure from target_setup_cmd_from_cdb causes transport_generic_request_failure to decrement simple_cmds, due to call to transport_complete_task_attr. With this dev->simple_cmds or dev->dev_ordered_sync is now -1, not 0. So when a subsequent command with an Ordered Task is sent, it causes a hang, since dev->simple_cmds is at -1. Tested-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Tested-by: Michael Cyr <mikecyr@linux.vnet.ibm.com> Signed-off-by: Michael Cyr <mikecyr@linux.vnet.ibm.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-20target: Fix max_unmap_lba_count calc overflowMike Christie
commit ea263c7fada4af8ec7fe5fcfd6e7d7705a89351b upstream. max_discard_sectors only 32bits, and some non scsi backend devices will set this to the max 0xffffffff, so we can end up overflowing during the max_unmap_lba_count calculation. This fixes a regression caused by my patch: commit 8a9ebe717a133ba7bc90b06047f43cc6b8bcb8b3 Author: Mike Christie <mchristi@redhat.com> Date: Mon Jan 18 14:09:27 2016 -0600 target: Fix WRITE_SAME/DISCARD conversion to linux 512b sectors which can result in extra discards being sent to due the overflow causing max_unmap_lba_count to be smaller than what the backing device can actually support. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-20target: Fix ordered task target_setup_cmd_from_cdb exception hangNicholas Bellinger
commit dff0ca9ea7dc8be2181a62df4a722c32ce68ff4a upstream. If a command with a Simple task attribute is failed due to a Unit Attention, then a subsequent command with an Ordered task attribute will hang forever. The reason for this is that the Unit Attention status is checked for in target_setup_cmd_from_cdb, before the call to target_execute_cmd, which calls target_handle_task_attr, which in turn increments dev->simple_cmds. However, transport_generic_request_failure still calls transport_complete_task_attr, which will decrement dev->simple_cmds. In this case, simple_cmds is now -1. So when a command with the Ordered task attribute is sent, target_handle_task_attr sees that dev->simple_cmds is not 0, so it decides it can't execute the command until all the (nonexistent) Simple commands have completed. Reported-by: Michael Cyr <mikecyr@linux.vnet.ibm.com> Tested-by: Michael Cyr <mikecyr@linux.vnet.ibm.com> Reported-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Tested-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-20IB/mlx5: Fix post send fence logicEli Cohen
commit c9b254955b9f8814966f5dabd34c39d0e0a2b437 upstream. If the caller specified IB_SEND_FENCE in the send flags of the work request and no previous work request stated that the successive one should be fenced, the work request would be executed without a fence. This could result in RDMA read or atomic operations failure due to a MR being invalidated. Fix this by adding the mlx5 enumeration for fencing RDMA/atomic operations and fix the logic to apply this. Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-20IB/mlx5: Fix MODIFY_QP command input structureArtemy Kovalyov
commit e3353c268b06236d6c40fa1714c114f21f44451c upstream. Make MODIFY_QP command input structure compliant to specification Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-20block: fix bdi vs gendisk lifetime mismatchDan Williams
commit df08c32ce3be5be138c1dbfcba203314a3a7cd6f upstream. The name for a bdi of a gendisk is derived from the gendisk's devt. However, since the gendisk is destroyed before the bdi it leaves a window where a new gendisk could dynamically reuse the same devt while a bdi with the same name is still live. Arrange for the bdi to hold a reference against its "owner" disk device while it is registered. Otherwise we can hit sysfs duplicate name collisions like the following: WARNING: CPU: 10 PID: 2078 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x80 sysfs: cannot create duplicate filename '/devices/virtual/bdi/259:1' Hardware name: HP ProLiant DL580 Gen8, BIOS P79 05/06/2015 0000000000000286 0000000002c04ad5 ffff88006f24f970 ffffffff8134caec ffff88006f24f9c0 0000000000000000 ffff88006f24f9b0 ffffffff8108c351 0000001f0000000c ffff88105d236000 ffff88105d1031e0 ffff8800357427f8 Call Trace: [<ffffffff8134caec>] dump_stack+0x63/0x87 [<ffffffff8108c351>] __warn+0xd1/0xf0 [<ffffffff8108c3cf>] warn_slowpath_fmt+0x5f/0x80 [<ffffffff812a0d34>] sysfs_warn_dup+0x64/0x80 [<ffffffff812a0e1e>] sysfs_create_dir_ns+0x7e/0x90 [<ffffffff8134faaa>] kobject_add_internal+0xaa/0x320 [<ffffffff81358d4e>] ? vsnprintf+0x34e/0x4d0 [<ffffffff8134ff55>] kobject_add+0x75/0xd0 [<ffffffff816e66b2>] ? mutex_lock+0x12/0x2f [<ffffffff8148b0a5>] device_add+0x125/0x610 [<ffffffff8148b788>] device_create_groups_vargs+0xd8/0x100 [<ffffffff8148b7cc>] device_create_vargs+0x1c/0x20 [<ffffffff811b775c>] bdi_register+0x8c/0x180 [<ffffffff811b7877>] bdi_register_dev+0x27/0x30 [<ffffffff813317f5>] add_disk+0x175/0x4a0 Reported-by: Yi Zhang <yizhan@redhat.com> Tested-by: Yi Zhang <yizhan@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixed up missing 0 return in bdi_register_owner(). Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-20block: add missing group association in bio-cloning functionsPaolo Valente
commit 20bd723ec6a3261df5e02250cd3a1fbb09a343f2 upstream. When a bio is cloned, the newly created bio must be associated with the same blkcg as the original bio (if BLK_CGROUP is enabled). If this operation is not performed, then the new bio is not associated with any group, and the group of the current task is returned when the group of the bio is requested. Depending on the cloning frequency, this may cause a large percentage of the bios belonging to a given group to be treated as if belonging to other groups (in most cases as if belonging to the root group). The expected group isolation may thereby be broken. This commit adds the missing association in bio-cloning functions. Fixes: da2f0f74cf7d ("Btrfs: add support for blkio controllers") Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Reviewed-by: Nikolay Borisov <kernel@kyup.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16mm: memcontrol: fix cgroup creation failure after many small jobsJohannes Weiner
commit 73f576c04b9410ed19660f74f97521bee6e1c546 upstream. The memory controller has quite a bit of state that usually outlives the cgroup and pins its CSS until said state disappears. At the same time it imposes a 16-bit limit on the CSS ID space to economically store IDs in the wild. Consequently, when we use cgroups to contain frequent but small and short-lived jobs that leave behind some page cache, we quickly run into the 64k limitations of outstanding CSSs. Creating a new cgroup fails with -ENOSPC while there are only a few, or even no user-visible cgroups in existence. Although pinning CSSs past cgroup removal is common, there are only two instances that actually need an ID after a cgroup is deleted: cache shadow entries and swapout records. Cache shadow entries reference the ID weakly and can deal with the CSS having disappeared when it's looked up later. They pose no hurdle. Swap-out records do need to pin the css to hierarchically attribute swapins after the cgroup has been deleted; though the only pages that remain swapped out after offlining are tmpfs/shmem pages. And those references are under the user's control, so they are manageable. This patch introduces a private 16-bit memcg ID and switches swap and cache shadow entries over to using that. This ID can then be recycled after offlining when the CSS remains pinned only by objects that don't specifically need it. This script demonstrates the problem by faulting one cache page in a new cgroup and deleting it again: set -e mkdir -p pages for x in `seq 128000`; do [ $((x % 1000)) -eq 0 ] && echo $x mkdir /cgroup/foo echo $$ >/cgroup/foo/cgroup.procs echo trex >pages/$x echo $$ >/cgroup/cgroup.procs rmdir /cgroup/foo done When run on an unpatched kernel, we eventually run out of possible IDs even though there are no visible cgroups: [root@ham ~]# ./cssidstress.sh [...] 65000 mkdir: cannot create directory '/cgroup/foo': No space left on device After this patch, the IDs get released upon cgroup destruction and the cache and css objects get released once memory reclaim kicks in. [hannes@cmpxchg.org: init the IDR] Link: http://lkml.kernel.org/r/20160621154601.GA22431@cmpxchg.org Fixes: b2052564e66d ("mm: memcontrol: continue cache reclaim from offlined groups") Link: http://lkml.kernel.org/r/20160617162516.GD19084@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: John Garcia <john.garcia@mesosphere.io> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16devpts: clean up interface to pty driversLinus Torvalds
commit 67245ff332064c01b760afa7a384ccda024bfd24 upstream. This gets rid of the horrible notion of having that struct inode *ptmx_inode be the linchpin of the interface between the pty code and devpts. By de-emphasizing the ptmx inode, a lot of things actually get cleaner, and we will have a much saner way forward. In particular, this will allow us to associate with any particular devpts instance at open-time, and not be artificially tied to one particular ptmx inode. The patch itself is actually fairly straightforward, and apart from some locking and return path cleanups it's pretty mechanical: - the interfaces that devpts exposes all take "struct pts_fs_info *" instead of "struct inode *ptmx_inode" now. NOTE! The "struct pts_fs_info" thing is a completely opaque structure as far as the pty driver is concerned: it's still declared entirely internally to devpts. So the pty code can't actually access it in any way, just pass it as a "cookie" to the devpts code. - the "look up the pts fs info" is now a single clear operation, that also does the reference count increment on the pts superblock. So "devpts_add/del_ref()" is gone, and replaced by a "lookup and get ref" operation (devpts_get_ref(inode)), along with a "put ref" op (devpts_put_ref()). - the pty master "tty->driver_data" field now contains the pts_fs_info, not the ptmx inode. - because we don't care about the ptmx inode any more as some kind of base index, the ref counting can now drop the inode games - it just gets the ref on the superblock. - the pts_fs_info now has a back-pointer to the super_block. That's so that we can easily look up the information we actually need. Although quite often, the pts fs info was actually all we wanted, and not having to look it up based on some magical inode makes things more straightforward. In particular, now that "devpts_get_ref(inode)" operation should really be the *only* place we need to look up what devpts instance we're associated with, and we do it exactly once, at ptmx_open() time. The other side of this is that one ptmx node could now be associated with multiple different devpts instances - you could have a single /dev/ptmx node, and then have multiple mount namespaces with their own instances of devpts mounted on /dev/pts/. And that's all perfectly sane in a model where we just look up the pts instance at open time. This will eventually allow us to get rid of our odd single-vs-multiple pts instance model, but this patch in itself changes no semantics, only an internal binding model. Cc: Eric Biederman <ebiederm@xmission.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Cc: Jann Horn <jann@thejh.net> Cc: Greg KH <greg@kroah.com> Cc: Jiri Slaby <jslaby@suse.com> Cc: Florian Weimer <fw@deneb.enyo.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Francesco Ruggeri <fruggeri@arista.com> Cc: "Herton R. Krzesinski" <herton@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10vmlinux.lds: account for destructor sectionsDmitry Vyukov
commit e41f501d391265ff568f3e49d6128cc30856a36f upstream. If CONFIG_KASAN is enabled and gcc is configured with --disable-initfini-array and/or gold linker is used, gcc emits .ctors/.dtors and .text.startup/.text.exit sections instead of .init_array/.fini_array. .dtors section is not explicitly accounted in the linker script and messes vvar/percpu layout. We want: ffffffff822bfd80 D _edata ffffffff822c0000 D __vvar_beginning_hack ffffffff822c0000 A __vvar_page ffffffff822c0080 0000000000000098 D vsyscall_gtod_data ffffffff822c1000 A __init_begin ffffffff822c1000 D init_per_cpu__irq_stack_union ffffffff822c1000 A __per_cpu_load ffffffff822d3000 D init_per_cpu__gdt_page We got: ffffffff8279a600 D _edata ffffffff8279b000 A __vvar_page ffffffff8279c000 A __init_begin ffffffff8279c000 D init_per_cpu__irq_stack_union ffffffff8279c000 A __per_cpu_load ffffffff8279e000 D __vvar_beginning_hack ffffffff8279e080 0000000000000098 D vsyscall_gtod_data ffffffff827ae000 D init_per_cpu__gdt_page This happens because __vvar_page and .vvar get different addresses in arch/x86/kernel/vmlinux.lds.S: . = ALIGN(PAGE_SIZE); __vvar_page = .; .vvar : AT(ADDR(.vvar) - LOAD_OFFSET) { /* work around gold bug 13023 */ __vvar_beginning_hack = .; Discard .dtors/.fini_array/.text.exit, since we don't call dtors. Merge .text.startup into init text. Link: http://lkml.kernel.org/r/1467386363-120030-1-git-send-email-dvyukov@google.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10x86/quirks: Add early quirk to reset Apple AirPort cardLukas Wunner
commit abb2bafd295fe962bbadc329dbfb2146457283ac upstream. The EFI firmware on Macs contains a full-fledged network stack for downloading OS X images from osrecovery.apple.com. Unfortunately on Macs introduced 2011 and 2012, EFI brings up the Broadcom 4331 wireless card on every boot and leaves it enabled even after ExitBootServices has been called. The card continues to assert its IRQ line, causing spurious interrupts if the IRQ is shared. It also corrupts memory by DMAing received packets, allowing for remote code execution over the air. This only stops when a driver is loaded for the wireless card, which may be never if the driver is not installed or blacklisted. The issue seems to be constrained to the Broadcom 4331. Chris Milsted has verified that the newer Broadcom 4360 built into the MacBookPro11,3 (2013/2014) does not exhibit this behaviour. The chances that Apple will ever supply a firmware fix for the older machines appear to be zero. The solution is to reset the card on boot by writing to a reset bit in its mmio space. This must be done as an early quirk and not as a plain vanilla PCI quirk to successfully combat memory corruption by DMAed packets: Matthew Garrett found out in 2012 that the packets are written to EfiBootServicesData memory (http://mjg59.dreamwidth.org/11235.html). This type of memory is made available to the page allocator by efi_free_boot_services(). Plain vanilla PCI quirks run much later, in subsys initcall level. In-between a time window would be open for memory corruption. Random crashes occurring in this time window and attributed to DMAed packets have indeed been observed in the wild by Chris Bainbridge. When Matthew Garrett analyzed the memory corruption issue in 2012, he sought to fix it with a grub quirk which transitions the card to D3hot: http://git.savannah.gnu.org/cgit/grub.git/commit/?id=9d34bb85da56 This approach does not help users with other bootloaders and while it may prevent DMAed packets, it does not cure the spurious interrupts emanating from the card. Unfortunately the card's mmio space is inaccessible in D3hot, so to reset it, we have to undo the effect of Matthew's grub patch and transition the card back to D0. Note that the quirk takes a few shortcuts to reduce the amount of code: The size of BAR 0 and the location of the PM capability is identical on all affected machines and therefore hardcoded. Only the address of BAR 0 differs between models. Also, it is assumed that the BCMA core currently mapped is the 802.11 core. The EFI driver seems to always take care of this. Michael Büsch, Bjorn Helgaas and Matt Fleming contributed feedback towards finding the best solution to this problem. The following should be a comprehensive list of affected models: iMac13,1 2012 21.5" [Root Port 00:1c.3 = 8086:1e16] iMac13,2 2012 27" [Root Port 00:1c.3 = 8086:1e16] Macmini5,1 2011 i5 2.3 GHz [Root Port 00:1c.1 = 8086:1c12] Macmini5,2 2011 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1c12] Macmini5,3 2011 i7 2.0 GHz [Root Port 00:1c.1 = 8086:1c12] Macmini6,1 2012 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1e12] Macmini6,2 2012 i7 2.3 GHz [Root Port 00:1c.1 = 8086:1e12] MacBookPro8,1 2011 13" [Root Port 00:1c.1 = 8086:1c12] MacBookPro8,2 2011 15" [Root Port 00:1c.1 = 8086:1c12] MacBookPro8,3 2011 17" [Root Port 00:1c.1 = 8086:1c12] MacBookPro9,1 2012 15" [Root Port 00:1c.1 = 8086:1e12] MacBookPro9,2 2012 13" [Root Port 00:1c.1 = 8086:1e12] MacBookPro10,1 2012 15" [Root Port 00:1c.1 = 8086:1e12] MacBookPro10,2 2012 13" [Root Port 00:1c.1 = 8086:1e12] For posterity, spurious interrupts caused by the Broadcom 4331 wireless card resulted in splats like this (stacktrace omitted): irq 17: nobody cared (try booting with the "irqpoll" option) handlers: [<ffffffff81374370>] pcie_isr [<ffffffffc0704550>] sdhci_irq [sdhci] threaded [<ffffffffc07013c0>] sdhci_thread_irq [sdhci] [<ffffffffc0a0b960>] azx_interrupt [snd_hda_codec] Disabling IRQ #17 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111781 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=728916 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951#c16 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1098621 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632#c5 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1279130 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1332732 Tested-by: Konstantin Simanov <k.simanov@stlk.ru> # [MacBookPro8,1] Tested-by: Lukas Wunner <lukas@wunner.de> # [MacBookPro9,1] Tested-by: Bryan Paradis <bryan.paradis@gmail.com> # [MacBookPro9,2] Tested-by: Andrew Worsley <amworsley@gmail.com> # [MacBookPro10,1] Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> # [MacBookPro10,2] Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Rafał Miłecki <zajec5@gmail.com> Acked-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chris Milsted <cmilsted@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Michael Buesch <m@bues.ch> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: b43-dev@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: linux-wireless@vger.kernel.org Link: http://lkml.kernel.org/r/48d0972ac82a53d460e5fce77a07b2560db95203.1465690253.git.lukas@wunner.de [ Did minor readability edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27drm/ttm: Make ttm_bo_mem_compat availableSinclair Yeh
commit 94477bff390aa4612d2332c8abafaae0a13d6923 upstream. There are cases where it is desired to see if a proposed placement is compatible with a buffer object before calling ttm_bo_validate(). Signed-off-by: Sinclair Yeh <syeh@vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27vfs: add d_real_inode() helperMiklos Szeredi
commit a118084432d642eeccb961c7c8cc61525a941fcb upstream. Needed by the following fix. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27net_sched: fix mirrored packets checksumWANG Cong
[ Upstream commit 82a31b9231f02d9c1b7b290a46999d517b0d312a ] Similar to commit 9b368814b336 ("net: fix bridge multicast packet checksum validation") we need to fixup the checksum for CHECKSUM_COMPLETE when pushing skb on RX path. Otherwise we get similar splats. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27packet: Use symmetric hash for PACKET_FANOUT_HASH.David S. Miller
[ Upstream commit eb70db8756717b90c01ccc765fdefc4dd969fc74 ] People who use PACKET_FANOUT_HASH want a symmetric hash, meaning that they want packets going in both directions on a flow to hash to the same bucket. The core kernel SKB hash became non-symmetric when the ipv6 flow label and other entities were incorporated into the standard flow hash order to increase entropy. But there are no users of PACKET_FANOUT_HASH who want an assymetric hash, they all want a symmetric one. Therefore, use the flow dissector to compute a flat symmetric hash over only the protocol, addresses and ports. This hash does not get installed into and override the normal skb hash, so this change has no effect whatsoever on the rest of the stack. Reported-by: Eric Leblond <eric@regit.org> Tested-by: Eric Leblond <eric@regit.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27nfsd4/rpc: move backchannel create logic into rpc codeJ. Bruce Fields
commit d50039ea5ee63c589b0434baa5ecf6e5075bb6f9 upstream. Also simplify the logic a bit. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Trond Myklebust <trondmy@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27locking/static_key: Fix concurrent static_key_slow_inc()Paolo Bonzini
commit 4c5ea0a9cd02d6aa8adc86e100b2a4cff8d614ff upstream. The following scenario is possible: CPU 1 CPU 2 static_key_slow_inc() atomic_inc_not_zero() -> key.enabled == 0, no increment jump_label_lock() atomic_inc_return() -> key.enabled == 1 now static_key_slow_inc() atomic_inc_not_zero() -> key.enabled == 1, inc to 2 return ** static key is wrong! jump_label_update() jump_label_unlock() Testing the static key at the point marked by (**) will follow the wrong path for jumps that have not been patched yet. This can actually happen when creating many KVM virtual machines with userspace LAPIC emulation; just run several copies of the following program: #include <fcntl.h> #include <unistd.h> #include <sys/ioctl.h> #include <linux/kvm.h> int main(void) { for (;;) { int kvmfd = open("/dev/kvm", O_RDONLY); int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0); close(ioctl(vmfd, KVM_CREATE_VCPU, 1)); close(vmfd); close(kvmfd); } return 0; } Every KVM_CREATE_VCPU ioctl will attempt a static_key_slow_inc() call. The static key's purpose is to skip NULL pointer checks and indeed one of the processes eventually dereferences NULL. As explained in the commit that introduced the bug: 706249c222f6 ("locking/static_keys: Rework update logic") jump_label_update() needs key.enabled to be true. The solution adopted here is to temporarily make key.enabled == -1, and use go down the slow path when key.enabled <= 0. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 706249c222f6 ("locking/static_keys: Rework update logic") Link: http://lkml.kernel.org/r/1466527937-69798-1-git-send-email-pbonzini@redhat.com [ Small stylistic edits to the changelog and the code. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27locking/qspinlock: Fix spin_unlock_wait() some morePeter Zijlstra
commit 2c610022711675ee908b903d242f0b90e1db661f upstream. While this prior commit: 54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()") ... fixes spin_is_locked() and spin_unlock_wait() for the usage in ipc/sem and netfilter, it does not in fact work right for the usage in task_work and futex. So while the 2 locks crossed problem: spin_lock(A) spin_lock(B) if (!spin_is_locked(B)) spin_unlock_wait(A) foo() foo(); ... works with the smp_mb() injected by both spin_is_locked() and spin_unlock_wait(), this is not sufficient for: flag = 1; smp_mb(); spin_lock() spin_unlock_wait() if (!flag) // add to lockless list // iterate lockless list ... because in this scenario, the store from spin_lock() can be delayed past the load of flag, uncrossing the variables and loosing the guarantee. This patch reworks spin_is_locked() and spin_unlock_wait() to work in both cases by exploiting the observation that while the lock byte store can be delayed, the contender must have registered itself visibly in other state contained in the word. It also allows for architectures to override both functions, as PPC and ARM64 have an additional issue for which we currently have no generic solution. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Giovanni Gherdovich <ggherdovich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <waiman.long@hpe.com> Cc: Will Deacon <will.deacon@arm.com> Fixes: 54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()") Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27USB: EHCI: declare hostpc register as zero-length arrayAlan Stern
commit 7e8b3dfef16375dbfeb1f36a83eb9f27117c51fd upstream. The HOSTPC extension registers found in some EHCI implementations form a variable-length array, with one element for each port. Therefore the hostpc field in struct ehci_regs should be declared as a zero-length array, not a single-element array. This fixes a problem reported by UBSAN. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de> Tested-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-11bpf: try harder on clones when writing into skbDaniel Borkmann
[ Upstream commit 3697649ff29e0f647565eed04b27a7779c646a22 ] When we're dealing with clones and the area is not writeable, try harder and get a copy via pskb_expand_head(). Replace also other occurences in tc actions with the new skb_try_make_writable(). Reported-by: Ashhad Sheikh <ashhadsheikh394@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-11bpf, perf: delay release of BPF prog after grace periodDaniel Borkmann
[ Upstream commit ceb56070359b7329b5678b5d95a376fcb24767be ] Commit dead9f29ddcc ("perf: Fix race in BPF program unregister") moved destruction of BPF program from free_event_rcu() callback to __free_event(), which is problematic if used with tail calls: if prog A is attached as trace event directly, but at the same time present in a tail call map used by another trace event program elsewhere, then we need to delay destruction via RCU grace period since it can still be in use by the program doing the tail call (the prog first needs to be dropped from the tail call map, then trace event with prog A attached destroyed, so we get immediate destruction). Fixes: dead9f29ddcc ("perf: Fix race in BPF program unregister") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Jann Horn <jann@thejh.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-11sock_diag: do not broadcast raw socket destructionWillem de Bruijn
[ Upstream commit 9a0fee2b552b1235fb1706ae1fc664ae74573be8 ] Diag intends to broadcast tcp_sk and udp_sk socket destruction. Testing sk->sk_protocol for IPPROTO_TCP/IPPROTO_UDP alone is not sufficient for this. Raw sockets can have the same type. Add a test for sk->sk_type. Fixes: eb4cb008529c ("sock_diag: define destruction multicast groups") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-11net: Don't forget pr_fmt on net_dbg_ratelimited for CONFIG_DYNAMIC_DEBUGJason A. Donenfeld
[ Upstream commit daddef76c3deaaa7922f9d7b18edbf0a061215c3 ] The implementation of net_dbg_ratelimited in the CONFIG_DYNAMIC_DEBUG case was added with 2c94b5373 ("net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case"). The implementation strategy was to take the usual definition of the dynamic_pr_debug macro, but alter it by adding a call to "net_ratelimit()" in the if statement. This is, in fact, the correct approach. However, while doing this, the author of the commit forgot to surround fmt by pr_fmt, resulting in unprefixed log messages appearing in the console. So, this commit adds back the pr_fmt(fmt) invocation, making net_dbg_ratelimited properly consistent across DEBUG, no DEBUG, and DYNAMIC_DEBUG cases, and bringing parity with the behavior of dynamic_pr_debug as well. Fixes: 2c94b5373 ("net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Tim Bingham <tbingham@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: introduce and use xt_copy_counters_from_userFlorian Westphal
commit d7591f0c41ce3e67600a982bab6989ef0f07b3ce upstream. The three variants use same copy&pasted code, condense this into a helper and use that. Make sure info.name is 0-terminated. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: xt_compat_match_from_user doesn't need a retvalFlorian Westphal
commit 0188346f21e6546498c2a0f84888797ad4063fc5 upstream. Always returned 0. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: check for bogus target offsetFlorian Westphal
commit ce683e5f9d045e5d67d1312a42b359cb2ab2a13c upstream. We're currently asserting that targetoff + targetsize <= nextoff. Extend it to also check that targetoff is >= sizeof(xt_entry). Since this is generic code, add an argument pointing to the start of the match/target, we can then derive the base structure size from the delta. We also need the e->elems pointer in a followup change to validate matches. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: add compat version of xt_check_entry_offsetsFlorian Westphal
commit fc1221b3a163d1386d1052184202d5dc50d302d1 upstream. 32bit rulesets have different layout and alignment requirements, so once more integrity checks get added to xt_check_entry_offsets it will reject well-formed 32bit rulesets. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: add and use xt_check_entry_offsetsFlorian Westphal
commit 7d35812c3214afa5b37a675113555259cfd67b98 upstream. Currently arp/ip and ip6tables each implement a short helper to check that the target offset is large enough to hold one xt_entry_target struct and that t->u.target_size fits within the current rule. Unfortunately these checks are not sufficient. To avoid adding new tests to all of ip/ip6/arptables move the current checks into a helper, then extend this helper in followup patches. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24irqchip/gic-v3: Fix ICC_SGI1R_EL1.INTID decoding maskMarc Zyngier
commit dd5f1b049dc139876801db3cdd0f20d21fd428cc upstream. The INTID mask is wrong, and is made a signed value, which has nteresting effects in the KVM emulation. Let's sanitize it. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devicesDavid Wragg
[ Upstream commit 7e059158d57b79159eaf1f504825d19866ef2c42 ] Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could transmit vxlan packets of any size, constrained only by the ability to send out the resulting packets. 4.3 introduced netdevs corresponding to tunnel vports. These netdevs have an MTU, which limits the size of a packet that can be successfully encapsulated. The default MTU values are low (1500 or less), which is awkwardly small in the context of physical networks supporting jumbo frames, and leads to a conspicuous change in behaviour for userspace. Instead, set the MTU on openvswitch-created netdevs to be the relevant maximum (i.e. the maximum IP packet size minus any relevant overhead), effectively restoring the behaviour prior to 4.3. Signed-off-by: David Wragg <david@weave.works> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24uapi glibc compat: fix compilation when !__USE_MISC in glibcNicolas Dichtel
[ Upstream commit f0a3fdca794d1e68ae284ef4caefe681f7c18e89 ] These structures are defined only if __USE_MISC is set in glibc net/if.h headers, ie when _BSD_SOURCE or _SVID_SOURCE are defined. CC: Jan Engelhardt <jengelh@inai.de> CC: Josh Boyer <jwboyer@fedoraproject.org> CC: Stephen Hemminger <shemming@brocade.com> CC: Waldemar Brodkorb <mail@waldemar-brodkorb.de> CC: Gabriel Laskar <gabriel@lse.epita.fr> CC: Mikko Rapeli <mikko.rapeli@iki.fi> Fixes: 4a91cb61bb99 ("uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24switchdev: pass pointer to fib_info instead of copyJiri Pirko
[ Upstream commit da4ed55165d41b1073f9a476f1c18493e9bf8c8e ] The problem is that fib_info->nh is [0] so the struct fib_info allocation size depends on number of nexthops. If we just copy fib_info, we do not copy the nexthops info and driver accesses memory which is not ours. Given the fact that fib4 does not defer operations and therefore it does not need copy, just pass the pointer down to drivers as it was done before. Fixes: 850d0cbc91 ("switchdev: remove pointers from switchdev objects") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07drm/imx: Match imx-ipuv3-crtc components using device node in platform dataPhilipp Zabel
commit 310944d148e3600dcff8b346bee7fa01d34903b1 upstream. The component master driver imx-drm-core matches component devices using their of_node. Since commit 950b410dd1ab ("gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading"), the imx-ipuv3-crtc dev->of_node is not set during probing. Before that, of_node was set and caused an of: modalias to be used instead of the platform: modalias, which broke module autoloading. On the other hand, if dev->of_node is not set yet when the imx-ipuv3-crtc probe function calls component_add, component matching in imx-drm-core fails. While dev->of_node will be set once the next component tries to bring up the component master, imx-drm-core component binding will never succeed if one of the crtc devices is probed last. Add of_node to the component platform data and match against the pdata->of_node instead of dev->of_node in imx-drm-core to work around this problem. Fixes: 950b410dd1ab ("gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading") Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Tested-by: Fabio Estevam <fabio.estevam@nxp.com> Tested-by: Lothar Waßmann <LW@KARO-electronics.de> Tested-by: Heiko Schocher <hs@denx.de> Tested-by: Chris Ruehl <chris.ruehl@gtsys.com.hk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07pipe: limit the per-user amount of pages allocated in pipesWilly Tarreau
commit 759c01142a5d0f364a462346168a56de28a80f52 upstream. On no-so-small systems, it is possible for a single process to cause an OOM condition by filling large pipes with data that are never read. A typical process filling 4000 pipes with 1 MB of data will use 4 GB of memory. On small systems it may be tricky to set the pipe max size to prevent this from happening. This patch makes it possible to enforce a per-user soft limit above which new pipes will be limited to a single page, effectively limiting them to 4 kB each, as well as a hard limit above which no new pipes may be created for this user. This has the effect of protecting the system against memory abuse without hurting other users, and still allowing pipes to work correctly though with less data at once. The limit are controlled by two new sysctls : pipe-user-pages-soft, and pipe-user-pages-hard. Both may be disabled by setting them to zero. The default soft limit allows the default number of FDs per process (1024) to create pipes of the default size (64kB), thus reaching a limit of 64MB before starting to create only smaller pipes. With 256 processes limited to 1024 FDs each, this results in 1024*64kB + (256*1024 - 1024) * 4kB = 1084 MB of memory allocated for a user. The hard limit is disabled by default to avoid breaking existing applications that make intensive use of pipes (eg: for splicing). Reported-by: socketpair@gmail.com Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Mitigates: CVE-2013-4312 (Linux 2.0+) Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Moritz Muehlenhoff <moritz@wikimedia.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07mm: use phys_addr_t for reserve_bootmem_region() argumentsStefan Bader
commit 4b50bcc7eda4d3cc9e3f2a0aa60e590fedf728c5 upstream. Since commit 92923ca3aace ("mm: meminit: only set page reserved in the memblock region") the reserved bit is set on reserved memblock regions. However start and end address are passed as unsigned long. This is only 32bit on i386, so it can end up marking the wrong pages reserved for ranges at 4GB and above. This was observed on a 32bit Xen dom0 which was booted with initial memory set to a value below 4G but allowing to balloon in memory (dom0_mem=1024M for example). This would define a reserved bootmem region for the additional memory (for example on a 8GB system there was a reverved region covering the 4GB-8GB range). But since the addresses were passed on as unsigned long, this was actually marking all pages from 0 to 4GB as reserved. Fixes: 92923ca3aacef63 ("mm: meminit: only set page reserved in the memblock region") Link: http://lkml.kernel.org/r/1463491221-10573-1-git-send-email-stefan.bader@canonical.com Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01scsi: Add intermediate STARGET_REMOVE state to scsi_target_stateJohannes Thumshirn
commit f05795d3d771f30a7bdc3a138bf714b06d42aa95 upstream. Add intermediate STARGET_REMOVE state to scsi_target_state to avoid running into the BUG_ON() in scsi_target_reap(). The STARGET_REMOVE state is only valid in the path from scsi_remove_target() to scsi_target_destroy() indicating this target is going to be removed. This re-fixes the problem introduced in commits bc3f02a795d3 ("[SCSI] scsi_remove_target: fix softlockup regression on hot remove") and 40998193560d ("scsi: restart list search after unlock in scsi_remove_target") in a more comprehensive way. [mkp: Included James' fix for scsi_target_destroy()] Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Fixes: 40998193560dab6c3ce8d25f4fa58a23e252ef38 Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: James Bottomley <jejb@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01SIGNAL: Move generic copy_siginfo() to signal.hJames Hogan
commit ca9eb49aa9562eaadf3cea071ec7018ad6800425 upstream. The generic copy_siginfo() is currently defined in asm-generic/siginfo.h, after including uapi/asm-generic/siginfo.h which defines the generic struct siginfo. However this makes it awkward for an architecture to use it if it has to define its own struct siginfo (e.g. MIPS and potentially IA64), since it means that asm-generic/siginfo.h can only be included after defining the arch-specific siginfo, which may be problematic if the arch-specific definition needs definitions from uapi/asm-generic/siginfo.h. It is possible to work around this by first including uapi/asm-generic/siginfo.h to get the constants before defining the arch-specific siginfo, and include asm-generic/siginfo.h after. However uapi headers can't be included by other uapi headers, so that first include has to be in an ifdef __kernel__, with the non __kernel__ case including the non-UAPI header instead. Instead of that mess, move the generic copy_siginfo() definition into linux/signal.h, which allows an arch-specific uapi/asm/siginfo.h to include asm-generic/siginfo.h and define the arch-specific siginfo, and for the generic copy_siginfo() to see that arch-specific definition. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Petr Malat <oss@malat.biz> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Christopher Ferris <cferris@google.com> Cc: linux-arch@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-ia64@vger.kernel.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/12478/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()Peter Zijlstra
commit 54cf809b9512be95f53ed4a5e3b631d1ac42f0fa upstream. Similar to commits: 51d7d5205d33 ("powerpc: Add smp_mb() to arch_spin_is_locked()") d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers") qspinlock suffers from the fact that the _Q_LOCKED_VAL store is unordered inside the ACQUIRE of the lock. And while this is not a problem for the regular mutual exclusive critical section usage of spinlocks, it breaks creative locking like: spin_lock(A) spin_lock(B) spin_unlock_wait(B) if (!spin_is_locked(A)) do_something() do_something() In that both CPUs can end up running do_something at the same time, because our _Q_LOCKED_VAL store can drop past the spin_unlock_wait() spin_is_locked() loads (even on x86!!). To avoid making the normal case slower, add smp_mb()s to the less used spin_unlock_wait() / spin_is_locked() side of things to avoid this problem. Reported-and-tested-by: Davidlohr Bueso <dave@stgolabs.net> Reported-by: Giovanni Gherdovich <ggherdovich@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01Fix OpenSSH pty regression on closeBrian Bloniarz
commit 0f40fbbcc34e093255a2b2d70b6b0fb48c3f39aa upstream. OpenSSH expects the (non-blocking) read() of pty master to return EAGAIN only if it has received all of the slave-side output after it has received SIGCHLD. This used to work on pre-3.12 kernels. This fix effectively forces non-blocking read() and poll() to block for parallel i/o to complete for all ttys. It also unwinds these changes: 1) f8747d4a466ab2cafe56112c51b3379f9fdb7a12 tty: Fix pty master read() after slave closes 2) 52bce7f8d4fc633c9a9d0646eef58ba6ae9a3b73 pty, n_tty: Simplify input processing on final close 3) 1a48632ffed61352a7810ce089dc5a8bcd505a60 pty: Fix input race when closing Inspired by analysis and patch from Marc Aurele La France <tsi@tuyoix.net> Reported-by: Volth <openssh@volth.com> Reported-by: Marc Aurele La France <tsi@tuyoix.net> BugLink: https://bugzilla.mindrot.org/show_bug.cgi?id=52 BugLink: https://bugzilla.mindrot.org/show_bug.cgi?id=2492 Signed-off-by: Brian Bloniarz <brian.bloniarz@gmail.com> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01USB: leave LPM alone if possible when binding/unbinding interface driversAlan Stern
commit 6fb650d43da3e7054984dc548eaa88765a94d49f upstream. When a USB driver is bound to an interface (either through probing or by claiming it) or is unbound from an interface, the USB core always disables Link Power Management during the transition and then re-enables it afterward. The reason is because the driver might want to prevent hub-initiated link power transitions, in which case the HCD would have to recalculate the various LPM parameters. This recalculation takes place when LPM is re-enabled and the new parameters are sent to the device and its parent hub. However, if the driver does not want to prevent hub-initiated link power transitions then none of this work is necessary. The parameters don't need to be recalculated, and LPM doesn't need to be disabled and re-enabled. It turns out that disabling and enabling LPM can be time-consuming, enough so that it interferes with user programs that want to claim and release interfaces rapidly via usbfs. Since the usbfs kernel driver doesn't set the disable_hub_initiated_lpm flag, we can speed things up and get the user programs to work by leaving LPM alone whenever the flag isn't set. And while we're improving the way disable_hub_initiated_lpm gets used, let's also fix its kerneldoc. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Matthew Giassa <matthew@giassa.net> CC: Mathias Nyman <mathias.nyman@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01can: fix handling of unmodifiable configuration optionsOliver Hartkopp
commit bb208f144cf3f59d8f89a09a80efd04389718907 upstream. As described in 'can: m_can: tag current CAN FD controllers as non-ISO' (6cfda7fbebe) it is possible to define fixed configuration options by setting the according bit in 'ctrlmode' and clear it in 'ctrlmode_supported'. This leads to the incovenience that the fixed configuration bits can not be passed by netlink even when they have the correct values (e.g. non-ISO, FD). This patch fixes that issue and not only allows fixed set bit values to be set again but now requires(!) to provide these fixed values at configuration time. A valid CAN FD configuration consists of a nominal/arbitration bittiming, a data bittiming and a control mode with CAN_CTRLMODE_FD set - which is now enforced by a new can_validate() function. This fix additionally removed the inconsistency that was prohibiting the support of 'CANFD-only' controller drivers, like the RCar CAN FD. For this reason a new helper can_set_static_ctrlmode() has been introduced to provide a proper interface to handle static enabled CAN controller options. Reported-by: Ramesh Shanmugasundaram <ramesh.shanmugasundaram@bp.renesas.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Reviewed-by: Ramesh Shanmugasundaram <ramesh.shanmugasundaram@bp.renesas.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-18regulator: s2mps11: Fix invalid selector mask and voltages for buck9Krzysztof Kozlowski
commit 3b672623079bb3e5685b8549e514f2dfaa564406 upstream. The buck9 regulator of S2MPS11 PMIC had incorrect vsel_mask (0xff instead of 0x1f) thus reading entire register as buck9's voltage. This effectively caused regulator core to interpret values as higher voltages than they were and then to set real voltage much lower than intended. The buck9 provides power to other regulators, including LDO13 and LDO19 which supply the MMC2 (SD card). On Odroid XU3/XU4 the lower voltage caused SD card detection errors on Odroid XU3/XU4: mmc1: card never left busy state mmc1: error -110 whilst initialising SD card During driver probe the regulator core was checking whether initial voltage matches the constraints. With incorrect vsel_mask of 0xff and default value of 0x50, the core interpreted this as 5 V which is outside of constraints (3-3.775 V). Then the regulator core was adjusting the voltage to match the constraints. With incorrect vsel_mask this new voltage mapped to a vere low voltage in the driver. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com> Tested-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-18vfs: add vfs_select_inode() helperMiklos Szeredi
commit 54d5ca871e72f2bb172ec9323497f01cd5091ec7 upstream. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-18uapi glibc compat: fix compile errors when glibc net/if.h included before ↵Mikko Rapeli
linux/if.h MIME-Version: 1.0 [ Upstream commit 4a91cb61bb995e5571098188092e296192309c77 ] glibc's net/if.h contains copies of definitions from linux/if.h and these conflict and cause build failures if both files are included by application source code. Changes in uapi headers, which fixed header file dependencies to include linux/if.h when it was needed, e.g. commit 1ffad83d, made the net/if.h and linux/if.h incompatibilities visible as build failures for userspace applications like iproute2 and xtables-addons. This patch fixes compile errors when glibc net/if.h is included before linux/if.h: ./linux/if.h:99:21: error: redeclaration of enumerator ‘IFF_NOARP’ ./linux/if.h:98:23: error: redeclaration of enumerator ‘IFF_RUNNING’ ./linux/if.h:97:26: error: redeclaration of enumerator ‘IFF_NOTRAILERS’ ./linux/if.h:96:27: error: redeclaration of enumerator ‘IFF_POINTOPOINT’ ./linux/if.h:95:24: error: redeclaration of enumerator ‘IFF_LOOPBACK’ ./linux/if.h:94:21: error: redeclaration of enumerator ‘IFF_DEBUG’ ./linux/if.h:93:25: error: redeclaration of enumerator ‘IFF_BROADCAST’ ./linux/if.h:92:19: error: redeclaration of enumerator ‘IFF_UP’ ./linux/if.h:252:8: error: redefinition of ‘struct ifconf’ ./linux/if.h:203:8: error: redefinition of ‘struct ifreq’ ./linux/if.h:169:8: error: redefinition of ‘struct ifmap’ ./linux/if.h:107:23: error: redeclaration of enumerator ‘IFF_DYNAMIC’ ./linux/if.h:106:25: error: redeclaration of enumerator ‘IFF_AUTOMEDIA’ ./linux/if.h:105:23: error: redeclaration of enumerator ‘IFF_PORTSEL’ ./linux/if.h:104:25: error: redeclaration of enumerator ‘IFF_MULTICAST’ ./linux/if.h:103:21: error: redeclaration of enumerator ‘IFF_SLAVE’ ./linux/if.h:102:22: error: redeclaration of enumerator ‘IFF_MASTER’ ./linux/if.h:101:24: error: redeclaration of enumerator ‘IFF_ALLMULTI’ ./linux/if.h:100:23: error: redeclaration of enumerator ‘IFF_PROMISC’ The cases where linux/if.h is included before net/if.h need a similar fix in the glibc side, or the order of include files can be changed userspace code as a workaround. This change was tested in x86 userspace on Debian unstable with scripts/headers_compile_test.sh: $ make headers_install && \ cd usr/include && ../../scripts/headers_compile_test.sh -l -k ... cc -Wall -c -nostdinc -I /usr/lib/gcc/i586-linux-gnu/5/include -I /usr/lib/gcc/i586-linux-gnu/5/include-fixed -I . -I /home/mcfrisk/src/linux-2.6/usr/headers_compile_test_include.2uX2zH -I /home/mcfrisk/src/linux-2.6/usr/headers_compile_test_include.2uX2zH/i586-linux-gnu -o /dev/null ./linux/if.h_libc_before_kernel.h PASSED libc before kernel test: ./linux/if.h Reported-by: Jan Engelhardt <jengelh@inai.de> Reported-by: Josh Boyer <jwboyer@fedoraproject.org> Reported-by: Stephen Hemminger <shemming@brocade.com> Reported-by: Waldemar Brodkorb <mail@waldemar-brodkorb.de> Cc: Gabriel Laskar <gabriel@lse.epita.fr> Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-18net_sched: update hierarchical backlog tooWANG Cong
[ Upstream commit 2ccccf5fb43ff62b2b96cc58d95fc0b3596516e4 ] When the bottom qdisc decides to, for example, drop some packet, it calls qdisc_tree_decrease_qlen() to update the queue length for all its ancestors, we need to update the backlog too to keep the stats on root qdisc accurate. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-18net_sched: introduce qdisc_replace() helperWANG Cong
[ Upstream commit 86a7996cc8a078793670d82ed97d5a99bb4e8496 ] Remove nearly duplicated code and prepare for the following patch. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-18net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG caseTim Bingham
[ Upstream commit 2c94b53738549d81dc7464a32117d1f5112c64d3 ] Prior to commit d92cff89a0c8 ("net_dbg_ratelimited: turn into no-op when !DEBUG") the implementation of net_dbg_ratelimited() was buggy for both the DEBUG and CONFIG_DYNAMIC_DEBUG cases. The bug was that net_ratelimit() was being called and, despite returning true, nothing was being printed to the console. This resulted in messages like the following - "net_ratelimit: %d callbacks suppressed" with no other output nearby. After commit d92cff89a0c8 ("net_dbg_ratelimited: turn into no-op when !DEBUG") the bug is fixed for the DEBUG case. However, there's no output at all for CONFIG_DYNAMIC_DEBUG case. This patch restores debug output (if enabled) for the CONFIG_DYNAMIC_DEBUG case. Add a definition of net_dbg_ratelimited() for the CONFIG_DYNAMIC_DEBUG case. The implementation takes care to check that dynamic debugging is enabled before calling net_ratelimit(). Fixes: d92cff89a0c8 ("net_dbg_ratelimited: turn into no-op when !DEBUG") Signed-off-by: Tim Bingham <tbingham@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-18bpf: fix refcnt overflowAlexei Starovoitov
[ Upstream commit 92117d8443bc5afacc8d5ba82e541946310f106e ] On a system with >32Gbyte of phyiscal memory and infinite RLIMIT_MEMLOCK, the malicious application may overflow 32-bit bpf program refcnt. It's also possible to overflow map refcnt on 1Tb system. Impose 32k hard limit which means that the same bpf program or map cannot be shared by more than 32k processes. Fixes: 1be7f75d1668 ("bpf: enable non-root eBPF programs") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-18net/mlx5e: Device's mtu field is u16 and not intSaeed Mahameed
[ Upstream commit 046339eaab26804f52f6604877f5674f70815b26 ] For set/query MTU port firmware commands the MTU field is 16 bits, here I changed all the "int mtu" parameters of the functions wrapping those firmware commands to be u16. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>