summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2014-10-09jiffies: Fix timeval conversion to jiffiesAndrew Hunter
commit d78c9300c51d6ceed9f6d078d4e9366f259de28c upstream. timeval_to_jiffies tried to round a timeval up to an integral number of jiffies, but the logic for doing so was incorrect: intervals corresponding to exactly N jiffies would become N+1. This manifested itself particularly repeatedly stopping/starting an itimer: setitimer(ITIMER_PROF, &val, NULL); setitimer(ITIMER_PROF, NULL, &val); would add a full tick to val, _even if it was exactly representable in terms of jiffies_ (say, the result of a previous rounding.) Doing this repeatedly would cause unbounded growth in val. So fix the math. Here's what was wrong with the conversion: we essentially computed (eliding seconds) jiffies = usec * (NSEC_PER_USEC/TICK_NSEC) by using scaling arithmetic, which took the best approximation of NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC = x/(2^USEC_JIFFIE_SC), and computed: jiffies = (usec * x) >> USEC_JIFFIE_SC and rounded this calculation up in the intermediate form (since we can't necessarily exactly represent TICK_NSEC in usec.) But the scaling arithmetic is a (very slight) *over*approximation of the true value; that is, instead of dividing by (1 usec/ 1 jiffie), we effectively divided by (1 usec/1 jiffie)-epsilon (rounding down). This would normally be fine, but we want to round timeouts up, and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this would be fine if our division was exact, but dividing this by the slightly smaller factor was equivalent to adding just _over_ 1 to the final result (instead of just _under_ 1, as desired.) In particular, with HZ=1000, we consistently computed that 10000 usec was 11 jiffies; the same was true for any exact multiple of TICK_NSEC. We could possibly still round in the intermediate form, adding something less than 2^USEC_JIFFIE_SC - 1, but easier still is to convert usec->nsec, round in nanoseconds, and then convert using time*spec*_to_jiffies. This adds one constant multiplication, and is not observably slower in microbenchmarks on recent x86 hardware. Tested: the following program: int main() { struct itimerval zero = {{0, 0}, {0, 0}}; /* Initially set to 10 ms. */ struct itimerval initial = zero; initial.it_interval.tv_usec = 10000; setitimer(ITIMER_PROF, &initial, NULL); /* Save and restore several times. */ for (size_t i = 0; i < 10; ++i) { struct itimerval prev; setitimer(ITIMER_PROF, &zero, &prev); /* on old kernels, this goes up by TICK_USEC every iteration */ printf("previous value: %ld %ld %ld %ld\n", prev.it_interval.tv_sec, prev.it_interval.tv_usec, prev.it_value.tv_sec, prev.it_value.tv_usec); setitimer(ITIMER_PROF, &prev, NULL); } return 0; } Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul Turner <pjt@google.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Reviewed-by: Paul Turner <pjt@google.com> Reported-by: Aaron Jacobs <jacobsa@google.com> Signed-off-by: Andrew Hunter <ahh@google.com> [jstultz: Tweaked to apply to 3.17-rc] Signed-off-by: John Stultz <john.stultz@linaro.org> [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-09media: vb2: fix VBI/poll regressionHans Verkuil
commit 58d75f4b1ce26324b4d809b18f94819843a98731 upstream. The recent conversion of saa7134 to vb2 unconvered a poll() bug that broke the teletext applications alevt and mtt. These applications expect that calling poll() without having called VIDIOC_STREAMON will cause poll() to return POLLERR. That did not happen in vb2. This patch fixes that behavior. It also fixes what should happen when poll() is called when STREAMON is called but no buffers have been queued. In that case poll() will also return POLLERR, but only for capture queues since output queues will always return POLLOUT anyway in that situation. This brings the vb2 behavior in line with the old videobuf behavior. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05crypto: ccp - Check for CCP before registering crypto algsTom Lendacky
commit c9f21cb6388898bfe69886d001316dae7ecc9a4b upstream. If the ccp is built as a built-in module, then ccp-crypto (whether built as a module or a built-in module) will be able to load and it will register its crypto algorithms. If the system does not have a CCP this will result in -ENODEV being returned whenever a command is attempted to be queued by the registered crypto algorithms. Add an API, ccp_present(), that checks for the presence of a CCP on the system. The ccp-crypto module can use this to determine if it should register it's crypto alogorithms. Reported-by: Scot Doyle <lkml14@scotdoyle.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Scot Doyle <lkml14@scotdoyle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05vgaswitcheroo: add vga_switcheroo_fini_domain_pm_opsAlex Deucher
commit 766a53d059d1500c9755c8af017bd411bd8f1b20 upstream. Drivers should call this on unload to unregister pmops. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=84431 Reviewed-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Pali Rohár <pali.rohar@gmail.com> Cc: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05PCI: Add pci_ignore_hotplug() to ignore hotplug events for a deviceBjorn Helgaas
commit b440bde74f043c8ec31081cb59c9a53ade954701 upstream. Powering off a hot-pluggable device, e.g., with pci_set_power_state(D3cold), normally generates a hot-remove event that unbinds the driver. Some drivers expect to remain bound to a device even while they power it off and back on again. This can be dangerous, because if the device is removed or replaced while it is powered off, the driver doesn't know that anything changed. But some drivers accept that risk. Add pci_ignore_hotplug() for use by drivers that know their device cannot be removed. Using pci_ignore_hotplug() tells the PCI core that hot-plug events for the device should be ignored. The radeon and nouveau drivers use this to switch between a low-power, integrated GPU and a higher-power, higher-performance discrete GPU. They power off the unused GPU, but they want to remain bound to it. This is a reimplementation of f244d8b623da ("ACPIPHP / radeon / nouveau: Fix VGA switcheroo problem related to hotplug") but extends it to work with both acpiphp and pciehp. This fixes a problem where systems with dual GPUs using the radeon drivers become unusable, freezing every few seconds (see bugzillas below). The resume of the radeon device may also fail, e.g., This fixes problems on dual GPU systems where the radeon driver becomes unusable because of problems while suspending the device, as in bug 79701: [drm] radeon: finishing device. radeon 0000:01:00.0: Userspace still has active objects ! radeon 0000:01:00.0: ffff8800cb4ec288 ffff8800cb4ec000 16384 4294967297 force free ... WARNING: CPU: 0 PID: 67 at /home/apw/COD/linux/drivers/gpu/drm/radeon/radeon_gart.c:234 radeon_gart_unbind+0xd2/0xe0 [radeon]() trying to unbind memory from uninitialized GART ! or while resuming it, as in bug 77261: radeon 0000:01:00.0: ring 0 stalled for more than 10158msec radeon 0000:01:00.0: GPU lockup ... radeon 0000:01:00.0: GPU pci config reset pciehp 0000:00:01.0:pcie04: Card not present on Slot(1-1) radeon 0000:01:00.0: GPU reset succeeded, trying to resume *ERROR* radeon: dpm resume failed radeon 0000:01:00.0: Wait for MC idle timedout ! Link: https://bugzilla.kernel.org/show_bug.cgi?id=77261 Link: https://bugzilla.kernel.org/show_bug.cgi?id=79701 Reported-by: Shawn Starr <shawn.starr@rogers.com> Reported-by: Jose P. <lbdkmjdf@sharklasers.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Rajat Jain <rajatxjain@gmail.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05ftrace: Allow ftrace_ops to use the hashes from other opsSteven Rostedt (Red Hat)
commit 33b7f99cf003ca6c1d31c42b50e1100ad71aaec0 upstream. Currently the top level debug file system function tracer shares its ftrace_ops with the function graph tracer. This was thought to be fine because the tracers are not used together, as one can only enable function or function_graph tracer in the current_tracer file. But that assumption proved to be incorrect. The function profiler can use the function graph tracer when function tracing is enabled. Since all function graph users uses the function tracing ftrace_ops this causes a conflict and when a user enables both function profiling as well as the function tracer it will crash ftrace and disable it. The quick solution so far is to move them as separate ftrace_ops like it was earlier. The problem though is to synchronize the functions that are traced because both function and function_graph tracer are limited by the selections made in the set_ftrace_filter and set_ftrace_notrace files. To handle this, a new structure is made called ftrace_ops_hash. This structure will now hold the filter_hash and notrace_hash, and the ftrace_ops will point to this structure. That will allow two ftrace_ops to share the same hashes. Since most ftrace_ops do not share the hashes, and to keep allocation simple, the ftrace_ops structure will include both a pointer to the ftrace_ops_hash called func_hash, as well as the structure itself, called local_hash. When the ops are registered, the func_hash pointer will be initialized to point to the local_hash within the ftrace_ops structure. Some of the ftrace internal ftrace_ops will be initialized statically. This will allow for the function and function_graph tracer to have separate ops but still share the same hash tables that determine what functions they trace. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05lockdep: Revert lockdep check in raw_seqcount_begin()Trond Myklebust
commit 22fdcf02f6e80d64a927f702dd9d631a927d87d4 upstream. This commit reverts the addition of lockdep checking to raw_seqcount_begin for the following reasons: 1) It violates the naming convention that raw_* functions should not do lockdep checks (a convention that is also followed by the other raw_*_seqcount_begin functions). 2) raw_seqcount_begin does not spin, so it can only be part of an ABBA deadlock in very special circumstances (for instance if a lock is held across the entire raw_seqcount_begin()+read_seqcount_retry() loop while also being taken inside the write_seqcount protected area). 3) It is causing false positives with some existing callers, and there is no non-lockdep alternative for those callers to use. None of the three existing callers (__d_lookup_rcu, netdev_get_name, and the NFS state code) appear to use the function in a manner that is ABBA deadlock prone. Fixes: 1ca7d67cf5d5: seqcount: Add lockdep functionality to seqcount/seqlock Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Waiman Long <Waiman.Long@hp.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/CAHQdGtRR6SvEhXiqWo24hoUh9AU9cL82Z8Z-d8-7u951F_d+5g@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05regulatory: add NUL to alpha2Eliad Peller
commit a5fe8e7695dc3f547e955ad2b662e3e72969e506 upstream. alpha2 is defined as 2-chars array, but is used in multiple places as string (e.g. with nla_put_string calls), which might leak kernel data. Solve it by simply adding an extra char for the NULL terminator, making such operations safe. Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05workqueue: apply __WQ_ORDERED to create_singlethread_workqueue()Tejun Heo
commit e09c2c295468476a239d13324ce9042ec4de05eb upstream. create_singlethread_workqueue() is a compat interface for single threaded workqueue which maps to ordered workqueue w/ rescuer in the current implementation. create_singlethread_workqueue() currently implemented by invoking alloc_workqueue() w/ appropriate parameters. 8719dceae2f9 ("workqueue: reject adjusting max_active or applying attrs to ordered workqueues") introduced __WQ_ORDERED to protect ordered workqueues against dynamic attribute changes which can break ordering guarantees but forgot to apply it to create_singlethread_workqueue(). This in itself is okay as nobody currently uses dynamic attribute change on workqueues created with create_singlethread_workqueue(). However, 4c16bd327c ("workqueue: implement NUMA affinity for unbound workqueues") broke singlethreaded guarantee for ordered workqueues through allocating a separate pool_workqueue on each NUMA node by default. A later change 8a2b75384444 ("workqueue: fix ordered workqueues in NUMA setups") fixed it by allocating only one global pool_workqueue if __WQ_ORDERED is set. Combined, the __WQ_ORDERED omission in create_singlethread_workqueue() became critical breaking its single threadedness and ordering guarantee. Let's make create_singlethread_workqueue() wrap alloc_ordered_workqueue() instead so that it inherits __WQ_ORDERED and can implicitly track future ordered_workqueue changes. v2: I missed that __WQ_ORDERED now protects against pwq splitting across NUMA nodes and incorrectly described the patch as a nice-to-have fix to protect against future dynamic attribute usages. Oleg pointed out that this is actually a critical breakage due to 8a2b75384444 ("workqueue: fix ordered workqueues in NUMA setups"). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Mike Anderson <mike.anderson@us.ibm.com> Cc: Oleg Nesterov <onestero@redhat.com> Cc: Gustavo Luiz Duarte <gduarte@redhat.com> Cc: Tomas Henzl <thenzl@redhat.com> Fixes: 4c16bd327c ("workqueue: implement NUMA affinity for unbound workqueues") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05iio:trigger: modify return value for iio_trigger_getSrinivas Pandruvada
commit f153566570fb9e32c2f59182883f4f66048788fb upstream. Instead of a void function, return the trigger pointer. Whilst not in of itself a fix, this makes the following set of 7 fixes cleaner than they would otherwise be. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05ACPI / hotplug: Generate online uevents for ACPI containersRafael J. Wysocki
commit 8ab17fc92e49bc2b8fff9d220c19bf50ec9c1158 upstream. Commit 46394fd01 (ACPI / hotplug: Move container-specific code out of the core) removed the generation of "online" uevents for containers, because "add" uevents are now generated for them automatically when container system devices are registered. However, there are user space tools that need to be notified when the container and all of its children have been enumerated, which doesn't happen any more. For this reason, add a mechanism allowing "online" uevents to be generated for ACPI containers after enumerating the container along with all of its children. Fixes: 46394fd01 (ACPI / hotplug: Move container-specific code out of the core) Reported-and-tested-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05nfs: don't sleep with inode lock in lock_and_join_requestsWeston Andros Adamson
commit 7c3af975257383ece54b83c0505d3e0656cb7daf upstream. This handles the 'nonblock=false' case in nfs_lock_and_join_requests. If the group is already locked and blocking is allowed, drop the inode lock and wait for the group lock to be cleared before trying it all again. This should fix warnings found in peterz's tree (sched/wait branch), where might_sleep() checks are added to wait.[ch]. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Reviewed-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05nfs: check wait_on_bit_lock err in page_group_lockWeston Andros Adamson
commit e7029206ff43f6cf7d6fcb741adb126f47200516 upstream. Return errors from wait_on_bit_lock from nfs_page_group_lock. Add a bool argument @wait to nfs_page_group_lock. If true, loop over wait_on_bit_lock until it returns cleanly. If false, return the error from wait_on_bit_lock. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05nfs: remove pgio_header refcount, related cleanupWeston Andros Adamson
commit 4714fb51fd03a14d8c73001438283e7f7b752f1e upstream. The refcounting on nfs_pgio_header was related to there being (possibly) more than one nfs_pgio_data. Now that nfs_pgio_data has been merged into nfs_pgio_header, there is no reason to do this ref counting. Just call the completion callback on nfs_pgio_release/nfs_pgio_error. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05nfs: merge nfs_pgio_data into _headerWeston Andros Adamson
commit d45f60c67848b9f19160692581d78e5b4757a000 upstream. struct nfs_pgio_data only exists as a member of nfs_pgio_header, but is passed around everywhere, because there used to be multiple _data structs per _header. Many of these functions then use the _data to find a pointer to the _header. This patch cleans this up by merging the nfs_pgio_data structure into nfs_pgio_header and passing nfs_pgio_header around instead. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05nfs: rename members of nfs_pgio_dataWeston Andros Adamson
commit 823b0c9d9800e712374cda89ac3565bd29f6701b upstream. Rename "verf" to "writeverf" and "pages" to "page_array" to prepare for merge of nfs_pgio_data and nfs_pgio_header. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05nfs: move nfs_pgio_data and remove nfs_rw_headerWeston Andros Adamson
commit 1e7f3a485922211b6e4a082ebc6bf05810b0b6ea upstream. nfs_rw_header was used to allocate an nfs_pgio_header along with an nfs_pgio_data, because a _header would need at least one _data. Now there is only ever one nfs_pgio_data for each nfs_pgio_header -- move it to nfs_pgio_header and get rid of nfs_rw_header. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05drm/radeon: properly document reloc priority maskChristian König
commit 701e1e789142042144c8cc10b8f6d1554e960144 upstream. Instead of hard coding the value properly document that this is an userspace interface. No intended functional change. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05xattr: fix check for simultaneous glibc header inclusionFilipe Brandenburger
commit bfcfd44cce2774f19daeb59fb4e43fc9aa80e7b8 upstream. The guard was introduced in commit ea1a8217b06b ("xattr: guard against simultaneous glibc header inclusion") but it is using #ifdef to check for a define that is either set to 1 or 0. Fix it to use #if instead. * Without this patch: $ { echo "#include <sys/xattr.h>"; echo "#include <linux/xattr.h>"; } | gcc -E -Iinclude/uapi - >/dev/null include/uapi/linux/xattr.h:19:0: warning: "XATTR_CREATE" redefined [enabled by default] #define XATTR_CREATE 0x1 /* set value, fail if attr already exists */ ^ /usr/include/x86_64-linux-gnu/sys/xattr.h:32:0: note: this is the location of the previous definition #define XATTR_CREATE XATTR_CREATE ^ * With this patch: $ { echo "#include <sys/xattr.h>"; echo "#include <linux/xattr.h>"; } | gcc -E -Iinclude/uapi - >/dev/null (no warnings) Signed-off-by: Filipe Brandenburger <filbranden@google.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Cc: Allan McRae <allan@archlinux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05drm/ttm: fix handling of TTM_PL_FLAG_TOPDOWN v2Christian König
commit e3f202798aaa808e7a38faa8c3a9f0aa93b85cc0 upstream. bo->mem.placement is not initialized when ttm_bo_man_get_node is called, so the flag had no effect at all. v2: change nouveau and vmwgfx as well Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-17RDMA/uapi: Include socket.h in rdma_user_cm.hDoug Ledford
commit db1044d458a287c18c4d413adc4ad12e92e253b5 upstream. added struct sockaddr_storage to rdma_user_cm.h without also adding an include for linux/socket.h to make sure it is defined. Systemtap needs the header files to build standalone and cannot rely on other files to pre-include other headers, so add linux/socket.h to the list of includes in this file. Fixes: ee7aed4528f ("RDMA/ucma: Support querying for AF_IB addresses") Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-17mnt: Correct permission checks in do_remountEric W. Biederman
commit 9566d6742852c527bf5af38af5cbb878dad75705 upstream. While invesgiating the issue where in "mount --bind -oremount,ro ..." would result in later "mount --bind -oremount,rw" succeeding even if the mount started off locked I realized that there are several additional mount flags that should be locked and are not. In particular MNT_NOSUID, MNT_NODEV, MNT_NOEXEC, and the atime flags in addition to MNT_READONLY should all be locked. These flags are all per superblock, can all be changed with MS_BIND, and should not be changable if set by a more privileged user. The following additions to the current logic are added in this patch. - nosuid may not be clearable by a less privileged user. - nodev may not be clearable by a less privielged user. - noexec may not be clearable by a less privileged user. - atime flags may not be changeable by a less privileged user. The logic with atime is that always setting atime on access is a global policy and backup software and auditing software could break if atime bits are not updated (when they are configured to be updated), and serious performance degradation could result (DOS attack) if atime updates happen when they have been explicitly disabled. Therefore an unprivileged user should not be able to mess with the atime bits set by a more privileged user. The additional restrictions are implemented with the addition of MNT_LOCK_NOSUID, MNT_LOCK_NODEV, MNT_LOCK_NOEXEC, and MNT_LOCK_ATIME mnt flags. Taken together these changes and the fixes for MNT_LOCK_READONLY should make it safe for an unprivileged user to create a user namespace and to call "mount --bind -o remount,... ..." without the danger of mount flags being changed maliciously. Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-17mnt: Only change user settable mount flags in remountEric W. Biederman
commit a6138db815df5ee542d848318e5dae681590fccd upstream. Kenton Varda <kenton@sandstorm.io> discovered that by remounting a read-only bind mount read-only in a user namespace the MNT_LOCK_READONLY bit would be cleared, allowing an unprivileged user to the remount a read-only mount read-write. Correct this by replacing the mask of mount flags to preserve with a mask of mount flags that may be changed, and preserve all others. This ensures that any future bugs with this mask and remount will fail in an easy to detect way where new mount flags simply won't change. Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-17ACPI / scan: not cache _SUN value in struct acpi_device_pnpYasuaki Ishimatsu
commit a383b68d9fe9864c4d3b86f67ad6488f58136435 upstream. The _SUN device indentification object is not guaranteed to return the same value every time it is executed, so we should not cache its return value, but rather execute it every time as needed. If it is cached, an incorrect stale value may be used in some situations. This issue was exposed by commit 202317a573b2 (ACPI / scan: Add acpi_device objects for all device nodes in the namespace). Fix it by avoiding to cache the return value of _SUN. Fixes: 202317a573b2 (ACPI / scan: Add acpi_device objects for all device nodes in the namespace) Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-17scsi: do not issue SCSI RSOC command to Promise Vtrak E610fJanusz Dziemidowicz
commit 0213436a2cc5e4a5ca2fabfaa4d3877097f3b13f upstream. Some devices don't like REPORT SUPPORTED OPERATION CODES and will simply timeout causing sd_mod init to take a very very long time. Introduce BLIST_NO_RSOC scsi scan flag, that stops RSOC from being issued. Add it to Promise Vtrak E610f entry in scsi scan blacklist. Fixes bug #79901 reported at https://bugzilla.kernel.org/show_bug.cgi?id=79901 Fixes: 98dcc2946adb ("SCSI: sd: Update WRITE SAME heuristics") Signed-off-by: Janusz Dziemidowicz <rraptorr@nails.eu.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-17scsi: add a blacklist flag which enables VPD page inquiriesMartin K. Petersen
commit c1d40a527e885a40bb9ea6c46a1b1145d42b66a0 upstream. Despite supporting modern SCSI features some storage devices continue to claim conformance to an older version of the SPC spec. This is done for compatibility with legacy operating systems. Linux by default will not attempt to read VPD pages on devices that claim SPC-2 or older. Introduce a blacklist flag that can be used to trigger VPD page inquiries on devices that are known to support them. Reported-by: KY Srinivasan <kys@microsoft.com> Tested-by: KY Srinivasan <kys@microsoft.com> Reviewed-by: KY Srinivasan <kys@microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-17scsi_scan: Restrict sequential scan to 256 LUNsHannes Reinecke
commit 22ffeb48b7584d6cd50f2a595ed6065d86a87459 upstream. Sequential scan for more than 256 LUNs is very fragile as LUNs might not be numbered sequentially after that point. SAM revisions later than SCSI-3 impose a structure on LUNs larger than 256, making LUN numbers between 256 and 16384 illegal. SCSI-3, however allows for plain 64-bit numbers with no internal structure. So restrict sequential LUN scan to 256 LUNs and add a new blacklist flag 'BLIST_SCSI3LUN' to scan up to max_lun devices. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ewan Milne <emilne@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-17fanotify: fix double free of pending permission eventsJan Kara
commit 5838d4442bd5971687b72221736222637e03140d upstream. Commit 85816794240b ("fanotify: Fix use after free for permission events") introduced a double free issue for permission events which are pending in group's notification queue while group is being destroyed. These events are freed from fanotify_handle_event() but they are not removed from groups notification queue and thus they get freed again from fsnotify_flush_notify(). Fix the problem by removing permission events from notification queue before freeing them if we skip processing access response. Also expand comments in fanotify_release() to explain group shutdown in detail. Fixes: 85816794240b9659e66e4d9b0df7c6e814e5f603 Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Douglas Leeder <douglas.leeder@sophos.com> Tested-by: Douglas Leeder <douglas.leeder@sophos.com> Reported-by: Heinrich Schuchard <xypron.glpk@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-17CAPABILITIES: remove undefined caps from all processesEric Paris
commit 7d8b6c63751cfbbe5eef81a48c22978b3407a3ad upstream. This is effectively a revert of 7b9a7ec565505699f503b4fcf61500dceb36e744 plus fixing it a different way... We found, when trying to run an application from an application which had dropped privs that the kernel does security checks on undefined capability bits. This was ESPECIALLY difficult to debug as those undefined bits are hidden from /proc/$PID/status. Consider a root application which drops all capabilities from ALL 4 capability sets. We assume, since the application is going to set eff/perm/inh from an array that it will clear not only the defined caps less than CAP_LAST_CAP, but also the higher 28ish bits which are undefined future capabilities. The BSET gets cleared differently. Instead it is cleared one bit at a time. The problem here is that in security/commoncap.c::cap_task_prctl() we actually check the validity of a capability being read. So any task which attempts to 'read all things set in bset' followed by 'unset all things set in bset' will not even attempt to unset the undefined bits higher than CAP_LAST_CAP. So the 'parent' will look something like: CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: ffffffc000000000 All of this 'should' be fine. Given that these are undefined bits that aren't supposed to have anything to do with permissions. But they do... So lets now consider a task which cleared the eff/perm/inh completely and cleared all of the valid caps in the bset (but not the invalid caps it couldn't read out of the kernel). We know that this is exactly what the libcap-ng library does and what the go capabilities library does. They both leave you in that above situation if you try to clear all of you capapabilities from all 4 sets. If that root task calls execve() the child task will pick up all caps not blocked by the bset. The bset however does not block bits higher than CAP_LAST_CAP. So now the child task has bits in eff which are not in the parent. These are 'meaningless' undefined bits, but still bits which the parent doesn't have. The problem is now in cred_cap_issubset() (or any operation which does a subset test) as the child, while a subset for valid cap bits, is not a subset for invalid cap bits! So now we set durring commit creds that the child is not dumpable. Given it is 'more priv' than its parent. It also means the parent cannot ptrace the child and other stupidity. The solution here: 1) stop hiding capability bits in status This makes debugging easier! 2) stop giving any task undefined capability bits. it's simple, it you don't put those invalid bits in CAP_FULL_SET you won't get them in init and you won't get them in any other task either. This fixes the cap_issubset() tests and resulting fallout (which made the init task in a docker container untraceable among other things) 3) mask out undefined bits when sys_capset() is called as it might use ~0, ~0 to denote 'all capabilities' for backward/forward compatibility. This lets 'capsh --caps="all=eip" -- -c /bin/bash' run. 4) mask out undefined bit when we read a file capability off of disk as again likely all bits are set in the xattr for forward/backward compatibility. This lets 'setcap all+pe /bin/bash; /bin/bash' run Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Andrew G. Morgan <morgan@kernel.org> Cc: Serge E. Hallyn <serge.hallyn@canonical.com> Cc: Kees Cook <keescook@chromium.org> Cc: Steve Grubb <sgrubb@redhat.com> Cc: Dan Walsh <dwalsh@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-17tpm: Provide a generic means to override the chip returned timeoutsJason Gunthorpe
commit 8e54caf407b98efa05409e1fee0e5381abd2b088 upstream. Some Atmel TPMs provide completely wrong timeouts from their TPM_CAP_PROP_TIS_TIMEOUT query. This patch detects that and returns new correct values via a DID/VID table in the TIS driver. Tested on ARM using an AT97SC3204T FW version 37.16 [PHuewe: without this fix these 'broken' Atmel TPMs won't function on older kernels] Signed-off-by: "Berg, Christopher" <Christopher.Berg@atmel.com> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
2014-09-05svcrdma: Select NFSv4.1 backchannel transport based on forward channelChuck Lever
commit 3c45ddf823d679a820adddd53b52c6699c9a05ac upstream. The current code always selects XPRT_TRANSPORT_BC_TCP for the back channel, even when the forward channel was not TCP (eg, RDMA). When a 4.1 mount is attempted with RDMA, the server panics in the TCP BC code when trying to send CB_NULL. Instead, construct the transport protocol number from the forward channel transport or'd with XPRT_TRANSPORT_BC. Transports that do not support bi-directional RPC will not have registered a "BC" transport, causing create_backchannel_client() to fail immediately. Fixes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=265 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-05jbd2: fix descriptor block size handling errors with journal_csumDarrick J. Wong
commit db9ee220361de03ee86388f9ea5e529eaad5323c upstream. It turns out that there are some serious problems with the on-disk format of journal checksum v2. The foremost is that the function to calculate descriptor tag size returns sizes that are too big. This causes alignment issues on some architectures and is compounded by the fact that some parts of jbd2 use the structure size (incorrectly) to determine the presence of a 64bit journal instead of checking the feature flags. Therefore, introduce journal checksum v3, which enlarges the descriptor block tag format to allow for full 32-bit checksums of journal blocks, fix the journal tag function to return the correct sizes, and fix the jbd2 recovery code to use feature flags to determine 64bitness. Add a few function helpers so we don't have to open-code quite so many pieces. Switching to a 16-byte block size was found to increase journal size overhead by a maximum of 0.1%, to convert a 32-bit journal with no checksumming to a 32-bit journal with checksum v3 enabled. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: TR Reardon <thomas_reardon@hotmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-05drm/radeon: add additional SI pci idsAlex Deucher
commit 37dbeab788a8f23fd946c0be083e5484d6f929a1 upstream. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-05drm/radeon: add new bonaire pci idsAlex Deucher
commit 5fc540edc8ea1297c76685f74bc82a2107fe6731 upstream. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-05drm/radeon: add new KV pci idAlex Deucher
commit 6dc14baf4ced769017c7a7045019c7a19f373865 upstream. bug: https://bugs.freedesktop.org/show_bug.cgi?id=82912 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-14ip_tunnel(ipv4): fix tunnels with "local any remote $remote_ip"Dmitry Popov
[ Upstream commit 95cb5745983c222867cc9ac593aebb2ad67d72c0 ] Ipv4 tunnels created with "local any remote $ip" didn't work properly since 7d442fab0 (ipv4: Cache dst in tunnels). 99% of packets sent via those tunnels had src addr = 0.0.0.0. That was because only dst_entry was cached, although fl4.saddr has to be cached too. Every time ip_tunnel_xmit used cached dst_entry (tunnel_rtable_get returned non-NULL), fl4.saddr was initialized with tnl_params->saddr (= 0 in our case), and wasn't changed until iptunnel_xmit(). This patch adds saddr to ip_tunnel->dst_cache, fixing this issue. Reported-by: Sergey Popov <pinkbyte@gentoo.org> Signed-off-by: Dmitry Popov <ixaphire@qrator.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-30kexec: export free_huge_page to VMCOREINFOAtsushi Kumagai
PG_head_mask was added into VMCOREINFO to filter huge pages in b3acc56bfe1 ("kexec: save PG_head_mask in VMCOREINFO"), but makedumpfile still need another symbol to filter *hugetlbfs* pages. If a user hope to filter user pages, makedumpfile tries to exclude them by checking the condition whether the page is anonymous, but hugetlbfs pages aren't anonymous while they also be user pages. We know it's possible to detect them in the same way as PageHuge(), so we need the start address of free_huge_page(): int PageHuge(struct page *page) { if (!PageCompound(page)) return 0; page = compound_head(page); return get_compound_page_dtor(page) == free_huge_page; } For that reason, this patch changes free_huge_page() into public to export it to VMCOREINFO. Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Acked-by: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-30Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linuxLinus Torvalds
Pull Exynos platform DT fix from Grant Likely: "Device tree Exynos bug fix for v3.16-rc7 This bug fix has been brewing for a while. I hate sending it to you so late, but I only got confirmation that it solves the problem this past weekend. The diff looks big for a bug fix, but the majority of it is only executed in the Exynos quirk case. Unfortunately it required splitting early_init_dt_scan() in two and adding quirk handling in the middle of it on ARM. Exynos has buggy firmware that puts bad data into the memory node. Commit 1c2f87c22566 ("ARM: Get rid of meminfo") exposed the bug by dropping the artificial upper bound on the number of memory banks that can be added. Exynos fails to boot after that commit. This branch fixes it by splitting the early DT parse function and inserting a fixup hook. Exynos uses the hook to correct the DT before parsing memory regions" * tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux: arm: Add devicetree fixup machine function of: Add memory limiting function for flattened devicetrees of: Split early_init_dt_scan into two parts
2014-07-30Merge tag 'stable/for-linus-3.16-rc7-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen fix from David Vrabel: "Fix BUG when trying to expand the grant table. This seems to occur often during boot with Ubuntu 14.04 PV guests" * tag 'stable/for-linus-3.16-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: safely map and unmap grant frames when in atomic context
2014-07-30Revert "cdc_subset: deal with a device that needs reset for timeout"Linus Torvalds
This reverts commit 20fbe3ae990fd54fc7d1f889d61958bc8b38f254. As reported by Stephen Rothwell, it causes compile failures in certain configurations: drivers/net/usb/cdc_subset.c:360:15: error: 'dummy_prereset' undeclared here (not in a function) .pre_reset = dummy_prereset, ^ drivers/net/usb/cdc_subset.c:361:16: error: 'dummy_postreset' undeclared here (not in a function) .post_reset = dummy_postreset, ^ Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: David Miller <davem@davemloft.net> Cc: Oliver Neukum <oneukum@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Make fragmentation IDs less predictable, from Eric Dumazet. 2) TSO tunneling can crash in bnx2x driver, fix from Dmitry Kravkov. 3) Don't allow NULL msg->msg_name just because msg->msg_namelen is non-zero, from Andrey Ryabinin. 4) ndm->ndm_type set using wrong macros, from Jun Zhao. 5) cdc-ether devices can come up with entries in their address filter, so explicitly clear the filter after the device initializes. From Oliver Neukum. 6) Forgotten refcount bump in xfrm_lookup(), from Steffen Klassert. 7) Short packets not padded properly, exposing random data, in bcmgenet driver. Fix from Florian Fainelli. 8) xgbe_probe() doesn't return an error code, but rather zero, when netif_set_real_num_tx_queues() fails. Fix from Wei Yongjun. 9) USB speed not probed properly in r8152 driver, from Hayes Wang. 10) Transmit logic choosing the outgoing port in the sunvnet driver needs to consider a) is the port actually up and b) whether it is a switch port. Fix from David L Stevens. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits) net: phy: re-apply PHY fixups during phy_register_device cdc-ether: clean packet filter upon probe cdc_subset: deal with a device that needs reset for timeout net: sendmsg: fix NULL pointer dereference isdn/bas_gigaset: fix a leak on failure path in gigaset_probe() ip: make IP identifiers less predictable neighbour : fix ndm_type type error issue sunvnet: only use connected ports when sending can: c_can_platform: Fix raminit, use devm_ioremap() instead of devm_ioremap_resource() bnx2x: fix crash during TSO tunneling r8152: fix the checking of the usb speed net: phy: Ensure the MDIO bus module is held net: phy: Set the driver when registering an MDIO bus device bnx2x: fix set_setting for some PHYs hyperv: Fix error return code in netvsc_init_buf() amd-xgbe: Fix error return code in xgbe_probe() ath9k: fix aggregation session lockup net: bcmgenet: correctly pad short packets net: sctp: inherit auth_capable on INIT collisions mac80211: fix crash on getting sta info with uninitialized rate control ...
2014-07-30x86/xen: safely map and unmap grant frames when in atomic contextDavid Vrabel
arch_gnttab_map_frames() and arch_gnttab_unmap_frames() are called in atomic context but were calling alloc_vm_area() which might sleep. Also, if a driver attempts to allocate a grant ref from an interrupt and the table needs expanding, then the CPU may already by in lazy MMU mode and apply_to_page_range() will BUG when it tries to re-enable lazy MMU mode. These two functions are only used in PV guests. Introduce arch_gnttab_init() to allocates the virtual address space in advance. Avoid the use of apply_to_page_range() by using saving and using the array of PTE addresses from the alloc_vm_area() call (which ensures that the required page tables are pre-allocated). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-07-29of: Add memory limiting function for flattened devicetreesLaura Abbott
Buggy bootloaders may pass bogus memory entries in the devicetree. Add of_fdt_limit_memory to add an upper bound on the number of entries that can be present in the devicetree. Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Tested-by: Andreas Färber <afaerber@suse.de> Signed-off-by: Grant Likely <grant.likely@linaro.org>
2014-07-29of: Split early_init_dt_scan into two partsLaura Abbott
Currently, early_init_dt_scan validates the header, sets the boot params, and scans for chosen/memory all in one function. Split this up into two separate functions (validation/setting boot params in one, scanning in another) to allow for additional setup between boot params and scanning the memory. Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Tested-by: Andreas Färber <afaerber@suse.de> [glikely: s/early_init_dt_scan_all/early_init_dt_scan_nodes/] Signed-off-by: Grant Likely <grant.likely@linaro.org>
2014-07-29cdc_subset: deal with a device that needs reset for timeoutOliver Neukum
This device needs to be reset to recover from a timeout. Unfortunately this can be handled only at the level of the subdrivers. Signed-off-by: Oliver Neukum <oneukum@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-29Merge tag 'fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Arnd Bergmann: "A nice small set of bug fixes for arm-soc: - two incorrect register addresses in DT files on shmobile and hisilicon - one revert for a regression on omap - one bug fix for a newly introduced pin controller binding - one regression fix for the memory controller on omap - one patch to avoid a harmless WARN_ON" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: dts: Revert enabling of twl configuration for n900 ARM: dts: fix L2 address in Hi3620 ARM: OMAP2+: gpmc: fix gpmc_hwecc_bch_capable() pinctrl: dra: dt-bindings: Fix pull enable/disable ARM: shmobile: r8a7791: Fix SD2CKCR register address ARM: OMAP2+: l2c: squelch warning dump on power control setting
2014-07-28ip: make IP identifiers less predictableEric Dumazet
In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and Jedidiah describe ways exploiting linux IP identifier generation to infer whether two machines are exchanging packets. With commit 73f156a6e8c1 ("inetpeer: get rid of ip_id_count"), we changed IP id generation, but this does not really prevent this side-channel technique. This patch adds a random amount of perturbation so that IP identifiers for a given destination [1] are no longer monotonically increasing after an idle period. Note that prandom_u32_max(1) returns 0, so if generator is used at most once per jiffy, this patch inserts no hole in the ID suite and do not increase collision probability. This is jiffies based, so in the worst case (HZ=1000), the id can rollover after ~65 seconds of idle time, which should be fine. We also change the hash used in __ip_select_ident() to not only hash on daddr, but also saddr and protocol, so that ICMP probes can not be used to infer information for other protocols. For IPv6, adds saddr into the hash as well, but not nexthdr. If I ping the patched target, we can see ID are now hard to predict. 21:57:11.008086 IP (...) A > target: ICMP echo request, seq 1, length 64 21:57:11.010752 IP (... id 2081 ...) target > A: ICMP echo reply, seq 1, length 64 21:57:12.013133 IP (...) A > target: ICMP echo request, seq 2, length 64 21:57:12.015737 IP (... id 3039 ...) target > A: ICMP echo reply, seq 2, length 64 21:57:13.016580 IP (...) A > target: ICMP echo request, seq 3, length 64 21:57:13.019251 IP (... id 3437 ...) target > A: ICMP echo reply, seq 3, length 64 [1] TCP sessions uses a per flow ID generator not changed by this patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jeffrey Knockel <jeffk@cs.unm.edu> Reported-by: Jedidiah R. Crandall <crandall@cs.unm.edu> Cc: Willy Tarreau <w@1wt.eu> Cc: Hannes Frederic Sowa <hannes@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-25Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "These two pathes fix issues with the kernel-userspace protocol changes in v3.15" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: add FUSE_NO_OPEN_SUPPORT flag to INIT fuse: s_time_gran fix
2014-07-24Merge tag 'omap-for-v3.16/fixes-rc6' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes Merge "Two regression fixes for omaps and one fix for device signaling" from Tony Lindgren: - L2 cache regression fix for a warning about trying to access a read-only register - GPMC ECC software fallback regression fix for omap3 - Fix for dra7 pinctrl pull-up direction that causes signal issues for anybody trying to use the internal pull up or down * tag 'omap-for-v3.16/fixes-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP2+: gpmc: fix gpmc_hwecc_bch_capable() pinctrl: dra: dt-bindings: Fix pull enable/disable ARM: OMAP2+: l2c: squelch warning dump on power control setting Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2014-07-23Merge branch 'for-3.16-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata regression fix from Tejun Heo: "The last libata/for-3.16-fixes pull contained a regression introduced by 1871ee134b73 ("libata: support the ata host which implements a queue depth less than 32") which in turn was a fix for a regression introduced earlier while changing queue tag order to accomodate hard drives which perform poorly if tags are not allocated in circular order (ugh...). The regression happens only for SAS controllers making use of libata to serve ATA devices. They don't fill an ata_host field which is used by the new tag allocation function leading to NULL dereference. This patch adds a new intermediate field ata_host->n_tags which is initialized for both SAS and !SAS cases to fix the issue" * 'for-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: libata: introduce ata_host->n_tags to avoid oops on SAS controllers