summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2012-01-12usb: ch9: fix up MaxStreams helperFelipe Balbi
commit 18b7ede5f7ee2092aedcb578d3ac30bd5d4fc23c upstream. [ removed the dwc3 portion of the patch as it didn't apply to older kernels - gregkh] According to USB 3.0 Specification Table 9-22, if bmAttributes [4:0] are set to zero, it means "no streams supported", but the way this helper was defined on Linux, we will *always* have one stream which might cause several problems. For example on DWC3, we would tell the controller endpoint has streams enabled and yet start transfers with Stream ID set to 0, which would goof up the host side. While doing that, convert the macro to an inline function due to the different checks we now need. Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-12usb: fix number of mapped SG DMA entriesClemens Ladisch
commit bc677d5b64644c399cd3db6a905453e611f402ab upstream. Add a new field num_mapped_sgs to struct urb so that we have a place to store the number of mapped entries and can also retain the original value of entries in num_sgs. Previously, usb_hcd_map_urb_for_dma() would overwrite this with the number of mapped entries, which would break dma_unmap_sg() because it requires the original number of entries. This fixes warnings like the following when using USB storage devices: ------------[ cut here ]------------ WARNING: at lib/dma-debug.c:902 check_unmap+0x4e4/0x695() ehci_hcd 0000:00:12.2: DMA-API: device driver frees DMA sg list with different entry count [map count=4] [unmap count=1] Modules linked in: ohci_hcd ehci_hcd Pid: 0, comm: kworker/0:1 Not tainted 3.2.0-rc2+ #319 Call Trace: <IRQ> [<ffffffff81036d3b>] warn_slowpath_common+0x80/0x98 [<ffffffff81036de7>] warn_slowpath_fmt+0x41/0x43 [<ffffffff811fa5ae>] check_unmap+0x4e4/0x695 [<ffffffff8105e92c>] ? trace_hardirqs_off+0xd/0xf [<ffffffff8147208b>] ? _raw_spin_unlock_irqrestore+0x33/0x50 [<ffffffff811fa84a>] debug_dma_unmap_sg+0xeb/0x117 [<ffffffff8137b02f>] usb_hcd_unmap_urb_for_dma+0x71/0x188 [<ffffffff8137b166>] unmap_urb_for_dma+0x20/0x22 [<ffffffff8137b1c5>] usb_hcd_giveback_urb+0x5d/0xc0 [<ffffffffa0000d02>] ehci_urb_done+0xf7/0x10c [ehci_hcd] [<ffffffffa0001140>] qh_completions+0x429/0x4bd [ehci_hcd] [<ffffffffa000340a>] ehci_work+0x95/0x9c0 [ehci_hcd] ... ---[ end trace f29ac88a5a48c580 ]--- Mapped at: [<ffffffff811faac4>] debug_dma_map_sg+0x45/0x139 [<ffffffff8137bc0b>] usb_hcd_map_urb_for_dma+0x22e/0x478 [<ffffffff8137c494>] usb_hcd_submit_urb+0x63f/0x6fa [<ffffffff8137d01c>] usb_submit_urb+0x2c7/0x2de [<ffffffff8137dcd4>] usb_sg_wait+0x55/0x161 Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06mfd: Turn on the twl4030-madc MADC clockKyle Manna
commit 3d6271f92e98094584fd1e609a9969cd33e61122 upstream. Without turning the MADC clock on, no MADC conversions occur. $ cat /sys/class/hwmon/hwmon0/device/in8_input [ 53.428436] twl4030_madc twl4030_madc: conversion timeout! cat: read error: Resource temporarily unavailable Signed-off-by: Kyle Manna <kyle@kylemanna.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06net: introduce DST_NOPEER dst flagEric Dumazet
[ Upstream commit e688a604807647c9450f9c12a7cb6d027150a895 ] Chris Boot reported crashes occurring in ipv6_select_ident(). [ 461.457562] RIP: 0010:[<ffffffff812dde61>] [<ffffffff812dde61>] ipv6_select_ident+0x31/0xa7 [ 461.578229] Call Trace: [ 461.580742] <IRQ> [ 461.582870] [<ffffffff812efa7f>] ? udp6_ufo_fragment+0x124/0x1a2 [ 461.589054] [<ffffffff812dbfe0>] ? ipv6_gso_segment+0xc0/0x155 [ 461.595140] [<ffffffff812700c6>] ? skb_gso_segment+0x208/0x28b [ 461.601198] [<ffffffffa03f236b>] ? ipv6_confirm+0x146/0x15e [nf_conntrack_ipv6] [ 461.608786] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.614227] [<ffffffff81271d64>] ? dev_hard_start_xmit+0x357/0x543 [ 461.620659] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.626440] [<ffffffffa0379745>] ? br_parse_ip_options+0x19a/0x19a [bridge] [ 461.633581] [<ffffffff812722ff>] ? dev_queue_xmit+0x3af/0x459 [ 461.639577] [<ffffffffa03747d2>] ? br_dev_queue_push_xmit+0x72/0x76 [bridge] [ 461.646887] [<ffffffffa03791e3>] ? br_nf_post_routing+0x17d/0x18f [bridge] [ 461.653997] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.659473] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.665485] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.671234] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.677299] [<ffffffffa0379215>] ? nf_bridge_update_protocol+0x20/0x20 [bridge] [ 461.684891] [<ffffffffa03bb0e5>] ? nf_ct_zone+0xa/0x17 [nf_conntrack] [ 461.691520] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.697572] [<ffffffffa0374812>] ? NF_HOOK.constprop.8+0x3c/0x56 [bridge] [ 461.704616] [<ffffffffa0379031>] ? nf_bridge_push_encap_header+0x1c/0x26 [bridge] [ 461.712329] [<ffffffffa037929f>] ? br_nf_forward_finish+0x8a/0x95 [bridge] [ 461.719490] [<ffffffffa037900a>] ? nf_bridge_pull_encap_header+0x1c/0x27 [bridge] [ 461.727223] [<ffffffffa0379974>] ? br_nf_forward_ip+0x1c0/0x1d4 [bridge] [ 461.734292] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.739758] [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge] [ 461.746203] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.751950] [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge] [ 461.758378] [<ffffffffa037533a>] ? NF_HOOK.constprop.4+0x56/0x56 [bridge] This is caused by bridge netfilter special dst_entry (fake_rtable), a special shared entry, where attaching an inetpeer makes no sense. Problem is present since commit 87c48fa3b46 (ipv6: make fragment identifications less predictable) Introduce DST_NOPEER dst flag and make sure ipv6_select_ident() and __ip_select_ident() fallback to the 'no peer attached' handling. Reported-by: Chris Boot <bootc@bootc.net> Tested-by: Chris Boot <bootc@bootc.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06net: Add a flow_cache_flush_deferred functionSteffen Klassert
[ Upstream commit c0ed1c14a72ca9ebacd51fb94a8aca488b0d361e ] flow_cach_flush() might sleep but can be called from atomic context via the xfrm garbage collector. So add a flow_cache_flush_deferred() function and use this if the xfrm garbage colector is invoked from within the packet path. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Timo Teräs <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06sctp: fix incorrect overflow check on autocloseXi Wang
[ Upstream commit 2692ba61a82203404abd7dd2a027bda962861f74 ] Commit 8ffd3208 voids the previous patches f6778aab and 810c0719 for limiting the autoclose value. If userspace passes in -1 on 32-bit platform, the overflow check didn't work and autoclose would be set to 0xffffffff. This patch defines a max_autoclose (in seconds) for limiting the value and exposes it through sysctl, with the following intentions. 1) Avoid overflowing autoclose * HZ. 2) Keep the default autoclose bound consistent across 32- and 64-bit platforms (INT_MAX / HZ in this patch). 3) Keep the autoclose value consistent between setsockopt() and getsockopt() calls. Suggested-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06VFS: Fix race between CPU hotplug and lglocksSrivatsa S. Bhat
commit e30e2fdfe56288576ee9e04dbb06b4bd5f282203 upstream. Currently, the *_global_[un]lock_online() routines are not at all synchronized with CPU hotplug. Soft-lockups detected as a consequence of this race was reported earlier at https://lkml.org/lkml/2011/8/24/185. (Thanks to Cong Meng for finding out that the root-cause of this issue is the race condition between br_write_[un]lock() and CPU hotplug, which results in the lock states getting messed up). Fixing this race by just adding {get,put}_online_cpus() at appropriate places in *_global_[un]lock_online() is not a good option, because, then suddenly br_write_[un]lock() would become blocking, whereas they have been kept as non-blocking all this time, and we would want to keep them that way. So, overall, we want to ensure 3 things: 1. br_write_lock() and br_write_unlock() must remain as non-blocking. 2. The corresponding lock and unlock of the per-cpu spinlocks must not happen for different sets of CPUs. 3. Either prevent any new CPU online operation in between this lock-unlock, or ensure that the newly onlined CPU does not proceed with its corresponding per-cpu spinlock unlocked. To achieve all this: (a) We introduce a new spinlock that is taken by the *_global_lock_online() routine and released by the *_global_unlock_online() routine. (b) We register a callback for CPU hotplug notifications, and this callback takes the same spinlock as above. (c) We maintain a bitmap which is close to the cpu_online_mask, and once it is initialized in the lock_init() code, all future updates to it are done in the callback, under the above spinlock. (d) The above bitmap is used (instead of cpu_online_mask) while locking and unlocking the per-cpu locks. The callback takes the spinlock upon the CPU_UP_PREPARE event. So, if the br_write_lock-unlock sequence is in progress, the callback keeps spinning, thus preventing the CPU online operation till the lock-unlock sequence is complete. This takes care of requirement (3). The bitmap that we maintain remains unmodified throughout the lock-unlock sequence, since all updates to it are managed by the callback, which takes the same spinlock as the one taken by the lock code and released only by the unlock routine. Combining this with (d) above, satisfies requirement (2). Overall, since we use a spinlock (mentioned in (a)) to prevent CPU hotplug operations from racing with br_write_lock-unlock, requirement (1) is also taken care of. By the way, it is to be noted that a CPU offline operation can actually run in parallel with our lock-unlock sequence, because our callback doesn't react to notifications earlier than CPU_DEAD (in order to maintain our bitmap properly). And this means, since we use our own bitmap (which is stale, on purpose) during the lock-unlock sequence, we could end up unlocking the per-cpu lock of an offline CPU (because we had locked it earlier, when the CPU was online), in order to satisfy requirement (2). But this is harmless, though it looks a bit awkward. Debugged-by: Cong Meng <mc@linux.vnet.ibm.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06block: initialize request_queue's numa node duringMike Snitzer
commit 5151412dd4338b273afdb107c3772528e9e67d92 upstream. struct request_queue is allocated with __GFP_ZERO so its "node" field is zero before initialization. This causes an oops if node 0 is offline in the page allocator because its zonelists are not initialized. From Dave Young's dmesg: SRAT: Node 1 PXM 2 0-d0000000 SRAT: Node 1 PXM 2 100000000-330000000 SRAT: Node 0 PXM 1 330000000-630000000 Initmem setup node 1 0000000000000000-000000000affb000 ... Built 1 zonelists in Node order, mobility grouping on. ... BUG: unable to handle kernel paging request at 0000000000001c08 IP: [<ffffffff8111c355>] __alloc_pages_nodemask+0xb5/0x870 and __alloc_pages_nodemask+0xb5 translates to a NULL pointer on zonelist->_zonerefs. The fix is to initialize q->node at the time of allocation so the correct node is passed to the slab allocator later. Since blk_init_allocated_queue_node() is no longer needed, merge it with blk_init_allocated_queue(). [rientjes@google.com: changelog, initializing q->node] Reported-by: Dave Young <dyoung@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: David Rientjes <rientjes@google.com> Tested-by: Dave Young <dyoung@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21drm/radeon/kms: add some new pci idsAlex Deucher
commit cd5cfce856684e13b9b57d46b78bb827e9c4da3c upstream. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=43739 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21linux/log2.h: Fix rounddown_pow_of_two(1)Linus Torvalds
commit 13c07b0286d340275f2d97adf085cecda37ede37 upstream. Exactly like roundup_pow_of_two(1), the rounddown version was buggy for the case of a compile-time constant '1' argument. Probably because it originated from the same code, sharing history with the roundup version from before the bugfix (for that one, see commit 1a06a52ee1b0: "Fix roundup_pow_of_two(1)"). However, unlike the roundup version, the fix for rounddown is to just remove the broken special case entirely. It's simply not needed - the generic code 1UL << ilog2(n) does the right thing for the constant '1' argment too. The only reason roundup needed that special case was because rounding up does so by subtracting one from the argument (and then adding one to the result) causing the obvious problems with "ilog2(0)". But rounddown doesn't do any of that, since ilog2() naturally truncates (ie "rounds down") to the right rounded down value. And without the ilog2(0) case, there's no reason for the special case that had the wrong value. tl;dr: rounddown_pow_of_two(1) should be 1, not 0. Acked-by: Dmitry Torokhov <dtor@vmware.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21fix apparmor dereferencing potentially freed dentry, sanitize __d_path() APIAl Viro
commit 02125a826459a6ad142f8d91c5b6357562f96615 upstream. __d_path() API is asking for trouble and in case of apparmor d_namespace_path() getting just that. The root cause is that when __d_path() misses the root it had been told to look for, it stores the location of the most remote ancestor in *root. Without grabbing references. Sure, at the moment of call it had been pinned down by what we have in *path. And if we raced with umount -l, we could have very well stopped at vfsmount/dentry that got freed as soon as prepend_path() dropped vfsmount_lock. It is safe to compare these pointers with pre-existing (and known to be still alive) vfsmount and dentry, as long as all we are asking is "is it the same address?". Dereferencing is not safe and apparmor ended up stepping into that. d_namespace_path() really wants to examine the place where we stopped, even if it's not connected to our namespace. As the result, it looked at ->d_sb->s_magic of a dentry that might've been already freed by that point. All other callers had been careful enough to avoid that, but it's really a bad interface - it invites that kind of trouble. The fix is fairly straightforward, even though it's bigger than I'd like: * prepend_path() root argument becomes const. * __d_path() is never called with NULL/NULL root. It was a kludge to start with. Instead, we have an explicit function - d_absolute_root(). Same as __d_path(), except that it doesn't get root passed and stops where it stops. apparmor and tomoyo are using it. * __d_path() returns NULL on path outside of root. The main caller is show_mountinfo() and that's precisely what we pass root for - to skip those outside chroot jail. Those who don't want that can (and do) use d_path(). * __d_path() root argument becomes const. Everyone agrees, I hope. * apparmor does *NOT* try to use __d_path() or any of its variants when it sees that path->mnt is an internal vfsmount. In that case it's definitely not mounted anywhere and dentry_path() is exactly what we want there. Handling of sysctl()-triggered weirdness is moved to that place. * if apparmor is asked to do pathname relative to chroot jail and __d_path() tells it we it's not in that jail, the sucker just calls d_absolute_path() instead. That's the other remaining caller of __d_path(), BTW. * seq_path_root() does _NOT_ return -ENAMETOOLONG (it's stupid anyway - the normal seq_file logics will take care of growing the buffer and redoing the call of ->show() just fine). However, if it gets path not reachable from root, it returns SEQ_SKIP. The only caller adjusted (i.e. stopped ignoring the return value as it used to do). Reviewed-by: John Johansen <john.johansen@canonical.com> ACKed-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09sch_red: fix red_calc_qavg_from_idle_timeEric Dumazet
[ Upstream commit ea6a5d3b97b768561db6358f15e4c84ced0f4f7e ] Since commit a4a710c4a7490587 (pkt_sched: Change PSCHED_SHIFT from 10 to 6) it seems RED/GRED are broken. red_calc_qavg_from_idle_time() computes a delay in us units, but this delay is now 16 times bigger than real delay, so the final qavg result smaller than expected. Use standard kernel time services since there is no need to obfuscate them. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09inet: add a redirect generation id in inetpeerEric Dumazet
[ Upstream commit de68dca1816660b0d3ac89fa59ffb410007a143f ] Now inetpeer is the place where we cache redirect information for ipv4 destinations, we must be able to invalidate informations when a route is added/removed on host. As inetpeer is not yet namespace aware, this patch adds a shared redirect_genid, and a per inetpeer redirect_genid. This might be changed later if inetpeer becomes ns aware. Cache information for one inerpeer is valid as long as its redirect_genid has the same value than global redirect_genid. Reported-by: Arkadiusz Miśkiewicz <a.miskiewicz@gmail.com> Tested-by: Arkadiusz Miśkiewicz <a.miskiewicz@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09firmware: Sigma: Fix endianess issuesLars-Peter Clausen
commit bda63586bc5929e97288cdb371bb6456504867ed upstream. Currently the SigmaDSP firmware loader only works correctly on little-endian systems. Fix this by using the proper endianess conversion functions. Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09firmware: Sigma: Prevent out of bounds memory accessLars-Peter Clausen
commit 4f718a29fe4908c2cea782f751e9805319684e2b upstream. The SigmaDSP firmware loader currently does not perform enough boundary size checks when processing the firmware. As a result it is possible that a malformed firmware can cause an out of bounds memory access. This patch adds checks which ensure that both the action header and the payload are completely inside the firmware data boundaries before processing them. Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09drm/radeon/kms: add some new pci idsAlex Deucher
commit 2ed4d9d648cbd4fb1c232a646dbdbdfdd373ca94 upstream. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09clocksource: Avoid selecting mult values that might overflow when adjustedJohn Stultz
commit d65670a78cdbfae94f20a9e05ec705871d7cdf2b upstream. For some frequencies, the clocks_calc_mult_shift() function will unfortunately select mult values very close to 0xffffffff. This has the potential to overflow when NTP adjusts the clock, adding to the mult value. This patch adds a clocksource.maxadj value, which provides an approximation of an 11% adjustment(NTP limits adjustments to 500ppm and the tick adjustment is limited to 10%), which could be made to the clocksource.mult value. This is then used to both check that the current mult value won't overflow/underflow, as well as warning us if the timekeeping_adjust() code pushes over that 11% boundary. v2: Fix max_adjustment calculation, and improve WARN_ONCE messages. v3: Don't warn before maxadj has actually been set CC: Yong Zhang <yong.zhang0@gmail.com> CC: David Daney <ddaney.cavm@gmail.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Chen Jie <chenj@lemote.com> CC: zhangfx <zhangfx@lemote.com> Reported-by: Chen Jie <chenj@lemote.com> Reported-by: zhangfx <zhangfx@lemote.com> Tested-by: Yong Zhang <yong.zhang0@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09drm: integer overflow in drm_mode_dirtyfb_ioctl()Xi Wang
commit a5cd335165e31db9dbab636fd29895d41da55dd2 upstream. There is a potential integer overflow in drm_mode_dirtyfb_ioctl() if userspace passes in a large num_clips. The call to kmalloc would allocate a small buffer, and the call to fb->funcs->dirty may result in a memory corruption. Reported-by: Haogang Chen <haogangchen@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-26vmscan: fix shrinker callback bug in fs/super.cMikulas Patocka
commit 09f363c7363eb10cfb4b82094bd7064e5608258b upstream. The callback must not return -1 when nr_to_scan is zero. Fix the bug in fs/super.c and add this requirement to the callback specification. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-26nfs: when attempting to open a directory, fall back on normal lookup (try #5)Jeff Layton
commit 1788ea6e3b2a58cf4fb00206e362d9caff8d86a7 upstream. commit d953126 changed how nfs_atomic_lookup handles an -EISDIR return from an OPEN call. Prior to that patch, that caused the client to fall back to doing a normal lookup. When that patch went in, the code began returning that error to userspace. The d_revalidate codepath however never had the corresponding change, so it was still possible to end up with a NULL ctx->state pointer after that. That patch caused a regression. When we attempt to open a directory that does not have a cached dentry, that open now errors out with EISDIR. If you attempt the same open with a cached dentry, it will succeed. Fix this by reverting the change in nfs_atomic_lookup and allowing attempts to open directories to fall back to a normal lookup Also, add a NFSv4-specific f_ops->open routine that just returns -ENOTDIR. This should never be called if things are working properly, but if it ever is, then the dprintk may help in debugging. To facilitate this, a new file_operations field is also added to the nfs_rpc_ops struct. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-21drm/radeon: add some missing FireMV pci idsAlex Deucher
commit b872a37437e93df9d112ce674752b3b3a0a17020 upstream. Noticed by Egbert. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Egbert Eich <eich@suse.de> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11ACPI: Fix CONFIG_ACPI_DOCK=n compiler warningBart Van Assche
commit c1056b42a87b59375f8f81a92ef029165f44fcce upstream. Recently the ACPI ops structs were constified but the inline version of register_hotplug_dock_device() was overlooked (see also commit 9c8b04b, June 25 2011). Update the inline function register_hotplug_dock_device() that is enabled with CONFIG_ACPI_DOCK=n too. This patch fixes at least the following compiler warnings: drivers/ata/libata-acpi.c: In function .ata_acpi_associate.: drivers/ata/libata-acpi.c:266:11: warning: passing argument 2 of .register_hotplug_dock_device. discards qualifiers from pointer target type include/acpi/acpi_drivers.h:146:19: note: expected .struct acpi_dock_ops *. but argument is of type .const struct acpi_dock_ops *. drivers/ata/libata-acpi.c:275:11: warning: passing argument 2 of .register_hotplug_dock_device. discards qualifiers from pointer target type include/acpi/acpi_drivers.h:146:19: note: expected .struct acpi_dock_ops *. but argument is of type .const struct acpi_dock_ops *. Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11drm/radeon/kms: properly set panel mode for eDPAlex Deucher
commit 00dfb8df5bf8c3afe4c0bb8361133156b06b7a2c upstream. This should make eDP more reliable. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11thp: share get_huge_page_tail()Andrea Arcangeli
commit b35a35b556f5e6b7993ad0baf20173e75c09ce8c upstream. This avoids duplicating the function in every arch gup_fast. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11ext2,ext3,ext4: don't inherit APPEND_FL or IMMUTABLE_FL for new inodesTheodore Ts'o
commit 1cd9f0976aa4606db8d6e3dc3edd0aca8019372a upstream. This doesn't make much sense, and it exposes a bug in the kernel where attempts to create a new file in an append-only directory using O_CREAT will fail (but still leave a zero-length file). This was discovered when xfstests #79 was generalized so it could run on all file systems. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11mm: thp: tail page refcounting fixAndrea Arcangeli
commit 70b50f94f1644e2aa7cb374819cfd93f3c28d725 upstream. Michel while working on the working set estimation code, noticed that calling get_page_unless_zero() on a random pfn_to_page(random_pfn) wasn't safe, if the pfn ended up being a tail page of a transparent hugepage under splitting by __split_huge_page_refcount(). He then found the problem could also theoretically materialize with page_cache_get_speculative() during the speculative radix tree lookups that uses get_page_unless_zero() in SMP if the radix tree page is freed and reallocated and get_user_pages is called on it before page_cache_get_speculative has a chance to call get_page_unless_zero(). So the best way to fix the problem is to keep page_tail->_count zero at all times. This will guarantee that get_page_unless_zero() can never succeed on any tail page. page_tail->_mapcount is guaranteed zero and is unused for all tail pages of a compound page, so we can simply account the tail page references there and transfer them to tail_page->_count in __split_huge_page_refcount() (in addition to the head_page->_mapcount). While debugging this s/_count/_mapcount/ change I also noticed get_page is called by direct-io.c on pages returned by get_user_pages. That wasn't entirely safe because the two atomic_inc in get_page weren't atomic. As opposed to other get_user_page users like secondary-MMU page fault to establish the shadow pagetables would never call any superflous get_page after get_user_page returns. It's safer to make get_page universally safe for tail pages and to use get_page_foll() within follow_page (inside get_user_pages()). get_page_foll() is safe to do the refcounting for tail pages without taking any locks because it is run within PT lock protected critical sections (PT lock for pte and page_table_lock for pmd_trans_huge). The standard get_page() as invoked by direct-io instead will now take the compound_lock but still only for tail pages. The direct-io paths are usually I/O bound and the compound_lock is per THP so very finegrined, so there's no risk of scalability issues with it. A simple direct-io benchmarks with all lockdep prove locking and spinlock debugging infrastructure enabled shows identical performance and no overhead. So it's worth it. Ideally direct-io should stop calling get_page() on pages returned by get_user_pages(). The spinlock in get_page() is already optimized away for no-THP builds but doing get_page() on tail pages returned by GUP is generally a rare operation and usually only run in I/O paths. This new refcounting on page_tail->_mapcount in addition to avoiding new RCU critical sections will also allow the working set estimation code to work without any further complexity associated to the tail page refcounting with THP. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Michel Lespinasse <walken@google.com> Reviewed-by: Michel Lespinasse <walken@google.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11readlinkat: ensure we return ENOENT for the empty pathname for normal lookupsAndy Whitcroft
commit 1fa1e7f615f4d3ae436fa319af6e4eebdd4026a8 upstream. Since the commit below which added O_PATH support to the *at() calls, the error return for readlink/readlinkat for the empty pathname has switched from ENOENT to EINVAL: commit 65cfc6722361570bfe255698d9cd4dccaf47570d Author: Al Viro <viro@zeniv.linux.org.uk> Date: Sun Mar 13 15:56:26 2011 -0400 readlinkat(), fchownat() and fstatat() with empty relative pathnames This is both unexpected for userspace and makes readlink/readlinkat inconsistant with all other interfaces; and inconsistant with our stated return for these pathnames. As the readlinkat call does not have a flags parameter we cannot use the AT_EMPTY_PATH approach used in the other calls. Therefore expose whether the original path is infact entry via a new user_path_at_empty() path lookup function. Use this to determine whether to default to EINVAL or ENOENT for failures. Addresses http://bugs.launchpad.net/bugs/817187 [akpm@linux-foundation.org: remove unused getname_flags()] Signed-off-by: Andy Whitcroft <apw@canonical.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11mm: avoid null pointer access in vm_struct via /proc/vmallocinfoMitsuo Hayasaka
commit f5252e009d5b87071a919221e4f6624184005368 upstream. The /proc/vmallocinfo shows information about vmalloc allocations in vmlist that is a linklist of vm_struct. It, however, may access pages field of vm_struct where a page was not allocated. This results in a null pointer access and leads to a kernel panic. Why this happens: In __vmalloc_node_range() called from vmalloc(), newly allocated vm_struct is added to vmlist at __get_vm_area_node() and then, some fields of vm_struct such as nr_pages and pages are set at __vmalloc_area_node(). In other words, it is added to vmlist before it is fully initialized. At the same time, when the /proc/vmallocinfo is read, it accesses the pages field of vm_struct according to the nr_pages field at show_numa_info(). Thus, a null pointer access happens. The patch adds the newly allocated vm_struct to the vmlist *after* it is fully initialized. So, it can avoid accessing the pages field with unallocated page when show_numa_info() is called. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Rientjes <rientjes@google.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11io-mapping: ensure io_mapping_map_atomic _is_ atomicDaniel Vetter
commit 24dd85ff723f142093f44244764b9b5c152235b8 upstream. For the !HAVE_ATOMIC_IOMAP case the stub functions did not call pagefault_disable/_enable. The i915 driver relies on the map actually being atomic, otherwise it can deadlock with it's own pagefault handler in the gtt pwrite fastpath. This is exercised by gem_mmap_gtt from the intel-gpu-toosl gem testsuite. v2: Chris Wilson noted the lack of an include. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38115 Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11genirq: Add IRQF_RESUME_EARLY and resume such IRQs earlierIan Campbell
commit 9bab0b7fbaceec47d32db51cd9e59c82fb071f5a upstream. This adds a mechanism to resume selected IRQs during syscore_resume instead of dpm_resume_noirq. Under Xen we need to resume IRQs associated with IPIs early enough that the resched IPI is unmasked and we can therefore schedule ourselves out of the stop_machine where the suspend/resume takes place. This issue was introduced by 676dc3cf5bc3 "xen: Use IRQF_FORCE_RESUME". Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1318713254.11016.52.camel@dagon.hellion.org.uk Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11time: Change jiffies_to_clock_t() argument type to unsigned longhank
commit cbbc719fccdb8cbd87350a05c0d33167c9b79365 upstream. The parameter's origin type is long. On an i386 architecture, it can easily be larger than 0x80000000, causing this function to convert it to a sign-extended u64 type. Change the type to unsigned long so we get the correct result. Signed-off-by: hank <pyu@redhat.com> Cc: John Stultz <john.stultz@linaro.org> [ build fix ] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11net: hold sock reference while processing tx timestampsRichard Cochran
commit da92b194cc36b5dc1fbd85206aeeffd80bee0c39 upstream. The pair of functions, * skb_clone_tx_timestamp() * skb_complete_tx_timestamp() were designed to allow timestamping in PHY devices. The first function, called during the MAC driver's hard_xmit method, identifies PTP protocol packets, clones them, and gives them to the PHY device driver. The PHY driver may hold onto the packet and deliver it at a later time using the second function, which adds the packet to the socket's error queue. As pointed out by Johannes, nothing prevents the socket from disappearing while the cloned packet is sitting in the PHY driver awaiting a timestamp. This patch fixes the issue by taking a reference on the socket for each such packet. In addition, the comments regarding the usage of these function are expanded to highlight the rule that PHY drivers must use skb_complete_tx_timestamp() to release the packet, in order to release the socket reference, too. These functions first appeared in v2.6.36. Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11SUNRPC/NFS: make rpc pipe upcall genericPeng Tao
commit c1225158a8dad9e9d5eee8a17dbbd9c7cda05ab9 upstream. The same function is used by idmap, gss and blocklayout code. Make it generic. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11USB: fix ehci alignment errorHarro Haan
commit 276532ba9666b36974cbe16f303fc8be99c9da17 upstream. The Kirkwood gave an unaligned memory access error on line 742 of drivers/usb/host/echi-hcd.c: "ehci->last_periodic_enable = ktime_get_real();" Signed-off-by: Harro Haan <hrhaan@gmail.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11TTY: make tty_add_file non-failingJiri Slaby
commit fa90e1c935472281de314e6d7c9a37db9cbc2e4e upstream. If tty_add_file fails at the point it is now, we have to revert all the changes we did to the tty. It means either decrease all refcounts if this was a tty reopen or delete the tty if it was newly allocated. There was a try to fix this in v3.0-rc2 using tty_release in 0259894c7 (TTY: fix fail path in tty_open). But instead it introduced a NULL dereference. It's because tty_release dereferences filp->private_data, but that one is set even in our tty_add_file. And when tty_add_file fails, it's still NULL/garbage. Hence tty_release cannot be called there. To circumvent the original leak (and the current NULL deref) we split tty_add_file into two functions, making the latter non-failing. In that case we may do the former early in open, where handling failures is easy. The latter stays as it is now. So there is no change in functionality. The original bug (leak) was introduced by f573bd176 (tty: Remove __GFP_NOFAIL from tty_add_file()). Thanks Dan for reporting this. Later, we may split tty_release into more functions and call only some of them in this fail path instead. (If at all possible.) Introduced-in: v2.6.37-rc2 Signed-off-by: Jiri Slaby <jslaby@suse.cz> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-17Merge branch 'nf' of git://1984.lsi.us.es/netDavid S. Miller
2011-10-17udplite: fast-path computation of checksum coverageGerrit Renker
Commit 903ab86d195cca295379699299c5fc10beba31c7 of 1 March this year ("udp: Add lockless transmit path") introduced a new fast TX path that broke the checksum coverage computation of UDP-lite, which so far depended on up->len (only set if the socket is locked and 0 in the fast path). Fixed by providing both fast- and slow-path computation of checksum coverage. The latter can be removed when UDP(-lite)v6 also uses a lockless transmit path. Reported-by: Thomas Volkert <thomas@homer-conferencing.com> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-12IPVS netns shutdown/startup dead-lockHans Schillstrom
ip_vs_mutext is used by both netns shutdown code and startup and both implicit uses sk_lock-AF_INET mutex. cleanup CPU-1 startup CPU-2 ip_vs_dst_event() ip_vs_genl_set_cmd() sk_lock-AF_INET __ip_vs_mutex sk_lock-AF_INET __ip_vs_mutex * DEAD LOCK * A new mutex placed in ip_vs netns struct called sync_mutex is added. Comments from Julian and Simon added. This patch has been running for more than 3 month now and it seems to work. Ver. 3 IP_VS_SO_GET_DAEMON in do_ip_vs_get_ctl protected by sync_mutex instead of __ip_vs_mutex as sugested by Julian. Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-10-06Merge branch 'for-linus' of http://people.redhat.com/agk/git/linux-dmLinus Torvalds
* 'for-linus' of http://people.redhat.com/agk/git/linux-dm: dm crypt: always disable discard_zeroes_data dm: raid fix write_mostly arg validation dm table: avoid crash if integrity profile changes dm: flakey fix corrupt_bio_byte error path
2011-10-04Merge git://github.com/davem330/netLinus Torvalds
* git://github.com/davem330/net: pch_gbe: Fixed the issue on which a network freezes pch_gbe: Fixed the issue on which PC was frozen when link was downed. make PACKET_STATISTICS getsockopt report consistently between ring and non-ring net: xen-netback: correctly restart Tx after a VM restore/migrate bonding: properly stop queuing work when requested can bcm: fix incomplete tx_setup fix RDSRDMA: Fix cleanup of rds_iw_mr_pool net: Documentation: Fix type of variables ibmveth: Fix oops on request_irq failure ipv6: nullify ipv6_ac_list and ipv6_fl_list when creating new socket cxgb4: Fix EEH on IBM P7IOC can bcm: fix tx_setup off-by-one errors MAINTAINERS: tehuti: Alexander Indenbaum's address bounces dp83640: reduce driver noise ptp: fix L2 event message recognition
2011-10-04PCI: Disable MPS configuration by defaultJon Mason
Add the ability to disable PCI-E MPS turning and using the BIOS configured MPS defaults. Due to the number of issues recently discovered on some x86 chipsets, make this the default behavior. Also, add the option for peer to peer DMA MPS configuration. Peer to peer DMA is outside the scope of this patch, but MPS configuration could prevent it from working by having the MPS on one root port different than the MPS on another. To work around this, simply make the system wide MPS the smallest possible value (128B). Signed-off-by: Jon Mason <mason@myri.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-01Merge branches 'irq-urgent-for-linus', 'x86-urgent-for-linus' and ↵Linus Torvalds
'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip * 'irq-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: irq: Fix check for already initialized irq_domain in irq_domain_add irq: Add declaration of irq_domain_simple_ops to irqdomain.h * 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: x86/rtc: Don't recursively acquire rtc_lock * 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: posix-cpu-timers: Cure SMP wobbles sched: Fix up wchan borkage sched/rt: Migrate equal priority tasks to available CPUs
2011-09-30posix-cpu-timers: Cure SMP wobblesPeter Zijlstra
David reported: Attached below is a watered-down version of rt/tst-cpuclock2.c from GLIBC. Just build it with "gcc -o test test.c -lpthread -lrt" or similar. Run it several times, and you will see cases where the main thread will measure a process clock difference before and after the nanosleep which is smaller than the cpu-burner thread's individual thread clock difference. This doesn't make any sense since the cpu-burner thread is part of the top-level process's thread group. I've reproduced this on both x86-64 and sparc64 (using both 32-bit and 64-bit binaries). For example: [davem@boricha build-x86_64-linux]$ ./test process: before(0.001221967) after(0.498624371) diff(497402404) thread: before(0.000081692) after(0.498316431) diff(498234739) self: before(0.001223521) after(0.001240219) diff(16698) [davem@boricha build-x86_64-linux]$ The diff of 'process' should always be >= the diff of 'thread'. I make sure to wrap the 'thread' clock measurements the most tightly around the nanosleep() call, and that the 'process' clock measurements are the outer-most ones. --- #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <time.h> #include <fcntl.h> #include <string.h> #include <errno.h> #include <pthread.h> static pthread_barrier_t barrier; static void *chew_cpu(void *arg) { pthread_barrier_wait(&barrier); while (1) __asm__ __volatile__("" : : : "memory"); return NULL; } int main(void) { clockid_t process_clock, my_thread_clock, th_clock; struct timespec process_before, process_after; struct timespec me_before, me_after; struct timespec th_before, th_after; struct timespec sleeptime; unsigned long diff; pthread_t th; int err; err = clock_getcpuclockid(0, &process_clock); if (err) return 1; err = pthread_getcpuclockid(pthread_self(), &my_thread_clock); if (err) return 1; pthread_barrier_init(&barrier, NULL, 2); err = pthread_create(&th, NULL, chew_cpu, NULL); if (err) return 1; err = pthread_getcpuclockid(th, &th_clock); if (err) return 1; pthread_barrier_wait(&barrier); err = clock_gettime(process_clock, &process_before); if (err) return 1; err = clock_gettime(my_thread_clock, &me_before); if (err) return 1; err = clock_gettime(th_clock, &th_before); if (err) return 1; sleeptime.tv_sec = 0; sleeptime.tv_nsec = 500000000; nanosleep(&sleeptime, NULL); err = clock_gettime(th_clock, &th_after); if (err) return 1; err = clock_gettime(my_thread_clock, &me_after); if (err) return 1; err = clock_gettime(process_clock, &process_after); if (err) return 1; diff = process_after.tv_nsec - process_before.tv_nsec; printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", process_before.tv_sec, process_before.tv_nsec, process_after.tv_sec, process_after.tv_nsec, diff); diff = th_after.tv_nsec - th_before.tv_nsec; printf("thread: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", th_before.tv_sec, th_before.tv_nsec, th_after.tv_sec, th_after.tv_nsec, diff); diff = me_after.tv_nsec - me_before.tv_nsec; printf("self: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", me_before.tv_sec, me_before.tv_nsec, me_after.tv_sec, me_after.tv_nsec, diff); return 0; } This is due to us using p->se.sum_exec_runtime in thread_group_cputime() where we iterate the thread group and sum all data. This does not take time since the last schedule operation (tick or otherwise) into account. We can cure this by using task_sched_runtime() at the cost of having to take locks. This also means we can (and must) do away with thread_group_sched_runtime() since the modified thread_group_cputime() is now more accurate and would deadlock when called from thread_group_sched_runtime(). Aside of that it makes the function safe on 32 bit systems. The old code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a 64bit value and could be changed on another cpu at the same time. Reported-by: David Miller <davem@davemloft.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins Tested-by: David Miller <davem@davemloft.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-09-29ptp: fix L2 event message recognitionRichard Cochran
The IEEE 1588 standard defines two kinds of messages, event and general messages. Event messages require time stamping, and general do not. When using UDP transport, two separate ports are used for the two message types. The BPF designed to recognize event messages incorrectly classifies L2 general messages as event messages. This commit fixes the issue by extending the filter to check the message type field for L2 PTP packets. Event messages are be distinguished from general messages by testing the "general" bit. Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Cc: <stable@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-28Merge branch 'writeback-for-linus' of git://github.com/fengguang/linuxLinus Torvalds
* 'writeback-for-linus' of git://github.com/fengguang/linux: writeback: show raw dirtied_when in trace writeback_single_inode
2011-09-27vfs: remove LOOKUP_NO_AUTOMOUNT flagLinus Torvalds
That flag no longer makes sense, since we don't look up automount points as eagerly any more. Additionally, it turns out that the NO_AUTOMOUNT handling was buggy to begin with: it would avoid automounting even for cases where we really *needed* to do the automount handling, and could return ENOENT for autofs entries that hadn't been instantiated yet. With our new non-eager automount semantics, one discussion has been about adding a AT_AUTOMOUNT flag to vfs_fstatat (and thus the newfstatat() and fstatat64() system calls), but it's probably not worth it: you can always force at least directory automounting by simply adding the final '/' to the filename, which works for *all* of the stat family system calls, old and new. So AT_NO_AUTOMOUNT (and thus LOOKUP_NO_AUTOMOUNT) really were just a result of our bad default behavior. Acked-by: Ian Kent <raven@themaw.net> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-26vfs pathname lookup: Add LOOKUP_AUTOMOUNT flagLinus Torvalds
Since we've now turned around and made LOOKUP_FOLLOW *not* force an automount, we want to add the ability to force an automount event on lookup even if we don't happen to have one of the other flags that force it implicitly (LOOKUP_OPEN, LOOKUP_DIRECTORY, LOOKUP_PARENT..) Most cases will never want to use this, since you'd normally want to delay automounting as long as possible, which usually implies LOOKUP_OPEN (when we open a file or directory, we really cannot avoid the automount any more). But Trond argued sufficiently forcefully that at a minimum bind mounting a file and quotactl will want to force the automount lookup. Some other cases (like nfs_follow_remote_path()) could use it too, although LOOKUP_DIRECTORY would work there as well. This commit just adds the flag and logic, no users yet, though. It also doesn't actually touch the LOOKUP_NO_AUTOMOUNT flag that is related, and was made irrelevant by the same change that made us not follow on LOOKUP_FOLLOW. Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Ian Kent <raven@themaw.net> Cc: Jeff Layton <jlayton@redhat.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Greg KH <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-25dm crypt: always disable discard_zeroes_dataMilan Broz
If optional discard support in dm-crypt is enabled, discards requests bypass the crypt queue and blocks of the underlying device are discarded. For the read path, discarded blocks are handled the same as normal ciphertext blocks, thus decrypted. So if the underlying device announces discarded regions return zeroes, dm-crypt must disable this flag because after decryption there is just random noise instead of zeroes. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-09-22Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6Linus Torvalds
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: [S390] kvm: extension capability for new address space layout [S390] kvm: fix address mode switching
2011-09-21Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-block: floppy: use del_timer_sync() in init cleanup blk-cgroup: be able to remove the record of unplugged device block: Don't check QUEUE_FLAG_SAME_COMP in __blk_complete_request mm: Add comment explaining task state setting in bdi_forker_thread() mm: Cleanup clearing of BDI_pending bit in bdi_forker_thread() block: simplify force plug flush code a little bit block: change force plug flush call order block: Fix queue_flag update when rq_affinity goes from 2 to 1 block: separate priority boosting from REQ_META block: remove READ_META and WRITE_META xen-blkback: fixed indentation and comments xen-blkback: Don't disconnect backend until state switched to XenbusStateClosed.