summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2010-04-01block: Backport of various I/O topology fixes from 2.6.33 and 2.6.34Martin K. Petersen
block: Backport of various I/O topology fixes from 2.6.33 and 2.6.34 The stacking code incorrectly scaled up the data offset in some cases causing misaligned devices to report alignment. Rewrite the stacking algorithm to remedy this. (Upstream commit 9504e0864b58b4a304820dcf3755f1da80d5e72f) The top device misalignment flag would not be set if the added bottom device was already misaligned as opposed to causing a stacking failure. Also massage the reporting so that an error is only returned if adding the bottom device caused the misalignment. I.e. don't return an error if the top is already flagged as misaligned. (Upstream commit fe0b393f2c0a0d23a9bc9ed7dc51a1ee511098bd) lcm() was defined to take integer-sized arguments. The supplied arguments are multiplied, however, causing us to overflow given sufficiently large input. That in turn led to incorrect optimal I/O size reporting in some cases. Switch lcm() over to unsigned long similar to gcd() and move the function from blk-settings.c to lib. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01quota: manage reserved space when quota is not active [v2]Dmitry Monakhov
commit c469070aea5a0ada45a836937c776fd3083dae2b upstream. Since we implemented generic reserved space management interface, then it is possible to account reserved space even when quota is not active (similar to i_blocks/i_bytes). Without this patch following testcase result in massive comlain from WARN_ON in dquot_claim_space() TEST_CASE: mount /dev/sdb /mnt -oquota dd if=/dev/zero of=/mnt/test bs=1M count=1 quotaon /mnt # fs_reserved_spave == 1Mb # quota_reserved_space == 0, because quota was disabled dd if=/dev/zero of=/mnt/test seek=1 bs=1M count=1 # fs_reserved_spave == 2Mb # quota_reserved_space == 1Mb sync # ->dquot_claim_space() -> WARN_ON Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01mac80211: Retry null data frame for power saveVivek Natarajan
commit 375177bf35efc08e1bd37bbda4cc0c8cc4db8500 upstream. Even if the null data frame is not acked by the AP, mac80211 goes into power save. This might lead to loss of frames from the AP. Prevent this by restarting dynamic_ps_timer when ack is not received for null data frames. Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Vivek Natarajan <vnatarajan@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01if_tunnel.h: add missing ams/byteorder.h includePaulius Zaleckas
commit 9bf35c8dddd56f7f247a27346f74f5adc18071f4 upstream. When compiling userspace application which includes if_tunnel.h and uses GRE_* defines you will get undefined reference to __cpu_to_be16. Fix this by adding missing #include <asm/byteorder.h> Signed-off-by: Paulius Zaleckas <paulius.zaleckas@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01tty: Take a 256 byte padding into account when buffering below sub-page unitsMel Gorman
commit 352fa6ad16b89f8ffd1a93b4419b1a8f2259feab upstream. The TTY layer takes some care to ensure that only sub-page allocations are made with interrupts disabled. It does this by setting a goal of "TTY_BUFFER_PAGE" to allocate. Unfortunately, while TTY_BUFFER_PAGE takes the size of tty_buffer into account, it fails to account that tty_buffer_find() rounds the buffer size out to the next 256 byte boundary before adding on the size of the tty_buffer. This patch adjusts the TTY_BUFFER_PAGE calculation to take into account the size of the tty_buffer and the padding. Once applied, tty_buffer_alloc() should not require high-order allocations. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01tty: Keep the default buffering to sub-page unitsAlan Cox
commit d9661adfb8e53a7647360140af3b92284cbe52d4 upstream. We allocate during interrupts so while our buffering is normally diced up small anyway on some hardware at speed we can pressure the VM excessively for page pairs. We don't really need big buffers to be linear so don't try so hard. In order to make this work well we will tidy up excess callers to request_room, which cannot itself enforce this break up. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01hrtimer: Tune hrtimer_interrupt hang logicThomas Gleixner
commit 41d2e494937715d3150e5c75d01f0e75ae899337 upstream. The hrtimer_interrupt hang logic adjusts min_delta_ns based on the execution time of the hrtimer callbacks. This is error-prone for virtual machines, where a guest vcpu can be scheduled out during the execution of the callbacks (and the callbacks themselves can do operations that translate to blocking operations in the hypervisor), which in can lead to large min_delta_ns rendering the system unusable. Replace the current heuristics with something more reliable. Allow the interrupt code to try 3 times to catch up with the lost time. If that fails use the total time spent in the interrupt handler to defer the next timer interrupt so the system can catch up with other things which got delayed. Limit that deferment to 100ms. The retry events and the maximum time spent in the interrupt handler are recorded and exposed via /proc/timer_list Inspired by a patch from Marcelo. Reported-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marcelo Tosatti <mtosatti@redhat.com> Cc: kvm@vger.kernel.org Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01decompress: fix new decompressor for PICRussell King
commit 5ceaa2f39bfa73c4398cd01e78f1c3ebde3d3383 upstream. The ARM kernel decompressor wants to be able to relocate r/w data independently from the rest of the image, and we do this by ensuring that r/w data has global visibility. Define STATIC_RW_DATA to be empty to achieve this. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Alain Knaff <alain@knaff.lu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15sched: Fix sched_mv_power_savings for !SMTVaidyanathan Srinivasan
commit 28f5318167adf23b16c844b9c2253f355cb21796 upstream. Fix for sched_mc_powersavigs for pre-Nehalem platforms. Child sched domain should clear SD_PREFER_SIBLING if parent will have SD_POWERSAVINGS_BALANCE because they are contradicting. Sets the flags correctly based on sched_mc_power_savings. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100208100555.GD2931@dirshya.in.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15x86: Avoid race condition in pci_enable_msix()Brandon Phiilps
commit ced5b697a76d325e7a7ac7d382dbbb632c765093 upstream. Keep chip_data in create_irq_nr and destroy_irq. When two drivers are setting up MSI-X at the same time via pci_enable_msix() there is a race. See this dmesg excerpt: [ 85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X [ 85.170611] alloc irq_desc for 99 on node -1 [ 85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X [ 85.170614] alloc kstat_irqs on node -1 [ 85.170616] alloc irq_2_iommu on node -1 [ 85.170617] alloc irq_desc for 100 on node -1 [ 85.170619] alloc kstat_irqs on node -1 [ 85.170621] alloc irq_2_iommu on node -1 [ 85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X [ 85.170626] alloc irq_desc for 101 on node -1 [ 85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X [ 85.170630] alloc kstat_irqs on node -1 [ 85.170631] alloc irq_2_iommu on node -1 [ 85.170635] alloc irq_desc for 102 on node -1 [ 85.170636] alloc kstat_irqs on node -1 [ 85.170639] alloc irq_2_iommu on node -1 [ 85.170646] BUG: unable to handle kernel NULL pointer dereference at 0000000000000088 As you can see igb and ixgbe are both alternating on create_irq_nr() via pci_enable_msix() in their probe function. ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data = NULL via dynamic_irq_init(). igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[] via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this: cfg_new = irq_desc_ptrs[102]->chip_data; if (cfg_new->vector != 0) continue; This hits the NULL deref. Another possible race exists via pci_disable_msix() in a driver or in the number of error paths that call free_msi_irqs(): destroy_irq() dynamic_irq_cleanup() which sets desc->chip_data = NULL ...race window... desc->chip_data = cfg; Remove the save and restore code for cfg in create_irq_nr() and destroy_irq() and take the desc->lock when checking the irq_cfg. Reported-and-analyzed-by: Brandon Philips <bphilips@suse.de> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-3-git-send-email-yinghai@kernel.org> Signed-off-by: Brandon Phililps <bphilips@suse.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOMWu Fengguang
commit 0141450f66c3c12a3aaa869748caa64241885cdf upstream. This fixes inefficient page-by-page reads on POSIX_FADV_RANDOM. POSIX_FADV_RANDOM used to set ra_pages=0, which leads to poor performance: a 16K read will be carried out in 4 _sync_ 1-page reads. In other places, ra_pages==0 means - it's ramfs/tmpfs/hugetlbfs/sysfs/configfs - some IO error happened where multi-page read IO won't help or should be avoided. POSIX_FADV_RANDOM actually want a different semantics: to disable the *heuristic* readahead algorithm, and to use a dumb one which faithfully submit read IO for whatever application requests. So introduce a flag FMODE_RANDOM for POSIX_FADV_RANDOM. Note that the random hint is not likely to help random reads performance noticeably. And it may be too permissive on huge request size (its IO size is not limited by read_ahead_kb). In Quentin's report (http://lkml.org/lkml/2009/12/24/145), the overall (NFS read) performance of the application increased by 313%! Tested-by: Quentin Barnes <qbarnes+nfs@yahoo-inc.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Andi Kleen <andi@firstfloor.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: <qbarnes+nfs@yahoo-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23drm/i915: add i915_lp_ring_sync helperDaniel Vetter
commit 48764bf43f746113fc77877d7e80f2df23ca4cbb upstream. This just waits until the hw passed the current ring position with cmd execution. This slightly changes the existing i915_wait_request function to make uninterruptible waiting possible - no point in returning to userspace while mucking around with the overlay, that piece of hw is just too fragile. Also replace a magic 0 with the symbolic constant (and kill the then superflous comment) while I was looking at the code. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23netfilter: nf_conntrack: fix hash resizing with namespacesPatrick McHardy
commit d696c7bdaa55e2208e56c6f98e6bc1599f34286d upstream. As noticed by Jon Masters <jonathan@jonmasters.org>, the conntrack hash size is global and not per namespace, but modifiable at runtime through /sys/module/nf_conntrack/hashsize. Changing the hash size will only resize the hash in the current namespace however, so other namespaces will use an invalid hash size. This can cause crashes when enlarging the hashsize, or false negative lookups when shrinking it. Move the hash size into the per-namespace data and only use the global hash size to initialize the per-namespace value when instanciating a new namespace. Additionally restrict hash resizing to init_net for now as other namespaces are not handled currently. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23netfilter: nf_conntrack: per netns nf_conntrack_cachepEric Dumazet
commit 5b3501faa8741d50617ce4191c20061c6ef36cb3 upstream. nf_conntrack_cachep is currently shared by all netns instances, but because of SLAB_DESTROY_BY_RCU special semantics, this is wrong. If we use a shared slab cache, one object can instantly flight between one hash table (netns ONE) to another one (netns TWO), and concurrent reader (doing a lookup in netns ONE, 'finding' an object of netns TWO) can be fooled without notice, because no RCU grace period has to be observed between object freeing and its reuse. We dont have this problem with UDP/TCP slab caches because TCP/UDP hashtables are global to the machine (and each object has a pointer to its netns). If we use per netns conntrack hash tables, we also *must* use per netns conntrack slab caches, to guarantee an object can not escape from one namespace to another one. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> [Patrick: added unique slab name allocation] Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23resource: add helpers for fetching rlimitsJiri Slaby
commit 3e10e716abf3c71bdb5d86b8f507f9e72236c9cd upstream. We want to be sure that compiler fetches the limit variable only once, so add helpers for fetching current and maximal resource limits which do that. Add them to sched.h (instead of resource.h) due to circular dependency sched.h->resource.h->task_struct Alternative would be to create a separate res_access.h or similar. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: James Morris <jmorris@namei.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09connector: Delete buggy notification code.Evgeniy Polyakov
commit f98bfbd78c37c5946cc53089da32a5f741efdeb7 upstream. On Tue, Feb 02, 2010 at 02:57:14PM -0800, Greg KH (gregkh@suse.de) wrote: > > There are at least two ways to fix it: using a big cannon and a small > > one. The former way is to disable notification registration, since it is > > not used by anyone at all. Second way is to check whether calling > > process is root and its destination group is -1 (kind of priveledged > > one) before command is dispatched to workqueue. > > Well if no one is using it, removing it makes the most sense, right? > > No objection from me, care to make up a patch either way for this? Getting it is not used, let's drop support for notifications about (un)registered events from connector. Another option was to check credentials on receiving, but we can always restore it without bugs if needed, but genetlink has a wider code base and none complained, that userspace can not get notification when some other clients were (un)registered. Kudos for Sebastian Krahmer <krahmer@suse.de>, who found a bug in the code. Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09libata: retry link resume if necessaryTejun Heo
commit 5040ab67a2c6d5710ba497dc52a8f7035729d7b0 upstream. Interestingly, when SIDPR is used in ata_piix, writes to DET in SControl sometimes get ignored leading to detection failure. Update sata_link_resume() such that it reads back SControl after clearing DET and retry if it's not clear. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: fengxiangjun <fengxiangjun@neusoft.com> Reported-by: Jim Faulkner <jfaulkne@ccs.neu.edu> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09KVM: allow userspace to adjust kvmclock offsetGlauber Costa
(cherry picked from afbcf7ab8d1bc8c2d04792f6d9e786e0adeb328d) When we migrate a kvm guest that uses pvclock between two hosts, we may suffer a large skew. This is because there can be significant differences between the monotonic clock of the hosts involved. When a new host with a much larger monotonic time starts running the guest, the view of time will be significantly impacted. Situation is much worse when we do the opposite, and migrate to a host with a smaller monotonic clock. This proposed ioctl will allow userspace to inform us what is the monotonic clock value in the source host, so we can keep the time skew short, and more importantly, never goes backwards. Userspace may also need to trigger the current data, since from the first migration onwards, it won't be reflected by a simple call to clock_gettime() anymore. [marcelo: future-proof abi with a flags field] [jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it] Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09ax25: netrom: rose: Fix timer oopsesJarek Poplawski
[ Upstream commit d00c362f1b0ff54161e0a42b4554ac621a9ef92d ] Wrong ax25_cb refcounting in ax25_send_frame() and by its callers can cause timer oopses (first reported with 2.6.29.6 kernel). Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=14905 Reported-by: Bernard Pidoux <bpidoux@free.fr> Tested-by: Bernard Pidoux <bpidoux@free.fr> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09net: restore ip source validationJamal Hadi Salim
[ Upstream commit 28f6aeea3f12d37bd258b2c0d5ba891bff4ec479 ] when using policy routing and the skb mark: there are cases where a back path validation requires us to use a different routing table for src ip validation than the one used for mapping ingress dst ip. One such a case is transparent proxying where we pretend to be the destination system and therefore the local table is used for incoming packets but possibly a main table would be used on outbound. Make the default behavior to allow the above and if users need to turn on the symmetry via sysctl src_valid_mark Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09Split 'flush_old_exec' into two functionsLinus Torvalds
commit 221af7f87b97431e3ee21ce4b0e77d5411cf1549 upstream. 'flush_old_exec()' is the point of no return when doing an execve(), and it is pretty badly misnamed. It doesn't just flush the old executable environment, it also starts up the new one. Which is very inconvenient for things like setting up the new personality, because we want the new personality to affect the starting of the new environment, but at the same time we do _not_ want the new personality to take effect if flushing the old one fails. As a result, the x86-64 '32-bit' personality is actually done using this insane "I'm going to change the ABI, but I haven't done it yet" bit (TIF_ABI_PENDING), with SET_PERSONALITY() not actually setting the personality, but just the "pending" bit, so that "flush_thread()" can do the actual personality magic. This patch in no way changes any of that insanity, but it does split the 'flush_old_exec()' function up into a preparatory part that can fail (still called flush_old_exec()), and a new part that will actually set up the new exec environment (setup_new_exec()). All callers are changed to trivially comply with the new world order. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09ACPI: Add platform-wide _OSC support.Shaohua Li
commit 3563ff964fdc36358cef0330936fdac28e65142a upstream. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09ACPI: Add a generic API for _OSC -v2Shaohua Li
commit 70023de88c58a81a730ab4d13c51a30e537ec76e upstream. v2->v1: .improve debug info as suggedted by Bjorn,Kenji .API is using uuid string as suggested by Alexey Add an API to execute _OSC. A lot of devices can have this method, so add a generic API. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09mm: add new 'read_cache_page_gfp()' helper functionLinus Torvalds
commit 0531b2aac59c2296570ac52bfc032ef2ace7d5e1 upstream. It's a simplified 'read_cache_page()' which takes a page allocation flag, so that different paths can control how aggressive the memory allocations are that populate a address space. In particular, the intel GPU object mapping code wants to be able to do a certain amount of own internal memory management by automatically shrinking the address space when memory starts getting tight. This allows it to dynamically use different memory allocation policies on a per-allocation basis, rather than depend on the (static) address space gfp policy. The actual new function is a one-liner, but re-organizing the helper functions to the point where you can do this with a single line of code is what most of the patch is all about. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28libfc: fix free of fc_rport_priv with timer pendingJoe Eykholt
commit b4a9c7ede96e90f7b1ec009ce7256059295e76df upstream. Timer crashes were caused by freeing a struct fc_rport_priv with a timer pending, causing the timer facility list to be corrupted. This was during FC uplink flap tests with a lot of targets. After discovery, we were doing an PLOGI on an rdata that was in DELETE state but not yet removed from the lookup list. This moved the rdata from DELETE state to PLOGI state. If the PLOGI exchange allocation failed and needed to be retried, the timer scheduling could race with the free being done by fc_rport_work(). When fc_rport_login() is called on a rport in DELETE state, move it to a new state RESTART. In fc_rport_work, when handling a LOGO, STOPPED or FAILED event, look for restart state. In the RESTART case, don't take the rdata off the list and after the transport remote port is deleted and exchanges are reset, re-login to the remote port. Note that the new RESTART state also corrects a problem we had when re-discovering a port that had moved to DELETE state. In that case, a new rdata was created, but the old rdata would do an exchange manager reset affecting the FC_ID for both the new rdata and old rdata. With the new state, the new port isn't logged into until after any old exchanges are reset. Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28libfc: Fix frags in frame exceeding SKB_MAX_FRAGS in fc_fcp_send_dataYi Zou
commit d37322a43ebac79eef417149f5696390cf8872db upstream. In case of sequence offload, in fc_fcp_send_data(), the skb_fill_page_info() called may end up adding more frags to the skb_shinfo(fp_skb(fp))->frags[], exceeding SKB_MAX_FRAGS, this eventually corrupts the memory. I am adding the FR_FRAME_SG_LEN back, but as SKB_MAX_FRAGS -1, leaving 1 for our fcoe_eof_crc page. And send will be broken into multiple large sends if the frame already contains more frags than skb handle. Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28HID: fixup quirk for NCR devicesJiri Kosina
commit 5b915d9e6dc3d22fedde91dfef1cb1a8fa9a1870 upstream. NCR devices are terminally broken by design -- they claim themselves to contain proper input applications in their HID report descriptor, but behave very badly if treated in standard way. According to NCR developers, the devices get confused when queried for reports in a standard way, rendering them unusable. NCR is shipping application called "RPSL" that can be used to drive these devices through hiddev, under the assumption that in-kernel driver doesn't perform initial report query. If it does, neither in-kernel nor hiddev-based driver can operate with these devices any more. Introduce a quirk that skips the report query for all NCR devices. The previous NOGET quirk was wrong and had been introduced because I misunderstood the nature of brokenness of these devices. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28nohz: Prevent clocksource wrapping during idleJon Hunter
commit 98962465ed9e6ea99c38e0af63fe1dcb5a79dc25 upstream. The dynamic tick allows the kernel to sleep for periods longer than a single tick, but it does not limit the sleep time currently. In the worst case the kernel could sleep longer than the wrap around time of the time keeping clock source which would result in losing track of time. Prevent this by limiting it to the safe maximum sleep time of the current time keeping clock source. The value is calculated when the clock source is registered. [ tglx: simplified the code a bit and massaged the commit msg ] Signed-off-by: Jon Hunter <jon-hunter@ti.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <1250617512-23567-2-git-send-email-jon-hunter@ti.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28powerpc/fsl: Add PCI device ids for new QoirQ chipsBen Hutchings
commit a3f62bd2b20c769ddc989b242ddd274179e19ee6 upstream by Kumar Gala <galak@kernel.crashing.org>. I have adjusted the patch context for 2.6.32. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28ACPI: don't cond_resched if irq is disabledXiaotian Feng
commit c084ca704a3661bf77690a05bc6bd2c305d87c34 upstream. commit 8bd108d adds preemption point after each opcode parse, then a sleeping function called from invalid context bug was founded during suspend/resume stage. this was fixed in commit abe1dfa by don't cond_resched when irq_disabled. But recent commit 138d156 changes the behaviour to don't cond_resched when in_atomic. This makes the sleeping function called from invalid context bug happen again, which is reported in http://lkml.org/lkml/2009/12/1/371. This patch also fixes http://bugzilla.kernel.org/show_bug.cgi?id=14483 Reported-and-bisected-by: Larry Finger <Larry.Finger@lwfinger.net> Reported-and-bisected-by: Justin P. Mattock <justinmattock@gmail.com> Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Acked-by: Alexey Starikovskiy <astarikovskiy@suse.de> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-25block: bdev_stack_limits wrapperMartin K. Petersen
commit 17be8c245054b9c7786545af3ba3ca4e54cd4ad9 upstream. DM does not want to know about partition offsets. Add a partition-aware wrapper that DM can use when stacking block devices. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-25SCSI: enclosure: fix oops while iterating enclosure_status arrayJames Bottomley
commit cc9b2e9f6603190c009e5d2629ce8e3f99571346 upstream. Based on patch originally by Jeff Mahoney <jeffm@suse.com> enclosure_status is expected to be a NULL terminated array of strings but isn't actually NULL terminated. When writing an invalid value to /sys/class/enclosure/.../.../status, it goes off the end of the array and Oopses. Fix by making the assumption true and adding NULL at the end. Reported-by: Artur Wojcik <artur.wojcik@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-22PCI/cardbus: Add a fixup hook and fix powerpcBenjamin Herrenschmidt
commit 2d1c861871d767153538a77c498752b36d4bb4b8 upstream The cardbus code creates PCI devices without ever going through the necessary fixup bits and pieces that normal PCI devices go through. There's in fact a commented out call to pcibios_fixup_bus() in there, it's commented because ... it doesn't work. I could make pcibios_fixup_bus() do the right thing on powerpc easily but I felt it cleaner instead to provide a specific hook pci_fixup_cardbus for which a weak empty implementation is provided by the PCI core. This fixes cardbus on powerbooks and probably all other PowerPC platforms which was broken completely for ever on some platforms and since 2.6.31 on others such as PowerBooks when we made the DMA ops mandatory (since those are setup by the fixups). Acked-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-22mfd: Correct WM835x ISINK ramp time definesMark Brown
commit 9dffe2a32b0deef52605d50527c0d240b15cabf7 upstream. The constants used to specify ISINK ramp times for WM835x had the wrong shifts so that the on times applied to the off ramp and vice versa. The masks for the bitfields are correct. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-22block: Fix incorrect reporting of partition alignmentMartin K. Petersen
commit 81744ee44ab2845c16ffd7d6f762f7b4a49a4750 upstream queue_sector_alignment_offset returned the wrong value which caused partitions to report an incorrect alignment_offset. Since offset calculation is needed several places it has been split into a separate helper function. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Tested-by: Mike Snitzer <snitzer@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-18drm: remove address mask param for drm_pci_alloc()Zhenyu Wang
commit e6be8d9d17bd44061116f601fe2609b3ace7aa69 upstream. drm_pci_alloc() has input of address mask for setting pci dma mask on the device, which should be properly setup by drm driver. And leave it as a param for drm_pci_alloc() would cause confusion or mistake would corrupt the correct dma mask setting, as seen on intel hw which set wrong dma mask for hw status page. So remove it from drm_pci_alloc() function. Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-18untangle the do_mremap() messAl Viro
This backports the following upstream commits all as one patch: 54f5de709984bae0d31d823ff03de755f9dcac54 ecc1a8993751de4e82eb18640d631dae1f626bd6 1a0ef85f84feb13f07b604fcf5b90ef7c2b5c82f f106af4e90eadd76cfc0b5325f659619e08fb762 097eed103862f9c6a97f2e415e21d1134017b135 935874141df839c706cd6cdc438e85eb69d1525e 0ec62d290912bb4b989be7563851bc364ec73b56 c4caa778157dbbf04116f0ac2111e389b5cd7a29 2ea1d13f64efdf49319e86c87d9ba38c30902782 570dcf2c15463842e384eb597a87c1e39bead99b 564b3bffc619dcbdd160de597b0547a7017ea010 0067bd8a55862ac9dd212bd1c4f6f5bff1ca1301 f8b7256096a20436f6d0926747e3ac3d64c81d24 8c7b49b3ecd48923eb64ff57e07a1cdb74782970 9206de95b1ea68357996ec02be5db0638a0de2c1 2c6a10161d0b5fc047b5bd81b03693b9af99fab5 05d72faa6d13c9d857478a5d35c85db9adada685 bb52d6694002b9d632bb355f64daa045c6293a4e e77414e0aad6a1b063ba5e5750c582c75327ea6a aa65607373a4daf2010e8c3867b6317619f3c1a3 Backport done by Greg Kroah-Hartman. Only minor tweaks were needed. Cc: David S. Miller <davem@davemloft.net> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06quota: decouple fs reserved space from quota reservationDmitry Monakhov
commit fd8fbfc1709822bd94247c5b2ab15a5f5041e103 upstream. Currently inode_reservation is managed by fs itself and this reservation is transfered on dquot_transfer(). This means what inode_reservation must always be in sync with dquot->dq_dqb.dqb_rsvspace. Otherwise dquot_transfer() will result in incorrect quota(WARN_ON in dquot_claim_reserved_space() will be triggered) This is not easy because of complex locking order issues for example http://bugzilla.kernel.org/show_bug.cgi?id=14739 The patch introduce quota reservation field for each fs-inode (fs specific inode is used in order to prevent bloating generic vfs inode). This reservation is managed by quota code internally similar to i_blocks/i_bytes and may not be always in sync with internal fs reservation. Also perform some code rearrangement: - Unify dquot_reserve_space() and dquot_reserve_space() - Unify dquot_release_reserved_space() and dquot_free_space() - Also this patch add missing warning update to release_rsv() dquot_release_reserved_space() must call flush_warnings() as dquot_free_space() does. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06Add unlocked version of inode_add_bytes() functionDmitry Monakhov
commit b462707e7ccad058ae151e5c5b06eb5cadcb737f upstream. Quota code requires unlocked version of this function. Off course we can just copy-paste the code, but copy-pasting is always an evil. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06sched: Fix balance vs hotplug racePeter Zijlstra
commit 6ad4c18884e864cf4c77f9074d3d1816063f99cd upstream. Since (e761b77: cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment) we have cpu_active_mask which is suppose to rule scheduler migration and load-balancing, except it never (fully) did. The particular problem being solved here is a crash in try_to_wake_up() where select_task_rq() ends up selecting an offline cpu because select_task_rq_fair() trusts the sched_domain tree to reflect the current state of affairs, similarly select_task_rq_rt() trusts the root_domain. However, the sched_domains are updated from CPU_DEAD, which is after the cpu is taken offline and after stop_machine is done. Therefore it can race perfectly well with code assuming the domains are right. Cure this by building the domains from cpu_active_mask on CPU_DOWN_PREPARE. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Holger Hoffstätte <holger.hoffstaette@googlemail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06netfilter: fix crashes in bridge netfilter caused by fragment jumpsPatrick McHardy
commit 8fa9ff6849bb86c59cc2ea9faadf3cb2d5223497 upstream. When fragments from bridge netfilter are passed to IPv4 or IPv6 conntrack and a reassembly queue with the same fragment key already exists from reassembling a similar packet received on a different device (f.i. with multicasted fragments), the reassembled packet might continue on a different codepath than where the head fragment originated. This can cause crashes in bridge netfilter when a fragment received on a non-bridge device (and thus with skb->nf_bridge == NULL) continues through the bridge netfilter code. Add a new reassembly identifier for packets originating from bridge netfilter and use it to put those packets in insolated queues. Fixes http://bugzilla.kernel.org/show_bug.cgi?id=14805 Reported-and-Tested-by: Chong Qiao <qiaochong@loongson.cn> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06ipv6: reassembly: use seperate reassembly queues for conntrack and local ↵Patrick McHardy
delivery commit 0b5ccb2ee250136dd7385b1c7da28417d0d4d32d upstream. Currently the same reassembly queue might be used for packets reassembled by conntrack in different positions in the stack (PREROUTING/LOCAL_OUT), as well as local delivery. This can cause "packet jumps" when the fragment completing a reassembled packet is queued from a different position in the stack than the previous ones. Add a "user" identifier to the reassembly queue key to seperate the queues of each caller, similar to what we do for IPv4. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06NOMMU: Optimise away the {dac_,}mmap_min_addr testsDavid Howells
commit 6e1415467614e854fee660ff6648bd10fa976e95 upstream. In NOMMU mode clamp dac_mmap_min_addr to zero to cause the tests on it to be skipped by the compiler. We do this as the minimum mmap address doesn't make any sense in NOMMU mode. mmap_min_addr and round_hint_to_min() can be discarded entirely in NOMMU mode. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18vmalloc: conditionalize build of pcpu_get_vm_areas()Tejun Heo
No matching upstream commit as it was resolved differently there. pcpu_get_vm_areas() is used only when dynamic percpu allocator is used by the architecture. In 2.6.32, ia64 doesn't use dynamic percpu allocator and has a macro which makes pcpu_get_vm_areas() buggy via local/global variable aliasing and triggers compile warning. The problem is fixed in upstream and ia64 uses dynamic percpu allocators, so the only left issue is inclusion of unnecessary code and compile warning on ia64 on 2.6.32. Don't build pcpu_get_vm_areas() if legacy percpu allocator is in use. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jan Beulich <JBeulich@novell.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18drm/i915: Fix sync to vblank when VGA output is turned offLi Peng
commit 778c902640530371a169ad1c03566e7c51b09874 upstream In current vblank-wait implementation, if we turn off VGA output, drm_wait_vblank will still wait on the disabled pipe until timeout, because vblank on the pipe is assumed be enabled. This would cause slow system response on some system such as moblin. This patch resolve the issue by adding a drm helper function drm_vblank_off which explicitly clear vblank_enabled[crtc], wake up any waiting queue and save last vblank counter before turning off crtc. It also slightly change drm_vblank_get to ensure that we will will return immediately if trying to wait on a disabled pipe. Signed-off-by: Li Peng <peng.li@intel.com> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> [anholt: hand-applied for conflicts with overlay changes] Signed-off-by: Eric Anholt <eric@anholt.net> Cc: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18tracing: Fix event format exportJohannes Berg
commit 811cb50baf63461ce0bdb234927046131fc7fa8b upstream. For some reason the export of the event print format to userspace uses '#fmt' which breaks if the format string is anything but a plain string, for example if it is built with macros then the macro names are exported instead of their contents. Use "\"%s\"", fmt instead of "%s", #fmt to export the string and not the way it is built. For example, in net/mac80211/driver-trace.h for the trace event drv_start there is: TP_printk( LOCAL_PR_FMT, LOCAL_PR_ARG ) Which use to produce: print fmt: LOCAL_PR_FMT, REC->wiphy_name Now produces: print fmt: "%s", REC->wiphy_name Signed-off-by: Johannes Berg <johannes@sipsolutions.net> LKML-Reference: <20091113224009.GB23942@elte.hu> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18tcp: Stalling connections: Fix timeout calculation routineDamian Lukowski
[ Upstream commit 07f29bc5bbae4e53e982ab956fed7207990a7786 ] This patch fixes a problem in the TCP connection timeout calculation. Currently, timeout decisions are made on the basis of the current tcp_time_stamp and retrans_stamp, which is usually set at the first retransmission. However, if the retransmission fails in tcp_retransmit_skb(), retrans_stamp is not updated and remains zero. This leads to wrong decisions in retransmits_timed_out() if tcp_time_stamp is larger than the specified timeout, which is very likely. In this case, the TCP connection dies after the first attempted (and unsuccessful) retransmission. With this patch, tcp_skb_cb->when is used instead, when retrans_stamp is not available. This bug has been introduced together with retransmits_timed_out() in 2.6.32, as the number of retransmissions has been used for timeout decisions before. The corresponding commit was 6fa12c85031485dff38ce550c24f10da23b0adaa (Revert Backoff [v3]: Calculate TCP's connection close threshold as a time value.). Thanks to Ilpo Järvinen for code suggestions and Frederic Leroy for testing. Reported-by: Frederic Leroy <fredo@starox.org> Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18drm/ttm: Fix build failure due to missing struct pageMartin Michlmayr
commit c3a73ba13bac7fd96030f39202b2d37fb19c46a6 upstream. drm/ttm fails to build on MIPS because "struct page" is not known: | In file included from drivers/gpu/drm/ttm/ttm_memory.c:28: | include/drm/ttm/ttm_memory.h:154: warning: 'struct page' declared inside parameter list | include/drm/ttm/ttm_memory.h:154: warning: its scope is only this definition or declaration, which is probably not what you want | include/drm/ttm/ttm_memory.h:156: warning: 'struct page' declared inside parameter list | drivers/gpu/drm/ttm/ttm_memory.c:540: error: conflicting types for 'ttm_mem_global_alloc_page' | include/drm/ttm/ttm_memory.h:154: error: previous declaration of 'ttm_mem_global_alloc_page' was here | drivers/gpu/drm/ttm/ttm_memory.c:561: error: conflicting types for 'ttm_mem_global_free_page' | include/drm/ttm/ttm_memory.h:156: error: previous declaration of 'ttm_mem_global_free_page' was here Signed-off-by: Martin Michlmayr <tbm@cyrius.com> Acked-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18USB: usb-storage: add BAD_SENSE flagAlan Stern
commit a0bb108112a872c0b0c4b3ef4974f95fb75b155d upstream. This patch (as1311) fixes a problem in usb-storage: Some devices are pretty broken when it comes to reporting sense data. The information they send back indicates that they have more than 18 bytes of sense data available, but when the system asks for more than 18 they fail or hang. The symptom is that probing fails with multiple resets. The patch adds a new BAD_SENSE flag to indicate that usb-storage should never ask for more than 18 bytes of sense data. The flag can be set in an unusual_devs entry or via the "quirks=" module parameter, and it is set automatically whenever a REQUEST SENSE command for more than 18 bytes fails or times out. An unusual_devs entry is added for the Agfa photo frame, which uses a Prolific chip having this bug. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Daniel Kukula <daniel.kuku@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18KVM: s390: Make psw available on all exits, not just a subsetCarsten Otte
commit d7b0b5eb3000c6fb902f08c619fcd673a23d8fab upstream. This patch moves s390 processor status word into the base kvm_run struct and keeps it up-to date on all userspace exits. The userspace ABI is broken by this, however there are no applications in the wild using this. A capability check is provided so users can verify the updated API exists. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>