summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2010-05-12tracing: Fix ftrace_event_call alignment for use with gcc 4.5Jeff Mahoney
commit 86c38a31aa7f2dd6e74a262710bf8ebf7455acc5 upstream. GCC 4.5 introduces behavior that forces the alignment of structures to use the largest possible value. The default value is 32 bytes, so if some structures are defined with a 4-byte alignment and others aren't declared with an alignment constraint at all - it will align at 32-bytes. For things like the ftrace events, this results in a non-standard array. When initializing the ftrace subsystem, we traverse the _ftrace_events section and call the initialization callback for each event. When the structures are misaligned, we could be treating another part of the structure (or the zeroed out space between them) as a function pointer. This patch forces the alignment for all the ftrace_event_call structures to 4 bytes. Without this patch, the kernel fails to boot very early when built with gcc 4.5. It's trivial to check the alignment of the members of the array, so it might be worthwhile to add something to the build system to do that automatically. Unfortunately, that only covers this case. I've asked one of the gcc developers about adding a warning when this condition is seen. Cc: stable@kernel.org Signed-off-by: Jeff Mahoney <jeffm@suse.com> LKML-Reference: <4B85770B.6010901@suse.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andreas Radke <a.radke@arcor.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12ACPI: introduce kernel parameter acpi_sleep=sci_force_enableZhang Rui
commit d7f0eea9e431e1b8b0742a74db1a9490730b2a25 upstream. Introduce kernel parameter acpi_sleep=sci_force_enable some laptop requires SCI_EN being set directly on resume, or else they hung somewhere in the resume code path. We already have a blacklist for these laptops but we still need this option, especially when debugging some suspend/resume problems, in case there are systems that need this workaround and are not yet in the blacklist. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12libata: Fix accesses at LBA28 boundary (old bug, but nasty) (v2)Mark Lord
commit 45c4d015a92f72ec47acd0c7557abdc0c8a6499d upstream. Most drives from Seagate, Hitachi, and possibly other brands, do not allow LBA28 access to sector number 0x0fffffff (2^28 - 1). So instead use LBA48 for such accesses. This bug could bite a lot of systems, especially when the user has taken care to align partitions to 4KB boundaries. On misaligned systems, it is less likely to be encountered, since a 4KB read would end at 0x10000000 rather than at 0x0fffffff. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12hugetlb: fix infinite loop in get_futex_key() when backed by huge pagesMel Gorman
commit 23be7468e8802a2ac1de6ee3eecb3ec7f14dc703 upstream. If a futex key happens to be located within a huge page mapped MAP_PRIVATE, get_futex_key() can go into an infinite loop waiting for a page->mapping that will never exist. See https://bugzilla.redhat.com/show_bug.cgi?id=552257 for more details about the problem. This patch makes page->mapping a poisoned value that includes PAGE_MAPPING_ANON mapped MAP_PRIVATE. This is enough for futex to continue but because of PAGE_MAPPING_ANON, the poisoned value is not dereferenced or used by futex. No other part of the VM should be dereferencing the page->mapping of a hugetlbfs page as its page cache is not on the LRU. This patch fixes the problem with the test case described in the bugzilla. [akpm@linux-foundation.org: mel cant spel] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Darren Hart <darren@dvhart.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-12core, x86: make LIST_POISON less deadlyAvi Kivity
commit a29815a333c6c6e677294bbe5958e771d0aad3fd upstream. The list macros use LIST_POISON1 and LIST_POISON2 as undereferencable pointers in order to trap erronous use of freed list_heads. Unfortunately userspace can arrange for those pointers to actually be dereferencable, potentially turning an oops to an expolit. To avoid this allow architectures (currently x86_64 only) to override the default values for these pointers with truly-undereferencable values. This is easy on x86_64 as the virtual address space is large and contains areas that cannot be mapped. Other 64-bit architectures will likely find similar unmapped ranges. [ingo: switch to 0xdead000000000000 as the unmapped area] [ingo: add comments, cleanup] [jaswinder: eliminate sparse warnings] Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26KVM: Increase NR_IOBUS_DEVS limit to 200Sridhar Samudrala
(Cherry-picked from commit e80e2a60ff7914dae691345a976c80bbbff3ec74) This patch increases the current hardcoded limit of NR_IOBUS_DEVS from 6 to 200. We are hitting this limit when creating a guest with more than 1 virtio-net device using vhost-net backend. Each virtio-net device requires 2 such devices to service notifications from rx/tx queues. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26KVM: fix the handling of dirty bitmaps to avoid overflowsTakuya Yoshikawa
(Cherry-picked from commit 87bf6e7de1134f48681fd2ce4b7c1ec45458cb6d) Int is not long enough to store the size of a dirty bitmap. This patch fixes this problem with the introduction of a wrapper function to calculate the sizes of dirty bitmaps. Note: in mark_page_dirty(), we have to consider the fact that __set_bit() takes the offset as int, not long. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26fs-writeback: Add helper function to start writeback if idleEric Sandeen
commit 17bd55d037a02b04d9119511cfd1a4b985d20f63 upstream. ext4, at least, would like to start pushing on writeback if it starts to get close to ENOSPC when reserving worst-case blocks for delalloc writes. Writing out delalloc data will convert those worst-case predictions into usually smaller actual usage, freeing up space before we hit ENOSPC based on this speculation. Thanks to Jens for the suggestion for the helper function, & the naming help. I've made the helper return status on whether writeback was started even though I don't plan to use it in the ext4 patch; it seems like it would be potentially useful to test this in some cases. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Jan Kara <jack@suse.cz> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26module: fix __module_ref_addr()Mathieu Desnoyers
The __module_ref_addr() problem disappears in 2.6.34-rc kernels because these percpu accesses were re-factored. __module_ref_addr() should use per_cpu_ptr() to obfuscate the pointer (RELOC_HIDE is needed for per cpu pointers). This non-standard per-cpu pointer use has been introduced by commit 720eba31f47aeade8ec130ca7f4353223c49170f It causes a NULL pointer exception on some configurations when CONFIG_TRACING is enabled on 2.6.33. This patch fixes the problem (acknowledged by Randy who reported the bug). It did not appear to hurt previously because most of the accesses were done through local_inc, which probably obfuscated the access enough that no compiler optimizations were done. But with local_read() done when CONFIG_TRACING is active, this becomes a problem. Non-CONFIG_TRACING is probably affected as well (module.c contains local_set and local_read that use __module_ref_addr()), but I guess nobody noticed because we've been lucky enough that the compiler did not generate the inappropriate optimization pattern there. This patch should be queued for the 2.6.29.x through 2.6.33.x stable branches. (tested on 2.6.33.1 x86_64) Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Tested-by: Randy Dunlap <randy.dunlap@oracle.com> CC: Eric Dumazet <dada1@cosmosbay.com> CC: Rusty Russell <rusty@rustcorp.com.au> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Tejun Heo <tj@kernel.org> CC: Ingo Molnar <mingo@elte.hu> CC: Andrew Morton <akpm@linux-foundation.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Greg Kroah-Hartman <gregkh@suse.de> CC: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26x86/PCI: irq and pci_ids patch for Intel Cougar Point DeviceIDsSeth Heasley
commit 93da6202264ce1256b04db8008a43882ae62d060 upstream. This patch adds the Intel Cougar Point (PCH) LPC and SMBus Controller DeviceIDs. Signed-off-by: Seth Heasley <seth.heasley@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26pci: Update pci_set_vga_state() to call arch functionsMike Travis
commit 95a8b6efc5d07103583f706c8a5889437d537939 upstream. Update pci_set_vga_state to call arch dependent functions to enable Legacy VGA I/O transactions to be redirected to correct target. [akpm@linux-foundation.org: make pci_register_set_vga_state() __init] Signed-off-by: Mike Travis <travis@sgi.com> LKML-Reference: <201002022238.o12McE1J018723@imap1.linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Robin Holt <holt@sgi.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: David Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26SCSI: fc-transport: Use packed modifier for fc_bsg_request structure.Harish Zunjarrao
commit eda05a28ec52be40086400a1b606d211276f0e41 upstream. The 32bit kernel does not add padding bytes in the fc_bsg_request structure whereas the 64bit kernel adds padding bytes in the fc_bsg_request structure. Due to this, structure elements gets mismatched with 32bit application and 64bit kernel.To resolve this, used packed modifier to avoid adding padding bytes. Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26drm/radeon/kms: add FireMV 2400 PCI ID.Dave Airlie
commit 79b9517a33a283c5d9db875c263670ed1e055f7e upstream. This is an M24/X600 chip. From RH# 581927 Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26NFSv4: fix delegated lockingTrond Myklebust
commit 0df5dd4aae211edeeeb84f7f84f6d093406d7c22 upstream. Arnaud Giersch reports that NFSv4 locking is broken when we hold a delegation since commit 8e469ebd6dc32cbaf620e134d79f740bf0ebab79 (NFSv4: Don't allow posix locking against servers that don't support it). According to Arnaud, the lock succeeds the first time he opens the file (since we cannot do a delegated open) but then fails after we start using delegated opens. The following patch fixes it by ensuring that locking behaviour is governed by a per-filesystem capability flag that is initially set, but gets cleared if the server ever returns an OPEN without the NFS4_OPEN_RESULT_LOCKTYPE_POSIX flag being set. Reported-by: Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26resource: move kernel function inside __KERNEL__Jiri Slaby
commit 96d07d211739fd2450ac54e81d00fa40fcd4b1bd upstream From: Jiri Slaby <jslaby@suse.cz> resource: move kernel function inside __KERNEL__ It is an internal function. Move it inside __KERNEL__ ifdef, along with task_struct declaration. Then we get: #--- /usr/include/linux/resource.h 2009-09-14 15:09:29.000000000 +0200 #+++ usr/include/linux/resource.h 2010-01-04 11:30:54.000000000 +0100 #@@ -3,8 +3,6 @@ # ##include <linux/time.h> # #-struct task_struct; #- #/* #* Resource control/accounting header file for linux #*/ #@@ -70,6 +68,5 @@ #*/ ##include <asm/resource.h> # #-int getrusage(struct task_struct *p, int who, struct rusage *ru); # ##endif # #*********** include/linux/Kbuild is untouched, since unifdef is run even on headers-y nowadays. backport to 2.6.32 by maximilian attems <max@stro.at> Patch commented out by gregkh due to quilt complaining. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26raw: fsync method is now requiredAnton Blanchard
commit 55ab3a1ff843e3f0e24d2da44e71bffa5d853010 upstream. Commit 148f948ba877f4d3cdef036b1ff6d9f68986706a (vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode) broke the raw driver. We now call through generic_file_aio_write -> generic_write_sync -> vfs_fsync_range. vfs_fsync_range has: if (!fop || !fop->fsync) { ret = -EINVAL; goto out; } But drivers/char/raw.c doesn't set an fsync method. We have two options: fix it or remove the raw driver completely. I'm happy to do either, the fact this has been broken for so long suggests it is rarely used. The patch below adds an fsync method to the raw driver. My knowledge of the block layer is pretty sketchy so this could do with a once over. If we instead decide to remove the raw driver, this patch might still be useful as a backport to 2.6.33 and 2.6.32. Signed-off-by: Anton Blanchard <anton@samba.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <jens.axboe@oracle.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Tested-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26Freezer: Fix buggy resume test for tasks frozen with cgroup freezerMatt Helsley
commit 5a7aadfe2fcb0f69e2acc1fbefe22a096e792fc9 upstream. When the cgroup freezer is used to freeze tasks we do not want to thaw those tasks during resume. Currently we test the cgroup freezer state of the resuming tasks to see if the cgroup is FROZEN. If so then we don't thaw the task. However, the FREEZING state also indicates that the task should remain frozen. This also avoids a problem pointed out by Oren Ladaan: the freezer state transition from FREEZING to FROZEN is updated lazily when userspace reads or writes the freezer.state file in the cgroup filesystem. This means that resume will thaw tasks in cgroups which should be in the FROZEN state if there is no read/write of the freezer.state file to trigger this transition before suspend. NOTE: Another "simple" solution would be to always update the cgroup freezer state during resume. However it's a bad choice for several reasons: Updating the cgroup freezer state is somewhat expensive because it requires walking all the tasks in the cgroup and checking if they are each frozen. Worse, this could easily make resume run in N^2 time where N is the number of tasks in the cgroup. Finally, updating the freezer state from this code path requires trickier locking because of the way locks must be ordered. Instead of updating the freezer state we rely on the fact that lazy updates only manage the transition from FREEZING to FROZEN. We know that a cgroup with the FREEZING state may actually be FROZEN so test for that state too. This makes sense in the resume path even for partially-frozen cgroups -- those that really are FREEZING but not FROZEN. Reported-by: Oren Ladaan <orenl@cs.columbia.edu> Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26drm/radeon: add new RS880 pci idAlex Deucher
commit 338e2b1d571e4873908b199c90d6a31f65137fe3 upstream. This should go to 2.6.33 stable as well. Signed-off-by: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01block: Backport of various I/O topology fixes from 2.6.33 and 2.6.34Martin K. Petersen
block: Backport of various I/O topology fixes from 2.6.33 and 2.6.34 The stacking code incorrectly scaled up the data offset in some cases causing misaligned devices to report alignment. Rewrite the stacking algorithm to remedy this. (Upstream commit 9504e0864b58b4a304820dcf3755f1da80d5e72f) The top device misalignment flag would not be set if the added bottom device was already misaligned as opposed to causing a stacking failure. Also massage the reporting so that an error is only returned if adding the bottom device caused the misalignment. I.e. don't return an error if the top is already flagged as misaligned. (Upstream commit fe0b393f2c0a0d23a9bc9ed7dc51a1ee511098bd) lcm() was defined to take integer-sized arguments. The supplied arguments are multiplied, however, causing us to overflow given sufficiently large input. That in turn led to incorrect optimal I/O size reporting in some cases. Switch lcm() over to unsigned long similar to gcd() and move the function from blk-settings.c to lib. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01quota: manage reserved space when quota is not active [v2]Dmitry Monakhov
commit c469070aea5a0ada45a836937c776fd3083dae2b upstream. Since we implemented generic reserved space management interface, then it is possible to account reserved space even when quota is not active (similar to i_blocks/i_bytes). Without this patch following testcase result in massive comlain from WARN_ON in dquot_claim_space() TEST_CASE: mount /dev/sdb /mnt -oquota dd if=/dev/zero of=/mnt/test bs=1M count=1 quotaon /mnt # fs_reserved_spave == 1Mb # quota_reserved_space == 0, because quota was disabled dd if=/dev/zero of=/mnt/test seek=1 bs=1M count=1 # fs_reserved_spave == 2Mb # quota_reserved_space == 1Mb sync # ->dquot_claim_space() -> WARN_ON Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01mac80211: Retry null data frame for power saveVivek Natarajan
commit 375177bf35efc08e1bd37bbda4cc0c8cc4db8500 upstream. Even if the null data frame is not acked by the AP, mac80211 goes into power save. This might lead to loss of frames from the AP. Prevent this by restarting dynamic_ps_timer when ack is not received for null data frames. Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Vivek Natarajan <vnatarajan@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01if_tunnel.h: add missing ams/byteorder.h includePaulius Zaleckas
commit 9bf35c8dddd56f7f247a27346f74f5adc18071f4 upstream. When compiling userspace application which includes if_tunnel.h and uses GRE_* defines you will get undefined reference to __cpu_to_be16. Fix this by adding missing #include <asm/byteorder.h> Signed-off-by: Paulius Zaleckas <paulius.zaleckas@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01tty: Take a 256 byte padding into account when buffering below sub-page unitsMel Gorman
commit 352fa6ad16b89f8ffd1a93b4419b1a8f2259feab upstream. The TTY layer takes some care to ensure that only sub-page allocations are made with interrupts disabled. It does this by setting a goal of "TTY_BUFFER_PAGE" to allocate. Unfortunately, while TTY_BUFFER_PAGE takes the size of tty_buffer into account, it fails to account that tty_buffer_find() rounds the buffer size out to the next 256 byte boundary before adding on the size of the tty_buffer. This patch adjusts the TTY_BUFFER_PAGE calculation to take into account the size of the tty_buffer and the padding. Once applied, tty_buffer_alloc() should not require high-order allocations. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01tty: Keep the default buffering to sub-page unitsAlan Cox
commit d9661adfb8e53a7647360140af3b92284cbe52d4 upstream. We allocate during interrupts so while our buffering is normally diced up small anyway on some hardware at speed we can pressure the VM excessively for page pairs. We don't really need big buffers to be linear so don't try so hard. In order to make this work well we will tidy up excess callers to request_room, which cannot itself enforce this break up. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01hrtimer: Tune hrtimer_interrupt hang logicThomas Gleixner
commit 41d2e494937715d3150e5c75d01f0e75ae899337 upstream. The hrtimer_interrupt hang logic adjusts min_delta_ns based on the execution time of the hrtimer callbacks. This is error-prone for virtual machines, where a guest vcpu can be scheduled out during the execution of the callbacks (and the callbacks themselves can do operations that translate to blocking operations in the hypervisor), which in can lead to large min_delta_ns rendering the system unusable. Replace the current heuristics with something more reliable. Allow the interrupt code to try 3 times to catch up with the lost time. If that fails use the total time spent in the interrupt handler to defer the next timer interrupt so the system can catch up with other things which got delayed. Limit that deferment to 100ms. The retry events and the maximum time spent in the interrupt handler are recorded and exposed via /proc/timer_list Inspired by a patch from Marcelo. Reported-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marcelo Tosatti <mtosatti@redhat.com> Cc: kvm@vger.kernel.org Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01decompress: fix new decompressor for PICRussell King
commit 5ceaa2f39bfa73c4398cd01e78f1c3ebde3d3383 upstream. The ARM kernel decompressor wants to be able to relocate r/w data independently from the rest of the image, and we do this by ensuring that r/w data has global visibility. Define STATIC_RW_DATA to be empty to achieve this. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Alain Knaff <alain@knaff.lu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15sched: Fix sched_mv_power_savings for !SMTVaidyanathan Srinivasan
commit 28f5318167adf23b16c844b9c2253f355cb21796 upstream. Fix for sched_mc_powersavigs for pre-Nehalem platforms. Child sched domain should clear SD_PREFER_SIBLING if parent will have SD_POWERSAVINGS_BALANCE because they are contradicting. Sets the flags correctly based on sched_mc_power_savings. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100208100555.GD2931@dirshya.in.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15x86: Avoid race condition in pci_enable_msix()Brandon Phiilps
commit ced5b697a76d325e7a7ac7d382dbbb632c765093 upstream. Keep chip_data in create_irq_nr and destroy_irq. When two drivers are setting up MSI-X at the same time via pci_enable_msix() there is a race. See this dmesg excerpt: [ 85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X [ 85.170611] alloc irq_desc for 99 on node -1 [ 85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X [ 85.170614] alloc kstat_irqs on node -1 [ 85.170616] alloc irq_2_iommu on node -1 [ 85.170617] alloc irq_desc for 100 on node -1 [ 85.170619] alloc kstat_irqs on node -1 [ 85.170621] alloc irq_2_iommu on node -1 [ 85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X [ 85.170626] alloc irq_desc for 101 on node -1 [ 85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X [ 85.170630] alloc kstat_irqs on node -1 [ 85.170631] alloc irq_2_iommu on node -1 [ 85.170635] alloc irq_desc for 102 on node -1 [ 85.170636] alloc kstat_irqs on node -1 [ 85.170639] alloc irq_2_iommu on node -1 [ 85.170646] BUG: unable to handle kernel NULL pointer dereference at 0000000000000088 As you can see igb and ixgbe are both alternating on create_irq_nr() via pci_enable_msix() in their probe function. ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data = NULL via dynamic_irq_init(). igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[] via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this: cfg_new = irq_desc_ptrs[102]->chip_data; if (cfg_new->vector != 0) continue; This hits the NULL deref. Another possible race exists via pci_disable_msix() in a driver or in the number of error paths that call free_msi_irqs(): destroy_irq() dynamic_irq_cleanup() which sets desc->chip_data = NULL ...race window... desc->chip_data = cfg; Remove the save and restore code for cfg in create_irq_nr() and destroy_irq() and take the desc->lock when checking the irq_cfg. Reported-and-analyzed-by: Brandon Philips <bphilips@suse.de> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-3-git-send-email-yinghai@kernel.org> Signed-off-by: Brandon Phililps <bphilips@suse.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOMWu Fengguang
commit 0141450f66c3c12a3aaa869748caa64241885cdf upstream. This fixes inefficient page-by-page reads on POSIX_FADV_RANDOM. POSIX_FADV_RANDOM used to set ra_pages=0, which leads to poor performance: a 16K read will be carried out in 4 _sync_ 1-page reads. In other places, ra_pages==0 means - it's ramfs/tmpfs/hugetlbfs/sysfs/configfs - some IO error happened where multi-page read IO won't help or should be avoided. POSIX_FADV_RANDOM actually want a different semantics: to disable the *heuristic* readahead algorithm, and to use a dumb one which faithfully submit read IO for whatever application requests. So introduce a flag FMODE_RANDOM for POSIX_FADV_RANDOM. Note that the random hint is not likely to help random reads performance noticeably. And it may be too permissive on huge request size (its IO size is not limited by read_ahead_kb). In Quentin's report (http://lkml.org/lkml/2009/12/24/145), the overall (NFS read) performance of the application increased by 313%! Tested-by: Quentin Barnes <qbarnes+nfs@yahoo-inc.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Andi Kleen <andi@firstfloor.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: <qbarnes+nfs@yahoo-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23drm/i915: add i915_lp_ring_sync helperDaniel Vetter
commit 48764bf43f746113fc77877d7e80f2df23ca4cbb upstream. This just waits until the hw passed the current ring position with cmd execution. This slightly changes the existing i915_wait_request function to make uninterruptible waiting possible - no point in returning to userspace while mucking around with the overlay, that piece of hw is just too fragile. Also replace a magic 0 with the symbolic constant (and kill the then superflous comment) while I was looking at the code. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23netfilter: nf_conntrack: fix hash resizing with namespacesPatrick McHardy
commit d696c7bdaa55e2208e56c6f98e6bc1599f34286d upstream. As noticed by Jon Masters <jonathan@jonmasters.org>, the conntrack hash size is global and not per namespace, but modifiable at runtime through /sys/module/nf_conntrack/hashsize. Changing the hash size will only resize the hash in the current namespace however, so other namespaces will use an invalid hash size. This can cause crashes when enlarging the hashsize, or false negative lookups when shrinking it. Move the hash size into the per-namespace data and only use the global hash size to initialize the per-namespace value when instanciating a new namespace. Additionally restrict hash resizing to init_net for now as other namespaces are not handled currently. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23netfilter: nf_conntrack: per netns nf_conntrack_cachepEric Dumazet
commit 5b3501faa8741d50617ce4191c20061c6ef36cb3 upstream. nf_conntrack_cachep is currently shared by all netns instances, but because of SLAB_DESTROY_BY_RCU special semantics, this is wrong. If we use a shared slab cache, one object can instantly flight between one hash table (netns ONE) to another one (netns TWO), and concurrent reader (doing a lookup in netns ONE, 'finding' an object of netns TWO) can be fooled without notice, because no RCU grace period has to be observed between object freeing and its reuse. We dont have this problem with UDP/TCP slab caches because TCP/UDP hashtables are global to the machine (and each object has a pointer to its netns). If we use per netns conntrack hash tables, we also *must* use per netns conntrack slab caches, to guarantee an object can not escape from one namespace to another one. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> [Patrick: added unique slab name allocation] Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23resource: add helpers for fetching rlimitsJiri Slaby
commit 3e10e716abf3c71bdb5d86b8f507f9e72236c9cd upstream. We want to be sure that compiler fetches the limit variable only once, so add helpers for fetching current and maximal resource limits which do that. Add them to sched.h (instead of resource.h) due to circular dependency sched.h->resource.h->task_struct Alternative would be to create a separate res_access.h or similar. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: James Morris <jmorris@namei.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09connector: Delete buggy notification code.Evgeniy Polyakov
commit f98bfbd78c37c5946cc53089da32a5f741efdeb7 upstream. On Tue, Feb 02, 2010 at 02:57:14PM -0800, Greg KH (gregkh@suse.de) wrote: > > There are at least two ways to fix it: using a big cannon and a small > > one. The former way is to disable notification registration, since it is > > not used by anyone at all. Second way is to check whether calling > > process is root and its destination group is -1 (kind of priveledged > > one) before command is dispatched to workqueue. > > Well if no one is using it, removing it makes the most sense, right? > > No objection from me, care to make up a patch either way for this? Getting it is not used, let's drop support for notifications about (un)registered events from connector. Another option was to check credentials on receiving, but we can always restore it without bugs if needed, but genetlink has a wider code base and none complained, that userspace can not get notification when some other clients were (un)registered. Kudos for Sebastian Krahmer <krahmer@suse.de>, who found a bug in the code. Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09libata: retry link resume if necessaryTejun Heo
commit 5040ab67a2c6d5710ba497dc52a8f7035729d7b0 upstream. Interestingly, when SIDPR is used in ata_piix, writes to DET in SControl sometimes get ignored leading to detection failure. Update sata_link_resume() such that it reads back SControl after clearing DET and retry if it's not clear. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: fengxiangjun <fengxiangjun@neusoft.com> Reported-by: Jim Faulkner <jfaulkne@ccs.neu.edu> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09KVM: allow userspace to adjust kvmclock offsetGlauber Costa
(cherry picked from afbcf7ab8d1bc8c2d04792f6d9e786e0adeb328d) When we migrate a kvm guest that uses pvclock between two hosts, we may suffer a large skew. This is because there can be significant differences between the monotonic clock of the hosts involved. When a new host with a much larger monotonic time starts running the guest, the view of time will be significantly impacted. Situation is much worse when we do the opposite, and migrate to a host with a smaller monotonic clock. This proposed ioctl will allow userspace to inform us what is the monotonic clock value in the source host, so we can keep the time skew short, and more importantly, never goes backwards. Userspace may also need to trigger the current data, since from the first migration onwards, it won't be reflected by a simple call to clock_gettime() anymore. [marcelo: future-proof abi with a flags field] [jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it] Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09ax25: netrom: rose: Fix timer oopsesJarek Poplawski
[ Upstream commit d00c362f1b0ff54161e0a42b4554ac621a9ef92d ] Wrong ax25_cb refcounting in ax25_send_frame() and by its callers can cause timer oopses (first reported with 2.6.29.6 kernel). Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=14905 Reported-by: Bernard Pidoux <bpidoux@free.fr> Tested-by: Bernard Pidoux <bpidoux@free.fr> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09net: restore ip source validationJamal Hadi Salim
[ Upstream commit 28f6aeea3f12d37bd258b2c0d5ba891bff4ec479 ] when using policy routing and the skb mark: there are cases where a back path validation requires us to use a different routing table for src ip validation than the one used for mapping ingress dst ip. One such a case is transparent proxying where we pretend to be the destination system and therefore the local table is used for incoming packets but possibly a main table would be used on outbound. Make the default behavior to allow the above and if users need to turn on the symmetry via sysctl src_valid_mark Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09Split 'flush_old_exec' into two functionsLinus Torvalds
commit 221af7f87b97431e3ee21ce4b0e77d5411cf1549 upstream. 'flush_old_exec()' is the point of no return when doing an execve(), and it is pretty badly misnamed. It doesn't just flush the old executable environment, it also starts up the new one. Which is very inconvenient for things like setting up the new personality, because we want the new personality to affect the starting of the new environment, but at the same time we do _not_ want the new personality to take effect if flushing the old one fails. As a result, the x86-64 '32-bit' personality is actually done using this insane "I'm going to change the ABI, but I haven't done it yet" bit (TIF_ABI_PENDING), with SET_PERSONALITY() not actually setting the personality, but just the "pending" bit, so that "flush_thread()" can do the actual personality magic. This patch in no way changes any of that insanity, but it does split the 'flush_old_exec()' function up into a preparatory part that can fail (still called flush_old_exec()), and a new part that will actually set up the new exec environment (setup_new_exec()). All callers are changed to trivially comply with the new world order. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09ACPI: Add platform-wide _OSC support.Shaohua Li
commit 3563ff964fdc36358cef0330936fdac28e65142a upstream. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09ACPI: Add a generic API for _OSC -v2Shaohua Li
commit 70023de88c58a81a730ab4d13c51a30e537ec76e upstream. v2->v1: .improve debug info as suggedted by Bjorn,Kenji .API is using uuid string as suggested by Alexey Add an API to execute _OSC. A lot of devices can have this method, so add a generic API. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09mm: add new 'read_cache_page_gfp()' helper functionLinus Torvalds
commit 0531b2aac59c2296570ac52bfc032ef2ace7d5e1 upstream. It's a simplified 'read_cache_page()' which takes a page allocation flag, so that different paths can control how aggressive the memory allocations are that populate a address space. In particular, the intel GPU object mapping code wants to be able to do a certain amount of own internal memory management by automatically shrinking the address space when memory starts getting tight. This allows it to dynamically use different memory allocation policies on a per-allocation basis, rather than depend on the (static) address space gfp policy. The actual new function is a one-liner, but re-organizing the helper functions to the point where you can do this with a single line of code is what most of the patch is all about. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28libfc: fix free of fc_rport_priv with timer pendingJoe Eykholt
commit b4a9c7ede96e90f7b1ec009ce7256059295e76df upstream. Timer crashes were caused by freeing a struct fc_rport_priv with a timer pending, causing the timer facility list to be corrupted. This was during FC uplink flap tests with a lot of targets. After discovery, we were doing an PLOGI on an rdata that was in DELETE state but not yet removed from the lookup list. This moved the rdata from DELETE state to PLOGI state. If the PLOGI exchange allocation failed and needed to be retried, the timer scheduling could race with the free being done by fc_rport_work(). When fc_rport_login() is called on a rport in DELETE state, move it to a new state RESTART. In fc_rport_work, when handling a LOGO, STOPPED or FAILED event, look for restart state. In the RESTART case, don't take the rdata off the list and after the transport remote port is deleted and exchanges are reset, re-login to the remote port. Note that the new RESTART state also corrects a problem we had when re-discovering a port that had moved to DELETE state. In that case, a new rdata was created, but the old rdata would do an exchange manager reset affecting the FC_ID for both the new rdata and old rdata. With the new state, the new port isn't logged into until after any old exchanges are reset. Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28libfc: Fix frags in frame exceeding SKB_MAX_FRAGS in fc_fcp_send_dataYi Zou
commit d37322a43ebac79eef417149f5696390cf8872db upstream. In case of sequence offload, in fc_fcp_send_data(), the skb_fill_page_info() called may end up adding more frags to the skb_shinfo(fp_skb(fp))->frags[], exceeding SKB_MAX_FRAGS, this eventually corrupts the memory. I am adding the FR_FRAME_SG_LEN back, but as SKB_MAX_FRAGS -1, leaving 1 for our fcoe_eof_crc page. And send will be broken into multiple large sends if the frame already contains more frags than skb handle. Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28HID: fixup quirk for NCR devicesJiri Kosina
commit 5b915d9e6dc3d22fedde91dfef1cb1a8fa9a1870 upstream. NCR devices are terminally broken by design -- they claim themselves to contain proper input applications in their HID report descriptor, but behave very badly if treated in standard way. According to NCR developers, the devices get confused when queried for reports in a standard way, rendering them unusable. NCR is shipping application called "RPSL" that can be used to drive these devices through hiddev, under the assumption that in-kernel driver doesn't perform initial report query. If it does, neither in-kernel nor hiddev-based driver can operate with these devices any more. Introduce a quirk that skips the report query for all NCR devices. The previous NOGET quirk was wrong and had been introduced because I misunderstood the nature of brokenness of these devices. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28nohz: Prevent clocksource wrapping during idleJon Hunter
commit 98962465ed9e6ea99c38e0af63fe1dcb5a79dc25 upstream. The dynamic tick allows the kernel to sleep for periods longer than a single tick, but it does not limit the sleep time currently. In the worst case the kernel could sleep longer than the wrap around time of the time keeping clock source which would result in losing track of time. Prevent this by limiting it to the safe maximum sleep time of the current time keeping clock source. The value is calculated when the clock source is registered. [ tglx: simplified the code a bit and massaged the commit msg ] Signed-off-by: Jon Hunter <jon-hunter@ti.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <1250617512-23567-2-git-send-email-jon-hunter@ti.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28powerpc/fsl: Add PCI device ids for new QoirQ chipsBen Hutchings
commit a3f62bd2b20c769ddc989b242ddd274179e19ee6 upstream by Kumar Gala <galak@kernel.crashing.org>. I have adjusted the patch context for 2.6.32. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28ACPI: don't cond_resched if irq is disabledXiaotian Feng
commit c084ca704a3661bf77690a05bc6bd2c305d87c34 upstream. commit 8bd108d adds preemption point after each opcode parse, then a sleeping function called from invalid context bug was founded during suspend/resume stage. this was fixed in commit abe1dfa by don't cond_resched when irq_disabled. But recent commit 138d156 changes the behaviour to don't cond_resched when in_atomic. This makes the sleeping function called from invalid context bug happen again, which is reported in http://lkml.org/lkml/2009/12/1/371. This patch also fixes http://bugzilla.kernel.org/show_bug.cgi?id=14483 Reported-and-bisected-by: Larry Finger <Larry.Finger@lwfinger.net> Reported-and-bisected-by: Justin P. Mattock <justinmattock@gmail.com> Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Acked-by: Alexey Starikovskiy <astarikovskiy@suse.de> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-25block: bdev_stack_limits wrapperMartin K. Petersen
commit 17be8c245054b9c7786545af3ba3ca4e54cd4ad9 upstream. DM does not want to know about partition offsets. Add a partition-aware wrapper that DM can use when stacking block devices. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-25SCSI: enclosure: fix oops while iterating enclosure_status arrayJames Bottomley
commit cc9b2e9f6603190c009e5d2629ce8e3f99571346 upstream. Based on patch originally by Jeff Mahoney <jeffm@suse.com> enclosure_status is expected to be a NULL terminated array of strings but isn't actually NULL terminated. When writing an invalid value to /sys/class/enclosure/.../.../status, it goes off the end of the array and Oopses. Fix by making the assumption true and adding NULL at the end. Reported-by: Artur Wojcik <artur.wojcik@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>