summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2006-06-22[PATCH] I2O: Bugfixes to get I2O working againMarkus Lidel
- Fixed locking of struct i2o_exec_wait in Executive-OSM - Removed LCT Notify in i2o_exec_probe() which caused freeing memory and accessing freed memory during first enumeration of I2O devices - Added missing locking in i2o_exec_lct_notify() - removed put_device() of I2O controller in i2o_iop_remove() which caused the controller structure get freed to early - Fixed size of mempool in i2o_iop_alloc() - Fixed access to freed memory in i2o_msg_get() See http://bugzilla.kernel.org/show_bug.cgi?id=6561 Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-22[PATCH] SPARC64: Respect gfp_t argument to dma_alloc_coherent().David Miller
Using asm-generic/dma-mapping.h does not work because pushing the call down to pci_alloc_coherent() causes the gfp_t argument of dma_alloc_coherent() to be ignored. Fix this by implementing things directly, and adding a gfp_t argument we can use in the internal call down to the PCI DMA implementation of pci_alloc_coherent(). This fixes massive memory corruption when using the sound driver layer, which passes things like __GFP_COMP down into these routines and (correctly) expects that to work. This is a disk eater when sound is used, so it's pretty critical. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-22[PATCH] SPARC64: Fix D-cache corruption in mremapDavid Miller
If we move a mapping from one virtual address to another, and this changes the virtual color of the mapping to those pages, we can see corrupt data due to D-cache aliasing. Check for and deal with this by overriding the move_pte() macro. Set things up so that other platforms can cleanly override the move_pte() macro too. This long standing bug corrupts user memory, and in particular has been notorious for corrupting Debian package database files on sparc64 boxes. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-05-20[PATCH] SCTP: Respect the real chunk length when walking parameters ↵Vladislav Yasevich
(CVE-2006-1858) When performing bound checks during the parameter processing, we want to use the real chunk and paramter lengths for bounds instead of the rounded ones. This prevents us from potentially walking of the end if the chunk length was miscalculated. We still use rounded lengths when advancing the pointer. This was found during a conformance test that changed the chunk length without modifying parameters. (Vlad noted elsewhere: the most you'd overflow is 3 bytes, so problem is parameter dependent). Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2006-05-09[PATCH] SCTP: Allow spillover of receive buffer to avoid deadlock. ↵Neil Horman
(CVE-2006-2275) This patch fixes a deadlock situation in the receive path by allowing temporary spillover of the receive buffer. - If the chunk we receive has a tsn that immediately follows the ctsn, accept it even if we run out of receive buffer space and renege data with higher TSNs. - Once we accept one chunk in a packet, accept all the remaining chunks even if we run out of receive buffer space. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Mark Butler <butlerm@middle.net> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2006-05-01[PATCH] i386: fix broken FP exception handlingChuck Ebbert
The FXSAVE information leak patch introduced a bug in FP exception handling: it clears FP exceptions only when there are already none outstanding. Mikael Pettersson reported that causes problems with the Erlang runtime and has tested this fix. Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com> Acked-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-05-01[PATCH] MIPS: R2 build fixes for gcc < 3.4.Ralf Baechle
Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-05-01[PATCH] MIPS: Use "R" constraint for cache_op.Ralf Baechle
Gcc might emit an absolute address for the the "m" constraint which gas unfortunately does not permit. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-05-01[PATCH] x86/PAE: Fix pte_clear for the >4GB RAM caseZachary Amsden
Proposed fix for ptep_get_and_clear_full PAE bug. Pte_clear had the same bug, so use the same fix for both. Turns out pmd_clear had it as well, but pgds are not affected. The problem is rather intricate. Page table entries in PAE mode are 64-bits wide, but the only atomic 8-byte write operation available in 32-bit mode is cmpxchg8b, which is expensive (at least on P4), and thus avoided. But it can happen that the processor may prefetch entries into the TLB in the middle of an operation which clears a page table entry. So one must always clear the P-bit in the low word of the page table entry first when clearing it. Since the sequence *ptep = __pte(0) leaves the order of the write dependent on the compiler, it must be coded explicitly as a clear of the low word followed by a clear of the high word. Further, there must be a write memory barrier here to enforce proper ordering by the compiler (and, in the future, by the processor as well). On > 4GB memory machines, the implementation of pte_clear for PAE was clearly deficient, as it could leave virtual mappings of physical memory above 4GB aliased to memory below 4GB in the TLB. The implementation of ptep_get_and_clear_full has a similar bug, although not nearly as likely to occur, since the mappings being cleared are in the process of being destroyed, and should never be dereferenced again. But, as luck would have it, it is possible to trigger bugs even without ever dereferencing these bogus TLB mappings, even if the clear is followed fairly soon after with a TLB flush or invalidation. The problem is that memory above 4GB may now be aliased into the first 4GB of memory, and in fact, may hit a region of memory with non-memory semantics. These regions include AGP and PCI space. As such, these memory regions are not cached by the processor. This introduces the bug. The processor can speculate memory operations, including memory writes, as long as they are committed with the proper ordering. Speculating a memory write to a linear address that has a bogus TLB mapping is possible. Normally, the speculation is harmless. But for cached memory, it does leave the falsely speculated cacheline unmodified, but in a dirty state. This cache line will be eventually written back. If this cacheline happens to intersect a region of memory that is not protected by the cache coherency protocol, it can corrupt data in I/O memory, which is generally a very bad thing to do, and can cause total system failure or just plain undefined behavior. These bugs are extremely unlikely, but the severity is of such magnitude, and the fix so simple that I think fixing them immediately is justified. Also, they are nearly impossible to debug. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-05-01[PATCH] Simplify proc/devices and fix early termination regressionAndrew Morton
Repair /proc/devices early-termination regression. 2.6.16 broke /proc/devices. An application often gets an EOF before the end of data is reached, if that application uses a series of short read(2)s to access the data. I have used read buffers of varying sizes with varying degrees of unsuccess (larger sizes get further into the data than smaller sizes, following a simple pattern). It appears that the only safe way to get the data is to use a single read buffer larger than all the data in /proc/devices. The following example demonstates the problem: # dd if=/proc/devices bs=1 Character devices: 1 mem 27+0 records in 27+0 records out This patch is a backport of the fix recently accepted to Linus's tree: commit 68eef3b4791572ecb70249c7fb145bb3742dd899 [PATCH] Simplify proc/devices and fix early termination regression It replaces the complex, state-machine algorithm introduced in 2.6.16 with a simple algorithm, modeled on the implementation of /proc/interrupts. [akpm@osdl.org: cleanups, simplifications] Signed-off-by: Joe Korty <joe.korty@ccur.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-05-01[PATCH] for_each_possible_cpuAndrew Morton
Backport for_each_possible_cpu() into 2.6.16. Fixes the alpha build, and any future occurrences. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-18[PATCH] i386/x86-64: Fix x87 information leak between processes (CVE-2006-1056)Andi Kleen
AMD K7/K8 CPUs only save/restore the FOP/FIP/FDP x87 registers in FXSAVE when an exception is pending. This means the value leak through context switches and allow processes to observe some x87 instruction state of other processes. This was actually documented by AMD, but nobody recognized it as being different from Intel before. The fix first adds an optimization: instead of unconditionally calling FNCLEX after each FXSAVE test if ES is pending and skip it when not needed. Then do a x87 load from a kernel variable to clear FOP/FIP/FDP. This means other processes always will only see a constant value defined by the kernel in their FP state. I took some pain to make sure to chose a variable that's already in L1 during context switch to make the overhead of this low. Also alternative() is used to patch away the new code on CPUs who don't need it. Patch for both i386/x86-64. The problem was discovered originally by Jan Beulich. Richard Brunner provided the basic code for the workarounds, with contribution from Jan. This is CVE-2006-1056 Cc: richard.brunner@amd.com Cc: jbeulich@novell.com Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-17[PATCH] Fix buddy list race that could lead to page lru list corruptionsNick Piggin
Rohit found an obscure bug causing buddy list corruption. page_is_buddy is using a non-atomic test (PagePrivate && page_count == 0) to determine whether or not a free page's buddy is itself free and in the buddy lists. Each of the conjuncts may be true at different times due to unrelated conditions, so the non-atomic page_is_buddy test may find each conjunct to be true even if they were not both true at the same time (ie. the page was not on the buddy lists). Signed-off-by: Martin Bligh <mbligh@google.com> Signed-off-by: Rohit Seth <rohitseth@google.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-17[PATCH] m32r: Fix cpu_possible_map and cpu_present_map initialization for ↵Hirokazu Takata
SMP kernel This patch fixes a boot problem of the m32r SMP kernel 2.6.16-rc1-mm3 or later. In this patch, cpu_possible_map is statically initialized, and cpu_present_map is also copied from cpu_possible_map in smp_prepare_cpus(), because the m32r architecture has not supported CPU hotplug yet. Signed-off-by: Hayato Fujiwara <fujiwara.hayato@renesas.com> Signed-off-by: Hirokazu Takata <takata@linux-m32r.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-17[PATCH] m32r: security fix of {get, put}_user macrosHirokazu Takata
Update {get,put}_user macros for m32r kernel. - Modify get_user to use __get_user_asm macro, instead of __get_user_x macro. - Remove arch/m32r/lib/{get,put}user.S. - Some cosmetic updates. I would like to thank NIIBE Yutaka for his reporting about the m32r kernel's security problem in {get,put}_user macros. There were no address checking for user space access in {get,put}_user macros. ;-) Signed-off-by: Hirokazu Takata <takata@linux-m32r.org> Cc: NIIBE Yutaka <gniibe@fsij.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-17[PATCH] NETFILTER: Fix fragmentation issues with bridge netfilterPatrick McHardy
[NETFILTER]: Fix fragmentation issues with bridge netfilter The conntrack code doesn't do re-fragmentation of defragmented packets anymore but relies on fragmentation in the IP layer. Purely bridged packets don't pass through the IP layer, so the bridge netfilter code needs to take care of fragmentation itself. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-07[PATCH] kdump proc vmcore size oveflow fixVivek Goyal
A couple of /proc/vmcore data structures overflow with 32bit systems having memory more than 4G. This patch fixes those. Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-07[PATCH] fbcon: Fix big-endian bogosity in slow_imageblit()Antonino A. Daplas
The monochrome->color expansion routine that handles bitmaps which have (widths % 8) != 0 (slow_imageblit) produces corrupt characters in big-endian. This is caused by a bogus bit test in slow_imageblit(). Fix. Signed-off-by: Antonino Daplas <adaplas@pol.net> Acked-by: Herbert Poetzl <herbert@13thfloor.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-07[PATCH] powerpc: make ISA floppies work againStephen Rothwell
We used to assume that a DMA mapping request with a NULL dev was for ISA DMA. This assumption was broken at some point. Now we explicitly pass the detected ISA PCI device in the floppy setup. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-27[PATCH] DM: Fix bug: BIO_RW_BARRIER requests to md/raid1 hang.Neil Brown
Both R1BIO_Barrier and R1BIO_Returned are 4 !!!! This means that barrier requests don't get returned (i.e. b_endio called) because it looks like they already have been. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-27[PATCH] rtc.h broke strace(1) buildsJoe Korty
Git patch 52dfa9a64cfb3dd01fa1ee1150d589481e54e28e [PATCH] move rtc_interrupt() prototype to rtc.h broke strace(1) builds. The below moves the kernel-only additions lower, under the already provided #ifdef __KERNEL__ statement. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-27[PATCH] get_cpu_sysdev() signedness fixAndrew Morton
Doing (int < NR_CPUS) doesn't dtrt if it's negative.. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-19Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linusLinus Torvalds
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: [MIPS] SB1: Check for -mno-sched-prolog if building corelis debug kernel. [MIPS] Sibyte: Fix race in sb1250_gettimeoffset(). [MIPS] Sibyte: Fix interrupt timer off by one bug. [MIPS] Sibyte: Fix M_SCD_TIMER_INIT and M_SCD_TIMER_CNT wrong field width. [MIPS] Protect more of timer_interrupt() by xtime_lock. [MIPS] Work around bad code generation for <asm/io.h>. [MIPS] Simple patch to power off DBAU1200 [MIPS] Fix DBAu1550 software power off. [MIPS] local_r4k_flush_cache_page fix [MIPS] SB1: Fix interrupt disable hazard. [MIPS] Get rid of the IP22-specific code in arclib. Update MAINTAINERS entry for MIPS.
2006-03-19[TG3]: 40-bit DMA workaround part 2Michael Chan
The 40-bit DMA workaround recently implemented for 5714, 5715, and 5780 needs to be expanded because there may be other tg3 devices behind the EPB Express to PCIX bridge in the 5780 class device. For example, some 4-port card or mother board designs have 5704 behind the 5714. All devices behind the EPB require the 40-bit DMA workaround. Thanks to Chris Elmquist again for reporting the problem and testing the patch. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-19[AX.25]: Fix potencial memory hole.Ralf Baechle DL5RB
If the AX.25 dialect chosen by the sysadmin is set to DAMA master / 3 (or DAMA slave / 2, if CONFIG_AX25_DAMA_SLAVE=n) ax25_kick() will fall through the switch statement without calling ax25_send_iframe() or any other function that would eventually free skbn thus leaking the packet. Fix by restricting the sysctl inferface to allow only actually supported AX.25 dialects. The system administration mistake needed for this to happen is rather unlikely, so this is an uncritical hole. Coverity #651. Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-18[MIPS] Sibyte: Fix race in sb1250_gettimeoffset().Ralf Baechle
From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>: sb1250_gettimeoffset() simply reads the current cpu 0 timer remaining value, however once this counter reaches 0 and the interrupt is raised, it immediately resets and begins to count down again. If sb1250_gettimeoffset() is called on cpu 1 via do_gettimeofday() after the timer has reset but prior to cpu 0 processing the interrupt and taking write_seqlock() in timer_interrupt() it will return a full value (or close to it) causing time to jump backwards 1ms. Once cpu 0 handles the interrupt and timer_interrupt() gets far enough along it will jump forward 1ms. Fix this problem by implementing mips_hpt_*() on sb1250 using a spare timer unrelated to the existing periodic interrupt timers. It runs at 1Mhz with a full 23bit counter. This eliminated the custom do_gettimeoffset() for sb1250 and allowed use of the generic fixed_rate_gettimeoffset() using mips_hpt_*() and timerhi/timerlo. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-03-18[MIPS] Sibyte: Fix M_SCD_TIMER_INIT and M_SCD_TIMER_CNT wrong field width.Ralf Baechle
From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>: Field width should be 23 bits not 20 bits. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-03-18[MIPS] Work around bad code generation for <asm/io.h>.Ralf Baechle
If a call to set_io_port_base() was being followed by usage of mips_io_port_base in the same function gcc was possibly using the old value due to some clever abuse of const. Adding a barrier will keep the optimization and result in correct code with latest gcc. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-03-18[MIPS] local_r4k_flush_cache_page fixAtsushi Nemoto
If dcache_size != icache_size or dcache_size != scache_size, or set-associative cache, icache/scache does not flushed properly. Make blast_?cache_page_indexed() masks its index value correctly. Also, use physical address for physically indexed pcache/scache. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-03-18[MIPS] SB1: Fix interrupt disable hazard.Ralf Baechle
The SB1 core has a three cycle interrupt disable hazard but we were wrongly treating it as fully interlocked. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-03-17[NET]: Fix race condition in sk_wait_event().Alexey Kuznetsov
It is broken, the condition is checked out of socket lock. It is wonderful the bug survived for so long time. [ This fixes bugzilla #6233: race condition in tcp_sendmsg when connection became established ] Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-mergeLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge: powerpc: update defconfigs [PATCH] powerpc: properly configure DDR/P5IOC children devs [PATCH] powerpc: remove duplicate EXPORT_SYMBOLS [PATCH] powerpc: RTC memory corruption [PATCH] powerpc: enable NAP only on cpus who support it to avoid memory corruption [PATCH] powerpc: Clarify wording for CRASH_DUMP Kconfig option [PATCH] powerpc/64: enable CONFIG_BLK_DEV_SL82C105 [PATCH] powerpc: correct cacheflush loop in zImage powerpc: Fix problem with time going backwards powerpc: Disallow lparcfg being a module
2006-03-16[PATCH] powerpc: properly configure DDR/P5IOC children devsJohn Rose
The dynamic add path for PCI Host Bridges can fail to configure children adapters under P5IOC controllers. It fails to properly fixup bus/device resources, and it fails to properly enable EEH. Both of these steps need to occur before any children devices are enabled in pci_bus_add_devices(). Signed-off-by: John Rose <johnrose@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-15[ARM] 3364/1: [cleanup] warning fix - definitions for enable_hlt and disable_hltBen Dooks
Patch from Ben Dooks The enable_hlt and disable_hlt should be declared in include/asm/setup.h. This fixes sparse errors from arch/arm/kernel/process.c Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-12Merge master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds
* master.kernel.org:/home/rmk/linux-2.6-arm: [ARM] iwmmxt thread state alignment [ARM] 3350/1: Enable 1-wire on ARM [ARM] 3356/1: Workaround for the ARM1136 I-cache invalidation problem [ARM] 3355/1: NSLU2: remove propmt depends [ARM] 3354/1: NAS100d: fix power led handling [ARM] Fix muldi3.S
2006-03-12[ARM] iwmmxt thread state alignmentRussell King
This patch removes the reliance of iwmmxt on hand coded alignments. Since thread_info is always 8K aligned, specifying that fpstate is 8-byte aligned achieves the same effect without needing to resort to hand coded alignments. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-11[PATCH] remove __put_task_struct_cb export againChristoph Hellwig
The patch '[PATCH] RCU signal handling' [1] added an export for __put_task_struct_cb, a put_task_struct helper newly introduced in that patch. But the put_task_struct couldn't be used modular previously as __put_task_struct wasn't exported. There are not callers of it in modular code, and it shouldn't be exported because we don't want drivers to hold references to task_structs. This patch removes the export and folds __put_task_struct into __put_task_struct_cb as there's no other caller. [1] http://www2.kernel.org/git/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e56d090310d7625ecb43a1eeebd479f04affb48b Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-11[PATCH] ext3: ext3_symlink should use GFP_NOFS allocations insideKirill Korotaev
This patch fixes illegal __GFP_FS allocation inside ext3 transaction in ext3_symlink(). Such allocation may re-enter ext3 code from try_to_free_pages. But JBD/ext3 code keeps a pointer to current journal handle in task_struct and, hence, is not reentrable. This bug led to "Assertion failure in journal_dirty_metadata()" messages. http://bugzilla.openvz.org/show_bug.cgi?id=115 Signed-off-by: Andrey Savochkin <saw@saw.sw.com.sg> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[PATCH] slab: Node rotor for freeing alien caches and remote per cpu pages.Christoph Lameter
The cache reaper currently tries to free all alien caches and all remote per cpu pages in each pass of cache_reap. For a machines with large number of nodes (such as Altix) this may lead to sporadic delays of around ~10ms. Interrupts are disabled while reclaiming creating unacceptable delays. This patch changes that behavior by adding a per cpu reap_node variable. Instead of attempting to free all caches, we free only one alien cache and the per cpu pages from one remote node. That reduces the time spend in cache_reap. However, doing so will lengthen the time it takes to completely drain all remote per cpu pagesets and all alien caches. The time needed will grow with the number of nodes in the system. All caches are drained when they overflow their respective capacity. So the drawback here is only that a bit of memory may be wasted for awhile longer. Details: 1. Rename drain_remote_pages to drain_node_pages to allow the specification of the node to drain of pcp pages. 2. Add additional functions init_reap_node, next_reap_node for NUMA that manage a per cpu reap_node counter. 3. Add a reap_alien function that reaps only from the current reap_node. For us this seems to be a critical issue. Holdoffs of an average of ~7ms cause some HPC benchmarks to slow down significantly. F.e. NAS parallel slows down dramatically. NAS parallel has a 12-16 seconds runtime w/o rotor compared to 5.8 secs with the rotor patches. It gets down to 5.05 secs with the additional interrupt holdoff reductions. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[PATCH] m68k: fix cmpxchg compile errors if CONFIG_RMW_INSNS=nRoman Zippel
We require that all archs implement atomic_cmpxchg(), for the generic version of atomic_add_unless(). Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[PATCH] mtd: 64 bit fixesAtsushi Nemoto
Fix some bugs in mtd/jffs2 on 64bit platform. The MEMGETBADBLOCK/MEMSETBADBLOCK ioctl are not listed in compat_ioctl.h. And some variables in jffs2 are declared as uint32_t but used to hold size_t values. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[MIPS] Undefine scr_writew and scr_readw in <asm/vga.h>.Ralf Baechle
This is gluing the build of cirrusfb but really the mess that would need cleaning and fixing is <video/vga.h> and <linux/vt_buffer.h> ... Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-03-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-mergeLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge: powerpc: Fix various syscall/signal/swapcontext bugs [PATCH] powerpc: incorrect rmo_top handling in prom_init [PATCH] powerpc: Fix incorrect pud_ERROR() message [PATCH] powerpc: Expose SMT and L1 icache snoop userland features [PATCH] powerpc: Fix windfarm_pm112 not starting all control loops [PATCH] powerpc: Fix old g5 issues with windfarm powerpc32: Fix timebase synchronization on 32-bit powermacs powerpc: Turn off verbose debug output in powermac platform functions powerpc: Fix might-sleep warning in program check exception handler
2006-03-08[PATCH] i386: port ATI timer fix from x86_64 to i386 IIAndi Kleen
ATI chipsets tend to generate double timer interrupts for the local APIC timer when both the 8254 and the IO-APIC timer pins are enabled. This is because they route it to both and the result is anded together and the CPU ends up processing it twice. This patch changes check_timer to disable the 8254 routing for interrupt 0. I think it would be safe on all chipsets actually (i tested it on a couple and it worked everywhere) and Windows seems to do it in a similar way, but to be conservative this patch only enables this mode on ATI (and adds options to enable/disable too) Ported over from a similar x86-64 change. I reused the ACPI earlyquirk infrastructure for the ATI bridge check, but tweaked it a bit to work even without ACPI. Inspired by a patch from Chuck Ebbert, but redone. Cc: Chuck Ebbert <76306.1226@compuserve.com> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] fix kexec asmMichael Matz
While testing kexec and kdump we hit problems where the new kernel would freeze or instantly reboot. The easiest way to trigger it was to kexec a kernel compiled for CONFIG_M586 on an athlon cpu. Compiling for CONFIG_MK7 instead would work fine. The patch fixes a few problems with the kexec inline asm. Signed-off-by: Chris Mason <mason@suse.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] fix file countingDipankar Sarma
I have benchmarked this on an x86_64 NUMA system and see no significant performance difference on kernbench. Tested on both x86_64 and powerpc. The way we do file struct accounting is not very suitable for batched freeing. For scalability reasons, file accounting was constructor/destructor based. This meant that nr_files was decremented only when the object was removed from the slab cache. This is susceptible to slab fragmentation. With RCU based file structure, consequent batched freeing and a test program like Serge's, we just speed this up and end up with a very fragmented slab - llm22:~ # cat /proc/sys/fs/file-nr 587730 0 758844 At the same time, I see only a 2000+ objects in filp cache. The following patch I fixes this problem. This patch changes the file counting by removing the filp_count_lock. Instead we use a separate percpu counter, nr_files, for now and all accesses to it are through get_nr_files() api. In the sysctl handler for nr_files, we populate files_stat.nr_files before returning to user. Counting files as an when they are created and destroyed (as opposed to inside slab) allows us to correctly count open files with RCU. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] rcu batch tuningDipankar Sarma
This patch adds new tunables for RCU queue and finished batches. There are two types of controls - number of completed RCU updates invoked in a batch (blimit) and monitoring for high rate of incoming RCUs on a cpu (qhimark, qlowmark). By default, the per-cpu batch limit is set to a small value. If the input RCU rate exceeds the high watermark, we do two things - force quiescent state on all cpus and set the batch limit of the CPU to INTMAX. Setting batch limit to INTMAX forces all finished RCUs to be processed in one shot. If we have more than INTMAX RCUs queued up, then we have bigger problems anyway. Once the incoming queued RCUs fall below the low watermark, the batch limit is set to the default. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] percpu_counter_sum()Andrew Morton
Implement percpu_counter_sum(). This is a more accurate but slower version of percpu_counter_read_positive(). We need this for Alex's speedup-ext3_statfs patch and for the nr_file accounting fix. Otherwise these things would be too inaccurate on large CPU counts. Cc: Ravikiran G Thirumalai <kiran@scalex86.org> Cc: Alex Tomas <alex@clusterfs.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] __get_unaligned() gcc-4 fixAtsushi Nemoto
If the 'ptr' is a const, this code cause "assignment of read-only variable" error on gcc 4.x. Use __u64 instead of __typeof__(*(ptr)) for temporary variable to get rid of errors on gcc 4.x. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] powerpc: restore eeh_add_device_late() prototype stubMark Fasheh
We fixed this: arch/powerpc/platforms/pseries/eeh.c: In function `eeh_add_device_tree_late': arch/powerpc/platforms/pseries/eeh.c:901: warning: implicit declaration of function `eeh_add_device_late' arch/powerpc/platforms/pseries/eeh.c: At top level: arch/powerpc/platforms/pseries/eeh.c:918: error: conflicting types for 'eeh_add_device_late' arch/powerpc/platforms/pseries/eeh.c:901: error: previous implicit declaration of 'eeh_add_device_late' was here make[2]: *** [arch/powerpc/platforms/pseries/eeh.o] Error 1 But we forgot the !CONFIG_EEH stub. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>