summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2009-02-06PCI: irq and pci_ids patch for Intel Tigerpoint DeviceIDsSeth Heasley
commit 57064d213d2e44654d4f13c66df135b5e7389a26 upstream. This patch adds the Intel Tigerpoint LPC Controller DeviceIDs. Signed-off-by: Seth Heasley <seth.heasley@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-06kmalloc: return NULL instead of link failureJeff Mahoney
commit 1cf3eb2ff6b0844c678f2f48d0053b9d12b7da67 upstream. The SLAB kmalloc with a constant value isn't consistent with the other implementations because it bails out with __you_cannot_kmalloc_that_much rather than returning NULL and properly allowing the caller to fall back to vmalloc or take other action. This doesn't happen with a non-constant value or with SLOB or SLUB. Starting with 2.6.28, I've been seeing build failures on s390x. This is due to init_section_page_cgroup trying to allocate 2.5MB when the max size for a kmalloc on s390x is 2MB. It's failing because the value is constant. The workarounds at the call size are ugly and the caller shouldn't have to change behavior depending on what the backend of the API is. So, this patch eliminates the link failure and returns NULL like the other implementations. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Matt Mackall <mpm@selenic.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-02include/linux: Add bsg.h to the Kernel exported headersBoaz Harrosh
commit a229fc61ef0ee3c30fd193beee0eeb87410227f1 upstream. bsg.h in current form is perfectly suitable for user-mode consumption. It is needed together with scsi/sg.h for applications that want to interface with the bsg driver. Currently the few projects that use it would copy it over into the projects. But that is not acceptable for projects that need to provide source and devel packages for distros. This should also be submitted to stable 2.6.28 and 2.6.27 since bsg had a stable API since these Kernels and distro users will need the header for these kernels a swell Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-02epoll: drop max_user_instances and rely only on max_user_watchesDavide Libenzi
commit 9df04e1f25effde823a600e755b51475d438f56b upstream. Linus suggested to put limits where the money is, and max_user_watches already does that w/out the need of max_user_instances. That has the advantage to mitigate the potential DoS while allowing pretty generous default behavior. Allowing top 4% of low memory (per user) to be allocated in epoll watches, we have: LOMEM MAX_WATCHES (per user) 512MB ~178000 1GB ~356000 2GB ~712000 A box with 512MB of lomem, will meet some challenge in hitting 180K watches, socket buffers math teaches us. No more max_user_instances limits then. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Willy Tarreau <w@1wt.eu> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Bron Gondwana <brong@fastmail.fm> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-02serial_8250: support for Sealevel Systems Model 7803 COMM+8Flavio Leitner
commit e65f0f8271b1b0452334e5da37fd35413a000de4 upstream. Add support for Sealevel Systems Model 7803 COMM+8 Signed-off-by: Flavio Leitner <fleitner@redhat.com> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-02libata: pata_via: support VX855, future chips whose IDE controller use 0x0571JosephChan@via.com.tw
commit e4d866cdea24543ee16ce6d07d80c513e86ba983 upstream. It supports VX855 and future chips whose IDE controller uses PCI ID 0x0571. Signed-off-by: Joseph Chan <josephchan@via.com.tw> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-02it821x: Add ultra_mask quirk for Vortex86SXBrandon Philips
commit b94b898f3107046b5c97c556e23529283ea5eadd upstream. On Vortex86SX with IDE controller revision 0x11 ultra DMA must be disabled. This patch was tested by DMP and seems to work. It is a cleaned up version of their older Kernel patch: http://www.dmp.com.tw/tech/vortex86sx/patch-2.6.24-DMP.gz Tested-by: Shawn Lin <shawn@dmp.com.tw> Signed-off-by: Brandon Philips <bphilips@suse.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-02klist.c: bit 0 in pointer can't be used as flagJesper Nilsson
commit c0e69a5bbc6fc74184aa043aadb9a53bc58f953b upstream. The commit a1ed5b0cffe4b16a93a6a3390e8cee0fbef94f86 (klist: don't iterate over deleted entries) introduces use of the low bit in a pointer to indicate if the knode is dead or not, assuming that this bit is always free. This is not true for all architectures, CRIS for example may align data on byte borders. The result is a bunch of warnings on bootup, devices not being added correctly etc, reported by Hinko Kocevar <hinko.kocevar@cetrtapot.si>: ------------[ cut here ]------------ WARNING: at lib/klist.c:62 () Modules linked in: Stack from c1fe1cf0: c01cc7f4 c1fe1d11 c000eb4e c000e4de 00000000 00000000 c1f4f78f c1f50c2d c01d008c c1fdd1a0 c1fdd1a0 c1fe1d38 c0192954 c1fe0000 00000000 c1fe1dc0 00000002 7fffffff c1fe1da8 c0192d50 c1fe1dc0 00000002 7fffffff c1ff9fcc Call Trace: [<c000eb4e>] [<c000e4de>] [<c0192954>] [<c0192d50>] [<c001d49e>] [<c000b688>] [<c0192a3c>] [<c000b63e>] [<c000b63e>] [<c001a542>] [<c00b55b0>] [<c00411c0>] [<c00b559c>] [<c01918e6>] [<c0191988>] [<c01919d0>] [<c00cd9c8>] [<c00cdd6a>] [<c0034178>] [<c000409a>] [<c0015576>] [<c0029130>] [<c0029078>] [<c0029170>] [<c0012336>] [<c00b4076>] [<c00b4770>] [<c006d6e4>] [<c006d974>] [<c006dca0>] [<c0028d6c>] [<c0028e12>] [<c0006424>] <4>---[ end trace 4eaa2a86a8e2da22 ]--- ------------[ cut here ]------------ Repeat ad nauseam. Wed, Jan 14, 2009 at 12:11:32AM +0100, Bastien ROUCARIES wrote: > Perhaps using a pointerhackalign trick on this structure where > #define pointerhackalign(x) __attribute__ ((aligned (x))) > and declare > struct klist_node { > ... > } pointerhackalign(2); > > Because __attribute__ ((aligned (x))) could only increase alignment > it will safe to do that and serve as documentation purpose :) That works, but we need to do it not for the struct klist_node, but for the struct we insert into the void * in klist_node, which is struct klist. Reported-by: Hinko Kocevar <hinko.kocevar@cetrtapot.si Cc: Bastien ROUCARIES <roucaries.bastien@gmail.com> Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-24fs: sys_sync fixNick Piggin
commit 856bf4d717feb8c55d4e2f817b71ebb70cfbc67b upstream. s_syncing livelock avoidance was breaking data integrity guarantee of sys_sync, by allowing sys_sync to skip writing or waiting for superblocks if there is a concurrent sys_sync happening. This livelock avoidance is much less important now that we don't have the get_super_to_sync() call after every sb that we sync. This was replaced by __put_super_and_need_restart. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-24fs: remove WB_SYNC_HOLDNick Piggin
commit 4f5a99d64c17470a784a6c68064207d82e3e74a5 upstream. Remove WB_SYNC_HOLD. The primary motiviation is the design of my anti-starvation code for fsync. It requires taking an inode lock over the sync operation, so we could run into lock ordering problems with multiple inodes. It is possible to take a single global lock to solve the ordering problem, but then that would prevent a future nice implementation of "sync multiple inodes" based on lock order via inode address. Seems like a backward step to remove this, but actually it is busted anyway: we can't use the inode lists for data integrity wait: an inode can be taken off the dirty lists but still be under writeback. In order to satisfy data integrity semantics, we should wait for it to finish writeback, but if we only search the dirty lists, we'll miss it. It would be possible to have a "writeback" list, for sys_sync, I suppose. But why complicate things by prematurely optimise? For unmounting, we could avoid the "livelock avoidance" code, which would be easier, but again premature IMO. Fixing the existing data integrity problem will come next. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-24usb-storage: add last-sector hacksAlan Stern
commit 25ff1c316f6a763f1eefe7f8984b2d8c03888432 upstream. This patch (as1189c) adds some hacks to usb-storage for dealing with the growing problems involving bad capacity values and last-sector accesses: A new flag, US_FL_CAPACITY_OK, is created to indicate that the device is known to report its capacity correctly. An unusual_devs entry for Linux's own File-backed Storage Gadget is added with this flag set, since g_file_storage always reports the correct capacity and since the capacity need not be even (it is determined by the size of the backing file). An entry in unusual_devs.h which has only the CAPACITY_OK flag set shouldn't prejudice libusual, since the device will work perfectly well with either usb-storage or ub. So a new macro, COMPLIANT_DEV, is added to let libusual know about these entries. When a last-sector access fails three times in a row and neither the FIX_CAPACITY nor the CAPACITY_OK flag is set, we assume the last-sector bug is present. We replace the existing status and sense data with values that will cause the SCSI core to fail the access immediately rather than retry indefinitely. This should fix the difficulties people have been having with Nokia phones. This version of the patch differs from the version accepted into the mainline only in that it does not trigger a WARN() when an odd-numbered last-sector access succeeds. In a stable kernel series we don't want to go around spamming users' logs and consoles for no good reason. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18mm: fix assertionNick Piggin
commit 18e6959c385f3edf3991fa6662a53dac4eb10d5b upstream. This assertion is incorrect for lockless pagecache. By definition if we have an unpinned page that we are trying to take a speculative reference to, it may become the tail of a compound page at any time (if it is freed, then reallocated as a compound page). It was still a valid assertion for the vmscan.c LRU isolation case, but it doesn't seem incredibly helpful... if somebody wants it, they can put it back directly where it applies in the vmscan code. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18mm lockless pagecache barrier fixNick Piggin
commit e8c82c2e23e3527e0c9dc195e432c16784d270fa upstream. An XFS workload showed up a bug in the lockless pagecache patch. Basically it would go into an "infinite" loop, although it would sometimes be able to break out of the loop! The reason is a missing compiler barrier in the "increment reference count unless it was zero" case of the lockless pagecache protocol in the gang lookup functions. This would cause the compiler to use a cached value of struct page pointer to retry the operation with, rather than reload it. So the page might have been removed from pagecache and freed (refcount==0) but the lookup would not correctly notice the page is no longer in pagecache, and keep attempting to increment the refcount and failing, until the page gets reallocated for something else. This isn't a data corruption because the condition will be detected if the page has been reallocated. However it can result in a lockup. Linus points out that ACCESS_ONCE is also required in that pointer load, even if it's absence is not causing a bug on our particular build. The most general way to solve this is just to put an rcu_dereference in radix_tree_deref_slot. Assembly of find_get_pages, before: .L220: movq (%rbx), %rax #* ivtmp.1162, tmp82 movq (%rax), %rdi #, prephitmp.1149 .L218: testb $1, %dil #, prephitmp.1149 jne .L217 #, testq %rdi, %rdi # prephitmp.1149 je .L203 #, cmpq $-1, %rdi #, prephitmp.1149 je .L217 #, movl 8(%rdi), %esi # <variable>._count.counter, c testl %esi, %esi # c je .L218 #, after: .L212: movq (%rbx), %rax #* ivtmp.1109, tmp81 movq (%rax), %rdi #, ret testb $1, %dil #, ret jne .L211 #, testq %rdi, %rdi # ret je .L197 #, cmpq $-1, %rdi #, ret je .L211 #, movl 8(%rdi), %esi # <variable>._count.counter, c testl %esi, %esi # c je .L212 #, (notice the obvious infinite loop in the first example, if page->count remains 0) Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 33Heiko Carstens
commit 2b66421995d2e93c9d1a0111acf2581f8529c6e5 upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 32Heiko Carstens
commit d4e82042c4cfa87a7d51710b71f568fe80132551 upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18powerpc: Enable syscall wrappers for 64-bitBenjamin Herrenschmidt
commit ee6a093222549ac0c72cfd296c69fa5e7d6daa34 upstream. This enables the use of syscall wrappers to do proper sign extension for 64-bit programs. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrapper infrastructureHeiko Carstens
commit 1a94bc34768e463a93cb3751819709ab0ea80a01 upstream. From: Martin Schwidefsky <schwidefsky@de.ibm.com> By selecting HAVE_SYSCALL_WRAPPERS architectures can activate system call wrappers in order to sign extend system call arguments. All architectures where the ABI defines that the caller of a function has to perform sign extension probably need this. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18Rename old_readdir to sys_old_readdirHeiko Carstens
commit e55380edf68796d75bf41391a781c68ee678587d upstream. This way it matches the generic system call name convention. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18Convert all system calls to return a longHeiko Carstens
commit 2ed7c03ec17779afb4fcfa3b8c61df61bd4879ba upstream. Convert all system calls to return a long. This should be a NOP since all converted types should have the same size anyway. With the exception of sys_exit_group which returned void. But that doesn't matter since the system call doesn't return. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18Move compat system call declarations to compat header fileHeiko Carstens
commit 4c696ba7982501d43dea11dbbaabd2aa8a19cc42 upstream. Move declarations to correct header file. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18inotify: fix type errors in interfacesMichael Kerrisk
commit 4ae8978cf92a96257cd8998a49e781be83571d64 upstream. The problems lie in the types used for some inotify interfaces, both at the kernel level and at the glibc level. This mail addresses the kernel problem. I will follow up with some suggestions for glibc changes. For the sys_inotify_rm_watch() interface, the type of the 'wd' argument is currently 'u32', it should be '__s32' . That is Robert's suggestion, and is consistent with the other declarations of watch descriptors in the kernel source, in particular, the inotify_event structure in include/linux/inotify.h: struct inotify_event { __s32 wd; /* watch descriptor */ __u32 mask; /* watch mask */ __u32 cookie; /* cookie to synchronize two events */ __u32 len; /* length (including nulls) of name */ char name[0]; /* stub for possible name */ }; The patch makes the changes needed for inotify_rm_watch(). Signed-off-by: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Robert Love <rlove@google.com> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18sched_clock: prevent scd->clock from moving backwards, take #2Thomas Gleixner
commit 1c5745aa380efb6417b5681104b007c8612fb496 upstream. Redo: 5b7dba4: sched_clock: prevent scd->clock from moving backwards which had to be reverted due to s2ram hangs: ca7e716: Revert "sched_clock: prevent scd->clock from moving backwards" ... this time with resume restoring GTOD later in the sequence taken into account as well. The "timekeeping_suspended" flag is not very nice but we cannot call into GTOD before it has been properly resumed and the scheduler will run very early in the resume sequence. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18fs: symlink write_begin allocation context fixNick Piggin
commit 54566b2c1594c2326a645a3551f9d989f7ba3c5e upstream. With the write_begin/write_end aops, page_symlink was broken because it could no longer pass a GFP_NOFS type mask into the point where the allocations happened. They are done in write_begin, which would always assume that the filesystem can be entered from reclaim. This bug could cause filesystem deadlocks. The funny thing with having a gfp_t mask there is that it doesn't really allow the caller to arbitrarily tinker with the context in which it can be called. It couldn't ever be GFP_ATOMIC, for example, because it needs to take the page lock. The only thing any callers care about is __GFP_FS anyway, so turn that into a single flag. Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on this flag in their write_begin function. Change __grab_cache_page to accept a nofs argument as well, to honour that flag (while we're there, change the name to grab_cache_page_write_begin which is more instructive and does away with random leading underscores). This is really a more flexible way to go in the end anyway -- if a filesystem happens to want any extra allocations aside from the pagecache ones in ints write_begin function, it may now use GFP_KERNEL (rather than GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a random example). [kosaki.motohiro@jp.fujitsu.com: fix ubifs] [kosaki.motohiro@jp.fujitsu.com: fix fuse] Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Cleaned up the calling convention: just pass in the AOP flags untouched to the grab_cache_page_write_begin() function. That just simplifies everybody, and may even allow future expansion of the logic. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-12-19ACPI: don't cond_resched() when irqs_disabled()Wu Fengguang
The ACPI interpreter usually runs with irqs enabled. However, during suspend/resume it runs with irqs disabled to evaluate _GTS/_BFS, as well as by irqrouter_resume() which evaluates _CRS, _PRS, _SRS. http://bugzilla.kernel.org/show_bug.cgi?id=12252 Signed-off-by: Wu Fengguang <wfg@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2008-12-19ACPI: fix 2.6.28 acpi.debug_level regressionBjorn Helgaas
acpi_early_init() was changed to over-write the cmdline param, making it really inconvenient to set debug flags at boot-time. Also, This sets the default level to "info", which is what all the ACPI drivers use. So to enable messages from drivers, you only have to supply the "layer" (a.k.a. "component"). For non-"info" ACPI core and ACPI interpreter messages, you have to supply both level and layer masks, as before. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
2008-12-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: bnx2: Fix bug in bnx2_free_rx_mem(). irda: Add irda_skb_cb qdisc related padding jme: Fixed a typo net: kernel BUG at drivers/net/phy/mdio_bus.c:165! drivers/net: starfire: Fix napi ->poll() weight handling tlan: Fix pci memory unmapping enc28j60: use netif_rx_ni() to deliver RX packets tlan: Fix small (< 64 bytes) datagram transmissions netfilter: ctnetlink: fix missing CTA_NAT_SEQ_UNSPEC
2008-12-17irda: Add irda_skb_cb qdisc related paddingSamuel Ortiz
We need to pad irda_skb_cb in order to keep it safe accross dev_queue_xmit() calls. This is some ugly and temporary hack triggered by recent qisc code changes. Even though it fixes bugzilla.kernel.org bug #11795, it will be replaced by a proper fix before 2.6.29 is released. Signed-off-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-17USB: fix comment about endianness of descriptorsPhil Endecott
This patch fixes a comment and clarifies the documentation about the endianness of descriptors. The current policy is that descriptors will be little-endian at the API even on big-endian systems; however the /proc/bus/usb API predates this policy and presents descriptors with some multibyte fields byte-swapped. Signed-off-by: Phil Endecott <usb_endian_patch@chezphil.org> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-12-16netfilter: ctnetlink: fix missing CTA_NAT_SEQ_UNSPECPablo Neira Ayuso
This patch fixes an inconsistency in nfnetlink_conntrack.h that I introduced myself. The problem is that CTA_NAT_SEQ_UNSPEC is missing from enum ctattr_natseq. This inconsistency may lead to problems in the message parsing in userspace (if the message contains the CTA_NAT_SEQ_* attributes, of course). This patch breaks backward compatibility, however, the only known client of this code is libnetfilter_conntrack which indeed crashes because it assumes the existence of CTA_NAT_SEQ_UNSPEC to do the parsing. The CTA_NAT_SEQ_* attributes were introduced in 2.6.25. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: Phonet: keep TX queue disabled when the device is off SCHED: netem: Correct documentation comment in code. netfilter: update rwlock initialization for nat_table netlabel: Compiler warning and NULL pointer dereference fix e1000e: fix double release of mutex IA64: HP_SIMETH needs to depend upon NET netpoll: fix race on poll_list resulting in garbage entry ipv6: silence log messages for locally generated multicast sungem: improve ethtool output with internal pcs and serdes tcp: tcp_vegas cong avoid fix sungem: Make PCS PHY support partially work again.
2008-12-15Define smp_call_function_many for UPRusty Russell
Otherwise those using it in transition patches (eg. kvm) can't compile with CONFIG_SMP=n: arch/x86/kvm/../../../virt/kvm/kvm_main.c: In function 'make_all_cpus_request': arch/x86/kvm/../../../virt/kvm/kvm_main.c:380: error: implicit declaration of function 'smp_call_function_many' Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10Revert "radeonfb: accelerate imageblit and other improvements"Linus Torvalds
This reverts commit b1ee26bab14886350ba12a5c10cbc0696ac679bf, along with the "fixes" for it that all just caused problems: - c4c6fa9891f3d1bcaae4f39fb751d5302965b566 "radeonfb: fix problem with color expansion & alignment" - f3179748a157c21d44d929fd3779421ebfbeaa93 "radeonfb: Disable new color expand acceleration unless explicitely enabled" because even when disabled, it breaks for people. See http://bugzilla.kernel.org/show_bug.cgi?id=12191 for the latest example. Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Krzysztof Halasa <khc@pm.waw.pl> Cc: James Cloos <cloos@jhcloos.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Cc: Jean-Luc Coulon <jean.luc.coulon@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10MN10300: Fix __put_user_asm8()Akira Takeuchi
Fix __put_user_asm8() by jumping to the end label (3:) from the exception handler, rather than jumping back to retry the second store instruction (label 2:). Signed-off-by: Akira Takeuchi <takeuchi.akr@jp.panasonic.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10KSYM_SYMBOL_LEN fixesHugh Dickins
Miles Lane tailing /sys files hit a BUG which Pekka Enberg has tracked to my 966c8c12dc9e77f931e2281ba25d2f0244b06949 sprint_symbol(): use less stack exposing a bug in slub's list_locations() - kallsyms_lookup() writes a 0 to namebuf[KSYM_NAME_LEN-1], but that was beyond the end of page provided. The 100 slop which list_locations() allows at end of page looks roughly enough for all the other stuff it might print after the symbol before it checks again: break out KSYM_SYMBOL_LEN earlier than before. Latencytop and ftrace and are using KSYM_NAME_LEN buffers where they need KSYM_SYMBOL_LEN buffers, and vmallocinfo a 2*KSYM_NAME_LEN buffer where it wants a KSYM_SYMBOL_LEN buffer: fix those before anyone copies them. [akpm@linux-foundation.org: ftrace.h needs module.h] Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc Miles Lane <miles.lane@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Steven Rostedt <srostedt@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10atomic: fix a typo in atomic_long_xchg()Eric Dumazet
atomic_long_xchg() is not correctly defined for 32bit arches. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10revert "percpu_counter: new function percpu_counter_sum_and_set"Andrew Morton
Revert commit e8ced39d5e8911c662d4d69a342b9d053eaaac4e Author: Mingming Cao <cmm@us.ibm.com> Date: Fri Jul 11 19:27:31 2008 -0400 percpu_counter: new function percpu_counter_sum_and_set As described in revert "percpu counter: clean up percpu_counter_sum_and_set()" the new percpu_counter_sum_and_set() is racy against updates to the cpu-local accumulators on other CPUs. Revert that change. This means that ext4 will be slow again. But correct. Reported-by: Eric Dumazet <dada1@cosmosbay.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mingming Cao <cmm@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Cc: <stable@kernel.org> [2.6.27.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10revert "percpu counter: clean up percpu_counter_sum_and_set()"Andrew Morton
Revert commit 1f7c14c62ce63805f9574664a6c6de3633d4a354 Author: Mingming Cao <cmm@us.ibm.com> Date: Thu Oct 9 12:50:59 2008 -0400 percpu counter: clean up percpu_counter_sum_and_set() Before this patch we had the following: percpu_counter_sum(): return the percpu_counter's value percpu_counter_sum_and_set(): return the percpu_counter's value, copying that value into the central value and zeroing the per-cpu counters before returning. After this patch, percpu_counter_sum_and_set() has gone, and percpu_counter_sum() gets the old percpu_counter_sum_and_set() functionality. Problem is, as Eric points out, the old percpu_counter_sum_and_set() functionality was racy and wrong. It zeroes out counters on "other" cpus, without holding any locks which will prevent races agaist updates from those other CPUS. This patch reverts 1f7c14c62ce63805f9574664a6c6de3633d4a354. This means that percpu_counter_sum_and_set() still has the race, but percpu_counter_sum() does not. Note that this is not a simple revert - ext4 has since started using percpu_counter_sum() for its dirty_blocks counter as well. Note that this revert patch changes percpu_counter_sum() semantics. Before the patch, a call to percpu_counter_sum() will bring the counter's central counter mostly up-to-date, so a following percpu_counter_read() will return a close value. After this patch, a call to percpu_counter_sum() will leave the counter's central accumulator unaltered, so a subsequent call to percpu_counter_read() can now return a significantly inaccurate result. If there is any code in the tree which was introduced after e8ced39d5e8911c662d4d69a342b9d053eaaac4e was merged, and which depends upon the new percpu_counter_sum() semantics, that code will break. Reported-by: Eric Dumazet <dada1@cosmosbay.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mingming Cao <cmm@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-09netpoll: fix race on poll_list resulting in garbage entryNeil Horman
A few months back a race was discused between the netpoll napi service path, and the fast path through net_rx_action: http://kerneltrap.org/mailarchive/linux-netdev/2007/10/16/345470 A patch was submitted for that bug, but I think we missed a case. Consider the following scenario: INITIAL STATE CPU0 has one napi_struct A on its poll_list CPU1 is calling netpoll_send_skb and needs to call poll_napi on the same napi_struct A that CPU0 has on its list CPU0 CPU1 net_rx_action poll_napi !list_empty (returns true) locks poll_lock for A poll_one_napi napi->poll netif_rx_complete __napi_complete (removes A from poll_list) list_entry(list->next) In the above scenario, net_rx_action assumes that the per-cpu poll_list is exclusive to that cpu. netpoll of course violates that, and because the netpoll path can dequeue from the poll list, its possible for CPU0 to detect a non-empty list at the top of the while loop in net_rx_action, but have it become empty by the time it calls list_entry. Since the poll_list isn't surrounded by any other structure, the returned data from that list_entry call in this situation is garbage, and any number of crashes can result based on what exactly that garbage is. Given that its not fasible for performance reasons to place exclusive locks arround each cpus poll list to provide that mutal exclusion, I think the best solution is modify the netpoll path in such a way that we continue to guarantee that the poll_list for a cpu is in fact exclusive to that cpu. To do this I've implemented the patch below. It adds an additional bit to the state field in the napi_struct. When executing napi->poll from the netpoll_path, this bit will be set. When a driver calls netif_rx_complete, if that bit is set, it will not remove the napi_struct from the poll_list. That work will be saved for the next iteration of net_rx_action. I've tested this and it seems to work well. About the biggest drawback I can see to it is the fact that it might result in an extra loop through net_rx_action in the event that the device is actually contended for (i.e. the netpoll path actually preforms all the needed work no the device, and the call to net_rx_action winds up doing nothing, except removing the napi_struct from the poll_list. However I think this is probably a small price to pay, given that the alternative is a crash. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-09Merge branch 'audit.b59' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b59' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: [PATCH] fix broken timestamps in AVC generated by kernel threads [patch 1/1] audit: remove excess kernel-doc [PATCH] asm/generic: fix bug - kernel fails to build when enable some common audit code on Blackfin [PATCH] return records for fork() both to child and parent [PATCH] Audit: make audit=0 actually turn off audit
2008-12-09[PATCH] fix broken timestamps in AVC generated by kernel threadsAl Viro
Timestamp in audit_context is valid only if ->in_syscall is set. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-09[PATCH] asm/generic: fix bug - kernel fails to build when enable some common ↵Mike Frysinger
audit code on Blackfin If you enable some common audit code, the kernel fails to build. In file included from lib/audit.c:17: include/asm-generic/audit_write.h:3: error: '__NR_swapon' undeclared here (not in a function) make[1]: *** [lib/audit.o] Error 1 make: *** [lib] Error 2 So do not use __NR_swapon if it isnt defined for a port. Signed-off-by: Mike Frysinger <vapier.adi@gmail.com> Signed-off-by: Bryan Wu <cooloney@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-09[PATCH] return records for fork() both to child and parentAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: tproxy: fixe a possible read from an invalid location in the socket match zd1211rw: use unaligned safe memcmp() in-place of compare_ether_addr() mac80211: use unaligned safe memcmp() in-place of compare_ether_addr() ipw2200: fix netif_*_queue() removal regression iwlwifi: clean key table in iwl_clear_stations_table function tcp: tcp_vegas ssthresh bug fix can: omit received RTR frames for single ID filter lists ATM: CVE-2008-5079: duplicate listen() on socket corrupts the vcc table netx-eth: initialize per device spinlock tcp: make urg+gso work for real this time enc28j60: Fix sporadic packet loss (corrected again) hysdn: fix writing outside the field on 64 bits b1isa: fix b1isa_exit() to really remove registered capi controllers can: Fix CAN_(EFF|RTR)_FLAG handling in can_filter Phonet: do not dump addresses from other namespaces netlabel: Fix a potential NULL pointer dereference bnx2: Add workaround to handle missed MSI. xfrm: Fix kernel panic when flush and dump SPD entries
2008-12-05Enforce a minimum SG_IO timeoutLinus Torvalds
There's no point in having too short SG_IO timeouts, since if the command does end up timing out, we'll end up through the reset sequence that is several seconds long in order to abort the command that timed out. As a result, shorter timeouts than a few seconds simply do not make sense, as the recovery would be longer than the timeout itself. Add a BLK_MIN_SG_TIMEOUT to match the existign BLK_DEFAULT_SG_TIMEOUT. Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-04[PATCH 2/2] documnt FMODE_ constantsChristoph Hellwig
Make sure all FMODE_ constants are documents, and ensure a coherent style for the already existing comments. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-04[PATCH 1/2] kill FMODE_NDELAY_NOWChristoph Hellwig
Update FMODE_NDELAY before each ioctl call so that we can kill the magic FMODE_NDELAY_NOW. It would be even better to do this directly in setfl(), but for that we'd need to have FMODE_NDELAY for all files, not just block special files. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-03can: Fix CAN_(EFF|RTR)_FLAG handling in can_filterOliver Hartkopp
Due to a wrong safety check in af_can.c it was not possible to filter for SFF frames with a specific CAN identifier without getting the same selected CAN identifier from a received EFF frame also. This fix has a minimum (but user visible) impact on the CAN filter API and therefore the CAN version is set to a new date. Indeed the 'old' API is still working as-is. But when now setting CAN_(EFF|RTR)_FLAG in can_filter.can_mask you might get less traffic than before - but still the stuff that you expected to get for your defined filter ... Thanks to Kurt Van Dijck for pointing at this issue and for the review. Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net> Acked-by: Kurt Van Dijck <kurt.van.dijck@eia.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-03block: fix setting of max_segment_size and seg_boundary maskMilan Broz
Fix setting of max_segment_size and seg_boundary mask for stacked md/dm devices. When stacking devices (LVM over MD over SCSI) some of the request queue parameters are not set up correctly in some cases by default, namely max_segment_size and and seg_boundary mask. If you create MD device over SCSI, these attributes are zeroed. Problem become when there is over this mapping next device-mapper mapping - queue attributes are set in DM this way: request_queue max_segment_size seg_boundary_mask SCSI 65536 0xffffffff MD RAID1 0 0 LVM 65536 -1 (64bit) Unfortunately bio_add_page (resp. bio_phys_segments) calculates number of physical segments according to these parameters. During the generic_make_request() is segment cout recalculated and can increase bio->bi_phys_segments count over the allowed limit. (After bio_clone() in stack operation.) Thi is specially problem in CCISS driver, where it produce OOPS here BUG_ON(creq->nr_phys_segments > MAXSGENTRIES); (MAXSEGENTRIES is 31 by default.) Sometimes even this command is enough to cause oops: dd iflag=direct if=/dev/<vg>/<lv> of=/dev/null bs=128000 count=10 This command generates bios with 250 sectors, allocated in 32 4k-pages (last page uses only 1024 bytes). For LVM layer, it allocates bio with 31 segments (still OK for CCISS), unfortunatelly on lower layer it is recalculated to 32 segments and this violates CCISS restriction and triggers BUG_ON(). The patch tries to fix it by: * initializing attributes above in queue request constructor blk_queue_make_request() * make sure that blk_queue_stack_limits() inherits setting (DM uses its own function to set the limits because it blk_queue_stack_limits() was introduced later. It should probably switch to use generic stack limit function too.) * sets the default seg_boundary value in one place (blkdev.h) * use this mask as default in DM (instead of -1, which differs in 64bit) Bugs related to this: https://bugzilla.redhat.com/show_bug.cgi?id=471639 http://bugzilla.kernel.org/show_bug.cgi?id=8672 Signed-off-by: Milan Broz <mbroz@redhat.com> Reviewed-by: Alasdair G Kergon <agk@redhat.com> Cc: Neil Brown <neilb@suse.de> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Tejun Heo <htejun@gmail.com> Cc: Mike Miller <mike.miller@hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-12-03block: internal dequeue shouldn't start timerTejun Heo
blkdev_dequeue_request() and elv_dequeue_request() are equivalent and both start the timeout timer. Barrier code dequeues the original barrier request but doesn't passes the request itself to lower level driver, only broken down proxy requests; however, as the original barrier code goes through the same dequeue path and timeout timer is started on it. If barrier sequence takes long enough, this timer expires but the low level driver has no idea about this request and oops follows. Timeout timer shouldn't have been started on the original barrier request as it never goes through actual IO. This patch unexports elv_dequeue_request(), which has no external user anyway, and makes it operate on elevator proper w/o adding the timer and make blkdev_dequeue_request() call elv_dequeue_request() and add timer. Internal users which don't pass the request to driver - barrier code and end_that_request_last() - are converted to use elv_dequeue_request(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-12-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (30 commits) MAINTAINERS: add netdev to ATM ATM: horizon, fix hrz_probe fail path pppol2tp: Add missing sock_put() in pppol2tp_release() net: Fix soft lockups/OOM issues w/ unix garbage collector macvlan: don't broadcast PAUSE frames to macvlan devices Phonet: fix oops in phonet_address_del() on non-Phonet device netfilter: ctnetlink: fix GFP_KERNEL allocation under spinlock sungem: Fix PCS_MIICTRL register write in gem_init_phy(). net: make skb_truesize_bug() call WARN() net: hp-plus uses eip_poll net/wireless/reg.c: fix bad WARN_ON in if statement ath5k: disable beacon filter when station is not associated ath5k: fix Security issue in DebugFS part of ath5k ath9k: correct expected max RX buffer size ath9k: Fix SW-IOMMU bounce buffer starvation mac80211 : Fix setting ad-hoc mode and non-ibss channel iwlagn: fix DMA sync phylib: Add Vitesse VSC8221 SGMII PHY rose: zero length frame filtering in af_rose.c bridge: netfilter: fix update_pmtu crash with GRE ...